Finally, in order to transform, process, and analyze the data sitting in these delimited text-based files, spreadsheets or relational databases, typically an analyst, data engineer or software engineer would have written some code.
This code, for example, could take the form of formulas or Visual Basic for Applications (VBA) for spreadsheets, or Structured Query Language (SQL) for relational databases, and would be used for the following purposes:
- Loading data, including batch loading and data migration
- Transforming data, including data cleansing, joins, merges, enrichment, and validation
- Standard statistical aggregations, including computing averages, counts, totals, and pivot tables
- Reporting, including graphs, charts, tables, and dashboards
To perform more complex statistical calculations, such as generating predictive models, advanced analysts could utilize more advanced programming languages, including Python, R, SAS, or even Java.
Crucially, however, this data transformation, processing, and analysis would have either been executed directly on the server in which the data was persisted (for example, SQL statements executed directly on the relational database server in competition with other business-as-usual read and write requests), or data would be moved over the network via a programmatic query (for example, an ODBC or JDBC connection), or via flat files (for example, CSV or XML files) to another remote analytical processing server. The code could then be executed on that data, assuming, of course, that the remote processing server had sufficient CPUs, memory and/or disk space in its single machine to execute the job in question. In other words, the data would have been moved to the code in some way or another.