Book Image

Data Modeling with Microsoft Excel

By : Bernard Obeng Boateng
5 (1)
Book Image

Data Modeling with Microsoft Excel

5 (1)
By: Bernard Obeng Boateng

Overview of this book

Microsoft Excel's BI solutions have evolved, offering users more flexibility and control over analyzing data directly in Excel. Features like PivotTables, Data Model, Power Query, and Power Pivot empower Excel users to efficiently get, transform, model, aggregate, and visualize data. Data Modeling with Microsoft Excel offers a practical way to demystify the use and application of these tools using real-world examples and simple illustrations. This book will introduce you to the world of data modeling in Excel, as well as definitions and best practices in data structuring for both normalized and denormalized data. The next set of chapters will take you through the useful features of Data Model and Power Pivot, helping you get to grips with the types of schemas (snowflake and star) and create relationships within multiple tables. You’ll also understand how to create powerful and flexible measures using DAX and Cube functions. By the end of this book, you’ll be able to apply the acquired knowledge in real-world scenarios and build an interactive dashboard that will help you make important decisions.
Table of Contents (16 chapters)
1
Part 1: Overview and Introduction to Data Modeling in Microsoft Excel
6
Part 2: Creating Insightful Calculations from your Data Model using DAX and Cube Functions
9
Part 3: Putting it all together with a Dashboard

Understanding the concept of data modeling

Data modeling is the process of structuring and organizing data in a way that it can be easily analyzed and reported. Think of it like arranging books in a library. If you just threw all the books into a room, it would be hard to find what you need. But if you categorize them by genre, author, or publication date, it becomes much easier to locate a specific book.

Similarly, data modeling helps in organizing data so that you can easily derive insights from it.

Just as a business plan serves as a blueprint for a company, a data model acts as a blueprint for creating and visualizing the relationships between different datasets. This activity is known as data modeling.

It serves as the backbone for your visuals and calculations, allowing for more complex data analysis. A data model gives you a visual or conceptual view of how the datasets you are working with connect to produce the results or insights you need. Getting it right can be the difference between well-optimized data analytics and analytics filled with redundant data that offers little insight.

Microsoft offers the following definitions for a data model in Excel and Power BI:

  • A data model allows you to integrate data from multiple tables, effectively building a relational data source inside an Excel workbook.
  • Data modeling is the process of analyzing and defining all the different data types your business collects and produces, as well as the relationships between those bits of data. By using text, symbols, and diagrams, data modeling concepts create visual representations of data as it’s captured, stored, and used in your business. As your business determines how data is used and when the data modeling process becomes an exercise in understanding and clarifying your data requirements.

In Excel, a data model can help you connect to one or many tables and summarize the data with PivotTables.

Figure 1.1 – Comparing a one-table analysis to multiple-table analysis

Figure 1.1 – Comparing a one-table analysis to multiple-table analysis

Besides Excel, the concept also applies to other database management systems, such as Power BI, Access, Oracle, and so on.

With a data model, analyzing your data becomes easier because you can clearly define each dataset, the role it plays, and how it connects to other datasets to give you the results you need.

Comparing a one-table analysis to multiple-table analysis in Microsoft Excel

Often, we store our data in a range of cells in Microsoft Excel. Converting data stored in a range of cells into a table makes it easier for you to reference the dataset for calculations and further analysis using a PivotTable. This is called Structured Referencing. Standing in the range of cells, you can insert a table in Excel by going to Insert > Table in the ribbon or simply pressing Ctrl + T.

When data is stored in a table, simple aggregations such as SUM, AVERAGE, and COUNT can be performed using the table name and the column. For instance, summing sales from a table named Table1 can be simply done using =SUM(Table1[Sales]).

Data in the table can also be used in a PivotTable. This way, when the source data changes with the addition of more rows or columns, the PivotTables automatically update with the new data in the table when it is refreshed. This avoids the need to update the source reference of cells in the PivotTable.

Most Excel users tend to store all their data in one table for their analysis. This can be referred to as One-Table Analysis. There is nothing wrong with this approach. However, if the data you are working with grows and you have a situation where you need to add other tables to your analysis, it can become complex with just one table and a PivotTable.

Creating a data model in Power Pivot in Excel allows you to have access to multiple tables for your analysis without the need for complex lookup formulas. It improves performance and gives you a clear overview of how the tables relate.

Let’s now explore some of the key advantages of using a data model in Power Pivot.

Here are some reasons to use a data model:

  • It gives you a broad overview of your datasets or tables. This ensures that all the tables and datasets you require in your model are accurately captured. Take a look at the following example data model for a sales report.
Figure 1.2 – A Diagram view of an example sales report data model

Figure 1.2 – A Diagram view of an example sales report data model

You’ll realize that even though there are several tables used in the creation of the final dashboard, the data model gives a good overview of how each table connects and contributes to delivering the final results.

  • It is an abstract representation of the real-world situation you are analyzing. With the data model, you are in a good position to generate accurate measures and calculations for the KPIs in your report.
  • The data model helps reduce the occurrence of redundant data. That is, the repetition of the same data at different points in your dataset. This helps improve performance when your data increases.
  • The data model can also be a good blueprint for developing web or frontend applications for your dataset. For example, PowerApps, AppSheet, Caspio, and Squirrel are some of the applications that can benefit from a well-designed data model.

Most of these are low-code tools that use data models as a blueprint to create interactive apps for users. The data model then becomes an indirect way for developers to document the data that will be required to build these apps.

So far, we have covered what a data model is and the reasons you should consider using data models to structure datasets that are broken up into relational components and that need to be connected and properly visualized in order to effect the maximum efficiency and insight that is possible.

In the following section, we will look at some practical use cases of a data model. We will look at the case of an accountant and a salesperson and see how data models can help reduce the efforts and processes required in analyzing data.

Practical use cases for a data model

This section explores practical use cases of data models in various workplace scenarios.

The accountant

Mr. Owusu Yeboah is a chartered accountant. He enters his accounting records in the Journal tab, a table he has created in Microsoft Excel to record the Date, Description, Amount, Debit Account, and Credit Account of all transactions.

Figure 1.3 – Journal showing accounting entries

Figure 1.3 – Journal showing accounting entries

In another worksheet named COA, he has a table containing his chart of accounts with account codes, sorted to classify the various accounts into assets, liabilities, equity, revenue, and expenses. The other columns in his chart of accounts describe how each account has to be treated to produce a monthly and an annual financial statement.

Figure 1.4 – Sample chart of accounts

Figure 1.4 – Sample chart of accounts

For Mr. Owusu Yeboah to determine the ins and outs of each account or create a trial balance, he would need to use a lot of lookup formulas to connect the two tables. Aside from this, when new data is added to the tables, he must manually update all his workings to capture the new entries. Using Excel tables to store data is one way to avoid manually updating calculations when your data changes.

How does a data model help in this situation?

Using a data model, Mr. Owusu can upload and connect the two tables using common columns. These common columns are used to establish a relationship between the tables and make it possible to create a data model. He can then create an extra calendar table to help him create a month-on-month or annual financial statement.

A calendar table in Excel is a special table with a series of sequential dates that helps you keep track of dates and times in your data. It’s great for looking at things such as sales or expenses by day, month, or year. If your data is missing information for certain dates, a calendar table makes it easy to spot those gaps so you can fill them in. This ensures you’re not missing out on important details when making decisions.

In addition to helping Mr. Owusu Yeboah sort and analyze his data over time, a calendar table makes sure that all the date information in his various tables lines up correctly. This helps him avoid mistakes and makes it easier to combine different sets of data. It also lets Excel perform more advanced calculations for him, such as figuring out his total sales for each month or calculating averages over specific time periods.

His data model will look something like the following screenshot:

Figure 1.5 – A screenshot of a data model with accounting data

Figure 1.5 – A screenshot of a data model with accounting data

This will help him easily capture new information in the journal and chart of accounts and create a dynamic financial statement for his users.

The salesperson

Ferdinand Attobra is a sales executive with Finex online electronics shop. Daily, he is required to create a report that captures top-performing products, branches, and customers to his supervisors.

Figure 1.6 – Sales transactions

Figure 1.6 – Sales transactions

To create his report, he downloads four datasets from his sales software:

  • Transactions: This captures all the revenue as well as the cost of sales per transaction. The table also has fields that identify the customer, product, and store information related to each transaction. This is represented by Customer ID, Product ID, and Store ID.

    Apart from the Transactions table, there are three other tables he uses to look up the details of each customer, product, or store that appeared in the Transactions table.

Figure 1.7 – Sample lookup tables

Figure 1.7 – Sample lookup tables

  • Customers: This table has the unique details of all the shop’s customers’ IDs, their names, and their customer segments.
  • Products: This table contains the unique details of the product IDs, their categories, sub-categories, and their names.
  • Location: This table contains the details of each store ID, the city, region, and country.

The challenge Ferdinand faces in creating his report is how he can use the various IDs stored in the Transactions table to look up the customer, product, and store involved in each transaction.

How does a data model help in this situation?

Using a data model, Ferdi can upload and connect the Customers, Products, and Locations tables to the Transactions tables using the Customer ID, Product ID, and City columns respectively. This is where a calendar table, created as supplemental data but very useful, would get connected as well. He will then use this model to generate his daily reports to analyze sales by Product, Geography, Customer, and Date.

The model will look like the following screenshot:

Figure 1.8 – A screenshot of a data model showing sales data

Figure 1.8 – A screenshot of a data model showing sales data

From the two case studies, we can appreciate that using Excel’s data model can help us overcome some of the typical challenges in our routine office work.

Excel’s data model allows you to integrate data from multiple sources in an efficient manner. This is what is called an Entity Relationship Diagram (ERD).

Figure 1.9 – Sample ERD for a sales report in Excel

Apart from this key advantage, the data model can also do the following:

  • Store and analyze data beyond Microsoft Excel’s 1-million-row capacity. This brings a whole new capability to regular Excel.
  • Create more powerful formulas to help you analyze your data more efficiently.
  • Work together with tools such as Power Query to transform, shape your data, and maintain a dynamic connection to your data sources.

In the next topic, we will dive into the main tool for data modeling and explore some best practices to help you get more insights from your datasets.