-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Microsoft Power BI Cookbook
By :

One of the most valuable features of Power BI is its deep support for real-time and streaming datasets, with the ability to provide immediate visibility to business processes and events as this data is created or updated. As Power BI Desktop's data modeling engine reflects the latest Analysis Services features, it becomes feasible to design DirectQuery models or composite models (DirectQuery and import) in Power BI Desktop, and thus avoid the scalability limitations and scheduled refresh requirements of models based on importing data.
The three most common candidates for DirectQuery or composite model projects are as follows:
This recipe walks through the primary steps in designing the data access layer that supports a DirectQuery model in Power BI Desktop. As these models are not cached into memory and dynamically convert the DAX queries from report visualizations to SQL statements, guidance is provided to maintain performance. Additional details, resources, and documentation on DirectQuery's current limitations and comparisons with the default import mode are also included to aid your design decision.
Be aware that DirectQuery models are limited to a single source database and not all databases are supported for DirectQuery. If multiple data sources are needed, such as SQL Server and Oracle, or Teradata and Excel, then the default Import mode model, with a scheduled refresh to the Power BI Service, will be the only option.
For this recipe, we will use the AdventureWorksDW2019 database that has been used thus far in this chapter. To implement this recipe, follow these steps:
Figure 2.14: Creating a DirectQuery data source
Item
in the formula bar or by clicking on the gear icon next to the Navigation step under APPLIED STEPS in the Query Settings pane. Also, rename this query to reflect the data being referenced.Figure 2.15: Editing the Navigation step in the formula bar
DimEmployee
DimPromotion
DimCurrency
DimSalesTerritory
The Report Canvas view will confirm that the model is in DirectQuery mode via the status bar at the bottom right (see Figure 2.16). In addition, the Data view in the left-hand pane, which is visible for import models, will not be visible.
Figure 2.16: DirectQuery Status in Power BI Desktop
The M transformation functions supported in DirectQuery are limited by compatibility with the source system. The Power Query Editor will advise when a transformation is not supported in DirectQuery mode, per Figure 2.17.
Figure 2.17: A warning in Query Editor that the IsEven function is not supported in DirectQuery mode
Given this limitation and the additional complexity the M-based transforms would add to the solution, it is recommended that you embed all the necessary logic and transforms in the source relational layer. Ideally, the base tables in the source database themselves would reflect these needs. As a secondary option, a layer of views can be created to support the DirectQuery model.
If the database objects themselves cannot be revised, the Value.Native
M function can be used to directly pass the SQL statement from Power BI Desktop to the source database, as per Figure 2.18.
Figure 2.18: The Value.Native function used to pass a SQL statement to a source system
As report visualizations are refreshed or interacted with in Power BI, the DAX queries from each visualization are translated into SQL statements, utilizing the source SQL statements to return the results. Be aware that Power BI does cache query results with DirectQuery models. Therefore, when accessing a recently utilized visual, a local cache may be used rather than a new query sent to the source.
The SQL statements passed from Power BI to the DirectQuery data source include all columns from the tables referenced by the visual.
For example, a Power BI visual with SalesAmount from the FactResellerSales table grouped by ResellerName from DimReseller would result in a SQL statement that selects the columns from both tables and implements the join defined in the model. However, as the SQL statement passed embeds these source views as derived tables, the relational engine is able to generate a query plan that only scans the required columns to support the join and aggregation.
The performance and scalability of DirectQuery models are primarily driven by the relational data source. A denormalized star schema with referential integrity and a system that is isolated from OLTP workloads is recommended if near real-time visibility is not required. Additionally, in-memory and columnar features available to supported DirectQuery sources are recommended for reporting and analytical queries.
By default, DirectQuery models generate outer join SQL queries to ensure that measures return the correct value even if there's not a related dimension. However, you can configure DirectQuery models to send inner join queries. This is done by editing the relationship between tables in the modeling view by checking the Assume referential integrity setting (see Figure 2.19). Along with source system resources, this is one of the top factors contributing to the DirectQuery model's performance.
Figure 2.19: Activating referential integrity assumption in relationships
Of course, you should ensure that there is referential integrity in the source before enabling this setting; otherwise, incorrect results could be returned.
The design of the source relational schema and the hardware resources of this system can, of course, greatly impact the performance of DirectQuery models.
A classic star-schema design with denormalized tables is recommended to reduce the required join operations at query time. Optimizing relational fact tables with column store technologies such as the Clustered Columnstore Index for SQL Server and table partitions will also significantly benefit DirectQuery models.