-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Microsoft Power BI Cookbook
By :

The transformations applied within Power BI's M queries serve to protect the integrity of the data model and to support enhanced analysis and visualization. The specific transformations to implement varies based on data quality, integration needs, and the goals of the overall solution. However, at a minimum, developers should look to protect the integrity of the model's relationships and to simplify the user experience via denormalization and standardization. Additionally, developers should check with owners of the data source to determine whether certain required transformations can be implemented in the source, or perhaps made available via SQL view objects such that Power Query (M) expressions are not necessary.
This recipe demonstrates how to protect a model from duplicate values within the source data that can prevent forming proper relationships within the data model, which may even result in query failures. While a simple scenario is used, this recipe demonstrates scenarios you may run into while attempting to merge multiple data sources and eliminating duplicates.
To prepare, start by importing the DimProduct and FactResellerSales tables from the AdventureWorksDW2019 database by doing the following:
let
Source = Sql.Database("localhost\MSSQLSERVERDEV", "AdventureWorksDW2019")
in
Source
Figure 2.31: Many-Many relationship cardinality warning
For additional details on performing these steps, see the Managing Queries and Data Sources recipe in this chapter.
We wish to remove duplicates from the EnglishProductName column in our DimProduct query. To implement this recipe, use the following steps:
EnglishProductName
column with a Text.Trim
function.EnglishProductName
key column with the Table.DuplicateColumn
function and name this new column Product Name
.EnglishProductName
column via the Table.TransformColumns
function. This new expression must be applied before the duplicate removal expressions are applied.Table.Distinct
function to remove duplicate rows.Table.Distinct
expression to specifically remove duplicate values from the EnglishProductName
column.EnglishProductName
column via Table.RemoveColumns
.The final query should resemble the following:
let
Source = AdWorksDW,
dbo_DimProduct = Source{[Schema="dbo",Item="DimProduct"]}[Data],
RemoveColumns = Table.SelectColumns(dbo_DimProduct,{"EnglishProductName"}),
TrimText =
Table.TransformColumns(
RemoveColumns,{"EnglishProductName",Text.Trim}
),
DuplicateKey =
Table.DuplicateColumn(
TrimText,"EnglishProductName","Product Name"
),
UpperCase =
Table.TransformColumns(
DuplicateKey,{{"EnglishProductName", Text.Upper}}
),
DistinctProductRows = Table.Distinct(UpperCase),
DistinctProductNames =
Table.Distinct(
DistinctProductRows, {"EnglishProductName"}
),
RemoveEnglishProductName =
Table.RemoveColumns(
DistinctProductNames,"EnglishProductName"
)
in
RemoveEnglishProductName
In the TrimText
expression, the Trim.Text
function removes white space from the beginning and end of a column. Different amounts of empty space make those rows distinct within the query engine, but not necessarily distinct within the model. Therefore, it is always a good idea to use Trim.Text
first and then remove duplicate rows and values.
In the next expression, DuplicateKey
, the Table.DuplicateColumn
function is used to duplicate the column where we will be removing duplicate values. We give this new column the name that we desire for our final column. This is done because we will need to transform the values in the column we are removing duplicates from, in order to account for mixed cases such as "Fender Set" and "Fender set". Thus, we wish to preserve the original values and casing by using this duplicate column.
In order to eliminate mixed casing issues, the UpperCase
expression changes all values in the EnglishProductName
column to uppercase using the Table.TransformColumns
function, and specifying Text.Upper
. The M engine considers mixed casing values unique, but the data model engine does not.
The next two expressions, DistinctProductRows
and DistinctProductNames
, simply demonstrate two different methods of using the Table.Distinct
function. The first, DistinctProductRows
, eliminates rows where the entire row (all column values) are identical. The second version looks only at the values in a single column when determining whether or not the row is a duplicate.
At this point, the query is now resilient to duplicate values and rows, mixed cases, and spaces. However, the EnglishProductName
column is now in the uppercase format. Since we preserved a copy of the original values and casing in our Product Name
column, we can simply drop the EnglishProductName
column using the Table.RemoveColumns
function.
We can now form a one-to-many relationship between our DimProduct and FactResellerSales tables.
Figure 2.32: Simple one-to-many relationship model
To support troubleshooting, create a query that accesses the same source table and retrieves the values from the EnglishProductName
column with more than one row.
let
Source = AdWorksDW,
dbo_DimProduct = Source{[Schema="dbo",Item="DimProduct"]}[Data],
RemoveColumns = Table.SelectColumns(dbo_DimProduct,{"EnglishProductName"}),
TrimText =
Table.TransformColumns(
RemoveColumns,{"EnglishProductName",Text.Trim}
),
UpperCase =
Table.TransformColumns(
TrimText,{{"EnglishProductName", Text.Upper}}
),
GroupedRows =
Table.Group(
UpperCase, {"EnglishProductName"},
{{"Rows", each Table.RowCount(_), Int64.Type}}
),
Duplicates = Table.SelectRows(GroupedRows, each [Rows] > 1)
in
Duplicates
The EnglishProductName
column is selected, trimmed, converted to uppercase, grouped, and then filtered to always retrieve any duplicate key values. Disable the loading of this query, as the query would only exist for troubleshooting purposes.
Table.SelectColumns
: http://bit.ly/38Qk7LtTable.RemoveColumns
: http://bit.ly/3cJju7pTable.TransformColumns
: http://bit.ly/3tsdxm2Table.DuplicateColumn
: http://bit.ly/3cIF63XTable.Distinct
: http://bit.ly/38V8mmNText.Trim
: http://bit.ly/3eUmAZ0Text.Upper
: http://bit.ly/3vFW2R6