Precedence is one of the aggregation properties that you can define on an aggregation table in Power BI. In a model with multiple aggregation tables, you can use Precedence to define the order in which aggregation tables will be considered by Power BI to answer queries. The higher the Precedence number, the sooner the aggregation table will be considered. Very simple and easy to understand. but the question is:
What does Power BI do when there are multiple aggregation tables configured with the same Precedence value and they can all answer the same query? Which one is considered first? Does it choose the smallest one? or is there another rule in place?
Spoiler Alert: It does not seem to choose the smaller aggregation table.
In my previous post on Aggregations in Power BI, I used an example where the model had two aggregation tables:
One table at Product and Year level with 288 rows:
CREATE TABLE [dbo].[Sales_Agg1_By_Product_Year]( [ProductKey] [int] NOT NULL, [Calendar Year] [smallint] NOT NULL, [Sales Amount] [money] NULL, [Unit Price Sum] [money] NULL, [Unit Price Count] [int] NULL, [InternetSalesRowCount] [int] NULL ) ON [PRIMARY] INSERT INTO dbo.Sales_Agg1_By_Product_Year SELECT fs.ProductKey, dd.[Calendar Year], SUM([Sales Amount]) AS [Sales Amount], SUM([Unit Price]) AS [Unit Price Sum], COUNT([Unit Price]) AS [Unit Price Count], COUNT(*) AS InternetSalesRowCount FROM [dbo].[vFactInternetSales] fs JOIN [dbo].[vDimDate] dd ON fs.DateKey = dd.DateKey GROUP By fs.ProductKey, dd.[Calendar Year]
One table at Year level with 5 rows:
CREATE TABLE [dbo].[Sales_Agg2_By_Year]( [Calendar Year] [smallint] NOT NULL, [Sales Amount] [money] NULL ) ON [PRIMARY] INSERT INTO dbo.Sales_Agg2_By_Year SELECT dd.[Calendar Year], SUM([Sales Amount]) AS [Sales Amount] FROM [dbo].[vFactInternetSales] fs JOIN [dbo].[vDimDate] dd ON fs.DateKey = dd.DateKey GROUP By dd.[Calendar Year]
Obviously, before configuring the second aggregation table, Power BI used the first and only aggregation table available to answer the test query.
After configuring the second aggregation table, Power BI picked the second one.
Since the second table was smaller, this seemed like a desired behavior but the question is: Did Power BI pick it because it was smaller or was there another factor in place? Unfortunately as you will see shortly, there seems to be something else at play.
Before I show you that, for reference, when I set the Precedence property of the larger aggregation tableto 10, Power BI chose the larger aggregation table to answer the respective queries.
This last example was only used to demonstrate that the Precedence property can direct Power BI to consider one aggregation table over another one.
In a real case, if there is more than one aggregation table configured and they can all be used to answer the same query, you do want the smallest aggregation table to be considered first.
What does Power BI do when aggregation tables have the exact same Precedence
To find out why Power BI picked the second aggregation table in the absence of precedence (0), I added a third aggregation table to the mix.
CREATE TABLE [dbo].[Sales_Agg3_By_ProductCategory_Year]( [ProductCategoryKey] [int] NOT NULL, [Calendar Year] [smallint] NOT NULL, [Sales Amount] [money] NULL ) ON [PRIMARY] INSERT INTO dbo.Sales_Agg3_By_ProductCategory_Year --49 rows SELECT dp.[ProductSubcategoryKey], dd.[Calendar Year], SUM([Sales Amount]) AS [Sales Amount] FROM [dbo].[vFactInternetSales] fs JOIN [dbo].[vDimDate] dd ON fs.DateKey = dd.DateKey JOIN [dbo].[DimProduct] dp ON fs.ProductKey = dp.ProductKey GROUP By dp.[ProductSubcategoryKey], dd.[Calendar Year]
This table has 49 rows. It is smaller than the first aggregation table with 288 rows but larger than the second one with 5 rows.
I configured the third aggregation table similar to others.
Since the query in question, is by Calendar Year, I did not configure ProductCategoryKey and did not connect the ProductCategoryKey column between the Sales_Agg3_By_ProductCategory_Year and the Product Category tables in the model.
EVALUATE SUMMARIZECOLUMNS( 'Date'[Calendar Year], "Sales Amount", SUM ('Internet Sales'[Sales Amount]) )
Multiple Aggregation tables with the same Precedence
After adding and configuring the third aggregation table in the model, the test query was answered from the third aggregation table:
It seems like Power BI chose the last configured Aggregation table to answer the query.
(Note: As expected, connecting ProductCategoryKey between the Sales_Agg3_By_ProductCategory_Year and Product Category tables in the model, does not make any difference in this example. This is because that column is not in use for the query in question.)
I repeatetest the same test as before but this time I changed the order of the opeations. I added Sales_Agg3_By_ProductCategory to the model first, configured it, then added Sales_Agg2_By_Year and configured it. As I was suspecting at this point, Power BI chose Sales_Agg2_By_year which was the last table added to the model.
At this point I had yet another question in mind: In the absence of different Precedence properties, is it the order in which aggregation tables are added to the model or the order in which they are configured that affects the order in which they are considered to answer queries?
To find out, I tested the following combinations:
|Agg2 configured 1st.|
Agg3 configured 2nd.
|Agg 2 configured 2nd.|
Agg 3 configured 1st.
|Agg 2 added first. Close and Apply.|
Agg 3 added second. Close and Apply.
|Query uses Agg3.||Query uses Agg3.|
|Agg 3 added first. Close and Apply.|
Agg 2 added second. Close and Apply.
|Query uses Agg2.||Query uses Agg2.|
The aggregation table last added to the model wins in all combinations.
A couple of notes:
(1) Agggration table one was not considered in either combinaton.
(2) I repaeted the same test with the Precedence set to 10 instead of 0 and got the same results.
It seems like Power BI is favoring the last aggregation table added to a model when there is a tie in the Precedence property between them. At the moment, there are not many details available about the inner workings of how Power BI decides which aggregation tables to use to answer queries if there are more than one candidate table available in the model. It is important to note, the dataset I worked with in this blog post was very small. It is not clear if the same scenario would happen in an actual large dataset.
SQL Server Analysis Services (SSAS) Multidimensional, (the other system I used agggrations extensively in) has a complex set of rules when considering aggregations to answer queries. For example, one of the rules was the 1/3 rule. If an aggregation was larger than 1/3 of the main table (fact table) it would not be created at all. I wonder if Power BI uses a similar set of rules based on either the number of rows in a table or the table’s compressed size. Hopefully Power BI team will provide more details over time.
Regardless of which rules Power BI uses to choose agggration tabltes, always check the important queries you intend to answer from the agggration tables, to make sure the aggregation table you intended to be used is the one being used. You can use either DAX Studio as I showed in my previous post or Extended Events for this.