During one of my presentations on Incremental Refresh (IR) in Power BI, someone asked what happens during a Power BI automatic partition maintenance window when Power BI has an opportunity to merge smaller partitions into larger ones. Does Power BI use the data that is already imported into Power BI for the smaller partitions and combine it into a bigger one or does it re-read the data for those smaller partitions again. For example, if a dataset has an IR policy to refresh the last 1 day, and it has read data for all the days in a previous month, one day for each, when the new month arrives, it has an opportunity to merge the smaller day partitions into a month partition for the previous month. Does it re-read the previous month’s data from the source again or does it use what it already has in its memory?
Theory
My hope and gut feeling (based on years of working with SSAS partitions) was that Power BI does not re-read the data for the previous periods and uses what’s already in memory. This is important because when we talk about IR we are usually talking about large datasets which means full data reads from the backend source can be slow and/or expensive so we want to avoid re-reads if we can.
Test
To prove my theory, I set up a Power BI dataset with an IR policy of “Import Last 5 Years/Refresh Last 1 Day” and used XMLA process commands from SSMS with the Effective Date parameter to simulate a process (refresh) on the last day of a year and the first day of the next year and tracked the partitions. The Effective Date parameter can be used with an XMLA process command and is useful for testing scenarios like this. If you want to find out more about the test, read the Details section in this post.
Results
Power BI does not re-read the data for the smaller partitions, individually (one per day) or as a group (a month), to create the bigger partition. It only re-reads the data for the last time period as defined in the IR policy.
The period here is whatever was set in the Refresh the last …. in the IR. In my example, it was Days.
This is super important if you are dealing with a large dataset or if there is a cost for the reads, you don’t want Power BI to re-read all those 30 days. It makes sense that it re-reads the data for the last period (day) to make sure it had all the updates for that period (day) if anything changed since it read it and when the period (day) closed.
Read/Merge
In the example of a new month:
On the first refresh of a new day, Power BI reads the data for the 1st day of the new month, and it also re-reads the data for the last day of the previous month. No merges happen yet. Power BI does not issue a read for the entire data for the previous month.
On the second refresh of a new day, Power BI merges the smaller partitions (days) into a bigger one (month) and it re-reads the data for the current period (day) per IR policy. When there is only one daily refresh scheduled per day, the merge will happen with the refresh on the next day.
In the rest of this post, I show you how I did the test. Read on if you want to know the details or if you are interested to see a real use case of using the EffectiveDate parameter. Note that once you publish a Power BI dataset with IR to the service, after the first refresh from the service has been completed, the only way to reload older partitions (full process or incrementally using a test date) is to submit a command using the XMLA endpoint, for example via SSMS.
Details
I set up a Power BI dataset with the following IR policy.

I published it to Power BI Service.

At this point, the dataset only has one partition which can be seen from SSMS.

At this point, a Refresh command from Power BI service will generate the partitions based on the IR policy and pull in data for the last 5 years. Since I wrote this post in November and I want to simulate what happens at the end of a period change, I use the XMLA endpoint (fancy way of saying connecting to the Power BI datasets) from SSMS to issue process commands instead of using Power BI service directly. Note that from SSMS, you can issue commands that force a full refresh of historical data (by setting the applyRefreshPolicy to false, it is true by default), or you can issue process commands that makes Power BI think the date is a different date than what actually is on the calendar. The latter one is what I used to make Power BI think we are on December 31st, 2021, and then on January 1st, 2022.
Process full command with EffectiveDate set to 12/31/2021
First process command:
I ran the following command from SSMS. It took about 23 seconds for a full load of historical data.
{
"refresh": {
"type": "full",
"applyRefreshPolicy": true,
"effectiveDate": "12/31/2021",
"objects": [
{
"database": "Hybrid Tables Demo 5b - Last 5 Years Last 1 Day IR - Process with EffectiveDate",
"table": "vFactInternetSalesBig"
}
]
}
}
I used Azure Data Studio to track the queries Power BI submitted to my Azure SQL DB. As expected, there are queries for 2016, 2017, 2018, 2019, 2020 as years. From that point on, there are queries for 2021 Q1, Q2, and Q3, then October and November, and finally days in December leading up to December 31st.

A look at the partitions from SSMS, confirms that partitions have been created by Power BI service.

Notice how processed time of all partitions is at 10:39 am. The last partition is for 12/31.

Second process command:
I submitted the same command a second time to verify IR was in place. This second process took 5 seconds (compared to 23 before)
{
"refresh": {
"type": "full",
"applyRefreshPolicy": true,
"effectiveDate": "12/31/2021",
"objects": [
{
"database": "Hybrid Tables Demo 5b - Last 5 Years Last 1 Day IR - Process with EffectiveDate",
"table": "vFactInternetSalesBig"
}
]
}
}
Looking at the partitions from SSMS confirmed that only the partition for 12/31 was processed. Timestamp showed 10:46.

I captured the queries in Azure Data Studio, which showed only one query to the data for 12/31. This is expected behavior and the definition of IR.

Process full command with EffectiveDate set to 01/01/2022
First process command:
This time I moved the date to January 1st, 2022, to see how/if/when Power BI merges the old daily/monthly/quarterly partitions into one partition for 2021. This refresh took 7 seconds.
{
"refresh": {
"type": "full",
"applyRefreshPolicy": true,
"effectiveDate": "01/01/2022",
"objects": [
{
"database": "Hybrid Tables Demo 5b - Last 5 Years Last 1 Day IR - Process with EffectiveDate",
"table": "vFactInternetSalesBig"
}
]
}
}
SSMS showed that a new partition was created for January 1st, 2022, and that the partition for December 31, 2021, was read again. The latter one caught me by surprise the first time I saw it, but it does make sense. This is because Power BI needs to capture any changes in the last partition in case data has changed in that last partition since it read it. Remember that once a partition is outside of IR window, it will be read again, unless forced.

Additionally, no partitions were merged yet. At this point, I could see daily partitions for December, monthly for November and October, then one for Q1 and one for Q2 for 2021. The yearly partition for 2016 is dropped at this point.
I also tracked the queries from Azure Data Studio which confirmed that only the data for the last two days were read:


Second process command:
I submitted another process command from SSMS with effective date set to January 1st, 2022.
{
“refresh”: {
“type”: “full”,
“applyRefreshPolicy”: true,
“effectiveDate”: “01/01/2022”,
“objects”: [
{
“database”: “Hybrid Tables Demo 5b – Last 5 Years Last 1 Day IR – Process with EffectiveDate”,
“table”: “vFactInternetSalesBig”
}
]
}
}
This process took 4 seconds and executed the merge. As I had seen in my tests for my previous blog post on IR, it is the second refresh of a new time period that triggers the merge.
The following pictures shows the partitions from SSMS. Notice the follwoing changes:
- Year 2021 has its own partition, but its Last processed date shows blank.
- The partition for January 1st, 2022, was updated.
- The partition for year 2016 is gone. This is because the IR policy was set to keep only the last 5 years. This one was actually dropped with the first refresh.

Azure Data Studio also confirmed that only the data for January 1st, 2022, was read and nothing else.

Summary
When Power BI has an opportunity to merge smaller partitions into bigger ones, it does not re-read the data for the smaller partitions, individually (one per day) or as a group (a month), to create the bigger partition. The merge happens with the second refresh of a new period. The first one gets the data for the last small previous period and the second one uses that data when a merge is possible.
I have a few other video/blog posts on Incremental Refresh and Hybrid Tables if you want to learn more:
Incremental Refresh and Hybrid Tables in Power BI – Shabnam Watson’s Blog (wpcomstaging.com)
Hybrid Tables (In Power BI) Q&A – Shabnam Watson’s Blog (wpcomstaging.com)
One thought on “Exploring Automatic Partition Maintenance in Power BI datasets with Incremental Refresh, using EffectiveDate parameter”