As you may have already heard, one of the easiest ways to reduce a Power BI model (dataset) size is by splitting DateTime columns into separate Date and Time columns but the question is how much space reduction can you achieve by doing so. As I show in this blog post, the reduction can be significant and up to % 80 or % 90 depending on the number and cardinality of the datetime columns.
To test the size reduction, I created a bigger version of FactInternetSales from AdventureWorks. I called the new table FactInternetSalesBig which has 1,270,517 rows. As you can see from report shown below, this table has 1.89 million distinct Order DateTimes, DueDateTimes, and ShipDateTimes. While this is not a lot of rows compared to most business scenarios you run into, it is big enough to measure the size reduction in terms of pbix file size on disk and in memory.
Once I had the table, I created a Power BI model/report containing just this one table and measured its size on disk and in memory (using DAX Studio).
The following image shows the baseline for the test. Without any changes, the pbix file is 49 MB on disk and 265 MB in memory.
Next I split OrderDate into separate Date and Time columns. I chose to do this using PowerQuery which makes these kind of transformation extremely easy. Alternatively, this can be done in the SQL data source using a view. (The downwise of using PowerQuery in this case is that certain transformation including the ones I used in these pbix files will not allow incremental refresh to be set up but that was not a factor for this test.)
The following shows how the pbix file size reduced to 27 MB on disk and 184 MB in memory.
Next I split DueDate into separate Date and Time columns. The following shows how the pbix file size reduced to 23 MB on disk and 101 MB in memory.
Finally I split ShipDate into separate Date and Time columns. The following shows how the pbix file size reduced to 9 MB on disk and 21 MB in memory.
Here is the summary of all the steps:
|Test||Disk (MB)||Memory (MB)||Disk Reduction to original amount||Memory Reduction to original amount|
|OrderDate split||27||184||% 45||% 30|
|DueDate Split||23||101||% 53||% 62|
|ShipDate Split||9||21||% 82||% 92|
As you can see, just by splitting 3 DateTime columns into separate pairs of Date and Time columns, the model size reduces by % 82 on disk and % 92 in memory. This is a significant reduction considering the effort involved.
You can find all of these files and the SQL Code I used to generate FactInternetSalesBig here on my GitHub.
In this post I was purely interested in the effect of splitting DateTime columns into Date and Time columns. In some businesses, the Time of an event is not significant for reporting but may still be generated by the backend systems. In such scenarios, the Time portion can be completely removed from the PBI model or at least rounded to Hours/Minutes vs. seconds/milliseconds.
Additionally, in the presences of a good dimensional model, one can achieve a even a higher compression rate by using DateIDs/TimeIDs from Date/Time dimension tables instead of the actual dates/times. While this is recommended and necessary for TimeIntelligence calculations, if reporting is done at the grain of a Fact Table with multiple datetime columns that need to be combined back using measures, this can perform slower than if some of the DateTime values were kept intact in the table. You need to test performance for your model/reports to know which one works faster.