how the partitioning of delta lake tables can impact the performance of your direct lake semantic models.
introduction.
First off, this blog post is not supposed to tell you how you should partition your delta tables when consuming them in direct lake models. Neither suggests this article that you should partition your delta tables. Instead, this post is about acknowledging that partitioning can have an impact on your direct lake semantic models’ performance. To showcase this, we created the same 100 million row table four times with different partitions and tested their performance against each other in DAX studio by utilising two distinct measures.
1. What’s the setup?
Let’s start with the tables themselves. We created four times the same identical delta table in our Lakehouse. The difference between those tables is how we partitioned each of them: One table had no partition at all, whereas the other ones had partitions on month, week and date respectively. Each table consists of 100 million rows and three columns: IDInt (a row count column as integer), AttrString (with the three distinct values Banana, Apple and Orange) and a Date column (with a random date within the year of 2024). The script we used can be found in the appendix I. Below, a snippet of the table:
Here is how the tables look like in OneLake’s file explorer. I used the add-on that you can download from here.