April 2, 2023|developer tip|

why you should not build one gigantic super semantic model (former dataset) in power bi.

You might have been faced with a requirement to consolidate data from various information areas like invoice, order, finance, sales, supply chain etc. into just one single semantic model. Here are 13 reasons (in no particular order) why one should reconsider building such a gigantic super semantic model. Instead, I advocate creating one semantic model for each subject or information area: So, one for finance, one for order and one for sales. Note, this is not equal to separating them by source system either! Of course, there are ways to split your semantic models other than business functions as well. My point is to divide and conquer!
By the way, the same applies to the other extreme. You should not create one semantic model per report! This is as wrong as just implementing one semantic model for everything!

Disclaimer: The answer on how a semantic model structure in Power BI shall look like, will typically be an “it depends”. The recommendation for the approach suggested in this post here, targets primarily large enterprise business intelligence solutions. Yet, the reasons below might come in handy for smaller projects, too.


the 13 reasons.

1. Barely any interest in data from other information areas
2. Difficulties comprehending and maintaining the semantic model and its dependencies
3. Lack of clarity over the relationships of dimensions and facts
4. Star schema? Rather a star-spangled galaxy
5. More naming ambiguity issues for measures and business processes
6. The maze
7. Cling together, swing together
8. Losing a precious friend for your data governance
9. Confusing ownership
10. Increase of refresh time
11. Lack of clarity on data freshness
12. Bumping quicker into premium capacity restrictions
13. Missing out on dedicating compute and memory for certain parts of the model


1. Barely any interest in data from other information areas.

The requirement for combining data from various information areas usually originates from a cross level perspective (i.e. financial controllers or business analysts) or from higher up the organisational ladder (i.e. middle or top management). Most of your users, however, won’t need access to data from other business functions. Needless to say, it is always crucial to scrutinise the analytical need, particularly when being asked to link data from several business functions.
I get it: Analysts want to combine, slice and dice data in almost every imaginable way. And good for you, if you have such ambitious and demanding colleagues! You as a gatekeeper though, must keep things clean and neat in your data models rather than building something just because it’d be nice to have.
But what if you get a specific use case, where you would need to compare i.e. sales with order data? One option could be to create a slim order fact table in the sales semantic model or a slim sales fact table in the order semantic model. Which path you go, depends on the specific use case. Either way, that special fact table shall only contain attributes you really need to fulfil the analytical requirement. Another alternative: Power BIs composite model. With this feature, you are able to build a combined model on top of your segregated sales and order semantic models . Composite models, however, do come along with its own challenges. I won’t dig into these, but the feature itself is definitely something to consider in such a scenario.


2. Difficulties comprehending and maintaining the semantic model and its dependencies.

With a super semantic model, both your semantic model and your Power BI infrastructure will become progressively harder to understand. And this applies to literally everyone working with that semantic model: Power BI admins, semantic model developers, report builders and even end consumers. Sooner or later, they are all going to suffer from the advancing complexity. An obvious one: It will take time and resources to learn and understand the model. But that’s not all. For example, a perhaps minor adjustment in the semantic model, might take quite some time to crosscheck for its impact on the model – both semantic and performance wise.
From an infrastructural point, there will be plenty of connections to that model, too (like reports, excel pivots, apps, dashboards etc.). As a result, it will be tough to follow up which of these objects are affected by changes to the semantic model. Inevitably, it will make developers hesitant for larger modifications. You are likely to end up with a “holy” monolith, where barely anyone dares to alter or delete existing things.
On the flipside, when dividing the semantic model into smaller areas, the semantic models become overall easier to understand, manage and maintain allowing for faster updates and changes.


3. Lack of clarity over the relationships of dimensions and facts.

There will be cases in your model, where measures cannot be used with certain dimensions, because no logical relationship exists. You might already have experienced this issue when dragging a measure into a visual or a pivot table, where the overall total of that measure gets displayed for all members of the dimension. Having such “do-not-use-with-certain-dimension” measures are very common, even if you follow a strict one semantic model per information area approach. To be honest, already then, you find yourself repeatedly explaining, why you can’t use this one measure with that specific dimension. Quite naturally, your super semantic model will have even more such cases. One might argue, this is just a documentation or education issue and I do give away that point. Still, the effort for recurrently explaining each one of them – preferably in a pedagogical way – sounds pretty exhausting.


4. Star schema? Rather a star-spangled galaxy.

The Vertipaq engine (Power BI’s motor under the hood) loves a well-designed star schema. This has to do with the fact that it’s underlying storage engine is a columnar database. Actually, that’s why Power BI is so fast (at least it’s of one of the reasons). With a super semantic model in place, however, you will have plenty of fact tables, which possibly share quite a number of dimensions with other fact tables. As a result, your model diverges from a simple star schema. Instead, you have a model containing several and partly intertwined stars. Besides the usual suspects like less comprehensibility and a larger memory footprint, your model involves the danger of making suboptimal modelling decisions: Circular, many-to-many or fact-to-fact relationships are all waiting around you!
With that said, semantic models containing more than one fact table are nothing unusual (i.e. models containing header and detail facts). So technically, having multiple facts is not a problem per se as long as you are on top of things. Yet, the “being on top of things” is nothing you should underestimate. Models with a dozen of facts can cause quite some headaches – even for experienced data modelers (I am not sure, if experienced data modelers would even end up with a dozen fact model, in the first place). Star schemas, however, have a huge advantage: They are simple. As a consequence, they prevent you – that is the data modeler – from doing crazy things to your model. Needless to say, classic star schemas come along quite naturally if you conform to the one-area-information-approach.


5. More naming ambiguity issues for measures and business processes.

Blending multiple information areas into one data model will sooner or later require you to come up with quite some creative names for your measures and dimensions. This is because Power BI does not allow several objects with the same name (rightly so!). Amount probably has different meanings in the order, finance or invoice part of your super model. Another example: Weight in transport vs production. Likewise, there will be dimensions with different meanings per information area. Your options are either to go with plenty of role playing dimensions or you push the logic to the measure side by making use of the dear USERELATIONSHIP function. Whatever you do, it’ll inflate your model!
By the way, this is again an issue you are facing with non super semantic models, too. You are just going to bump into such cases way more often with a super semantic model.


6. The maze.

Even if your Power BI developers are able to comprehend the semantic model holistically (see bullet 2.), it will be impractical and difficult to navigate through the excessive list of tables, columns, measures, calculation groups and KPIs during report creation or exploratory analysis. You might be tempted to restrict the number of objects for certain developers to increase lucidity. Unfortunately, as the eighth bullet explains, users with edit permission will always see all objects in the data model. In consequence, such users will have to deal with a super semantic model involving every single data asset.


7. Cling together, swing together.

When deploying code, there is the jeopardy of negatively impacting parts of the model that are not supposed to be touched in the first place. While there is the option for incremental deployments in Power BI, you still do not want to find yourself in the situation where the entire semantic model gets corrupted just because of a small change in one of the areas. Development errors happen and they always will. The issue is that a super semantic model makes you way more vulnerable to such events as it will always be the entire semantic model and thus all information areas suffering from bugs.
Another potential issue: Imagine, you have made modifications on different information areas of the model during a sprint. You then deploy the model to your test environment where you realise that one of the changes must not be released to production. Personally, I do not (yet) know of any practical way to cherry pick for particular modifications. So, you won’t be able to solely publish the code that would be good to go. Instead, you need to fix the bug first (or revert it) before you can deploy all other changes to production. The result: Users from all business areas are impacted and need to wait for their features to come to prod.
From a more general development point of view, collaboration can be improved with several smaller semantic models as multiple people can work simultaneously on different information areas without interfering with each other. Worth noticing, with the introduction of TMDL, version control and CICD of Power BI semantic models will drastically improve. This means that some of these issues might not even be a problem in the future. Personally, I think TMDL really deserves the current buzz!


8. Losing a precious friend for your data governance.

Most likely, you want to restrict your consumers to only have access to certain information both in terms of object- and row-level-security. When separating your semantic models per information area, quite naturally, the access for certain measures, tables and other objects is segregated. If you have one big semantic model, however, you are forced to use object level security (OLS) by baking it into the semantic model definition itself. This not only results in additional maintenance effort, it even binds data governance responsibility to your semantic model developers. Depending on your IT organization, this might not be appreciated.
Also, OLS in Power BI semantic models comes with some limitations as of today. As the link explains, OLS only applies to viewers. For example, members of the contributor role – essentially everyone who needs to build stuff on top of your semantic model – will always see all objects. More OLS limitations in Power BI can be found here. In addition, concerning row-level-security (RLS), a super semantic model will likely come along with higher complexity, too.


9. Confusing ownership.

Depending on your overall data governance strategy, distributing ownership within the model can be a burden. If you follow the subject specific approach, there is a chance for a one to one match of an information expert and a semantic model. For instance, an expert on your organization’s transport data is likely to be a good choice for the information owner of the transport semantic model. This makes it way easier for business users to know whom they have to approach for questions about transport data. Vice versa, the transport owner probably appreciates having ownership over a certain Power BI object instead of a few intangible areas within a massive semantic model. To be fair, the problem still exists when having subject specific semantic models. For instance, the owner of the product dimension will find him or herself in several semantic models being responsible for just a small part. Yet, from an organizational perspective, these additional semantic model containers (that is per information area) provide your organization with some natural ownership structure and guidance.


10. Increase of refresh time.

Generally, if you refresh the whole semantic model, i.e. via schedule refresh or via the refreshes REST API, the time it takes to process the model will be higher for a large super semantic model compared to several smaller ones. This is mainly because Power BI needs to first load the whole model into memory before a refresh can be started (full process). This can be circumvented with REST API calls triggering refreshes solely for certain parts like specific tables or partitions. With such an approach you could, indeed, exclusively process i.e. the finance part of your super semantic model. Thus, Power BI would also only need to load that part into memory, saving time and resources. The problem is that during the time of the processing you cannot trigger refreshes for other parts, since there can only be one refresh on the semantic model at the same time. This means, you need to implement a wait logic or other dependencies, when trying to keep the data in your model fresh. On the contrary, multiple semantic models can be refreshed simultaneously / in parallel with potentially comprehensive performance benefits.


11. Lack of clarity on data freshness.

Since you likely want to refresh designated parts of your gigantic data model on different schedules (i.e. some more frequently than others), you need to implement a similar approach as mentioned in the previous bullet. However, when certain tables are more up-to-date than others, it becomes non-transparent for business users to know, how fresh their piece of data is. It’ll be way easier to disclose this information of data freshness with a per information area strategy.


12. Bumping quicker into premium capacity restrictions.

As Microsoft’s documentation states, any Power BI Premium capacity sets limits on the maximum memory workload per semantic model. This implies, you might need to purchase a larger and more expensive premium capacity because your super semantic model requires such one. On the contrary, many smaller semantic models could keep you under the threshold. Together with Power BI’s semantic model enviction, there is a good chance that your solution works just fine even with a cheaper capacity.


13. Missing out on dedicating compute and memory for certain parts of the model

With the capability to purchase several Power BI Premium capacities, you are able to customize resource allocation. Say, your order semantic model is operationally crucial and requires maximum SLA. This semantic model must not be disturbed by other workloads, meaning it shall not share the capacity with other semantic models. In that case, you can purchase an additional capacity and just dedicate all compute and memory exclusively on that order semantic model. With a super model, you miss out on this option because a semantic model can only run on one capacity. If you need to increase the SLA, there is only one option: A higher premium tier.


final thoughts.

Don’t get me wrong, there are understandable reasons to implement a – what I call – “super semantic model“. For instance, it reduces the risk of repeating business logic. Also, data duplication (i.e. if a dimension is used in different semantic models) is mitigated with a super semantic model, which can have some positive effect on performance. A super semantic model might even work completely fine on your end – at least for now. When time passes, things could change though. I advocate checking your model against the bullets above, preferably already when designing your Power BI semantic models. If you go for a super semantic model, challenge this decision proactively and regularly. As soon as some of the yellow flags start to turn orange, split your model before you have built a monolith: You do not want to keep growing your super semantic model to an extent, where the model has become too large with too many dependencies. Understanding and unwrapping all logic and references will come with a high cost and effort. You might even be forced to build new subject specific models next to your super semantic model so you can switch slots as soon as you are ready. Depending on the model and the number of reports and apps in production, this migration can require a lot of resources!
Personally, I find Power BI sometimes almost too forgiving: At the beginning, things work pretty smoothly despite some major flaws. And such flaws lurk around literally everywhere: In your data model design, in your infrastructural setup, in your data governance. But after a while, these flaws will make things shaky and it’ll be just a matter of time until your stakeholders approach you with disappointing faces. So if you do not do things right from the start (and that is way easier said than done), you can run into severe challenges later on. Arbitrarily throwing in data from all corners of your data warehouse into a super semantic model, is one of those temptations you should resist right from the beginning. Even if it is well-intended, it’s probably going to backfire sooner or later.

2 Comments

  1. ManuelGP April 3, 2023 at 12:46 pm - Reply

    Great post! I agree probably with everything. The key IMO is as you mention in the end or point 1) Power BIs composite models. But the important challenge is that the relationships across datasets are limited. I hope that this can be changed / improved because it would simplify so many scenarios at once. I put this as feedback to the DQ for PBI Datasets preview feature here: https://community.powerbi.com/t5/Desktop/Share-your-thoughts-on-DirectQuery-for-Power-BI-datasets-and/m-p/3124293/highlight/true#M1057702

  2. AlejandroM April 6, 2023 at 5:23 pm - Reply

    Thanks for this great post. I enjoyed it a lot

Leave A Comment