October 20, 2024|tutorial|

how to reframe your direct lake semantic model and restore in-memory columns with fabric data pipelines.

introduction.

Currently, when proactively refreshing (or rather reframing) a direct lake semantic model, the data held in memory is entirely evicted. This means, data needs to be paged back into memory again which usually is done by interacting with the model, often through activity in downstream reports. After such reframing, depending on various factors (data volume, capacity etc.), report users might experience some form of delay.
In this blog article, we explore an automated way to reload the same columns into memory that were present before the semantic model was reframed. In the approach below, we’ll first identify what is currently stored in memory, then refresh the model, and finally reload the exact same columns back into the cache. This entire process will be executed using Fabric Data Pipelines exclusively.
In fact, the approach is very similar to what has already been implemented in the python module _warm_cache from the semantic-link-labs repository. To be fair, I got quite inspired by the code, so Kudos to Michael and whoever has been involved. In general, I strongly recommend checking out semantic-link-labs. There is a lot of cool stuff in there and it’s pretty easy to use, too. Thus, you could also just use the function warm_direct_lake_cache_isresident() from that module in a dedicated python notebook and directly trigger it from a Fabric Data Pipeline. However, if you feel more comfortable with sticking entirely to data pipelines, or if this simply is a better fit for your overall architecture, go ahead with the way describe below. Also to some extent, this post is a continuation of the one called how to pre warm your semantic model from fabric data pipelines.
Finally, I want to recommend checking out a really good video of Patrick from Guy in a Cube about reframing, e.g. why and when we even wanna be doing this (I feel like saying both really good and Guy in a Cube in the same sentence is somewhat redundant; all their stuff is really good).

prerequisites.

1. A Fabric capacity and workspace
2. A semantic model in Fabric
3. A Web v2 connection to the Power BI API endpoint. You can read up here on how to set this up. In this article, we use the oAuth2 method and our connection is called pbi_connection
4. A Power BI Semantic Model connection. You can read up here on how to set this up. This will be used for the in-built Semantic Model Refresh Activity. In this article, this connection is called PowerBIDataset Admin

plan of action.

1. What’s the goal?
2. The pipeline
3. Showtime

1. What’s the goal?

The goal is to put back the same columns that were held in memory before the cache had been cleared as part of reframing a semantic model. Below, we can see the columns that are paged back into our semantic model after we reframed it. The column IS_RESIDENT states whether a column is in memory. The DAX code for the view below can be found in the appendix, although you could also just query the INFO.STORAGETABLECOLUMNSEGMENTS() DAX function for a similar overview.

2. The pipeline

We start off with the first activity GetResidentColumnsAndTables where we query a semantic model with the DAX Info functions to find out which columns are currently held in memory.

Below the inputs. Please note, make sure that the quotes are correct as " and not as ". When copying from here it can be that the quotes are coming along in the wrong format. If the wrong ones come in, you might get a bad request error in your Fabric pipeline, like .

Connection: pbi_connection
Relative URL: groups/[WorkspaceID]/datasets/[SemanticModelID]/executeQueries (you can find the WorkspaceID and SemanticModelID in the URL bar of your browser by clicking on the semantic model in Fabric, similar to here)
Body:
Headers: content-type as Name and application/json as Value

Below the DAX query that shall be pasted in PasteDAXQueryFromBelowinHere.

Copy to Clipboard

DEFINE 
	-- build _storageTableColumnSegments to find residential columns
    VAR _storageTableColumnSegments = 
    FILTER(
        SELECTCOLUMNS(
            INFO.STORAGETABLECOLUMNSEGMENTS(),
            \"COLUMN_NAME\", [COLUMN_ID],
            \"ID\", 
                LEFT(
                    TRIM(
                        RIGHT(
                            SUBSTITUTE(
                                [COLUMN_ID],
                                \"(\",
                                REPT(\" \", LEN([COLUMN_ID]))
                            ),
                            LEN([COLUMN_ID])
                        )
                    ),
                    LEN(
                        TRIM(
                            RIGHT(
                                SUBSTITUTE(
                                    [COLUMN_ID],
                                    \"(\",
                                    REPT(\" \", LEN([COLUMN_ID]))
                                ),
                                LEN([COLUMN_ID])
                            )
                        )
                    ) - 1
                ),
            \"ISRESIDENT\", [ISRESIDENT]
        ),
        RIGHT([COLUMN_NAME], 1) = \")\" && [ISRESIDENT] = TRUE && LEFT([COLUMN_NAME], 10) <> \"RowNumber \"
    )

-- build _viewColumns for friendly names	
    VAR _viewColumns =
    SELECTCOLUMNS(
        INFO.VIEW.COLUMNS(),
        \"ID\", FORMAT([ID], \"0\"),
        \"COLUMN\", [Name],
        \"TABLE\", [Table]    
	)

-- join the two columns
    VAR _joinedTable = 
    NATURALINNERJOIN(
        _storageTableColumnSegments,
        _viewColumns
    )
	
EVALUATE
SUMMARIZE(
    _joinedTable,
    [TABLE], 
    \"COLUMN_LIST\", CONCATENATEX(
        FILTER(
            _joinedTable, 
            [TABLE] = EARLIER([Table])
        ),
        \"'\" & [TABLE] & \"'[\" & [COLUMN] & \"]\", 
        \", \", 
        \"'\" & [TABLE] & \"'[\" & [COLUMN] & \"]\", ASC
    )
)

You can also run the above DAX query directly in Fabric to check out what the code exactly does. For this, find your semantic model in your workspace, press the ellipsis and click Write DAX queries.

As you can see in the picture above, the DAX query returns a comma separated list of columns that are residential (i.e. paged into memory) grouped by table. We will be using the comma separated list later to run a simple DAX query, effectively pushing those columns back into memory.
Next, we reframe the semantic model by using the new Semantic model refresh activity. For this, just choose the workspace and model from the dropdown. For a little bit more customization capabilities, check out my previous blog article on how to reframe/refresh your semantic model from Fabric Data Pipelines.

And finally, we add a ForEach activity looping over each table with their respective comma separated list of columns. We wrap this list into an EVALUATE TOPN(1,SUMMARIZECOLUMNS(')) DAX statement, effectively querying each specified column. Once again, this approach is identical to the logic found in the warm_cache module from semantic-link-labs.

Items: @activity(‘GetResidentColumnsAndTables’).output.results[0].tables[0].rows

Below the Web activity in the for each loop calling the DAX statement for each table:

Below the inputs. Please note: Make sure the quotes are correctly set as " and not as ". When copying from here, it can be that the quotes come along in the wrong format.

Connection: pbi_connection
Relative URL: groups/[WorkspaceID]/datasets/[SemanticModelID]/executeQueries (you can find the WorkspaceID and SemanticModelID in the URL bar of your browser by clicking on the semantic model in Fabric, similar to here)
Body: @CONCAT('')
Headers: content-type as Name and application/json as Value

3. Showtime

Let’s start with checking out what columns currently are paged into memory for our semantic model. You can find the query for the below view in the appendix.

Now, lets run the pipeline:

Right at this point, during the refreshing/reframing of the semantic model, our DAX query from above returns nothing, as expected:

However, when the pipeline has successfully completed…

…we can see that all the columns from before are back into memory again.

end.

This is it! Note, this approach does not take the temperature into consideration. As you can see above, the temperature of the columns were higher after we ran our queries. Also, did you notice, that we used the new DAX Info function INFO.COLUMNS.VIEW()?

appendix.

Here the script we used above that can be useful for debugging purposes:

Copy to Clipboard

DEFINE
	-- build _storageTableColumnSegments to find residential columns
    VAR _storageTableColumnSegments = 
    FILTER(
        SELECTCOLUMNS(
            INFO.STORAGETABLECOLUMNSEGMENTS(),
            "Column Name", [COLUMN_ID],
            "ID", 
                LEFT(
                    TRIM(
                        RIGHT(
                            SUBSTITUTE(
                                [COLUMN_ID],
                                "(",
                                REPT(" ", LEN([COLUMN_ID]))
                            ),
                            LEN([COLUMN_ID])
                        )
                    ),
                    LEN(
                        TRIM(
                            RIGHT(
                                SUBSTITUTE(
                                    [COLUMN_ID],
                                    "(",
                                    REPT(" ", LEN([COLUMN_ID]))
                                ),
                                LEN([COLUMN_ID])
                            )
                        )
                    ) - 1
                ),
            "ISRESIDENT", [ISRESIDENT],
            "TEMPERATURE", [TEMPERATURE]
        ),
        RIGHT([Column Name], 1) = ")" && [ISRESIDENT] = TRUE && LEFT([Column Name], 10) <> "RowNumber "
    )

-- build _viewColumns for friendly names	
    VAR _viewColumns =
    SELECTCOLUMNS(
        INFO.VIEW.COLUMNS(),
        "ID", FORMAT([ID], "0"),
        "Name", [Name],
        "Table", [Table]    
	)

-- join the two columns
    VAR _joinedTable = 
    NATURALINNERJOIN(
        _storageTableColumnSegments,
        _viewColumns
    )

EVALUATE
	SELECTCOLUMNS(
		_joinedTable,
		"COLUMN_ID", [ID],
		"TABLE_NAME", [Table],
		"COLUMN_NAME", [Name],
		"COLUMN_NAME_STOR_TAB_COL_SEG", [Column Name],
		"FULL_COLUMN_NAME", "'" & [Table] & "'[" & [Name] & "]",
		"IS_RESIDENT", [ISRESIDENT],
		"TEMPERATURE", [TEMPERATURE]
	) ORDER BY [TABLE_NAME]

tackytechtomOctober 20, 2024tutorialdata engineering, dax, fabric, power bi 0