August 28, 2023|tutorial|

how to replace several characters in a column in just one power query step.

introduction.

Usually you would want to outsource such cleansing tasks to the source during the ETL flow i.e. into or within your lakehouse or data warehouse. But we all know that sometimes you still get dirty data and you are forced to do the cleansing yourself in Power Query or you just do not have such a data infrastructure in place. Either way, this post provides a function where you can scale that cleansing just by adding replacement rules to a cleansing table.

prerequisites.

1. Power BI Desktop

1. What’s the goal?

The goal itself becomes quite clear after illustrating the problem. Say you have the following table:

We would like to clean the values of the table as we want our data to follow the pattern Fruit123. Also, there are a few spelling mistakes in our data. The following picture marks all the issues that we want to clear out:

Power Query provides the Table.ReplaceValue function for such use cases. Unfortunately, you cannot provide the function with several values at the same time; instead you have to add a new step for each of your replacements. That would be one for deleting “-“, another one for replacing “Kiwii” with “Kiwi”, one more handling for “Aple” etc:

The goal of this blog post is to provide a solution that does not require us to add multiple steps. Instead, we will be entering all our cleaning rules into a cleaning table that will be used by a function applying all those rule onto our target table.

2. Create the cleaning table

For the function to work, we need a cleaning table with the attributes oldValue and newValue. For this, add your mapping/replacement data into a new table, by clicking Enter Data under the Home ribbon. The beauty of it? Any time you run into another cleaning case, just add a row to that table and it will automatically be applied on the target table via the function.


3. Create the function

Here, the function. If you’d like an explanation, check out the appendix I. Add a blank query and copy and paste the following code into the advanced editor window:

Copy to Clipboard

4. Showtime

To invoke the function, select it in the query pane, fill in the values and click Invoke. Note, if you do not want to create an entirely new query window and instead incorporate it into your own existing query, please check out the appendix II.

And here the result, the cleaned table:

end.

If you find yourself in the situation where you need to do cleaning on multiple target tables, it might be worthwhile to add an additional column with the name of the target table that each rule shall apply to. You would then need to add that constraint to the function as well though. In the appendix you can read up on what the function actually does, to do that customization yourself! Lastly, I wanna inform you about other Power Query tips and small scripts here on tackytech, as well as how to get the next scheduled refresh into your Power BI report.

appendix I.

There are a couple of parts to the function that need to be understood before we can make sense of it as a whole:

Copy to Clipboard

This line of code calculates how many rows there are in the CleaningTable (aka how many values shall we look for to replace) and saves that value in a variable called maxcount

Copy to Clipboard

This code block takes the value from the first row (counter = 0) of the oldValue column (CTOldValue) in the CleaningTable and looks for that value in the column (OTColumnToBeCleaned) of the target table (OriginalTable). It replaces all of these values with the value from the first row (counter = 0) found in the newValue column of the CleaningTable. Finally, it saves that table in the result variable. In other words, we replace all the values in our target table according to the rules of the first row from the CleaningTable.

Copy to Clipboard

This one is the secret sauce of the whole function. Understanding this, will help you form other iterative functions in Power Query and is a great tool to have in your toolbox. Let’s go through it.

() => [firstthing] – this acts as the base for everything to come. In our case we assign the counter variable the value 0 and we replace the first value in our target table.

each condition – here we test a specific condition for each of our iteraton. In our case, we test whether our counter variable is below our maxcount. Since the counter variable in the first iteration has the value 0, this is always true (unless there is no rows in the cleaning table). Since the condition is true, we proceed to the next part of the function (otherwise we stop the iteration).

each [secondthing] – here we first increase the counter by one and replace all the values in our target table (note, we already have done one cleaning step on that exact table) according to the next value from the cleaningtable. We save the freshly cleaned target table in the variable result.

each [thirdthing] – finally we return the table result

Copy to Clipboard

This line is needed since the List.Generate function returns a table for every iteration where the condition is true. This means, we end up with a list of maxccount – 1 rows where each row contains one version of the cleaned target table. We, however, need just the final version of our cleaned table. Thus, we specifically ask for that row.
And that was all. Just go through the code once more and check whether you grasped the whole thing.

appendix II.

There were quite a few comments and questions about how to incorporate the function as a single / one step in an existing query instead of creating an entirely new one. So here a quick walkthrough on how you can achieve this.
We start with the initial table now called “table 2” and add a dummy index column. This is just for the sake of showing that we did some kind of transformation before invoking the function. In fact, you could have either multiple transformations before running the function or none. Same applies for steps after the function has been invoked.