August 28, 2023|tutorial|

how to replace several characters in a column in just one power query step.

introduction.

Usually you would want to outsource such cleansing tasks to the source during the ETL flow i.e. into or within your lakehouse or data warehouse. But we all know that sometimes you still get dirty data and you are forced to do the cleansing yourself in Power Query or you just do not have such a data infrastructure in place. Either way, this post provides a function where you can scale that cleansing just by adding replacement rules to a cleansing table.

prerequisites.

1. Power BI Desktop

1. What’s the goal?

The goal itself becomes quite clear after illustrating the problem. Say you have the following table:

We would like to clean the values of the table as we want our data to follow the pattern Fruit123. Also, there are a few spelling mistakes in our data. The following picture marks all the issues that we want to clear out:

Power Query provides the Table.ReplaceValue function for such use cases. Unfortunately, you cannot provide the function with several values at the same time; instead you have to add a new step for each of your replacements. That would be one for deleting “-“, another one for replacing “Kiwii” with “Kiwi”, one more handling for “Aple” etc:

The goal of this blog post is to provide a solution that does not require us to add multiple steps. Instead, we will be entering all our cleaning rules into a cleaning table that will be used by a function applying all those rule onto our target table.

2. Create the cleaning table

For the function to work, we need a cleaning table with the attributes oldValue and newValue. For this, add your mapping/replacement data into a new table, by clicking Enter Data under the Home ribbon. The beauty of it? Any time you run into another cleaning case, just add a row to that table and it will automatically be applied on the target table via the function.


3. Create the function

Here, the function. If you’d like an explanation, check out the appendix I. Add a blank query and copy and paste the following code into the advanced editor window:

Copy to Clipboard

4. Showtime

To invoke the function, select it in the query pane, fill in the values and click Invoke. Note, if you do not want to create an entirely new query window and instead incorporate it into your own existing query, please check out the appendix II.

And here the result, the cleaned table:

end.

If you find yourself in the situation where you need to do cleaning on multiple target tables, it might be worthwhile to add an additional column with the name of the target table that each rule shall apply to. You would then need to add that constraint to the function as well though. In the appendix you can read up on what the function actually does, to do that customization yourself! Lastly, I wanna inform you about other Power Query tips and small scripts here on tackytech, as well as how to get the next scheduled refresh into your Power BI report.

appendix I.

There are a couple of parts to the function that need to be understood before we can make sense of it as a whole:

Copy to Clipboard

This line of code calculates how many rows there are in the CleaningTable (aka how many values shall we look for to replace) and saves that value in a variable called maxcount

Copy to Clipboard

This code block takes the value from the first row (counter = 0) of the oldValue column (CTOldValue) in the CleaningTable and looks for that value in the column (OTColumnToBeCleaned) of the target table (OriginalTable). It replaces all of these values with the value from the first row (counter = 0) found in the newValue column of the CleaningTable. Finally, it saves that table in the result variable. In other words, we replace all the values in our target table according to the rules of the first row from the CleaningTable.

Copy to Clipboard

This one is the secret sauce of the whole function. Understanding this, will help you form other iterative functions in Power Query and is a great tool to have in your toolbox. Let’s go through it.

() => [firstthing] – this acts as the base for everything to come. In our case we assign the counter variable the value 0 and we replace the first value in our target table.

each condition – here we test a specific condition for each of our iteraton. In our case, we test whether our counter variable is below our maxcount. Since the counter variable in the first iteration has the value 0, this is always true (unless there is no rows in the cleaning table). Since the condition is true, we proceed to the next part of the function (otherwise we stop the iteration).

each [secondthing] – here we first increase the counter by one and replace all the values in our target table (note, we already have done one cleaning step on that exact table) according to the next value from the cleaningtable. We save the freshly cleaned target table in the variable result.

each [thirdthing] – finally we return the table result

Copy to Clipboard

This line is needed since the List.Generate function returns a table for every iteration where the condition is true. This means, we end up with a list of maxccount – 1 rows where each row contains one version of the cleaned target table. We, however, need just the final version of our cleaned table. Thus, we specifically ask for that row.
And that was all. Just go through the code once more and check whether you grasped the whole thing.

appendix II.

There were quite a few comments and questions about how to incorporate the function as a single / one step in an existing query instead of creating an entirely new one. So here a quick walkthrough on how you can achieve this.
We start with the initial table now called “table 2” and add a dummy index column. This is just for the sake of showing that we did some kind of transformation before invoking the function. In fact, you could have either multiple transformations before running the function or none. Same applies for steps after the function has been invoked.

Now, all we need to do is to open the advanced editor once again and paste the invoking function code line:

Copy to Clipboard

Note, in the example above, the name of our function is CleaningFunction – you might need to align this. Also, make sure that the first parameter of the function (OriginalTable) equals the name of your previous step of your query (in our case #”Added Index”). Further, align the other parameters accordingly in case you are using other names for them. Lastly, you need to also update the very last row under the in with the name of the added step (in our case #”Invoke Function”). That’s it!

Now the function has cleaned the original table directly and you can continue with other transformations.

13 Comments

  1. Petra October 6, 2023 at 2:17 pm - Reply

    it looks like the query I am looking for – could you attach screenshot of the end of the query, please? I am a beginner and I am not able to end it :( thank you very much

    • tackytechtom October 6, 2023 at 4:15 pm - Reply

      Hi Petra :)

      Thank you for your message. I realised the script has disappeared from the post somehow. I republished it and now it is back.

      This behaviour has been popping up now and then and my host and myself are currently investigating why. In case, anyone else has the same problem, please just shoot me a message and I’ll send you the script.

      Thanks,

      Tom

  2. stephen strickland October 24, 2023 at 2:57 pm - Reply

    Is there a way to use this as a step in a query?

    • tackytechtom October 26, 2023 at 8:37 pm - Reply

      Hi Stephen,
      I kindly refer to appendix II :)
      /Tom

  3. Kar October 26, 2023 at 7:08 pm - Reply

    Hi Tom,
    Invoking the query will create a new table in Power Query editor. Is it possible to call for the query and clean up the table in itself?

    • tackytechtom October 26, 2023 at 8:38 pm - Reply

      Hi Kar,
      Yes, that’s doable! :) Check out appendix II.
      Hope this helps!
      /Tom

  4. James October 26, 2023 at 7:13 pm - Reply

    This is great but I am a little confused. When I set this up it creates a whole new query with that data in it, how can I either alter a column in place on a table or add a column with the updated values to a table?

    • tackytechtom October 26, 2023 at 8:39 pm - Reply

      Hi James,
      Thanks for your question. I updated the post with an appendix II that hopefully does what you are looking for :)
      Thanks
      /Tom

  5. Alex December 6, 2023 at 2:10 pm - Reply

    Great content! Greets, Alex

    • tackytechtom December 6, 2023 at 2:14 pm - Reply

      Cheers Alex! :)

      • Alex December 7, 2023 at 2:36 pm - Reply

        While applying your solution to some script / Dataflow one problem appeared..

        when I try to replace a string with a comma within – eg oldValue (EUR,) ; new Value (EUR 0,00) – the replace isn’t applied.
        whithout the comma everything is just fine. While analyzing the steps I saw that Table.ToList starts to change the value from EUR, to “EUR,”
        I tried to adjust the function the folowing:
        Text.Remove(Table.ToList(Table.SelectColumns(CleaningTable, CTOldValue)){counter}, {“”””}),
        while looking fine in the single step output (the ” characters are removed). still the replace isn’t applied.

        a normal Table.ReplaceValue step can handle the string with comma just fine..

        I am running out of ideas how to solve this. maybe you have a solution.

        KR Alex

  6. Alex December 7, 2023 at 3:16 pm - Reply

    Sorry, just a typing error on my side.
    the adjustment (wrap the line in a Text.Remove) works!

    • tackytechtom December 7, 2023 at 8:07 pm - Reply

      Hi Alex! Great you could solve the issue :) And thanks for sharing!
      /Tom

Leave A Comment