avatarNikola Ilic

Summary

Query folding in Power BI is a critical optimization technique that leverages the underlying data source's capabilities to efficiently handle complex data transformations, which is essential for improved performance and effective data shaping.

Abstract

Query folding is an essential concept in Power BI's Power Query tool, allowing for data transformations to be executed at the data source level, thereby enhancing performance and reducing load times. By "folding" multiple M language transformation steps into a single SQL query or equivalent, the Mashup engine can significantly streamline the data preparation process. This is particularly useful when working with large datasets in relational databases and other supported data sources such as OData, SSAS, SharePoint lists, and Exchange. While not all Power Query transformations can be folded, common ones like removing or renaming columns, filtering, grouping, and summarizing can. The ability of query folding to push transformations to the data source depends on the data source's querying language support and the specific transformations applied. The article emphasizes the importance of understanding data shaping and query folding to optimize Power BI data models and encourages readers to anticipate common pitfalls, such as treating similar data entities as separate ones, which can be mitigated during the data shaping phase. The upcoming parts of the series are expected to provide deeper insights into query folding's benefits and how to determine if a query will fold, highlighting the nuanced aspects that affect foldability.

Opinions

  • The author suggests that understanding query folding is crucial for anyone dealing with data shaping and business intelligence solutions in Power BI.
  • Query folding is praised for its ability to execute complex data transformations close to the source, which can lead to better performance by leveraging the efficient computation capabilities of the source's engine.
  • Data shaping is highlighted as a key process in data preparation that should ideally occur before data enters the data model, emphasizing the importance of clean, high-quality data.
  • The author indicates that data shaping within Power Query offers significant advantages for users with minimal coding skills by automating transformation processes upon dataset refresh.
  • The author notes that query folding can only be leveraged with specific data sources, with relational databases and some non-relational databases being mentioned as those that can benefit from this feature.
  • Some Power Query transformations are acknowledged as non-foldable, which can impact the overall data processing efficiency when used.
  • The importance of consistency in naming conventions for entities like cities (e.g., New York vs. NYC) is mentioned as a critical aspect of the data shaping process.
  • The author expresses that the true value of Power Query's Mashup engine lies in its ability to push transformations to the source through query folding, effectively reducing the volume of data being moved and transformed at the destination.
  • The article sets expectations for follow-up discussions that will explore why query folding matters and provide guidance on identifying foldable queries, with a nod to the idea that the details often contain the critical insights.

What is a Query folding in Power BI and why should I care?

“Will that break a query folding?” “Does your query fold?”… Maybe someone asked you those questions, but you were like: “Query…Whaaaat?!” In this series of articles, let’s learn one of the most important concepts in Power Query!

https://www.pexels.com/photo/green-grass-field-with-wind-turbines-under-blue-sky-6604984/

“Will that break a query folding?” “Does your query fold?”… Maybe someone asked you those questions, but you were like: “Query…Whaaaat?!”

Or, maybe you’ve heard about query folding in Power BI, but didn’t know how to take advantage of it in real-life scenarios.

If you recognized yourself in (at least) one of the two situations specified above, then please continue reading this short series of blog posts!

Fine, you are curious to find out what is a Query folding. But, first things first…Before you get into it, we’ll need to put some theoretical foundations which will place the Query folding feature in the proper context.

Data Shaping

I’ve already written about data shaping and why is it one of the key concepts in the data preparation phase. Now, I would like to expand on that in a (maybe) unusual way:

I guess you all know about the book written by Thomas More, called “Utopia”.

In that story, everything is perfect and everyone is satisfied. In an ideal world, let’s call it “Data Utopia”, we have clean, high-quality data, that just flies into our reports “as-it-is”, without needing to perform any kind of face-lifting or transformations along the way. Unfortunately, “Data Utopia” can exist only in books — the reality is crueler — as we have to deal with a whole bunch of challenges while nurturing our data.

That being said, one of the key concepts that we have to absorb is Data Shaping. Data shaping is the process you should perform once you get familiar with your data, and become aware of possible pitfalls within the data you are planning to use in your business intelligence solution.

I’ve intentionally used the term “Business Intelligence” instead of “Power BI”, as this is a general concept that should be used outside of Power BI solutions too.

In most simple words, data shaping is the process of data consolidation, BEFORE it becomes part of your data model. The key thing to keep in mind is the word: BEFORE! So, one would perform data shaping before the data goes into the report itself. Data shaping can be done at different places, and, depending on where you apply data shaping techniques, at different points in time during the data preparation process.

WHERE should you perform data shaping?

Source Database — This is the most obvious choice and in most cases the most desirable scenario. It is based on traditional data warehousing principles of Extracting-Transforming-Loading (ETL) data. In this scenario, you define what data you want to extract (not all data from the database is needed, and it’s usually not a good idea to import all the data). Then, you decide if your data needs to be transformed along the way, to better suit your reporting needs — for example, do you want to perform currency conversion, or do you need to conform country and city names.

Do you recognize the city in the following image?

https://www.pexels.com/photo/skyline-photo-of-empire-state-building-in-new-york-city-2190283/

Yes, it’s New York. Or, is it NYC? Or, is it New York City? Which one of these three names is correct? Yes, all of them are correct — but if you import the data in your data model like this, you will get incorrect results — as each New York, NYC, and New York City will be treated as a separate entity. This, and many more potential caveats, need to be solved during the Data Shaping phase, and that’s why important to spend some time massaging your data.

Power Query

If you don’t perform data transformations on the source side, the next station is Power Query — it’s the built-in tool within Power BI, that enables you to perform all kinds of transformations to your data. According to Microsoft’s official documentation, you can apply more than 300 different transformations, and that number is constantly increasing!

The key advantage of Power Query is that you can perform complex data transformations with little or no coding skills! Additionally, all steps you’ve applied during the data transformation process are being saved, so every time you refresh your dataset, those steps will be automatically applied to shape your data and prepare it for consuming via reports.

Under the hood of Power Query is a Mashup engine, that enables your data shaping to run smoothly. Power Query uses very powerful M language for data manipulation. And, now you are probably asking yourselves, what all this story about data shaping, Power Query, Mashup engine, M language, etc. has to do with Query folding. I don’t blame you, it’s a fair question, but we will come back soon to answer it.

What is a Query folding?

For some data sources, such as relational databases, but also non-relational data sources, for example, OData, AD, or Exchange, Mashup engine is able to “translate” M language to a language that the underlying data source will “understand” — in most cases it’s SQL.

https://www.pexels.com/photo/black-cables-1054397/

By pushing complex calculations and transformations directly to a source, Power Query leverages the capabilities of the robust relational database engines, that are built to cope with large volumes of data in the most efficient way.

That ability of Power Query’s Mashup engine to create a single SQL statement combining all M statements behind your transformations is what we call Query folding.

Or, let`s make it simple: if the Mashup engine is able to generate a single SQL query that is going to be executed on the data source side, we say that the query folds.

Data sources that support Query folding

As already mentioned, the most obvious beneficiary of query folding is relational database sources, such as SQL Server, Oracle, or MySQL. However, it`s not just that SQL databases take advantage of the query folding concept. Essentially, any data source that supports some kind of querying language, can possibly take advantage of query folding. Those other data sources are OData, SSAS, Sharepoint lists, Exchange and AD.

On the other hand, when you use data sources such as Excel files, BLOB storage files, flat files, etc. in your Power BI datasets, the query can’t fold.

Data Transformations that support Query folding

However, when it comes to data sources that support query folding in general, it’s important to keep in mind that not all transformations can be folded and pushed to a data source. So, just to be clear, a fact that SQL database supports query folding, doesn’t necessarily mean that your query will fold! There are some Power Query transformations that simply can`t be pushed to a SQL database engine.

Very often, some subtle differences in the Power Query transformations can be decisive in the final outcome, and if your query will fold or not. I’ll show you a few of those subtle differences in the next parts of this series.

Generally speaking, the following transformations, when applied in Power Query, can be “translated” to a single SQL statement:

  • Removing columns
  • Renaming columns
  • Filtering rows, with static values or Power Query parameters, as they are treated as WHERE clause predicates in SQL
  • Grouping and summarizing, which have equivalent in SQL’S Group by clause
  • Merging of foldable queries based on the same source — as this operation can be translated to JOIN in SQL. When I said, merging of foldable queries — that means it will work if you are joining two SQL server tables, but it will not work if you are trying to join SQL table and Excel file
  • Appending foldable queries based on the same source — this transformation relates to the UNION ALL operator in SQL
  • Adding custom columns with simple logic. What does simple logic mean? Using M functions that have equivalents in SQL language, for example, mathematical functions, or text manipulation functions
  • Pivot and Unpivot transformations
https://www.pexels.com/photo/wood-red-industry-writing-3018978/

On the other hand, some transformations that will prevent the query from folding are:

  • Merging queries based on different sources, as explained previously
  • Appending (union-ing) queries based on different sources — similar logic as in the previous case
  • Adding custom columns with complex logic or using some M functions that don’t have a counterpart in SQL
  • Adding index columns
  • Changing a column data type. This one is a typical “it-depends” case. I will show you in the next part of the series what it depends on, but just keep in mind that changing a column data type can be both foldable and non-foldable transformation

Conclusion

By learning about Data Shaping and Power Query, we laid a solid theoretical background to understand the Query folding feature in Power BI, or maybe it’s better to say in Power Query Editor.

In the next part of the series, we will examine why Query folding is so important, and why should we care about the fact if the query folds or not. We will also learn how to check if the query folds or not, and why people say that the devil is in the detail…

Thanks for reading!

Become a member and read every story on Medium!

Subscribe here to get more insightful data articles!

Power Bi
Towards Data Science
Data Science
Data
Power Query
Recommended from ReadMedium