Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop

Incremental-Refresh-in-Power-BI-Part-1-Implementation-in-Power-BI-Desktop

Incremental refresh, or IR, refers to loading the data incrementally, which has been around in the world of ETL for data warehousing for a long time. Let us discuss incremental refresh (or incremental data loading) in a simple language to better understand how it works.

From a data movement standpoint, there are always two options when we transfer data from location A to location B:

  1. Truncation and load: We transfer the data as a whole from location A to location B. If location B has some data already, we entirely truncate the location B and reload the whole data from location A to B
  2. Incremental load: We transfer the data as a whole from location A to location B just once for the first time. The next time, we only load the data changes from A to B. In this approach, we never truncate B. Instead, we only transfer the data that exists in A but not in B

When we refresh the data in Power BI, we use the first approach, truncation and load, if we have not configured an incremental refresh. In Power BI, the first approach only applies to tables with Import or Dual storage modes. Previously, the Incremental load was available only in the tables with either Import or Dual storage modes. But the new announcement from Microsoft about Hybrid Tables greatly affects how Incremental load works. With the Hybrid Tables, the Incremental load is available on a portion of the table when a specific partition is in Direct Query mode, while the rest of the partitions are in Import storage mode.

Incremental refresh used to be available only on Premium capacities, but from Feb 2020 onwards, it is also available in Power BI Pro with some limitations. However, the Hybrid Tables are currently available on Power BI Premium Capacity and Premium Per User (PPU), not Pro. Let’s hope that Microsft will change its licensing plan for the Hybrid Tables in the future and make it available in Pro.

I will write about Hybrid Tables in a future blog post.

When we successfully configure the incremental refresh policies in Power BI, we always have two ranges of data; the historical range and the incremental range. The historical range includes all data processed in the past, and the incremental range is the current range of data to process. Incremental refresh in Power BI always looks for data changes in the incremental range, not the historical range. Therefore, the incremental refresh will not notice any changes in the historical data. When we talk about the data changes, we are referring to new rows inserted, updated or deleted, however, the incremental refresh detects updated rows as deleting the rows and inserting new rows of data.

Benefits of Incremental Refresh

Configuring incremental refresh is beneficial for large tables with hundreds of millions of rows. The following are some benefits of configuring incremental refresh in Power BI:

  • The data refreshes much faster than when we truncate and load the data as the incremental refresh only refreshes the incremental range
  • The data refresh process is less resource-intensive than refreshing the entire data all the time
  • The data refresh is less expensive and more maintainable than the non-incremental refreshes over large tables
  • The incremental refresh is inevitable when dealing with massive datasets with billions of rows that do not fit into our data model in Power BI Desktop. Remember, Power BI uses in-memory data processing engine; therefore, it is improbable that our local machine can handle importing billions of rows of data into the memory

Now that we understand the basic concepts of the incremental refresh, let us see how it works in Power BI.

Implementing Incremental Refresh Policies with Power BI Desktop

We currently can configure incremental refresh in the Power BI Desktop and in Dataflows contained in a Premium Workspace. This blog post looks at the incremental refresh implementation within the Power BI Desktop.

After successfully implementing the incremental refresh policies with the desktop, we publish the model to Power BI Service. The first data refresh takes longer as we transfer all data from the data source(s) to Power BI Service for the first time. After the first load, all future data refreshes will be incremental.

How to Implement Incremental Refresh

Implementing incremental refresh in Power BI is simple. There are two generic parts of the implementation:

  1. Preparing some prerequisites in Power Query and defining incremental policies in the data model
  2. Publishing the model to Power BI Service and refreshing the dataset

Let’s briefly get to some more details to quickly understand how the implementation works.

  • Preparing Prerequisites in Power Query
    • We require to define two parameters with DateTime data type in Power Query Editor. The names for the two parameters are RangeStart and RangeEnd, which are reserved for defining incremental refresh policies. As you know, Power Query is case-sensitive, so the names of the parameters must be RangeStart and RangeEnd.
    • The next step is to filter the table by a DateTime column using the RangeStart and RangeEnd parameters when the value of the DateTime column is between RangeStart and RangeEnd.

Notes

  • The data type of the parameters must be DateTime
  • The datat tpe of the column we use for incremental refresh must be Int64 (integer) Date or DateTime.Therefore, for scenarios that our table has a smart date key instead of Date or DateTime, we have to convert the RangeStart and RangeEnd parameters to Int64
  • When we filter a table using the RangeStart and RangeEnd parameters, Power BI uses the filter on the DateTime column for creating partitions on the table. So it is important to pay attention to the DateTime ranges when filtering the values so that only one filter condition must have an “equal to” on RangeStart or RangeEnd, not both
Continue reading “Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop”

Good Practices: Always Display the Reporting Environment

Good Practice: Always Display the Reporting Environment

When you work on real-world projects in power BI, you would probably have different environments Like DEV, UAT, Pre-Prod and Prod. It is important for you and your audience to know what the data is coming from. Am I looking at Dev or UAT data or I am actually looking at real data in Production environment. You may have asked or been asked with a question like “Where the data is coming from?”. It is important to know how trustworthy the data you’re analysing is. In this post I show you an easy way to show the environment your Power BI report is connected to.

How It Works

To display the environment name you use query parameters, then you reference that parameter, turn it to a table and add columns to show the environments accordingly. Easy right?

Read more about query parameters from a list output here.

Depending on your scenario the implementation might be slightly different, but the principals are the same. In this post I use a SQL server database. Therefore I need to Parameterise server name. in real world you may also need to parameterise the database name. Again, if your case is quite different, like if you get data from Excel, then the Excel path can be different for different environments. Let’s dig-in.

  • Open Query Editor
  • Click “Manage Parameters”
  • Click “New”
    • Enter “Name” and “description”
    • Tick “Required”
    • Select “Text” in “Type”
    • Select “List of values” in “Suggested Values” and type in server names for different environments
    • pick a “Default Value” and “Current Value”
    • Click OK
Creating new parameter in Power BI Query Editor

So far you created a new parameter that can be used to get data from a SQL Server data source.The next steps show you how to use that parameter to show the environments in your reports.

Continue reading “Good Practices: Always Display the Reporting Environment”

Webinar Materials: Power BI Under the Hood

Pass DW BI VC Power BI Under the Hood with Soheil BakhshiI’d like to thank you all for attending the webinar held on 30th September 2016. I talked about some amazing under cover aspects of Power BI Desktop model. In this session you learnt:

If you’ve missed the webinar you can watch it online here:

Download the Power Point presentation file here.:

[gview file=”https://www.biinsight.com/wp-content/uploads/2016/10/Pass-DW-BI^LJ-Power-BI-Under-The-Hood-with-Soheil-Bakhshi-Presentation-2.pptx”]

Here is the PDF version of presentation:

[gview file=”https://www.biinsight.com/wp-content/uploads/2016/10/Pass-DW-BI^LJ-Power-BI-Under-The-Hood-with-Soheil-Bakhshi-Presentation.pdf”]

Power BI Desktop Query Parameters, Part 3, List Output

List Output in Power BI Query Parameters

In the previous posts, here and here, I explained how you can use Power BI Desktop Query Parameters for many different use cases. Power BI development team added another cool feature to Power BI Desktop on July 2016 which is the ability to add a List Query output to a query parameter as it’s “Suggested Values” (formerly “Allowed Values”). This feature is very useful and from now on we are not restricted to proviode a static list of values in “Manage Parameters”. In this post I show you how to use a list output in query parameters.

Note: This feature is NOT available in DirectQuery mode at the time of writing this post.

Requirements

In this post as usual I’ll connect to a SQL Server database as a sample. To be able to follow this post you have to have:

  1. The latest version of Power BI Desktop (current version is 2.38.4491.282 64-bit (August 2016))
  2. AdventureWorksDW

Scenario

In the first post of these series I explained how to create dynamic data sources using Query Parameters. You also learnt how to use Query Parameters in Filter Rows. But, what if we want to filter query results based on the values of a column from a particular table? Previously we couldn’t answer these sort of questions if we want to filter FactInternetSales based on a selected values of EnglishProductName column from DimProductCategories using Query Parameters. But, now we can easily implement those sort of scenarios.

Let’s implement this scenario.

Loading Data into the Model:

  • Open Power BI Desktop
  • Get data from SQL Server and connect to Adventure Works DW 2016 CTP3
  • Select “FactInternetSales”, “DimProduct”, “DimProductSubCategory” and “DimProductCategory” tables then click “Load”

Power BI Desktop Loading Data into the Model

  • Switch to “Relationships” view to make sure the relationships detected correctly then click “Edit Queries” from the ribbon

Power BI Desktop Edit Queries

Continue reading “Power BI Desktop Query Parameters, Part 3, List Output”