Power BI is a business analytics solution that lets you visualize your data and share insights across your organization or embed them in your app or website. Connect to a wide variety of data sources and bring your data to life with live dashboards and reports.
Power BI connects to a large number of varied data sources, including:
Power BI works in three modes to connect to on-premise or cloud-based sources.
For small setups with few data models, data imports into Power BI will solve reporting requirements. However, for enterprises with TBs of historical data and GB's daily incremental data, the import will lead to data governance issues like:
With Import and DirectQuery modes, we have observed limitations in terms of import data size and performance dependency on source systems.
In Live Mode, Power BI doesn't make a copy of data in cache nor does it make a copy of schema / allow data modeling in Power BI design. Instead, it consumes source as is, i.e., Power BI is just the front end for your reporting.
Live Mode overcomes all scalability and performance issues by connecting to a Tabular cube defined on Azure Analysis Services OR On-Premise SQL Server Analysis Services.
Analysis Services Tabular cube comes with the below advantages.
Cube size Focus on Data model design to minimize volume and size. As we learned Azure AS /SSAS vertipaq engine stores data in columnar format, it is advised to minimize the column dictionary by reducing the unique values in aggregated tables. If we include a fact table (10+mln records) with a surrogate key column, it creates a columnar dictionary of 10+mln records. However, the surrogate key is not actually used in PowerBI reporting.
Remove all ETL columns For ETL maintenance, certain date and datetime columns, Surrogate keys often used in DWH development, which are not necessary for reporting. Hence, stop using the same in the data model and avoid high volume data dictionary.
Are all columns relevant take conscious decisions to include only necessary columns for reporting. Pull only selected columns that are the most relevant and are needed for the reporting. Instead of pulling the entire table and hiding which are not required for reporting, import only required columns and avoid wide tables.
Partition your facts Azure AS/SSAS allows you to partition fact tables, improving cube performance. In most cases, the current period (day/week/month/quarter) only gets updates, so it is advised to create the right partition and minimize overall cube processing time by process relevant partitions. This will certainly allow multiple processing windows and the availability of the latest data in the reporting layer.
While Designing Azure AS/SSAS cubes, use business-friendly names for tables/columns/calculations because the same will be visible when PowerBI reports are designed.
Hide all columns in the tables that are not intended for reporting like the Key columns, Base Measure columns and any other metadata-related columns.
Use a standard Date Dimension defined in the Data Warehouse or DataMart as the date dimension source.
Mark the Date Dimension as such to Enable Time Intelligence in the Tabular cube.
Calculations By limiting complex calculations to Azure AS/SSAS cube, it encapsulates business logic and ensures a single version across all users. However, more calculations in the cube occupy more memory and eventually slows down performance. To mitigate this, plan to add calculations in the Import query OR include it in the dimensional model population.
Optimize expressions Use tools like DAX studio to analyze execution plan and cost analysis of Data Analysis Expressions used in calculations.
Use inexpensive functions - ADDCOLUMNS instead of SUMMARIZE or COUNTROWS instead of DISTINCTCOUNT.
Try to implement Vertipaq complaint code rewrite nested functions, so that it is resolved as a single function.
Filter unused rows - make sure to remove empty rows or the rows that are not required before an expensive computation to minimize the overall dataset and improve the performance.
For a memory-intensive application like Azure AS/SSAS Tabular, it is important to do a quality due diligence considering various factors, few mentioned below
CPU Speed: Faster CPU speed helps in computing results faster.
CPU Cores: Address concurrency issues by increasing CPU cores while striking a balance between CPU speed and cores.
CPU Cache: as retrieving data from the CPU cache is much faster than retrieving data from RAM, plan it more.
RAM size: Make sure RAM size is more than two times the cube size.
RAM Speed: For a memory-intensive application like Azure AS/SSAS Tabular, faster the RAM speed, the better is the performance.
Consider switching off Or minimize the interaction between visuals, which reduces the query load when users cross-highlight.
Limit the number of visuals and the data per visuals.
Enabling row-level security can result in large changes in performance. Be sure to test the different row-level security roles that your users will assume.
Visuals pinned to dashboards are served by the query cache when the dashboard loads. Conversely, when visiting a report, the queries are made on-the-fly to the data source either the Power BI service (in the case of import) or the data source that you specify (in the case of DirectQuery or live connection).
Chandra Matam is a Sr Technical Manager at KPI Partners and has around 15+ years of experience in data warehousing, governance, modelling and designing business intelligence applications. He has extensively worked on Microsoft BI, PowerBI and Azure data projects using Azure data factory, Azure databricks, azure SQL DWH. He is working on developing optimized real-time ETL processes using spark,python