Exploring the different ways Tableau connects to data and whether dimensional modeling is necessary when using Tableau’s architecture.
Tableau has two primary ways to connect to data sources and work with them as shown in this simplified architecture diagram shown below .
What does In-Memory Data mean?
Basically, In-Memory Data means that Tableau stores data locally in your computer in a columnar format, using data compression along with a design to take advantage of the computer’s RAM and processors.
It is not necessary for all the data to fit in the RAM (random access memory). When the data set is larger than the available RAM, the Tableau Data Engine intelligently allocates only a portion of the data extract to reside in the RAM.
The advantage of this design is that we can avoid connecting to databases with slow analytic query performance and do our analysis at a faster pace by extracting the data locally. This also reduces the load on the source database.
The disadvantages are that we will be creating a silo of data and not working with the latest data. Also, in order to create an extract of data, it will take some time initially to pull in the data into the local system, which is a one-time activity until you need another refreshed set of data.
When should we use a Live or Direct Connection?
When there is a business need to use real-time data for decision making, a live connection to the data is required. If there are high speed analytical databases as part of the infrastructure, local extracts of data are not required and direct connection can be made to these fast databases.
The primary issue with direct connections is when there is not fast database infrastructure in place and the queries take a long time to process.
What type of data model should we use with Tableau?
Traditionally, dimensional data models or star schemas have been used in a data warehouse to serve the purposes of integrating data from different sources and improving the performance of analytical queries. With Tableau’s data engine, we may or may not need a dimensional model in order to support our analysis depending on whether we are using In-Memory Data or a Direct Connection and considering Performance and Data Integration objectives.
From a performance perspective, there is no need for a dimensional model in the database if a Tableau extract is used. If a fast database infrastructure is available, a dimensional model is not required to improve the performance of queries.
It is preferred to use a dimensional model if a direct connection is made to a non-high sped analytical database.
The decision tree below is a visual guide to help determine whether nor not we need a dimensional model. The decision is based on whether the business needs real time data and if the infrastructure has high speed analytical database.
From a data integration perspective, Tableau provides a feature called Data Blending where different data sources can be integrated based on common fields/dimensions. For purposes of data integration, a dimensional model is still recommended to help properly conduct master data management. It is also recommended that conformed dimensions are utilized for the purpose of easier data analysis across different facts & data sources.
Tableau has revolutionized the Business Intelligence (BI) domain with its self-service BI offering. Tableau features outstanding visual interfaces and data blending. Utilizing a star schema in the Tableau world depends heavily on the data integration objectives, real-time data requirements, and the type of database infrastructure available.
Avinash Mohan is a Manager at KPI Partners. He is a Reporting and DW specialist who has worked within in the industry since 2001. His experience includes working with tools such as Business Objects, Tableau, Dell Boomi, Informatica, Netsuite ERP, Oracle BI Applications and Oracle E-Business Suite. Check out Avinash's blog at KPIPartners.com.