Migrating Code from Development to Production in the Cloud
The cost of cloud migration is not cheap but the promise is that the company will acheive a significant return on investment over the long haul. The objective of this article is to help us understand how to setup Development and Production environments in the cloud, specifically regarding these components of a data warehouse (DW):
- Data model definitions
- Reporting definitions
Setting Up Development and Production Environments in the Cloud
There are multiple options to set up a data warehouse in the cloud. For ease of understanding, we can examine one setup that has worked successfully and extrapolate from there.
Primarily, we have 3 major components in a data warehouse system:
- Database/Data Model
- ETL Tool
- Reporting/Common Logical Model Tool
These components can be hosted on a Cloud Platform like Microsoft Azure or Amazon EC2 by using Platform as a Service (PaaS) options.
For example we could host a database like SQL Server/Oracle, ETL related installations of Dell Boomi Atomsphere/Informatica Cloud and a reporting tool like Tableau/OBIEE on a Microsoft Azure Platform.
Installing all the components on one server is not recommended. The Reporting Tool is ideally installed on a one box and the ETL Tool and Database on a separate box.
There is the option of using a cloud-based ETL tool such as Informatica Cloud or Dell Boomi. Even though the ETL Tool is hosted on the cloud, a few other components, like that of an Atom or Secure Agent, should be installed where the target database resides. This helps with scheduling jobs, identity management, and specifying connection details between source and Database.
Defining the Development and Production Environments:
Data warehouse components need to be setup on the servers designated for Development/QA/Production.
- A Windows Azure box for ETL/DB Install Dell Boomi Atom + SQL Server
- A Windows Azure box for Reporting Tool Install Tableau Server
Note: In this example, the SQL Server is maintained in-house and not by the PaaS provider.
A Quality Assurance (QA) environment could also be leveraged, but that would increase the cost. As environments are added for Development/QA/Production/Other purposes, the cost for PaaS increase proportionately. Apart from the licensing costs, additional time is spent every time we migrate the code from Dev to Prod. Based on business requirements, we need to decide on the number of migration steps we want to introduce between Development and Production environments.
Migration of Code from Development to Production Environment
The migration of data warehouse components (i.e. the data model definitions, ETL and reporting/common logical model definitions) includes the movement of code for each of these components from one environment to another.
Moving Data Model Definitions
We need to create the DDL scripts of the required components in the database that are required to move which are available in the Development environment. These DDL scripts will contain the definitions of Tables, Views, Indexes, Triggers, Procedures and Functions.These DDL scripts are executed in Production to recreate the data model from Development into Production. We have options where this type of synchronization can be done through some wizard/tools, but they generally come with a licensing cost.
When we use an Integration PaaS ETL tool like Dell Boomi Atomsphere or Informatica Cloud, we need to install the Atom or Secure agent on the Production Box with a configuration that connects to the appropriate Production Source.
To move the mapping/execution plans created in the development environment to Production using Dell Boomi, we need to deploy the same execution plans using the Production Atom using a single account which has access to both the environments.
If we use Informatica Cloud, then we will have separate accounts for each environment. Here we need to use the Migrate Objects option from the destination location (Production) and specify the source account credentials (Development) in order to move objects from source to destination.
We have to be extra careful when we are migrating ETL related objects and ensure that the connections are all pointing towards Production sources and targets. The references to objects in Development shouldn't be present in Production objects.
Moving report/common logical model definitions
If we are using a reporting tool like Tableau, Cognos, or Oracle BI, we need to move the data-sources and reports/workbooks from the Development Server to the Production Server.
This is accomplished by first installing the appropriate BI Tool on the Production box. Then we open the data-sources and workbooks that need to be migrated into the BI tool and change the connection to point to the Production Server and publish them to Production.
Note - We need to ensure that the database connections are changed to point to production databases when publishing the BI tool's data-sources.
In Tableau, we can automate the publishing of workbooks by the use of the tabcmd utility which comes along with Tableau Server. We can use the same command line utility to import the users from Development to Production.
This concludes the steps necessary to migrate data warehouse code from Development/QA to a Production environment.
Ksheetij Dongre is a Principal Consultant at KPI Partners. He is a business intelligence professional specializing in data warehouse reporting. In addition to his ETL experience, he has worked extensively on reporting tools like Tableau, OBIEE, and Business Objects. He has implemented custom BI DW solutions for clients across various industries including Retail, Manufacturing, Logistics, High Technology. Check out Ksheetij's blog at KPIPartners.com.