Client's marketing team is looking for ways to gather data from various sources including all major social data platforms and import them into the client data lake for analytics.
The client selected KPI over multiple vendors through a rigorous RFP process. KPI's expertise in Big Data Ecosystem, Airflow, PySpark, and AWS was the key differentiator for the client in addition to KPI's blended shore model to minimize cost and risk for the client.
What KPI Delivered
The KPI team delivered multiple pipelines to automate the ingestion of data from various sources including all major social media platforms like Apple, Google, FB, Twitter, etc. into Client Data Lake (AWS S3) on a daily, weekly, and monthly basis.
Data pipelines include fetching data from APIs, SPTP, S3 using Python and perform transformations, aggregations, and consolidations using PySpark to load into client's Data Lake.
Provided data to Gain Theory, a third-party AI/ML platform for marketing decisions which is critical for client’s business.
- Business is able to save hours every month via the automated data pipelines KPI has implemented. Optimized/ Refined the data for further analytics using AI/ML on current and historical data
- The client now has the ability to perform uplift analysis on campaigns to gauge the effectiveness of the advertisement
- Supporting data ingestion to external vendors such as Gain theory (GT), AIML platform for marketing decisions which is critical for client's business
- Accomplished data governance which includes data quality and data security.
- Repeatable processes with restartability and recoverability
- Automated ad-hoc data scheduling using Apache Airflow and now the data can be made available on any given date including historical backfill
"This work is invaluable. KPI’s Analytics Lead and the team have done SUCH an awesome job ingesting all of our disparate social data sources"
Social Media Manager
“Big-time $$ saver !!”
Sr Manager Marketing Operations