Skip to content

Big Data Analytics on Marketing Data

Case Studies

About Online Marketplace Company

The leading real estate marketplace. Search millions of for-sale and rental listings, compare Zestimate® home values and connect with local professionals to get into your next home with speed, certainty, and ease.

A $3.3 billion company with more than 5,500 experienced employees.


  • Python KPI partners
  • PySpark KPI Partners
  • AWS KPI Partners
  • Airflow KPI Partners


Business Driver

Client's marketing team is looking for ways to gather data from various sources including all major social data platforms and import them into the client data lake for analytics.


Selection Process

The client selected KPI over multiple vendors through a rigorous RFP process. KPI's expertise in Big Data Ecosystem, Airflow, PySpark, and AWS was the key differentiator for the client in addition to KPI's blended shore model to minimize cost and risk for the client.


What KPI Delivered

The KPI team delivered multiple pipelines to automate the ingestion of data from various sources including all major social media platforms like Apple, Google, FB, Twitter, etc. into Client Data Lake (AWS S3) on a daily, weekly, and monthly basis.

Data pipelines include fetching data from APIs, SPTP, S3 using Python and perform transformations, aggregations, and consolidations using PySpark to load into client's Data Lake.

Provided data to Gain Theory, a third-party AI/ML platform for marketing decisions which is critical for client’s business.


Business Benefits

  • Business is able to save hours every month via the automated data pipelines KPI has implemented. Optimized/ Refined the data for further analytics using AI/ML on current and historical data
  • The client now has the ability to perform uplift analysis on campaigns to gauge the effectiveness of the advertisement
  • Supporting data ingestion to external vendors such as Gain theory (GT), AIML platform for marketing decisions which is critical for client's business
  • Accomplished data governance which includes data quality and data security.
  • Repeatable processes with restartability and recoverability
  • Automated ad-hoc data scheduling using Apache Airflow and now the data can be made available on any given date including historical backfill


"This work is invaluable. KPI’s Analytics Lead and the team have done SUCH an awesome job ingesting all of our disparate social data sources"

Madelyn T.
Social Media Manager


“Big-time $$ saver !!”

Justin M
Sr Manager Marketing Operations


Comments not added yet!

Ready to realize your vision?