Edmunds Tech

Building a Data Warehouse for Business Analytics using Spark SQL - Spark Summit 2015

Edmunds.com is a car-shopping website that serves more than 19 million visitors each month, and we heavily use data analysis to optimize the experience for each visitor. To accomplish that goal, the engineering team at Edmunds processes terabytes of data, and our business analysts use rich visualizations on traffic, revenue and car leads metrics to get insights on the car shopper journey. When our team was faced with the challenge of increasing the speed of the pipeline and empowering business analysts to be completely self-autonomous in the process of dataset creation, aggregation and visualization, we decided to use Apache Spark. This talk is about that migration process and bumps along the road. First, the talk will address the technical hurdles we had to clear bringing up Spark - including the process of exposing our data in S3 for productionalized ETL and Ad Hoc analysis using Spark SQL in combination with libraries that we built in Scala. Then, we cover the benefits we were able to achieve better data refresh intervals, faster queries times, and even increased productivity in our development process. Lastly, we cover the rich set of visualization and analysis tools we employ to make all these data marts easily accessible to our business analysts.

Learn more about Spark Summit 2015, June 15-17 in San Francisco at Spark Summit

Blagoy Kaloferov is a Big Data Software Engineer at Edmunds.com with experience in devising reliable services that process very large quantities of structured and unstructured data and creating a toolset to make analyzing this data very simple. He is currently pushing the boundaries on Big Data interactive applications with Apache Spark. He is passionate about architecting solutions that are powerful, yet as simple and quick as possible for end users.

At Edmunds we’re not just about making car buying easier, we're also passionate about technology!

As with any website that has millions of unique visitors, it's a necessity that we maintain a scalable and highly-available infrastructure with reliable services.

We are excited by software design and strive to create engaging experiences while using coding practices that promote streamlined website production and experimentation. We embrace continuous delivery, dev ops, and are constantly improving our processes so we can stay lean and innovative. We also prioritize giving back to the community by providing open APIs for our auto-motive data and open sourcing projects whenever possible.

Recent Posts