Catalina Marketing is the market leader in shopper intelligence and targeted in-store and digital media. The company delivers $6.1 billion in consumer value annually, combining the richest buyer history database in the world with its own deep analytics and insights to help retailers, CPG brands, and agencies optimize every stage of media planning, execution, and measurement.
Until recently, the company’s enterprise data warehouse ran entirely on an IBM Netezza system. It supported two main workloads:
However, the Netezza system lacked the compute capacity to support both workloads. "It was an unsustainable environment, in which we weren't able to finish our data loads because we had 15-20 queries running at any given time," says Luis Velez, Data Engineering Director at Catalina. "Every day, it was getting a little bit worse."
The company’s Analytics team—the source of those queries—was also hamstrung by the lack of compute capacity, having to wait 20 minutes or more for their queries to run. "Sometimes queries took hours, and other times they were simply killed so ETL processes could run," says Aaron Augustine, Executive Director, Analytics Architecture at Catalina. "We only had small windows of time for heavy analysis—like maybe 25% of the day. Given that we have Analytics teams overseas and in Japan, it was a 24x7 problem."
Catalina decided to augment its existing data warehouse with a second system, dividing the compute workload into two parts. Data processing would continue to run on the Netezza system, while consumption of the processed data—including queries by the Analytics team—would be supported by Yellowbrick. "The Netezza box was running some pretty advanced ETL processes, which were working fine when there weren’t a lot of big queries hitting the system," explains Augustine. "At the same time, we needed to give data scientists, analysts and other power users the means to ask some really big questions of the data—as needed to discover entirely new insights into buyer behavior. Moving that ad-hoc, advanced analytics workload onto Yellowbrick was an incremental, low-risk approach that made a lot of sense."
Catalina discovered Yellowbrick after a failed proof-of-technology (POT) with a different vendor, which couldn’t deliver the needed performance or compatibility. During a successful 3-week POT on Yellowbrick, a single 10-U, 30-node Yellowbrick system delivered up to 182x better performance for specific use cases than an 8-rack, 56-node Netezza Mako system. The company immediately purchased the Yellowbrick system and spent the next 4-6 weeks migrating its 260 TB of data. Over the following two months, Catalina moved the Analytics team onto Yellowbrick. "All in all, our migration onto Yellowbrick took four months—far faster than we expected," says Augustine.
Today, the company’s Analytics team can work unimpeded. Queries that used to take up to 30 minutes—if they weren’t killed first—are now completed in a few seconds to a few minutes. "Our Yellowbrick system has made our analytics team a lot more productive," says Augustine. "These are power users doing deep and complex analytics—using tools like Python, R and SAS to query three years of point-of-sale data. We’re continuously hammering on Yellowbrick with some really big queries, and it’s handling them very well."
Now that it’s no longer hamstrung by a lack of compute capacity, the Analytics team can fully contribute to all parts of the business:
Although the aforementioned technical benefits are impressive on their own, to Augustine, the largest benefit of the company’s decision to partner with Yellowbrick has been its commitment to Catalina’s success. "To me, Yellowbrick is a lot more than just great hardware," he says. "What really impressed me was the great service and support. From the very beginning of the project, our partnership with Yellowbrick has gave us confidence that what we were doing would work out well—and it has."
- Aaron Augustine,
Executive Director of Analytics Architecture
Catalina Marketing