Catalina Marketing is the market leader in shopper intelligence and targeted in-store and digital media. The company delivers $6.1 billion in consumer value annually, combining the richest buyer history database in the world with its own deep analytics and insights to help retailers, CPG brands, and agencies optimize every stage of media planning, execution, and measurement.
Catalina understands there’s a science behind every buy and a unique buyer behind the data. To uncover those insights, the company must ingest terabytes of data, process and transform it, and then consume and analyze the results to help customers mobilize meaningful, real-time engagement and results.
Until recently, the company’s enterprise data warehouse ran entirely on an IBM Netezza system. It supported two main workloads:
However, the Netezza system lacked the compute capacity to support both workloads. “It was an unsustainable environment, in which we weren’t able to finish our data loads because we had 15-20 queries running at any given time,” says Luis Velez, Data Engineering Manager at Catalina. “Every day, it was getting a little bit worse.”
The company’s Analytics team—the source of those queries—was also hamstrung by the lack of compute capacity, having to wait 20 minutes or more for their queries to run. “Sometimes queries took hours, and other times they were simply killed so ETL processes could run,” says Aaron Augustine, Executive Director of Data Science at Catalina. “We only had small windows of time for heavy analysis—like maybe 25% of the day. Given that we have Analytics teams overseas and in Japan, it was a 24x7 problem.”
Catalina decided to augment its existing data warehouse with a second system, dividing the compute workload into two parts. Data processing would continue to run on the Netezza system, while consumption of the processed data—including queries by the Analytics team—would be supported by Yellowbrick. “The Netezza box was running some pretty advanced ETL processes, which were working fine when there weren’t a lot of big queries hitting the system,” explains Augustine. “At the same time, we needed to give data scientists and other power users the means to ask some really big questions of the data—as needed to discover entirely new insights into buyer behavior. Moving that ad-hoc, advanced analytics workload onto Yellowbrick was an incremental, low-risk approach that made a lot of sense.”
Catalina discovered Yellowbrick after a failed proof-of-technology (POT) with a different vendor, which couldn’t deliver the needed performance or compatibility. During a successful 3-week POT on Yellowbrick, a single 10-U, 30-node Yellowbrick system delivered up to 182x better performance than an 8-rack, 56-node Netezza Mako system. The company immediately purchased the Yellowbrick system and spent the next 4-6 weeks migrating its 260 TB of data. Over the following two months, Catalina moved the Analytics team onto Yellowbrick. “All in all, our migration onto Yellowbrick took four months—far faster than we expected,” says Augustine.
Today, the company’s Analytics team can work unimpeded. Queries that used to take up to 30 minutes—if they weren’t killed first—are now completed in a few seconds to a few minutes. “Our Yellowbrick system has made our analytics team a lot more productive,” says Augustine. “These are power users doing deep and complex analytics—using tools like SAS, R, and Python to query three years of point-of-sale data. We’re continuously hammering on Yellowbrick with some really big queries, and it’s handling them very well.”
Now that it’s no longer hamstrung by a lack of compute capacity, the Analytics team can fully contribute to all parts of the business:
“Yellowbrick is being used across all of Catalina’s work streams: data science and data mining, serving our retail partners, serving our brand partners, and serving our digital analytics teams,” says Augustine. “And it’s performing wonderfully across all those use cases. We were so happy with our first Yellowbrick system that we purchased a second one, which we plan to use to modernize some of our legacy applications.”
Catalina is also looking at how it can take advantage of Yellowbrick’s hybrid cloud architecture to help the company move its computing workloads to the cloud. “Everything we’re doing on-premises is about keeping the lights on and making sure existing applications are running effectively,” says Velez. “Our long-term vision is to retire our on-premises footprint and create a new platform in the cloud. We’re excited to explore how we can partner with Yellowbrick on those efforts.”
Through its use of Yellowbrick, Catalina is benefiting in several ways:
Although the aforementioned technical benefits are impressive on their own, to Augustine, the largest benefit of the company’s decision to partner with Yellowbrick has been its commitment to Catalina’s success. “To me, Yellowbrick is a lot more than just great hardware,” he says. “What really impressed me was the great service and support. From the very beginning of the project, our partnership with Yellowbrick has gave us confidence that what we were doing would work out well—and it has.”
Velez echoes Augustine’s thoughts on how the two companies have worked together. “Hands-on help from Yellowbrick is a big reason why we completed our migration so quickly,” he says. “Unlike some other vendors, who tend to sit back and wait for us to request their assistance, Yellowbrick proactively predicts what we need and then helps make it happen—like how they’re now socializing with our users to help fine-tune their queries and optimize resource usage. Our relationship with Yellowbrick has been very effective—it was a big win for us internally and I’m excited to see what else we can do through our continued strategic partnership.”
- Luis Velez, Data Engineering Manager
Catalina Marketing helps retailers, CPG brands, and agencies optimize media planning, execution, and measurement.
The company’s enterprise data warehouse ran entirely on an IBM Netezza system. It lacked the compute capacity to support both daily ETL processes and advanced analysis of that processed data by the company’s Analytics team.
Catalina moved its analytics workload onto Yellowbrick, enabling the company’s 100-member Analytics team to run deep, complex queries on three years of POS data using tools like SAS, R, and Python.
- Aaron Augustine, Executive Director of Data Science