One of the key business applications of ThreatMetrix (now known as LexisNexis Risk Solutions), a global digital fraud detection and identity authentication service, is an online portal accessed by thousands of users around the world. The portal serves over 5,000 global brands, helping them verify more than 20 billion financial transactions each year.
- Customers query a 300TB multi-tenant database over 25,000 times per day, with up to 1TB of new data ingested daily in real time from a data lake via Kafka.
- Hundreds of external users generate queries simultaneously.
- Many queries are also complex, accessing over 6 months of stored data spread across millions of records.
Frustrated business users
The application used a variety of different technologies for data processing, including Greenplum Database and Apache Impala, but even with complex, hard-to-manage optimizations, those solutions were unable to respond interactively during busy periods in the face of growing data sets and more users.
- Some customers would have to wait up to 3 minutes for queries to complete.
- Unpredictable Impala outages were common, frustrating customers.
- Business process changes would take weeks to implement (e.g., by adding new columns).
3X speed from 4X fewer nodes
After replacing Impala with Yellowbrick, portal end-users noticed performance improvements immediately, with most operations completing in milliseconds or seconds--and that’s with 4X fewer nodes, and 20X less memory, than what was required by Impala.
- Faster, more accurate insights for customers. With Yellowbrick’s faster and more consistent performance, even with real-time ingestion in the background, LexisNexis can deliver richer insights to its customers, more quickly, and with fresher data.
- Far less time needed for management. Yellowbrick automatically reallocates resources to respond to spikes or unusual usage patterns, and performance tuning is no longer needed.
- Better customer experience. Downtime is no longer a concern, and with Yellowbrick instances located in different global regions, workloads can shift seamlessly between clusters when needed.