The original promise of the data lake was to provide low-cost storage for all unstructured enterprise data—originally in HDFS but also more recently in object repositories such as Amazon S3—where it could be subsequently analyzed in place. In reality, even with a wide selection of data lake query engines (Apache Hive, Apache Impala, Presto, et al) available for the analytics layer, most data lakes still don’t fulfill the promise of large-scale, real-time analytics, including support for hundreds or thousands of concurrent BI tool users, sophisticated ad hoc queries by data scientists, or data-intensive reports.
Yellowbrick Data Warehouse is now the fastest and most modern solution available for data lake analytics, letting thousands of concurrent users explore all your data at the speed at a fraction of the cost of alternatives. Even conventional cloud data warehouses, which rely on VMs tied to general purpose hardware, fail to match Yellowbrick's breakthrough price/performance.
With Yellowbrick, you can continue to use your data lake for what it does really well while meeting the original goal of enabling real-time analytics at scale, and much more.
Yellowbrick makes it easy to ingest, process, and query data for any SQL analytics use case
Yellowbrick is built for a world where most queries are ad hoc and the data warehouse isn’t running a predefined, repeatable workload day in and day out. This requires the following characteristics:
Ad hoc users make mistakes and submit poorly coded queries that either return too much data, produce incredibly complex cross-products, or sometimes just are really complicated. In Yellowbrick, such queries can be run-time reprioritized and placed into a “penalty box” to ensure that shorter, interactive queries still complete and resources aren’t tied up.
Yellowbrick doesn't rely on inverted indexing or partitioning strategies for peak performance. Forward indexes and statistics are gathered at import and kept up to date automatically, and data is reformatted into the most optimal columnar form for fast querying.
Yellowbrick requires virtually no management of space. Data partitioning, while supported, typically is unnecessary, and issues with storage space utilization due to skewed partitioned data don’t exist.
Yellowbrick is highly available with no single point of failure, and has fault tolerance suitable for "Tier 1" applications.
Yellowbrick reads numerous Hadoop file formats and integrates with R, Python, SAS, Kafka, and Spark, as well as with common BI and visualization tools like SAS, Tableau, PowerBI, and MicroStrategy.
Migrations are fast and easy from any le gacy platform, and weʼll work with you to validate your use cases and success metrics along the way.
Try our free 7-day Test Drive: yellowbrick.com/test-drive
One of the top 10 mobile operators in the world depends on Yellowbrick to safeguard millions in revenue and increase time-to-insight by 20X.
Previously, the company used a legacy data warehouse on top of a data lake to analyze customer data for use by multiple departments--for example, to reconcile revenue from prepaid SIM cards sold by retailers. But with an 800% growth in data volume over time, that platform could no longer keep up with business needs.
Results of a painless migration to Yellowbrick Data Warehouse include:
ThreatMetrix (part of LexisNexis Risk Solutions) is a global anti-fraud SaaS product accessed by thousands of end-users. With Yellowbrick Data Warehouse adding a high-speed SQL analytics layer to its Hadoop data lake, ThreatMetrix helps 5,000 brands validate 5 billion online transactions per month.