Yellowbrick Data Warehouse provides near-real time answers from data at any scale, for any workload, in any environment. Its uniquely efficient use of resources (processor, memory, storage, and network) in an adaptive “cut-through” architecture helps you solve data processing-intensive problems you couldn’t solve before while getting the best possible performance value from the infrastructure available. Yellowbrick also offers flexibility for enabling distributed clouds, with deployment options including on Andromeda optimized instances in private clouds, as a service or instance in public clouds, and on K8S containers at the edge, with all data warehouse managed via a single, unified control plane (Yellowbrick Manager).
Yellowbrick was conceived with the goal of optimizing price/performance. New SQL analytics use cases are emerging all the time, and more concurrent users are consuming more ad hoc analytics. That requires more performance per dollar spent, and Yellowbrick architecture leapfrogs the industry in this respect.
In the cloud, Yellowbrick’s efficiency brings big savings compared to rivals combining unremarkable performance with consumption pricing. It’s not uncommon for 16 nodes of Yellowbrick to outperform 128 nodes of our closest competitor on both performance and concurrency, and by healthy margins. This means you will solve the same problem in a fraction of the time, and at a fraction of the cost.
Yellowbrick also addresses workload needs with incredible flexibility. Inside Yellowbrick, data is actively managed in a hybrid row and column store. The front-end row store minimizes commit latency for real-time inserts, while the back-end column store handles massive ad hoc queries, giving customers the ability to address the most demanding workloads with ease.
To follow are examples of some key Yellowbrick innovations. For a comprehensive overview of Yellowbrick data warehouse architecture, see our technical white paper at yellowbrick.com/whitepaper.
Yellowbrick makes it easy to ingest, process, and query data for any SQL analytics use case
To avoid performance limitations in Linux, we built a new Linux-based OS kernel from scratch that achieves maximum CPU efficiency by keeping memory pinned to the correct core until queries complete. Our optimized non-blocking thread management ensures the same data remains in the CPU caches while queries run. The Yellowbrick Kernel is also deployment-aware and will select which code and drivers to use based on the environment.
While most databases need to keep hot data in a memory buffer cache for performance, Yellowbrick never caches data; rather, its optimized data path provides the experience of all data living in cache all the time. Data is read directly from primary storage into the CPU cache, so no cache rewarming is ever needed.
Yellowbrick borrows concepts from high-performance computing, high-frequency trading, and public cloud stacks to offer the most efficient data warehouse drivers. Our networking drivers are 20X more efficient than standard Linux drivers: We’ve clocked a single CPU core sending and receiving 16GB/sec!
To lower costs while improving availability and reliability, the Yellowbrick cluster filesystem, ParityFS, implements n+2 erasure coding. Data is reconstructed on the fly, and when nodes are replaced, the original data is rebuilt automatically. Non-HA instances, or those with ephemeral storage in public clouds, use Yellowbrick Cloud Mirror technology. This approach, along with its TCO benefits, performance benefits, and integration into MPP database processing, is unique to Yellowbrick.