product

The Data Warehousing Divide

The Data Warehousing Divide

Chief Technology Officer

June 14, 2022

Back to Blog >

Today we announced the latest version of the Yellowbrick Data Warehouse, continuing a journey to the cloud that started with us hosting our optimized hardware offering in Yellowbrick Cloud, and now brings our technology to public cloud. The cloud data warehousing market is bustling, so what does Yellowbrick do that other cloud solutions don't, and who will be most interested in this?

The first question is easy. Just as with our current on-premises Andromeda platform, performance is designed-in. Most notably: our solution is engineered to consume resources efficiently, making us cheaper to run in the cloud. Secondly, driven by the requirements of our enterprise customers who want to avoid cloud concentration risk, this release is multi-cloud, allowing you to put your data warehouse(s) anywhere. Together with our on-premises and edge heritage, users also get total freedom in hybrid environments. Our customers don’t compromise on data ownership and still retain all the benefits of a SaaS experience. As you would expect with a cloud solution, true separation of compute and storage allows for elastic scaling on demand. Finally, roll in all the other benefits that Yellowbrick customers regard as table-stakes – high concurrency, query complexity, large data volumes, streaming analytics – and we begin to look very different from every other cloud data warehouse.

Now the more challenging question: who are we aiming our solution at? There’s a gap in the data warehouse market that isn’t being served by the current cloud data warehouses or the legacy on-premises vendors. At one end of the spectrum, this divide is most evident in large enterprises that are early on in their cloud journey. At the other end are those that are very advanced in their cloud strategy as shown here:

cloud strategy

  • Cloud Starters - Typically large businesses in heavily regulated industries that are taking a 3-5-year view of which workloads they move to the cloud and when. Data security is a huge concern for these companies.
  • Cloud Only - Often small/medium enterprises with relatively small data volumes and simpler workloads that want to consume data warehousing as a SaaS and have no interest in growing a CloudOps team to manage it themselves.
  • Cloud First - Large enterprises that were first movers to the new crop of cloud data warehouses but operate at a scale and concurrency level that is testing the limits of these modern solutions. Performance and price predictability are important here.

For Cloud Starters, a significant portion of their data warehousing workloads is on-premises. As they journey to the cloud, these companies are forced to adopt solutions that differ significantly from their existing on-prem data warehouses. The current cloud data warehouse vendors don't offer solutions that run in an enterprise's own data center, so these businesses end up doubling the complexity of their analytical ecosystem and doubling the skillsets needed to support the hybrid cloud stance desired. They have upstream data pipelines and ETL jobs that use the likes of Informatica or DataStage, and a rich set of downstream reporting solutions that can be a drag on migration. Migrations to the cloud must account for these other components, and not just focus on swapping out the data warehouse.

The Cloud Only folks – generally SMBs – are well served by the “good enough” cloud-only data warehouse solutions that exist today. They don’t have a complexity and scale problem and don’t have a significant data warehouse migration problem.

Cloud Firsts are in optimization mode looking to extract every ounce of efficiency from the cloud. They struggle with scale: concurrency, out-of-control costs, performance – and because of their desire to eliminate cloud concentration risk, the challenges of manually managing hybrid. Often a victim of their own success in attracting new analytics users, they are witnessing spiraling costs and inadequate performance. Cloud Firsts are forced to consume more cloud compute to try to address the scale challenges they face. However, Cloud First companies tell us they are reaching the limits in scalability that cloud data warehouse vendors can provide.

The challenges faced by Cloud Starters and Cloud First enterprises have not been addressed by data warehouse vendors up to this point, and this is the gap in the market I’m referring to. \

Yellowbrick Data Warehouse addresses these problems in AWS public cloud today, and with Google Cloud and Azure support following shortly.

Wherever you choose to run Yellowbrick, it’s the same software and the same user experience everywhere, managed from a single control plane – the Yellowbrick Manager. We call the notion of deploying the same data management and analytics stack on-prem, in public cloud, and even at the network edge a Distributed Data Cloud. A Distributed Data Cloud significantly simplifies hybrid cloud operations.

Cloud Starters choose to modernize their on-prem data warehouse stack with Yellowbrick knowing they’ve got a smooth route to running workloads on the cloud of their choice in the future. They also take their first cloud data warehouse steps with Yellowbrick. We’ve lowered the barrier to modernization by maintaining support for existing data pipelines and downstream applications. Migrating to Yellowbrick isn’t a “boil the ocean” exercise, and businesses swap their existing solution to Yellowbrick without having to change their entire ecosystem. Yellowbrick's replication capabilities make it trivial to keep data in sync between different instances in different locations, providing a simple, manageable hybrid cloud solution.

Cloud Firsts can now take advantage of the price/performance characteristics, which our on-premises customers love, in the cloud(s) of their choice. The combination of Yellowbrick’s core technology, our advanced workload management, and our fixed-capacity subscription or consumption-based pricing means that they can squeeze the best performance out of their public cloud infrastructure at the highest concurrency levels without having large and unpredictable cloud bills at the end of the month.

Cloud Only organizations can cheerfully stay with Snowflake, AWS, and the plethora of other simple cloud-only data warehouse solutions.

With this new release of Yellowbrick, we’ve addressed the gaping holes on either side of the cloud maturity divide. Now everyone is happy!

Blog Comments