Snowflake’s consumption model, where you purchase credits up front and then burn through them while your virtual data warehouses are running, makes it near impossible to determine the costs because you don’t know how long your queries will take to run, particularly in a mixed workload environment.
This is especially the case for ad hoc, discovery workloads that are very data and query dependent. This restricts innovation, as business analysts and data scientists/engineers must now become cost-conscious or risk exceeding their analytics budget given the open-ended nature of this type of work. Snowflake’s autoscaling makes the cost situation even harder to predict, as scaling out your cluster dynamically in response to new workloads or concurrent users means doubling the size of the cluster. You might find yourself doubling costs just because you’ve added a single new user to the system.
Snowflake isn’t interested in making its software more efficient as it would impact its business model. Since Snowflake has no workload management, the only way to improve performance is to double the nodes and the costs.
With Yellowbrick, you don’t buy credits, you simply decide what size of data warehouse you want in terms of the number of nodes and pay a flat subscription for the software over a 1- or 3-year term for that number of nodes.
Yellowbrick is regularly benchmarked against other data warehousing solutions during the procurement process. In this document we present query performance comparisons from real world benchmarks against Snowflake. This benchmark was conducted by a customer using its own data and query workloads.
Snowflake is an MPP cloud data warehouse, which is delivered “as-a-service” on all three major public clouds. Advocates of Snowflake cite its elasticity features, its ability to share data easily and its compelling user experience. While it is certainly true that Snowflake has made data warehousing more accessible to many smaller enterprises, its performance struggles when data volumes and concurrency requirements are higher and mixed workloads get more complex. Snowflake’s answer to higher concurrency and query volumes is to spin up more virtual data warehouses, which means partitioning up your users and workloads manually. This quickly becomes cost prohibitive, particularly if you are running business-critical “always-on” applications.
A customer in the digital marketing space wanted to reduce the time to insight into campaign performance for its clients, who are primarily in the financial services sector. The customer decided to replace its existing 40-node Vertica data warehouse platform to drive the improvements in performance needed from its advertising platform. The customer ran proof of concept exercises with Snowflake and Yellowbrick to identify the replacement. As part of the POC, it conducted query performance tests using 12 representative SQL workloads against its own data. The data consisted of 3.5 trillion records distributed across 82 tables, with a volume of approximately 500 TB.
The query tests compared a 32-node 2XL Snowflake cluster with a 30-node Yellowbrick cluster. Yellowbrick was 6X faster on average across all 12 queries. Ten of the queries had runtimes less than 2 minutes on both platforms:
The queries ranged in complexity from simple aggregations to complex joins and large updates. Yellowbrick was 147X faster in the case of one of the longest running queries involving a semi-join scanning over a year of data:
In addition to outperforming Snowflake in terms of query performance, Yellowbrick was also able to demonstrate a significant improvement in performance of critical ELT tasks, taking ELT times down from over 10 hours to just 45 minutes. Near real-time data ingestion was also important to the customer and tests demonstrated that Yellowbrick was able to ingest data from Kafta topics at 2 million records per second, far exceeding the 900,000 records/second requirement.
After selecting Yellowbrick, this customer can now update campaigns every 2 hours, enabling it to modify underperforming campaigns in flight, rather than having to let an obviously underperforming campaign run all day. This has helped the customer increase the performance of the upsell campaigns for their clients.
In a side-by-side comparison of Yellowbrick vs. Snowflake, it quickly becomes clear that Yellowbrick is in a league of its own. Simply put, Yellowbrick can do everything Snowflake can at a lower cost. Conversely, Snowflake simply does not offer the features and capabilities of Yellowbrick.
To illustrate this stark contrast, please review the comparison chart below.
As acceptance of cloud data warehousing has grown, so have the challenges. That is where Yellowbrick takes the baton from Snowflake and moves data warehousing forward. Our service architecture was designed to address the concerns raised in the areas of support for hybrid/distributed cloud models, data and infrastructure ownership, predictable cost, performance and concurrency, workload management, and real-time streaming ingest.
If you want to learn more about how Yellowbrick offers the best data warehousing service in the industry at the most competitive price, take a look at the resources below.