How Yellowbrick integrates into existing environments


June 28, 2019webcasts

Yellowbrick recently hosted a webcast discussing how Yellowbrick offloads overburdened legacy data warehouses to extend their life, reduce costs, and increase the performance of your analytic environment. Read on for a summary of the webcast.

Market drivers

Legacy data warehouses struggle to meet modern demands, including:

  • supporting more internal and external users
  • providing simultaneous services for:
    • new analytic applications
    • ad hoc BI queries
    • operational dashboards
    • predictive analytics over deep historical data, and real-time streaming of IoT and OLTP workloads
  • curbing costs through deployment flexibility and consolidation

Yellowbrick Data Warehouse overview and landscape

Yellowbrick Data helps enterprises meet today’s analytic and data warehouse challenges with the following features:

  • Always on and available
  • Ad hoc SQL queries that do not impact operational workloads
  • Correct answers on any schema
  • Easily scales to petabytes of capacity in a compact footprint
  • Fast performance for simultaneous, mixed workloads, including real-time inserts, batch jobs, and interactive applications
  • Support for thousands of concurrent users to add analysts and discover new optimizations

The chart below illustrates the current data warehouse landscape, sorting the primary vendors by different approaches:

Data warehousing approaches

Some vendors offer simplicity, with a scale up data warehouse in a single server. Other vendors deliver more performance and capacity than single-server solutions by scaling out many servers in a parallel processing architecture. Many single-server customers migrate to the scale out solutions once their data volume exceeds the capabilities of the single-server solutions. Yellowbrick Data is currently the only solution on the market that offers organizations a data warehouse that delivers high performance that scales to petabytes of capacity. Notably, it is available either in the cloud or a compact 6U on-premises system.

Data management evolution

The webcast discusses how data management has evolved in recent years.

  1. Enterprises began by consolidating key application data sets into a data warehouse in a server, where they could run analytics on the data.
  2. Companies soon wanted insights from internet data. However, the volume of this data exceeded the capabilities of single-server solutions. This led to the creation of the data lake. Solutions like MapReduce emerged to meet these needs.
  3. Because data lake technologies are challenging to work with, particularly for business users, SQL abstraction technologies, like Hive and Impala, emerged. While SQL-as-a-layer technologies made Big Data more user-friendly, they slow performance, and limited SQL surface area makes them unacceptable in many of today’s competitive environments.

Yellowbrick provides a modern architecture for scalable SQL analytics

As the slide above illustrates, platforms like Yellowbrick Data give enterprises a solution to this problem. Enterprises can move high value data to Yellowbrick, giving these workloads access to sophisticated SQL analytics and the highest possible performance, while easing resource contention in their data lake to ensure that the rest of the business also runs at top speed.

Integration recommendations

Yellowbrick is compatible with the PostgreSQL dialect providing easy ecosystem connectivity. Installation, deployment, and integration is fast and simple in almost any existing environment.

Some common integration cases for Yellowbrick, include:


  • Data ingest directly from SQL applications, via real-time streams from Kafka, or transformations with Spark.
  • Bulk loading. Using common connectors like ODBC, JDBC, and ADO.NET or the Yellowbrick 1 GB/s YBLOAD tool.
  • Load and transform with Informatica, Attunity, Talend, Syncsort, and Spark ETL.


  • Interactive applications. Yellowbrick enables organizations to build new analytical applications.
  • Powerful BI analytics. Organizations can perform ad hoc BI queries from applications like MicroStrategy, Tableau, and Business Objects and support many more users without increasing infrastructure footprint.
  • Business critical reporting. Build prioritized responses and multi-department support with workload management.
  • Data mining with SAS, R, and Python.

When you should consider Yellowbrick

The webcast provides recommendations about when customers should consider using Yellowbrick, depending on their current environment.

Environment type

When to consider Yellowbrick

Benefits of Yellowbrick

Single server

  • When your needs are outgrowing single-server capabilities
  • When you are spending too much time juggling the workloads of many disparate systems.
  • More capacity (compute and historical data
  • Simplified operation
  • Same well-known SQL models


  • Netezza users: now, as Netezza is ending support.
  • Vertica and Greenplum: when you want modern features from products receiving more active development
  • Teradata: when cost reduction becomes important. You can save millions of dollars by moving select workflows to a Yellowbrick system.
  • Netezza, Vertica, and Greenplum: seamless integration with a PostgreSQL database
  • For all legacy MPP systems: Massive savings with 3-100x performance in a fraction of the footprint

Pre-configured systems (Oracle or SAP)

  • When you want to make analytics available to more users
  • To meet corporate objectives to reduce infrastructure costs by moving analytics to Yellowbrick
  • Significantly lower costs from a flash-centric instead of DRAM-centric architecture
  • Opens analytic access to many teams simultaneously at a much lower comparative cost


  • When costs become too high or unpredictable
  • When you reach performance or concurrency limits
  • When you want to gain more control through a hybrid environment
  • Redshift is PostgreSQL compatible making migration easy
  • Yellowbrick offers cloud and on-premises solutions with predictable costs

 A customer example

Symphony RetailAI told us that: “[Query] performance improvements we saw were from 3x to 10x…basically running them as is.” They also noted how the ANSI SQL standard architecture of Yellowbrick simple to learn: “We had six engineers touch the system and all found it very easy to use because there was a lot of commonality with the existing system we already had.”

Security application demonstration

The demonstration illustrates analytics on a Netflow dataset captured over a six-month period to detect intrusion.

The demonstration illustrates the following Yellowbrick capabilities:

  • Integration with BI applications like Tableau and MicroStrategy.
  • Fast performance: the demonstration shows results of a query that mapped protocols by frequency in Tableau. The query required a table scan of 8-billion rows and an inner join across 17-billion rows to match protocols to ports. It completed in just 11.5 seconds.
  • High concurrency: JMeter runs hundreds of simultaneous queries against the system without slowdown.

Importantly, this is all possible in an efficient solution that can be deployed in your data center or the cloud.

You can view the 23-minute on-demand webcast here.