Spark and ybrelay Glossary

Apache Spark
Open-source platform for grid computing; a framework for solving analytics problems at large scale.
Row-based storage format with its data definition in JSON, and the data itself in binary format, making it compact and efficient.
Named pipes, as produced by the Linux mkfifo command.
Open-source Apache Hadoop distributed file system; manages very large data sets running on commodity hardware.
Apache Parquet, an open-source column-oriented data storage format commonly used in Hadoop projects.
Spark application
An application that generically consumes any data Spark feeds it (in row form).
Spark job
A job that is submitted to Spark to handle large-scale data export or import.
Yellowbrick "relay" client that accepts incoming data in various formats from any external file system and calls ybload to bulk load it into tables.
Yellowbrick bulk load client tool.