Spark and ybrelay Glossary

Apache Spark
Open-source platform for grid computing; a framework for solving analytics problems at large scale.
Avro
Row-based storage format with its data definition in JSON, and the data itself in binary format, making it compact and efficient.
FIFOs
Named pipes, as produced by the Linux mkfifo command.
HDFS
Open-source Apache Hadoop distributed file system; manages very large data sets running on commodity hardware.
Parquet
Apache Parquet, an open-source column-oriented data storage format commonly used in Hadoop projects.
Spark application
An application that generically consumes any data Spark feeds it (in row form).
Spark job
A job that is submitted to Spark to handle large-scale data export or import.
ybrelay
Yellowbrick "relay" client that accepts incoming data in various formats from any external file system and calls ybload to bulk load it into tables.
ybload
Yellowbrick bulk load client tool.