Encrypting Sensitive Data

You can use the ENCRYPT_KS and DECRYPT_KS SQL functions to protect access to sensitive character data, such as Personally Identifiable Information (PII). Encryption functions scramble the input data they receive, rendering it unreadable to any user or role who does not have both the privileges and the access key required to decrypt it. Authorized users can run the encryption functions against any valid SQL expression that evaluates to a character string.

Configuring Encryption for SQL Access

The ENCRYPT_KS and DECRYPT_KS functions are backed by an integrated "vault" on the manager node, which functions as a key manager (or keystore). The same vault is used in the manager node to secure keys for unlocking encrypted drives; see System Hardware Encryption.

Before you can use the SQL encryption functions, you have to complete the following configuration steps:
  1. Enable SQL encryption for the system by running the following ybcli command on the manager node:
    YBCLI(12545) (PRIMARY - yb100-mgr0)> config keystore sql enable
    
    Are you sure you want to configure keystore for SQL integration?
    Response (yes/no): yes
    ...

    This command sets up access to the vault. The system looks up keys in the vault when they are specified as arguments to encryption functions in SQL queries.

  2. Grant CREATE privilege per schema to users and roles who will be able to create, delete, and describe keys created in that schema. (This is the same type of privilege that you would grant to users who are going to create tables and other objects.) For example:
    premdb=# grant create on schema public to yb100;
    GRANT
  3. Create keys with the CREATE KEY SQL command, using a valid hexadecimal string to define a "secret." Keys belong to the database and schema where you create them. For example:
    premdb=# CREATE KEY yb100key SECRET 'a1b2c3d4e5f0';
    CREATE KEY
    This command creates a key for the public schema in the premdb database. The secret for the key is saved in the encryption keystore. (Authorized users pass in the key name, not its associated secret, when they run encryption functions.) Administrators can query the sys.key view to return a list of keys that have been created per database and schema. For example:
    premdb=# select * from sys.key;
     key_id |   name   | schema_id | owner_id |         creation_time         
    --------+----------+-----------+----------+-------------------------------
      16480 | yb100key |      2200 |    16007 | 2019-11-07 15:26:30.577896-08
    (1 row)
  4. Grant ENCRYPT and/or DECRYPT access on keys to specific users and roles. For example:
    premdb=# grant all on key yb100key to yb100;
    GRANT

    This example grants ENCRYPT and DECRYPT privileges to members of the role named yb100.

  5. Test the functions by encrypting and then decrypting a column from a table:
    premdb=# create table encrypted_names as select encrypt_ks(nickname, yb100key) enc from team;
    SELECT 50
    premdb=# select decrypt_ks(enc,yb100key) from encrypted_names;
     decrypt_ks 
    ------------
     Gunners
     Villains
     Tykes
     Blues
     Rovers
    ...
For more details about the commands used in these steps, see:

Calling the Functions

The ENCRYPT_KS and DECRYPT_KS functions take the following arguments:
  • An input expression (required), which must evaluate to a character string. For example, this could be a column name, the concatenation of values from two columns, or the result of another function.
  • The name of the key (required). Users specify keys to ensure that the data is encrypted and decrypted consistently every time the same source data is accessed. Keys are mapped to "secrets" stored in the vault.
  • One of three standard AES algorithms, all of which use Output Feedback Mode (OFB):
    • AES with a 128-bit key (the default)
    • AES with a 192-bit key
    • AES with a 256-bit key
  • An optional "initialization vector" provides an additional level of obfuscation. This parameter provides an efficient way to re-scramble the encrypted data without having to rebuild the overall character mapping that is generated when a new key is used. Instead of using a different key for each function call, you can use the same key and adjust the vector. Taken together, the key and vector parameters function like a login credential with two levels of privacy and security.

Using ybunload, you can unload data with a query that contains the ENCRYPT_KS function. You can reload the data as is with ybload, then decrypt it with a query or CTAS statement that uses the DECRYPT_KS function. (You cannot load and decrypt data in one step with ybload.)

The Yellowbrick implementation of the ENCRYPT_KS and DECRYPT_KS functions contains code under the OpenSSL license.