Unloading Data to an S3 Bucket

These instructions assume that you already have an Amazon Simple Storage Service (Amazon S3) account, know the S3 location of the files you want to load, and have your access keys ready at hand. If not, follow the instructions on the AWS site before proceeding with the information here. Specific prerequisites include:

  • An Amazon S3 account with valid credentials. S3 credentials must be provided in a manner supported by the Amazon AWS Java SDK:
    • Secure methods (integrated into your organization's login/identity mechanism):
      • EC2 roles (when running on Amazon EC2 instances)
      • SAML 2.0-compatible identity provider
      • Custom identity provider bridge to Amazon AWS
    • Other methods:
      • Environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
      • URI query parameters: aws_access_key_id and aws_secret_access_key
      • A credential file, typically located at ~/.aws/credentials (location may vary per platform)

    Your organization's AWS administrator should provide instructions for the mechanism you use. See Best Practices for Managing AWS Access Keys for further recommendations.

  • An installation of the AWS CLI (required on Linux platforms and optional on Windows platforms). This CLI provides the aws configure command for setting credentials. For details, see the AWS Command Line Interface documentation.

Assuming that these prerequisites are in place, you can unload data from a table or a query to a folder in an S3 bucket.

To unload a table or query results to an Amazon S3 bucket:

  1. Record your AWS security credentials (access key ID and secret access key).
  2. Run aws configure or use set commands to set the environment variables for the AWS access keys.
    For example, on Linux:
    $ aws configure
    AWS Access Key ID [****************SWUA]: 
    AWS Secret Access Key [****************n2L+]: 
    Default region name [None]: 
    Default output format [None]: 
    For example, on Windows:
    set AWS_ACCESS_KEY_ID=<key value>
    set AWS_SECRET_ACCESS_KEY=<key value>
  3. Record the S3 location of the folder you want to use as the destination for the unload operation. Two formats are supported in the ybunload syntax:
    • The complete HTTPS link. For example:
      https://s3-us-west-2.amazonaws.com/yb-tmp/premdb/premdb_unloads
    • The abbreviated S3 path. For example:
      s3://yb-tmp/premdb/premdb_unloads
    In this example, yb-tmp is the S3 bucket name, with folders below it.
  4. Run the ybload command in the usual way, providing the correct S3 path to the source file. For example:
    $ ybunload -d premdb -t match --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads
    Here is the equivalent command with the longer-form HTTPS link to the S3 file:
    $ ybunload -d premdb -t match --username bobr -W -o https://s3-us-west-2.amazonaws.com/yb-tmp/premdb/premdb_unloads
If an S3 unload fails and is left in a hung state, you can use the ybunload command with the --cancel-hung-uploads option to abort the upload. For example:
$ ybunload -d premdb -t newmatchstats --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads
Database login password: 
18:10:15.922 [ INFO] ybunload version 1.2.2-5686
18:10:15.924 [ INFO] COMMAND LINE = -d premdb -t newmatchstats --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads
18:10:15.930 [ INFO] Creating S3 client with default credential search path
18:10:16.429 [ INFO] Verifying unload statement...
18:10:16.501 [ INFO] Unload statement verified
18:10:16.501 [ INFO] Beginning unload to s3://yb-tmp/premdb/premdb_unloads
18:10:18.890 [ INFO] Key Name: unload_1_1_.csv Upload ID = Z9FDA9SSSrBY_1wLK8RKYPDggkmOa.RGhmuY3.BY0kfCdfR._bWBMzK_7D.y33sQ5DYD1Xe8I.Q3.JSW0G3m.C8iyR8WE7m9ltzIZ7yoQrzmBfNvhArcKI3X3IaZ_nkx
18:10:19.070 [ INFO] state: RUNNING - Network BW: 11.56 MB/s Disk BW: 0.00 KB/s
18:10:20.069 [ INFO] state: RUNNING - Network BW: 3.08 MB/s Disk BW: 5.69 MB/s
...
^C18:10:43.385 [ERROR] Caught ^C Forcing abort
18:10:43.386 [ WARN] Aborting multi part upload: Z9FDA9SSSrBY_1wLK8RKYPDggkmOa.RGhmuY3.BY0kfCdfR._bWBMzK_7D.y33sQ5DYD1Xe8I.Q3.JSW0G3m.C8iyR8WE7m9ltzIZ7yoQrzmBfNvhArcKI3X3IaZ_nkx
...
[1]+  Stopped                 ybunload -d premdb -t newmatchstats --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads
$ ybunload -d premdb --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads --cancel-hung-uploads
Database login password: 
18:13:02.479 [ INFO] ybunload version 1.2.2-5686
18:13:02.481 [ INFO] COMMAND LINE = -d premdb --username bobr -W -o s3://yb-tmp/premdb/premdb_unloads --cancel-hung-uploads
18:13:02.487 [ INFO] Creating S3 client with default credential search path
18:13:02.998 [ INFO] Looking for uploads with prefix unload
18:13:03.633 [ INFO] Found 1 multi part uploads.
18:13:03.634 [ INFO] Cancelling unload_1_1_.csv +ID: Z9FDA9SSSrBY_1wLK8RKYPDggkmOa.RGhmuY3.BY0kfCdfR._bWBMzK_7D.y33sQ5DYD1Xe8I.Q3.JSW0G3m.C8iyR8WE7m9ltzIZ7yoQrzmBfNvhArcKI3X3IaZ_nkx
$