Amazon Data Firehose

Send many different types of AWS-related log data to your Hydrolix cluster.

Overview

Hydrolix can ingest data streams from Amazon Data Firehose, bringing a wealth of different AWS services’ logs into your Hydrolix cluster. This enables rapid queries while delivering economic storage of long-term data. This feature was introduced in Hydrolix version 4.20.

You can configure Amazon Data Firehose to accept data from any of these Amazon Data Firehose-compatible sources, and then use Amazon Data Firehose to send data to your Hydrolix cluster via the Hydrolix HTTP Streaming API. In the steps below, you’ll set up a Hydrolix table and associated transform to accept the data.

Before You Begin

Gather the information you’ll be using. Here’s a table of the information you will need. You’ll be determining the HTTP Endpoint URL and Access Key in the steps following this table.

ItemDescriptionExample ValueSource
Firehose Stream NameThe name of the Amazon Data Firehose. This is meaningful in the AWS context, but not in Hydrolix.My-Sample-FirehoseYour choice
HTTP Endpoint URLThe URL of your Hydrolix cluster’s ingest endpoint, appended with table, project, and transform information.https://hydrolix.hostname.com /ingest/eventtable=sample_project.sample_table& transform=sample_transformHydrolix project and table setup (see below)
Access KeyA base64-encoded string containing a valid Hydrolix username and password delimited with a colonUnencoded: [email protected]:your.password
Encoded: eW91QHlvdXIuZW1haWwuZG9t YWluOnlvdXIucGFzc3dvcmQ=
Username/password encoding process (see below)
Destination ProjectThe project in your Hydrolix cluster that contains a destination tablesample_projectYour choice
Destination TableThe table in your Hydrolix cluster that will accept the data from Amazon Data Firehosesample_tableYour choice
Transform NameThe name of the transform that will accept Amazon Data Firehose-formatted data and add it to the destination tablesample_transformYour choice
S3 Backup Bucket NameThe name of a bucket that Amazon Data Firehose will save from failed transferss3://firehose-test-bucketYour AWS S3 administrator

Create Credential Information

Amazon Data Firehose sends data to the Hydrolix HTTP Streaming API with an accompanying X-Amz-Firehose-Access-Key HTTP header. The contents of this header is the Access Key, a base64-encoded, colon:separated concatenation of these two values:

  • A valid username from your Hydrolix cluster
  • The username’s password

There are many ways to base64-encode this information. Here are example commands to base64-encode strings on Linux, macOS, and Windows platforms:

% echo \-n "[[email protected]](mailto:[email protected]):your.password" | base64  
eW91QHlvdXIuZW1haWwuZG9tYWluOnlvdXIucGFzc3dvcmQ=  
C:\\Users\\Test\>powershell "\[convert\]::ToBase64String(\[Text.Encoding\]::UTF8.GetBytes(\\"[[email protected]](mailto:[email protected]):your.password\\"))"  
eW91QHlvdXIuZW1haWwuZG9tYWluOnlvdXIucGFzc3dvcmQ=  

Configure a Hydrolix Table and Transform

Define a table in your Hydrolix cluster to receive the data from Amazon Data Firehose, then define a transform to accept the data.

Create the Destination Table

Create a table via the Hydrolix UI using the “Table” option under the “Add new” menu on the upper right-hand corner of the screen:

You can also use the Hydrolix Configuration API to create the table. See Create a Table via API.

Create the Transform with Subtype firehose

Amazon Data Firehose sends data in a special format that includes repeated base64-encoded data fields. Hydrolix JSON transforms have a transform “subtype” called firehose to ingest this Amazon Data Firehose format.

As an example, a transform will be described that will accept data from the sample data generator in the AWS Console’s “Test with demo data” option in the Amazon Data Firehose configuration UI:

Create the Transform via the UI

  1. Attach a new transform to your table via the Hydrolix UI using the “Table Transform” option under the “Add new” menu on the upper right-hand corner of the screen:
  1. Fill in the resulting table with the name of your new destination table, a name for your transform, and a description:

Select the “Create” button, then select the “JSON” Data type button.

  1. This will bring you into the Hydrolix transform editor. Under “Transform Output Columns,” under the “</> JSON” tab, include the transform that will accept data from your data source. For example, this transform will accept data generated by the AWS Console’s “Test with demo data” option:
[
 {
   "name": "timestamp",
   "datatype": {
     "type": "epoch",
     "index": false,
     "primary": true,
     "format": "ms",
     "resolution": "seconds",
     "default": null,
     "script": null,
     "source": null,
     "suppress": false
   }
 },
 {
   "name": "requestId",
   "datatype": {
     "type": "string",
     "index": true,
     "format": null,
     "resolution": "seconds",
     "default": null,
     "script": null,
     "source": null,
     "suppress": false
   }
 },
 {
   "name": "CHANGE",
   "datatype": {
     "type": "double",
     "index": false,
     "format": null,
     "resolution": "seconds",
     "default": null,
     "script": null,
     "source": null,
     "suppress": false
   }
 },
 {
   "name": "PRICE",
   "datatype": {
     "type": "double",
     "index": false,
     "format": null,
     "resolution": "seconds",
     "default": null,
     "script": null,
     "source": null,
     "suppress": false
   }
 },
 {
   "name": "TICKER_SYMBOL",
   "datatype": {
     "type": "string",
     "index": true,
     "format": null,
     "resolution": "seconds",
     "default": null,
     "script": null,
     "source": null,
     "suppress": false
   }
 },
 {
   "name": "SECTOR",
   "datatype": {
     "type": "string",
     "index": true,
     "format": null,
     "resolution": "seconds",
     "default": null,
     "script": null,
     "source": null,
     "suppress": false
   }
 }
]

  1. Remove any text in the middle “Transform SQL” column on that page.

In the right-most “Sample data” column, you have the option of pasting in sample data to validate the transform in the left-most column. For demonstration purposes, here is some sample data that was produced by the AWS Console’s “Test with demo data” option:

{
 "records": [
   {
     "data": "eyJDSEFOR0UiOiAtMC4wMSwgIlBSSUNFIjogMTExLjE3LCAiVElDS0VSX1NZTUJPTCI6ICJHT09HTCIsICJTRUNUT1IiOiAiRW5lcmd5In0="
   },
   {
     "data": "eyJDSEFOR0UiOiAtMC4yNiwgIlBSSUNFIjogNzkuNDgsICJUSUNLRVJfU1lNQk9MIjogIkFNWk4iLCAiU0VDVE9SIjogIkhlYWx0aCJ9Cg=="
   },
   {
     "data": "eyJDSEFOR0UiOiAtMC4xMSwgIlBSSUNFIjogMTM2LjgxLCAiVElDS0VSX1NZTUJPTCI6ICJNU0ZUIiwgIlNFQ1RPUiI6ICJFbmVyZ3kifQo="
   },
   {
     "data": "eyJDSEFOR0UiOiAtMC40NCwgIlBSSUNFIjogMTEwLjUyLCAiVElDS0VSX1NZTUJPTCI6ICJBQVBMIiwgIlNFQ1RPUiI6ICJIZWFsdGgifQo="
   },
   {
     "data": "eyJDSEFOR0UiOiAtMC4yNiwgIlBSSUNFIjogMTI1LjksICJUSUNLRVJfU1lNQk9MIjogIkFBUEwiLCAiU0VDVE9SIjogIkZpbmFuY2UifQo="
   },
   {
     "data": "eyJDSEFOR0UiOiAtMC4yNSwgIlBSSUNFIjogOTYuNDgsICJUSUNLRVJfU1lNQk9MIjogIkFNWk4iLCAiU0VDVE9SIjogIkZpbmFuY2UifQo="
   },
   {
     "data": "eyJDSEFOR0UiOiAwLjE3LCAiUFJJQ0UiOiAxMzkuNTIsICJUSUNLRVJfU1lNQk9MIjogIk1TRlQiLCAiU0VDVE9SIjogIlRlY2gifQ=="
   },
   {
     "data": "eyJDSEFOR0UiOiAtMC4xLCAiUFJJQ0UiOiAxNDguODcsICJUSUNLRVJfU1lNQk9MIjogIkFBUEwiLCAiU0VDVE9SIjogIlFFIn0K"
   },
   {
     "data": "eyJDSEFOR0UiOiAwLjY4LCAiUFJJQ0UiOiA5Ny4xOSwgIlRJQ0tFUl9TWU1CT0wiOiAiQU1aTiIsICJTRUNUT1IiOiAiUUUifQo="
   },
   {
     "data": "eyJDSEFOR0UiOiAwLjI0LCAiUFJJQ0UiOiA2Ny42LCAiVElDS0VSX1NZTUJPTCI6ICJNU0ZUIiwgIlNFQ1RPUiI6ICJIZWFsdGgifQo="
   }
 ],
 "requestId": "e18945fc-6df4-4356-87d0-6d09d58b5a48",
 "timestamp": 1727768303433
}

  1. Designate this transform as a Firehose transform. Find the “Format Options” button underneath the right-most Sample Data field:

In the resulting options screen, select the “Firehose” transform subtype:

Select “Save changes”.

  1. Optionally, validate the transform format against the sample data by selecting the “Validate transform” button. You should see JSON-formatted results populating the "Output" box.
  2. The UI should show a sample of the properly-parsed data. Here’s an example using the above sample data:
  1. Once you’re satisfied with the results, select “Publish Transform.”

Create the Transform via the API

Transforms can also be created and published via the Hydrolix Configuration API.

Learn More About Transforms (Optional)

If you’re unfamiliar with Hydrolix transforms, consult our Write Transforms page to help you get started. To help you see the “shape” of your incoming data, consider using a catch-all transform to help you write your final transform.

Create Your Amazon Data Firehose

Now that you have a Hydrolix project, table, transform, and encoded access key, you’re ready to configure your Amazon Data Firehose.

  • In the Amazon Data Firehose UI, click “Create Firehose Stream.”

Specify Stream Source, Destination, and Name

  • Select the source of your data. For this example, use “Direct PUT.”

  • For the destination, select “HTTP Endpoint.” This will unfold more options to identify and authenticate to your Hydrolix HTTP Streaming API.

  • Under “Firehose stream name,” enter a descriptive name. Note that this name has no effect on where the data goes, and is not used in your Hydrolix cluster:

  • Skip the “Transform records” optional field.

Configure the Destination

Fill in the required fields in “Destination Settings” to point your Amazon Data Firehose to your Hydrolix cluster.

  • The HTTP endpoint name is for display and use in AWS, not Hydrolix.

  • Enter a URL for your Hydrolix cluster’s HTTP Streaming API, augmented by your chosen project name, table name, and transform name. This takes the form of
    https://<hostname>/ingest/event?table=<project_name>.<table_name>&transform=<transform_name>. Using the values used in this document, it would look like this:
    https://hydrolix.hostname.com/ingest/event?table=sample_project.sample_table&transform=sample_transform

  • Under “Authentication,” make sure ”Use access key” is selected.

  • Enter the base64 access key you created in the “Create Credential Information” step above.

  • Choose GZIP compression. Hydrolix also supports non-compressed payloads if you prefer.

Designate S3 Bucket for Failed Transfers

  • Configure which S3 bucket you would like Amazon Data Firehose to store data from failed transfers.

  • Accept the defaults in the remaining fields and click “Create Firehose Stream.”

Test the Integration

At this point, everything should be set up, connected, and ready for a test. Send data through the service you’ve selected to be the source of your Amazon Data Firehose data and check the AWS Console’s Destination Error Logs.

If you didn’t set up a source for your Amazon Data Firehose as in the examples above, you can still test by using the aforementioned “Test with demo data” functionality in the AWS Console which is accessible from your Firehose Stream’s UI.

After you click “Start sending demo data,” Amazon Data Firehose will send data to your Hydrolix cluster. If you used the transform given in the examples above, you can now query this data in Hydrolix. Data should start appearing within 15 seconds. Use the Hydrolix UI’s query interface to run a simple select * from sample_project.sample_table limit 10 to see the data you’ve sent. Here’s an example of the output:

Troubleshooting

Test the Authentication String

It’s easy to make mistakes while encoding, copying, and pasting authorization information. If you’re unsure your encoded string is correct, try decoding the string and attempt to validate against the Hydrolix HTTP Streaming API using command-line tools.

Decode Authentication

The string resulting from these decoding commands should match the username:password you entered in the previous step:

% echo -n eR91QHlvdXIuZW1haWruZG9tYWluOnlvdXIucGFzc3dgcmQ= | base64 --decode
[email protected]:your.password#

cURL the Hydrolix HTTP Streaming API

The cURL command is useful for testing authentication. Using commands similar to what’s below, make sure the Hydrolix cluster authorizes properly. The /version endpoint is a simple API endpoint to test against. In the example below, make sure to replace hydrolix.hostname.com with the hostname of your Hydrolix cluster.

% curl -H 'X-Amz-Firehose-Access-Key: eR91QHlvdXIuZW1haWruZG9tYWluOnlvdXIucGFzc3dgcmQ=' https://hydrolix.hostname.com/version
v4.20.0%

An unsuccessful validation will produce an Unauthorized error response like this:

curl -H 'X-Amz-Firehose-Access-Key: eR91QHlvdXIuZW1haWruZG9tYWluOnlvdXIucGFzc3dgcmQ=' https://hydrolix.hostname.com/version
401: Unauthorized%