Step 2 - Ingest Data

Most clients using Hydrolix are streaming data through the REST API. If you want to see how that works:

  1. Go to the file in your starter project, update the file with your cluster’s DNS (replace {$your_cluster_DNS} )
  2. Run python
  3. Study the structure of the python

Note that security (basic authentication) is not turned on for simplicity, but it could be, in which case, a token would need to be provided in the headers, in that case. Rotating tokens can be set through table settings.

The stream request API documentation provides a handy tool for working with desired API calls to form a request in a specific language.

A working example of a CURL stream request is provided in the sample code you downloaded.

There are other ways to ingest data into Hydrolix, which are going to useful to know about, but outside of scope for getting started:

  1. Kafka streaming - uses the same streaming principles
  2. Kinesis streaming - uses the same streaming principles
    S3 batch loads - purely batch
  3. Autoingest - batch loads driven by Lambda triggered by updates to cloud storage

Each of these solutions have their own architectural advantages, should have a cluster tuned to work well together, and is outside of scope of getting started.

Similarly, there are excellent tools like Logstash, Fluentd, and which will work nicely with the streaming API layer.

So, now you have it! You are ready to configure and stream in terabytes a day of data - into the same cluster without worrying about every adding disks to do this again! (just be sure to work with your admins to assign enough stream peers to cover that kind of streaming load). Petabytes can now be achieved!