Step 2 - Ingest Data

Most clients using Hydrolix are streaming data through the REST API. If you want to see how that works:

  1. Go to the request.py file in your starter project, update the file with your cluster’s DNS (replace {$your_cluster_DNS} )
  2. Run python request.py
  3. Study the structure of the python

Before using any of these methods, turn Unified Authentication off for simplicity. Do this by adding unified_auth: false to your hydrolixconfig.yaml file.

The stream request API documentation provides a handy tool for working with desired API calls to form a request in a specific language.

A working example of a CURL stream request is provided in the sample code you downloaded.

There are other ways to ingest data into Hydrolix, which are going to useful to know about, but outside of scope for getting started:

  1. Kafka streaming - uses the same streaming principles
  2. Kinesis streaming - uses the same streaming principles
    S3 batch loads - purely batch
  3. Autoingest - batch loads driven by Lambda triggered by updates to cloud storage
    .

Each of these solutions have their own architectural advantages, should have a cluster tuned to work well together, and is outside of scope of getting started.

Similarly, there are excellent tools like Logstash, Fluentd, and vector.dev which will work nicely with the streaming API layer.

So, now you have it! You are ready to configure and stream in terabytes a day of data - into the same cluster without worrying about every adding disks to do this again! (just be sure to work with your admins to assign enough stream peers to cover that kind of streaming load). Petabytes can now be achieved!