Prerequisites

To follow along with this tutorial, you will need the following:

  • A running Hydrolix cluster with access
  • Python3
  • HDXCLI, a command line tool that allows you to do admin tasks on Hydrolix
  • A copy of starter project in github here - folder nginx_web_access_logs used extensively

Hydrolix allows you to store high-volume time-series data at low cost for use cases like CDN logs, security logs, web server logs, and sensor logs. Some companies are using Hydrolix to store 1 to 5 TBs a day for a full year.

Cloud storage is used to index the data, so once your install is complete, the bulk of project work will typically consist of doing the following:

  1. Modeling how data is stored
  2. Ingesting data
  3. Querying data
  4. Adding advanced features such as dashboards and real-time aggregations
  5. Optimizing for high volume performance and lower costs

In this tutorial, you will learn the basics of modeling, ingesting, and querying data. You'll use a NGINX access log and try out the provided example by downloading the sample code and following along. You are encouraged to experiment and explore!