Projects & Tables

Hydrolix stores data within tables. You can group tables together in logical namespaces called projects. To reference your data, use the full path project.table, e.g. monitoring.http_logs.

Projects

Projects are equivalent to databases in a traditional RDBMS. You can create any number of projects, as long as each name is unique. We recommend picking short lower case names. When you must use a longer name, break it up logically with underscores to improve readability and decrease the likelihood of typos.

For example, an organization could contain three different projects:

  • "Systems Monitoring"
  • "Stock Trading"
  • "IOT"

These projects could contain 3 independent sets of unrelated data. They can all coexist in the same deployment.

You can manage projects via the API or the Portal UI.

Creating a Project via API

You must authenticate to use the API.

  1. Login with your username/password.
  2. Create a Project providing a name and description.

The following code snippets demonstrate an example of a request made to the create project API and the corresponding response after Hydrolix successfully creates a project:

{
  "name": "monitoring",
  "description": "Global monitoring of web services"
}
{
    "uuid": "dfadb1a9-c2ec-4e3e-aab6-1117c5532843",
    "name": "monitoring",
    "description": "Global monitoring of web services",
    ...
}

The response contains the uuid of the created project. To references resources contained within a project, like tables and transforms, include the project uuid path parameter in your request made to those API endpoints.

Tables

A table (and the associated Write Transforms ) represents your data-set. Hydrolix will store it as a compressed, sorted, two-dimensional data structure in a number of .hdx files in cloud storage (AWS/GCP). It is referenced via the project and you can have many tables in the same project.

The Table API endpoint allows you to define a name for your data, along with:

  • controls for stream ingest - hot/cold data parameters
  • controls for auto-ingest - patterns and queues to read for notifications
  • enable/disable background merge functionality to optimize your data storage
  • TTL and removal of old data

You have full control of how data is ingested into a table, backed by sane defaults if you choose not to modify for the likes of streaming ingest.

Before you can ingest data you will need to define a transform (a write schema) for a table, describing the data types to use.

👍

Advanced concept: Tables are flexible by design!

One table may have multiple ingest transforms, essentially expanding the column width of a table.

  • multi transforms on a single table must share the same datetime column
  • the resulting table column width is a union of all transforms
  • ideal for very closely associated data-sets arriving from different ingest methods

The sample project used in tutorials include a variety of tables. Each table has several columns. The table metrics has columns timestamp, hostname, region, etc...

You can manage tables via REST API or the Web UI.

Create a Table via API

You will need to be authenticated to use the API.

  • Log in with your username/password. Here's how to do it with cURL:

Get the bearer token, which is good for the next 24 hours, to authenticate future API calls. This command assumes you've set the $HDX_HOSTNAME, $HDX_USER and $HDX_PASSWORD environment variables:

export HDX_TOKEN=$(
  curl -v -X POST -H "Content-Type: application/json" \
  https://$HDX_HOSTNAME/config/v1/login/ \
  -d "{
    \"username\":\"$HDX_USER\",
    \"password\":\"$HDX_PASSWORD\"  
  }" | jq -r ".auth_token.access_token"
)
  • Create Table providing a name and ingest settings (optional):

An example create table API request/response exchange:

{
  "name": "http_logs",
  "description" : "web logs"
}
{
	"project": "6b0692f9-c040-47b1-988a-582e57dd3631",
	"name": "http_logs",
	"description": "web_logs",
	"uuid": "94dba0fa-24f6-4962-9190-e47ead444ec4",
	"created": "2022-05-31T03:48:55.172580Z",
	"modified": "2022-05-31T03:48:55.172599Z",
	"settings": {
		"stream": {
			"hot_data_max_age_minutes": 3,
			"hot_data_max_active_partitions": 3,
			"hot_data_max_rows_per_partition": 12288000,
			"hot_data_max_minutes_per_partition": 1,
			"hot_data_max_open_seconds": 60,
			"hot_data_max_idle_seconds": 30,
			"cold_data_max_age_days": 3650,
			"cold_data_max_active_partitions": 50,
			"cold_data_max_rows_per_partition": 12288000,
			"cold_data_max_minutes_per_partition": 60,
			"cold_data_max_open_seconds": 300,
			"cold_data_max_idle_seconds": 60
		},
		"age": {
			"max_age_days": 0
		},
		"reaper": {
			"max_age_days": 1
		},
		"merge": {
			"enabled": true,
			"partition_duration_minutes": 60,
			"input_aggregation": 20000000000,
			"max_candidates": 20,
			"max_rows": 10000000,
			"max_partitions_per_candidate": 100,
			"min_age_mins": 1,
			"max_age_mins": 10080
		},
		"autoingest": {
			"enabled": false,
			"source": "",
			"pattern": "",
			"max_rows_per_partition": 12288000,
			"max_minutes_per_partition": 60,
			"max_active_partitions": 50,
			"input_aggregation": 1073741824,
			"dry_run": false
		},
		"sort_keys": [],
		"shard_key": null,
		"max_future_days": 0
	},
	"url": "https://my-domain.hydrolix.live/config/v1/orgs/0ffa6312-61ba-4620-8d57-96514a7f3859/projects/6b0692f9-c040-47b1-988a-582e57dd3631/tables/94dba0fa-24f6-4962-9190-e47ead444ec4"
}

The response contains the uuid of the created table. All resources contained within a table (like transforms) are referenced via the project uuid path parameter and table uuid in their API endpoints. Therefore, you will need to store the table uuid.


What’s Next