Projects & Tables

Hydrolix stores data within tables. You can group tables together in logical namespaces called projects. To reference your data, use the full path project.table, for example monitoring.http_logs.

Projects

Projects are equivalent to databases in a traditional RDBMS. You can create any number of projects, as long as each name is unique. We recommend picking short lower case names. When you must use a longer name, break it up logically with underscores to improve readability and decrease the likelihood of typos.

For example, an organization could contain three different projects:

  • "Systems Monitoring"
  • "Stock Trading"
  • "IOT"

These projects could contain 3 independent sets of unrelated data. They can all coexist in the same deployment.

You can manage projects via the API or the Portal UI.

Creating a Project via API

You must authenticate to use the API.

  1. Login with your username/password.
  2. Create a Project providing a name and description.

The following code snippets demonstrate an example of a request made to the create project API and the corresponding response after Hydrolix successfully creates a project:

{
  "name": "monitoring",
  "description": "Global monitoring of web services"
}
{
    "uuid": "dfadb1a9-c2ec-4e3e-aab6-1117c5532843",
    "name": "monitoring",
    "description": "Global monitoring of web services",
    ...
}

The response contains the uuid of the created project. To references resources contained within a project, like tables and transforms, include the project uuid path parameter in your request made to those API endpoints.

Settings

The settings object specifies project-level configuration, describing default options such as default configuration for queries against data stored in the project and rate limiting.

PropertyTypePurposeDefaultRequired
default_query_optionsobjectSee Advanced Query Options for descriptions of each option.See Advanced Query Options for default values.No
blobobjectDo not use this API option.nullNo
rate_limitobjectLimits bytes per second ingest rate and max payload size. See Rate Limits.no limitNo

Tables

A table (and the associated Write Transforms ) represents your data-set. Hydrolix will store it as a compressed, sorted, two-dimensional data structure in a number of .hdx files in cloud storage (AWS/GCP). It's referenced via the project and you can have many tables in the same project.

The Table API endpoint allows you to define a name for your data, along with:

  • controls for stream ingest - hot/cold data parameters
  • controls for autoingest - patterns and queues to read for notifications
  • enable/disable background merge functionality to optimize your data storage
  • TTL and removal of old data

You have full control of how data is ingested into a table, backed by sane defaults if you choose not to modify for the likes of streaming ingest.

Before you can ingest data you will need to define a transform (a write schema) for a table, describing the data types to use.

👍

Advanced concept: Tables are flexible by design!

One table may have multiple ingest transforms, essentially expanding the column width of a table.

  • multi transforms on a single table must share the same datetime column
  • the resulting table column width is a union of all transforms
  • ideal for very closely associated data-sets arriving from different ingest methods

The sample project used in tutorials include a variety of tables. Each table has several columns. The table metrics has columns timestamp, hostname, region, etc...

You can manage tables via REST API or the Web UI.

Create a Table via API

You will need to be authenticated to use the API.

  • Log in with your username/password. Here's how to do it with cURL:

Get the bearer token, which is good for the next 24 hours, to authenticate future API calls. This command assumes you've set the $HDX_HOSTNAME, $HDX_USER and $HDX_PASSWORD environment variables:

export HDX_TOKEN=$(
  curl -v -X POST -H "Content-Type: application/json" \
  https://$HDX_HOSTNAME/config/v1/login/ \
  -d "{
    \"username\":\"$HDX_USER\",
    \"password\":\"$HDX_PASSWORD\"  
  }" | jq -r ".auth_token.access_token"
)
  • Create Table providing a name and ingest settings (optional):

An example create table API request/response exchange:

{
  "name": "http_logs",
  "description" : "web logs"
}
{
	"project": "6b0692f9-c040-47b1-988a-582e57dd3631",
	"name": "http_logs",
	"description": "web_logs",
	"uuid": "94dba0fa-24f6-4962-9190-e47ead444ec4",
	"created": "2022-05-31T03:48:55.172580Z",
	"modified": "2022-05-31T03:48:55.172599Z",
	"settings": {
		"stream": {
			"hot_data_max_age_minutes": 3,
			"hot_data_max_active_partitions": 3,
			"hot_data_max_rows_per_partition": 12288000,
			"hot_data_max_minutes_per_partition": 1,
			"hot_data_max_open_seconds": 60,
			"hot_data_max_idle_seconds": 30,
			"cold_data_max_age_days": 3650,
			"cold_data_max_active_partitions": 50,
			"cold_data_max_rows_per_partition": 12288000,
			"cold_data_max_minutes_per_partition": 60,
			"cold_data_max_open_seconds": 300,
			"cold_data_max_idle_seconds": 60
		},
		"age": {
			"max_age_days": 0
		},
		"reaper": {
			"max_age_days": 1
		},
		"merge": {
			"enabled": true,
			"partition_duration_minutes": 60,
			"input_aggregation": 20000000000,
			"max_candidates": 20,
			"max_rows": 10000000,
			"max_partitions_per_candidate": 100,
			"min_age_mins": 1,
			"max_age_mins": 10080
		},
		"autoingest": {
			"enabled": false,
			"source": "",
			"pattern": "",
			"max_rows_per_partition": 12288000,
			"max_minutes_per_partition": 60,
			"max_active_partitions": 50,
			"input_aggregation": 1073741824,
			"dry_run": false
		},
		"sort_keys": [],
		"shard_key": null,
		"max_future_days": 0
	},
	"url": "https://my-domain.hydrolix.live/config/v1/orgs/0ffa6312-61ba-4620-8d57-96514a7f3859/projects/6b0692f9-c040-47b1-988a-582e57dd3631/tables/94dba0fa-24f6-4962-9190-e47ead444ec4"
}

The response contains the uuid of the created table. All resources contained within a table (like transforms) are referenced via the project uuid path parameter and table uuid in their API endpoints. Therefore, you will need to store the table uuid.

Settings

The settings object specifies table-level configuration, describing default options at data storage and query time such as default query options, rate limits, shard keys, and other behaviors.

PropertyTypePurposeDefaultRequired
default_query_optionsobjectConfigure the default query option settings for queries against data in this table. See Advanced Query Options for descriptions of each option.See Advanced Query Options for default values.No
rate_limitobjectLimits bytes per second ingest rate and max payload size. See Rate Limits.no limitNo
summaryobjectSet this option if you want to create a summary table.nullNo
streamobjectSet this option to configure stream ingest options for the table.noneNo
ageobjectUse this setting to configure a TTL after which data will be deactivated.See Data Lifecycle Management for default values.No
reaperobjectUse this setting to configure a TTL after which data will be deleted.See Data Lifecycle Management for default values.No
mergeobjectEnable/disable merge and configure the merge pools. You can read more in the Merge Pools documentation."enabled": true. All other nested options default to null.No
autoingestarray[object]Enable and configure a continuous, batch ingest task for this table. More information on the Batch Ingest page."enabled": falseNo
sort_keysarray[string]Change the sort order of data as it's ingested and stored. You can read more about this setting in the Table Settings Reference.Defaults to null. By default, Hydrolix sorts columns according to cardinality.No
shard_keystringShard based on a specified key rather than by the default sharding mechanism (time-based). You can read more about this setting in the Table Settings Reference.Default to null which results in time-based sharding.No
max_future_daysintegerRetain rows with a timestamp less than this configured value of unit days. Read more in the Table Settings Reference.0No
max_request_bytesintegerMaximum allowed request size in bytes as measured by the content length of the request.0 which defaults to no configured maximumNo
storage_mapobjectAssigns a default storage bucket to a table. You can read more in the Table Settings Reference.turbineNo

What’s Next