Projects & Tables

Hydrolix stores data within tables. You can group tables together in logical namespaces called projects. To reference your data, use the full path project.table, for example monitoring.http_logs.

Projects

Projects are equivalent to databases in a traditional RDBMS. You can create any number of projects, as long as each name is unique. We recommend picking short lower case names. When you must use a longer name, break it up logically with underscores to improve readability and decrease the likelihood of typos.

For example, an organization could contain three different projects:

Multiple projects in an organization

  • "Systems Monitoring"
  • "Stock Trading"
  • "IOT"

These projects could contain 3 independent sets of unrelated data. They can all coexist in the same deployment.

You can manage projects via the API or the Portal UI.

Creating a Project via API

You must authenticate to use the API.

  1. Login with your username/password.
  2. Create a Project providing a name and description.

The following code snippets demonstrate an example of a request made to the create project API and the corresponding response after Hydrolix successfully creates a project:

{
  "name": "monitoring",
  "description": "Global monitoring of web services"
}
{
    "uuid": "dfadb1a9-c2ec-4e3e-aab6-1117c5532843",
    "name": "monitoring",
    "description": "Global monitoring of web services",
    ...
}

The response contains the uuid of the created project. To references resources contained within a project, like tables and transforms, include the project uuid path parameter in your request made to those API endpoints.

Project settings

The settings object specifies project-level configuration, describing default options such as default configuration for queries against data stored in the project and rate limiting.

PropertyTypePurposeDefaultRequired
default_query_optionsobjectSee Query Options Reference for descriptions of each option.See Query Options ReferenceNo
blobobjectDon't use this API option.nullNo
rate_limitobjectLimits bytes per second ingest rate and max payload size. See Rate Limits.no limitNo

Tables

A table (and the associated Write Transforms ) represents your data-set. Hydrolix will store it as a compressed, sorted, two-dimensional data structure in a number of .hdx files in cloud storage (AWS/GCP). It's referenced via the project and you can have many tables in the same project.

The Table API endpoint allows you to define a name for your data, along with:

  • controls for stream ingest - hot/cold data parameters
  • controls for autoingest - patterns and queues to read for notifications
  • enable/disable background merge functionality to optimize your data storage
  • TTL and removal of old data

You have full control of how data is ingested into a table, backed by sane defaults if you choose not to modify for the likes of streaming ingest.

Before you can ingest data you will need to define a transform (a write schema) for a table, describing the data types to use.

👍

Advanced concept: Tables are flexible by design!

One table may have multiple ingest transforms, essentially expanding the column width of a table.

  • multi transforms on a single table must share the same datetime column
  • the resulting table column width is a union of all transforms
  • ideal for very closely associated data-sets arriving from different ingest methods

The sample project used in tutorials include a variety of tables. Each table has several columns. The table metrics has columns timestamp, hostname, region, etc...

Sample Table

You can manage tables via REST API or the Web UI.

Create a Table via API

You will need to be authenticated to use the API.

  • Log in with your username/password. Here's how to do it with cURL:

Get the bearer token, which is good for the next 24 hours, to authenticate future API calls. This command assumes you've set the $HDX_HOSTNAME, $HDX_USER and $HDX_PASSWORD environment variables:

export HDX_TOKEN=$(
  curl -v -X POST -H "Content-Type: application/json" \
  https://$HDX_HOSTNAME/config/v1/login/ \
  -d "{
    \"username\":\"$HDX_USER\",
    \"password\":\"$HDX_PASSWORD\"  
  }" | jq -r ".auth_token.access_token"
)
  • Create Table providing a name and ingest settings (optional):

An example create table API request/response exchange:

{
  "name": "http_logs",
  "description" : "web logs"
}
{
    "project": "6b0692f9-c040-47b1-988a-582e57dd3631",
    "name": "http_logs",
    "description": "web_logs",
    "uuid": "94dba0fa-24f6-4962-9190-e47ead444ec4",
    "created": "2022-05-31T03:48:55.172580Z",
    "modified": "2022-05-31T03:48:55.172599Z",
    "settings": {
        "stream": {
            "hot_data_max_age_minutes": 3,
            "hot_data_max_active_partitions": 3,
            "hot_data_max_rows_per_partition": 12288000,
            "hot_data_max_minutes_per_partition": 1,
            "hot_data_max_open_seconds": 60,
            "hot_data_max_idle_seconds": 30,
            "cold_data_max_age_days": 3650,
            "cold_data_max_active_partitions": 50,
            "cold_data_max_rows_per_partition": 12288000,
            "cold_data_max_minutes_per_partition": 60,
            "cold_data_max_open_seconds": 300,
            "cold_data_max_idle_seconds": 60
        },
        "age": {
            "max_age_days": 0
        },
        "reaper": {
            "max_age_days": 1
        },
        "merge": {
            "enabled": true,
            "partition_duration_minutes": 60,
            "input_aggregation": 20000000000,
            "max_candidates": 20,
            "max_rows": 10000000,
            "max_partitions_per_candidate": 100,
            "min_age_mins": 1,
            "max_age_mins": 10080
        },
        "autoingest": {
            "enabled": false,
            "source": "",
            "pattern": "",
            "max_rows_per_partition": 12288000,
            "max_minutes_per_partition": 60,
            "max_active_partitions": 50,
            "input_aggregation": 1073741824,
            "dry_run": false
        },
        "sort_keys": [],
        "shard_key": null,
        "max_future_days": 0
    },
    "url": "https://my-domain.hydrolix.live/config/v1/orgs/0ffa6312-61ba-4620-8d57-96514a7f3859/projects/6b0692f9-c040-47b1-988a-582e57dd3631/tables/94dba0fa-24f6-4962-9190-e47ead444ec4"
}

The response contains the uuid of the created table. All resources contained within a table (like transforms) are referenced via the project uuid path parameter and table uuid in their API endpoints. Therefore, you will need to store the table uuid.

Table settings

The settings object specifies table-level configuration, describing default options at data storage and query time such as default query options, rate limits, shard keys, and other behaviors.

PropertyTypePurposeDefaultRequired
default_query_optionsobjectSet query options for this table. See Query Options Precedence and Query Options Reference.See Query Options ReferenceNo
rate_limitobjectLimits bytes per second ingest rate and max payload size. See Rate Limits.no limitNo
summaryobjectSet this option if you want to create a summary table.nullNo
streamobjectSet this option to configure stream ingest options for the table.noneNo
ageobjectUse this setting to configure a TTL after which data will be deactivated.See Data Lifecycle ManagementNo
reaperobjectUse this setting to configure a TTL after which data will be deleted.See Data Lifecycle ManagementNo
mergeobjectEnable/disable merge and configure the merge pools. See the Merge Pools documentation."enabled": true, all other nested options default to nullNo
autoingestarray[object]Enable and configure a continuous, batch ingest task for this table. See also Batch Ingest."enabled": falseNo
sort_keysarray[string]Change the sort order of data as it's ingested and stored. See also Table Settings Reference.null, Hydrolix sorts columns according to cardinality.No
shard_keystringShard based on a specified key rather than the default, time-based. See also Table Settings Reference.null, results in time-based shardingNo
max_future_daysintegerRetain rows with a timestamp less than this configured value of unit days. See also Table Settings Reference.0No
max_request_bytesintegerMaximum allowed request size in bytes as measured by the content length of the request.0, specifies no configured maximumNo
storage_mapobjectAssigns a default storage bucket to a table. See also Table Settings Reference.turbineNo

What’s Next