Projects & Tables
Hydrolix stores data within tables. You can group tables together in logical namespaces called projects. To reference your data, use the full path project.table
, e.g. monitoring.http_logs
.
Projects
Projects are equivalent to databases in a traditional RDBMS. You can create any number of projects, as long as each name is unique. We recommend picking short lower case names. When you must use a longer name, break it up logically with underscores to improve readability and decrease the likelihood of typos.
For example, an organization could contain three different projects:
- "Systems Monitoring"
- "Stock Trading"
- "IOT"
These projects could contain 3 independent sets of unrelated data. They can all coexist in the same deployment.
You can manage projects via the API or the Portal UI.
Creating a Project via API
You must authenticate to use the API.
- Login with your username/password.
- Create a Project providing a name and description.
The following code snippets demonstrate an example of a request made to the create project API and the corresponding response after Hydrolix successfully creates a project:
{
"name": "monitoring",
"description": "Global monitoring of web services"
}
{
"uuid": "dfadb1a9-c2ec-4e3e-aab6-1117c5532843",
"name": "monitoring",
"description": "Global monitoring of web services",
...
}
The response contains the uuid of the created project. To references resources contained within a project, like tables and transforms, include the project uuid path parameter in your request made to those API endpoints.
Tables
A table (and the associated Write Transforms ) represents your data-set. Hydrolix will store it as a compressed, sorted, two-dimensional data structure in a number of .hdx files in cloud storage (AWS/GCP). It is referenced via the project and you can have many tables in the same project.
The Table API endpoint allows you to define a name for your data, along with:
- controls for stream ingest - hot/cold data parameters
- controls for auto-ingest - patterns and queues to read for notifications
- enable/disable background merge functionality to optimize your data storage
- TTL and removal of old data
You have full control of how data is ingested into a table, backed by sane defaults if you choose not to modify for the likes of streaming ingest.
Before you can ingest data you will need to define a transform (a write schema) for a table, describing the data types to use.
Advanced concept: Tables are flexible by design!
One table may have multiple ingest transforms, essentially expanding the column width of a table.
- multi transforms on a single table must share the same datetime column
- the resulting table column width is a union of all transforms
- ideal for very closely associated data-sets arriving from different ingest methods
The sample
project used in tutorials include a variety of tables. Each table has several columns. The table metrics
has columns timestamp
, hostname
, region
, etc...
You can manage tables via REST API or the Web UI.
Create a Table via API
You will need to be authenticated to use the API.
- Log in with your username/password. Here's how to do it with cURL:
Get the bearer token, which is good for the next 24 hours, to authenticate future API calls. This command assumes you've set the $HDX_HOSTNAME, $HDX_USER and $HDX_PASSWORD environment variables:
export HDX_TOKEN=$(
curl -v -X POST -H "Content-Type: application/json" \
https://$HDX_HOSTNAME/config/v1/login/ \
-d "{
\"username\":\"$HDX_USER\",
\"password\":\"$HDX_PASSWORD\"
}" | jq -r ".auth_token.access_token"
)
- Create Table providing a name and ingest settings (optional):
An example create table API request/response exchange:
{
"name": "http_logs",
"description" : "web logs"
}
{
"project": "6b0692f9-c040-47b1-988a-582e57dd3631",
"name": "http_logs",
"description": "web_logs",
"uuid": "94dba0fa-24f6-4962-9190-e47ead444ec4",
"created": "2022-05-31T03:48:55.172580Z",
"modified": "2022-05-31T03:48:55.172599Z",
"settings": {
"stream": {
"hot_data_max_age_minutes": 3,
"hot_data_max_active_partitions": 3,
"hot_data_max_rows_per_partition": 12288000,
"hot_data_max_minutes_per_partition": 1,
"hot_data_max_open_seconds": 60,
"hot_data_max_idle_seconds": 30,
"cold_data_max_age_days": 3650,
"cold_data_max_active_partitions": 50,
"cold_data_max_rows_per_partition": 12288000,
"cold_data_max_minutes_per_partition": 60,
"cold_data_max_open_seconds": 300,
"cold_data_max_idle_seconds": 60
},
"age": {
"max_age_days": 0
},
"reaper": {
"max_age_days": 1
},
"merge": {
"enabled": true,
"partition_duration_minutes": 60,
"input_aggregation": 20000000000,
"max_candidates": 20,
"max_rows": 10000000,
"max_partitions_per_candidate": 100,
"min_age_mins": 1,
"max_age_mins": 10080
},
"autoingest": {
"enabled": false,
"source": "",
"pattern": "",
"max_rows_per_partition": 12288000,
"max_minutes_per_partition": 60,
"max_active_partitions": 50,
"input_aggregation": 1073741824,
"dry_run": false
},
"sort_keys": [],
"shard_key": null,
"max_future_days": 0
},
"url": "https://my-domain.hydrolix.live/config/v1/orgs/0ffa6312-61ba-4620-8d57-96514a7f3859/projects/6b0692f9-c040-47b1-988a-582e57dd3631/tables/94dba0fa-24f6-4962-9190-e47ead444ec4"
}
The response contains the uuid of the created table. All resources contained within a table (like transforms) are referenced via the project uuid path parameter and table uuid in their API endpoints. Therefore, you will need to store the table uuid.
Updated about 8 hours ago