Projects & Tables
Hydrolix stores data within tables. You can group tables together in logical namespaces called projects. To reference your data, use the full path project.table
, e.g. monitoring.http_logs
.
Projects
Projects are equivalent to databases in a traditional RDBMS. You can create any number of projects, as long as each name is unique. We recommend picking short lower case names. When you must use a longer name, break it up logically with underscores to improve readability and decrease the likelihood of typos.
For example, an organization could contain three different projects:
- "Systems Monitoring"
- "Stock Trading"
- "IOT"
These projects could contain 3 independent sets of unrelated data. They can all coexist in the same deployment.
You can manage projects via the API or the Portal UI.
Creating a Project via API
You must authenticate to use the API.
- Login with your username/password.
- Create a Project providing a name and description.
The following code snippets demonstrate an example of a request made to the create project API and the corresponding response after Hydrolix successfully creates a project:
{
"name": "monitoring",
"description": "Global monitoring of web services"
}
{
"uuid": "dfadb1a9-c2ec-4e3e-aab6-1117c5532843",
"name": "monitoring",
"description": "Global monitoring of web services",
...
}
The response contains the uuid of the created project. To references resources contained within a project, like tables and transforms, include the project uuid path parameter in your request made to those API endpoints.
Tables
A table (and the associated Write Transforms ) represents your data-set. Hydrolix will store it as a compressed, sorted, two-dimensional data structure in a number of .hdx files in cloud storage (AWS/GCP). It is referenced via the project and you can have many tables in the same project.
The Table API endpoint allows you to define a name for your data, along with:
- controls for stream ingest - hot/cold data parameters
- controls for auto-ingest - patterns and queues to read for notifications
- enable/disable background merge functionality to optimize your data storage
- TTL and removal of old data
You have full control of how data is ingested into a table, backed by sane defaults if you choose not to modify for the likes of streaming ingest.
Before you can ingest data you will need to define a transform (a write schema) for a table, describing the data types to use.
Advanced concept: Tables are flexible by design!
One table may have multiple ingest transforms, essentially expanding the column width of a table.
- multi transforms on a single table must share the same datetime column
- the resulting table column width is a union of all transforms
- ideal for very closely associated data-sets arriving from different ingest methods
The sample
project used in tutorials include a variety of tables. Each table has several columns. The table metrics
has columns timestamp
, hostname
, region
, etc...
You can manage tables via REST API or the Web UI.
Create a Table via API
You will need to be authenticated to use the API.
- Login with your username/password.
- Create Table providing a name and ingest settings (optional)
An example create table API request/response exchange:
{
"name": "http_logs",
"description" : "web logs"
}
{
"project": "6b0692f9-c040-47b1-988a-582e57dd3631",
"name": "http_logs",
"description": "web_logs",
"uuid": "94dba0fa-24f6-4962-9190-e47ead444ec4",
"created": "2022-05-31T03:48:55.172580Z",
"modified": "2022-05-31T03:48:55.172599Z",
"settings": {
"stream": {
"hot_data_max_age_minutes": 3,
"hot_data_max_active_partitions": 3,
"hot_data_max_rows_per_partition": 12288000,
"hot_data_max_minutes_per_partition": 1,
"hot_data_max_open_seconds": 60,
"hot_data_max_idle_seconds": 30,
"cold_data_max_age_days": 365,
"cold_data_max_active_partitions": 50,
"cold_data_max_rows_per_partition": 12288000,
"cold_data_max_minutes_per_partition": 60,
"cold_data_max_open_seconds": 300,
"cold_data_max_idle_seconds": 60
},
"age": {
"max_age_days": 0
},
"reaper": {
"max_age_days": 1
},
"merge": {
"enabled": true,
"partition_duration_minutes": 60,
"input_aggregation": 20000000000,
"max_candidates": 20,
"max_rows": 10000000,
"max_partitions_per_candidate": 100,
"min_age_mins": 1,
"max_age_mins": 10080
},
"autoingest": {
"enabled": false,
"source": "",
"pattern": "",
"max_rows_per_partition": 12288000,
"max_minutes_per_partition": 60,
"max_active_partitions": 50,
"input_aggregation": 1073741824,
"dry_run": false
},
"sort_keys": [],
"shard_key": null,
"max_future_days": 0
},
"url": "https://my-domain.hydrolix.live/config/v1/orgs/0ffa6312-61ba-4620-8d57-96514a7f3859/projects/6b0692f9-c040-47b1-988a-582e57dd3631/tables/94dba0fa-24f6-4962-9190-e47ead444ec4"
}
The response contains the uuid of the created table. All resources contained within a table (like transforms) are referenced via the project uuid path parameter and table uuid in their API endpoints. Therefore, you will need to store the table uuid.
Updated 12 days ago