Storage Settings Reference
Storage settings tell your cluster where to store your data. You can also tune performance, spread data across locations, and set the default location of your data.
Performance Mode
The io_perf_mode
storage setting tunes the parallelism and striping behavior when reading data from that particular storage. It has one of three settings:
Keyword | Integer Value |
---|---|
aggressive | 0 |
moderate | 1 |
relaxed | 2 |
If the io_perf_mode
value is a keyword (not an integer), the field will be updated to its integer value.
The default is aggressive
, which performs more read requests from storage with a smaller byte range. If you're experiencing throttling from cloud vendors who have stringent limits on iops, set it to moderate
or even relaxed
, which performs fewer, but larger requests. The example below sets it to moderate
.
"storage" : {
"name": "hdx_primary",
"description": "The initial bucket that comes up with the hdx cluster",
"org": "<example org>"
"settings": {
"bucket_name": "<example name>",
"bucket_path": "/",
"region": "<example region>",
"cloud": "<vendor",
"endpoint": "<endpoint URL>",
"io_perf_mode": "moderate"
}
}
Store Data Across Multiple Buckets
Hydrolix supports storing data in multiple different storage types. You can mix storage provided by different cloud providers, storage with different configurations within the same provider, or both.
Use Cases
There are many reasons why you might use multiple storage buckets in your Hydrolix cluster. Some applications separate data by region. Others segment data by customer. In some cases, certain data requires additional layers of security. Hydrolix supports a wide variety of use cases.
Storage Mapping
You can use storage mapping to shard data stored in a single table across multiple storage buckets. Hydrolix stores data based on literal values within the mapping column. You can provide a list of values that maps to each storage bucket for the given column.
The following example shards data using the column named "US State". This table sorts data into buckets according to the following rules:
- rows where the "US State" column contains values "New York" or "Colorado" map to "8bc2f07d-cdfc-storage-2"
- rows where the "US State" column contains values "Oregon" or "New Hampshire" map to "8bc2f07d-cdfc-storage-3"
- all other rows map to the default storage bucket, "8bc2f07d-cdfc-storage-1"
"my_table": {
"name": "<table_name>",
...
"storage_map": {
"default_storage_id": "8bc2f07d-cdfc-storage-1",
"column_name": "US State",
"column_value_mapping": {
"8bc2f07d-cdfc-storage-2": [ "New York", "Colorado" ],
"8bc2f07d-cdfc-storage-3": [ "Oregon", "New Hampshire" ],
}
}
}
You can also configure storage mappings in the Hydrolix Portal UI:
- Log into the UI, hosted at
https://<hostname>
. - Navigate to the Data view by clicking Data in the left sidebar.
- Click on the name of the project that contains the table that you want to configure.
- Click on the name of the table that you want to configure.
- Under Advanced Options, click the three-dot menu to the right of bucket settings.
- In the dropdown, click Edit.
- Select the column you would like to use to sort data into separate buckets.
- In the right sidebar, click Add mapping to configure a storage mapping.
- In the Storage ID dropdown, select the ID of the storage where you'd like to store a subset of data.
- In Values text entry box, enter the values you would like to map to the storage bucket you just selected. Press space after entering a value to persist it to the list of mapping values.
- Configure additional mappings by clicking Add mapping and repeating the previous 3 steps.
- Click Save changes to persist your storage mapping settings for the table.
Tradeoffs
Separating data into multiple storage buckets can impact system performance depending on your query patterns and resources.
Set Default Storage
You can configure a default storage bucket for any table in your Hydrolix cluster. When you assign a default storage bucket for a table, new data ingested into that table automatically uses that storage bucket, unless specified otherwise.
Set Default Storage for a Cluster
Hydrolix also provides the ability to set a bucket used as the default bucket for all tables that do not specify a default bucket. Use settings.is_default
in your storage configuration to assign a default storage bucket in a Hydrolix cluster:
"storage" : {
"name": "hdx_primary",
"description": "The initial bucket that comes up with the hdx cluster",
"org": "<example org>"
"settings": {
"bucket_name": "<example name>",
"bucket_path": "/",
"region": "<example region>",
"cloud": "<vendor",
"endpoint": "<endpoint URL>",
"is_default": true
}
}
If you do not specify a default storage bucket, but your cluster only contains a single storage bucket, Hydrolix automatically assigns that single bucket as the default storage bucket.
Set Default Storage for a Table
The following snippet demonstrates how to assign a default storage bucket to a table:
"example_table": {
"name": "<example_table_name>",
...
"storage_map": {
"default_storage_id": "<example_storage_id>",
}
}
You can also configure default table storage in the Hydrolix Portal UI:
- Log into the UI, hosted at
https://<hostname>
. - Navigate to the Data view by clicking Data in the left sidebar.
- Click on the name of the project that contains the table that you want to configure.
- Click on the name of the table that you want to configure.
- Under Advanced Options, click the three-dot menu to the right of bucket settings.
- In the dropdown, click Edit.
- In the right sidebar, under Default Storage ID, select the ID of the storage you would like to use as the default storage.
- Click Save changes to persist your default storage setting for the table.
Per-Vendor Examples
Configure storage buckets in the storage
field of your cluster configuration.
AWS
The following snippet demonstrates how to define a storage bucket using Amazon Simple Storage Service (S3):
"storage" : {
"name": "example_amzs3",
"description": "An Amazon S3 bucket.",
"org": "<example org>"
"settings": {
"bucket_name": "<example name>",
"bucket_path": "/",
"region": "<example region>",
"cloud": "aws",
"endpoint": "<endpoint URL>",
}
}
Use aws
as the value of the "cloud" field to indicate that the storage bucket is hosted on S3.
Hydrolix storage bucket configuration supports the following AWS region labels:
us-east-1
us-east-2
us-west-1
us-west-2
af-south-1
ap-east-1
ap-south-1
ap-northeast-1
ap-northeast-2
ap-northeast-3
ap-southeast-1
ap-southeast-2
ca-central-1
eu-central-1
eu-west-1
eu-west-2
eu-west-3
eu-north-1
me-south-1
sa-east-1
GCS
The following snippet demonstrates how to define a storage bucket using Google Cloud Storage (GCS):
"storage" : {
"name": "example_gcs",
"description": "A Google Cloud Storage bucket.",
"org": "<example org>"
"settings": {
"bucket_name": "<example name>",
"bucket_path": "/",
"region": "<example region>",
"cloud": "gcp",
"endpoint": "<endpoint URL>",
}
}
Use gcp
as the value of the "cloud" field to indicate that the storage bucket is hosted on GCS.
Hydrolix storage bucket configuration supports the following Google Cloud Storage region labels:
asia-east1
asia-east2
asia-northeast1
asia-northeast2
asia-northeast3
asia-south1
asia-southeast1
asia-southeast2
australia-southeast1
europe-central2
europe-north1
europe-west1
europe-west2
europe-west3
europe-west4
europe-west6
northamerica-northeast1
southamerica-east1
us-central1
us-east1
us-east4
us-west1
us-west2
us-west3
us-west4
Linode
The following snippet demonstrates how to define a storage bucket using Linode Cloud Storage:
"storage" : {
"name": "example_linode",
"description": "A Linode Cloud Storage bucket.",
"org": "<example org>"
"settings": {
"bucket_name": "<example name>",
"bucket_path": "/",
"region": "<example region>",
"cloud": "linode",
"endpoint": "<endpoint URL>",
}
}
Use linode
as the value of the "cloud" field to indicate that the storage bucket is hosted on Linode.
Hydrolix storage bucket configuration supports the following Linode region labels:
ap-west
ca-central
ap-southeast
us-central
us-west
us-southeast
us-east
eu-west
ap-south
eu-central
ap-northeast
Azure
The following snippet demonstrates how to define a storage bucket using Azure Storage:
"storage" : {
"name": "example_azure",
"description": "An Azure Storage bucket.",
"org": "<example org>"
"settings": {
"bucket_name": "<example name>",
"bucket_path": "/",
"region": "<example region>",
"cloud": "azure",
"endpoint": "<endpoint URL>",
}
}
Use azure
as the value of the "cloud" field to indicate that the storage bucket is hosted on Azure.
Hydrolix storage bucket configuration supports the following Azure region labels:
eastus
eastus2
southcentralus
westus
westus2
centralus
northcentralus
southcentralus
canadacentral
canadaeast
brazilsouth
northeurope
westeurope
uksouth
ukwest
francecentral
francesouth
switzerlandnorth
switzerlandwest
germanywestcentral
germanynorth
norwaywest
norwayeast
eastasia
southeastasia
japaneast
japanwest
koreacentral
koreasouth
australiaeast
australiasoutheast
centralindia
southindia
westindia
uaenorth
uaecentral
southafricanorth
southafricawest
Updated 29 days ago