Storage Settings Reference

Storage settings tell your cluster where to store your data. You can also tune performance, spread data across locations, and set the default location of your data.

Performance Mode

The io_perf_mode storage setting tunes the parallelism and striping behavior when reading data from that particular storage. It has one of three settings:

KeywordInteger Value
aggressive0
moderate1
relaxed2

If the io_perf_mode value is a keyword (not an integer), the field will be updated to its integer value.

The default is aggressive, which performs more read requests from storage with a smaller byte range. If you're experiencing throttling from cloud vendors who have stringent limits on iops, set it to moderate or even relaxed, which performs fewer, but larger requests. The example below sets it to moderate.

"storage" : {
	"name": "hdx_primary",
	"description": "The initial bucket that comes up with the hdx cluster",
	"org": "<example org>"
	"settings": {
		"bucket_name": "<example name>",
		"bucket_path": "/",
		"region": "<example region>",
		"cloud": "<vendor",
		"endpoint": "<endpoint URL>",
		"io_perf_mode": "moderate"
	}
}

Store Data Across Multiple Buckets

Hydrolix supports storing data in multiple different storage types. You can mix storage provided by different cloud providers, storage with different configurations within the same provider, or both.

Use Cases

There are many reasons why you might use multiple storage buckets in your Hydrolix cluster. Some applications separate data by region. Others segment data by customer. In some cases, certain data requires additional layers of security. Hydrolix supports a wide variety of use cases.

Storage Mapping

You can use storage mapping to shard data stored in a single table across multiple storage buckets. Hydrolix stores data based on literal values within the mapping column. You can provide a list of values that maps to each storage bucket for the given column.

The following example shards data using the column named "US State". This table sorts data into buckets according to the following rules:

  • rows where the "US State" column contains values "New York" or "Colorado" map to "8bc2f07d-cdfc-storage-2"
  • rows where the "US State" column contains values "Oregon" or "New Hampshire" map to "8bc2f07d-cdfc-storage-3"
  • all other rows map to the default storage bucket, "8bc2f07d-cdfc-storage-1"
"my_table": {
  "name": "<table_name>",
  ...
  "storage_map": {
    "default_storage_id": "8bc2f07d-cdfc-storage-1",
    "column_name": "US State",
    "column_value_mapping": {
      "8bc2f07d-cdfc-storage-2": [ "New York", "Colorado" ],
      "8bc2f07d-cdfc-storage-3": [ "Oregon", "New Hampshire" ],
    }
  }
}

You can also configure storage mappings in the Hydrolix Portal UI:

  1. Log into the UI, hosted at https://<hostname>.
  2. Navigate to the Data view by clicking Data in the left sidebar.
  3. Click on the name of the project that contains the table that you want to configure.
  4. Click on the name of the table that you want to configure.
  5. Under Advanced Options, click the three-dot menu to the right of bucket settings.
  6. In the dropdown, click Edit.
  7. Select the column you would like to use to sort data into separate buckets.
  8. In the right sidebar, click Add mapping to configure a storage mapping.
  9. In the Storage ID dropdown, select the ID of the storage where you'd like to store a subset of data.
  10. In Values text entry box, enter the values you would like to map to the storage bucket you just selected. Press space after entering a value to persist it to the list of mapping values.
  11. Configure additional mappings by clicking Add mapping and repeating the previous 3 steps.
  12. Click Save changes to persist your storage mapping settings for the table.

Tradeoffs

Separating data into multiple storage buckets can impact system performance depending on your query patterns and resources.

Set Default Storage

You can configure a default storage bucket for any table in your Hydrolix cluster. When you assign a default storage bucket for a table, new data ingested into that table automatically uses that storage bucket, unless specified otherwise.

Set Default Storage for a Cluster

Hydrolix also provides the ability to set a bucket used as the default bucket for all tables that do not specify a default bucket. Use settings.is_default in your storage configuration to assign a default storage bucket in a Hydrolix cluster:

"storage" : {
	"name": "hdx_primary",
	"description": "The initial bucket that comes up with the hdx cluster",
	"org": "<example org>"
	"settings": {
		"bucket_name": "<example name>",
		"bucket_path": "/",
		"region": "<example region>",
		"cloud": "<vendor",
		"endpoint": "<endpoint URL>",
		"is_default": true
	}
}

If you do not specify a default storage bucket, but your cluster only contains a single storage bucket, Hydrolix automatically assigns that single bucket as the default storage bucket.

Set Default Storage for a Table

The following snippet demonstrates how to assign a default storage bucket to a table:

"example_table": {
  "name": "<example_table_name>",
  ...
  "storage_map": {
    "default_storage_id": "<example_storage_id>",
  }
}

You can also configure default table storage in the Hydrolix Portal UI:

  1. Log into the UI, hosted at https://<hostname>.
  2. Navigate to the Data view by clicking Data in the left sidebar.
  3. Click on the name of the project that contains the table that you want to configure.
  4. Click on the name of the table that you want to configure.
  5. Under Advanced Options, click the three-dot menu to the right of bucket settings.
  6. In the dropdown, click Edit.
  7. In the right sidebar, under Default Storage ID, select the ID of the storage you would like to use as the default storage.
  8. Click Save changes to persist your default storage setting for the table.

Per-Vendor Examples

Configure storage buckets in the storage field of your cluster configuration.

AWS

The following snippet demonstrates how to define a storage bucket using Amazon Simple Storage Service (S3):

"storage" : {
	"name": "example_amzs3",
	"description": "An Amazon S3 bucket.",
	"org": "<example org>"
	"settings": {
		"bucket_name": "<example name>",
		"bucket_path": "/",
		"region": "<example region>",
		"cloud": "aws",
		"endpoint": "<endpoint URL>",
	}
}

Use aws as the value of the "cloud" field to indicate that the storage bucket is hosted on S3.
Hydrolix storage bucket configuration supports the following AWS region labels:

  • us-east-1
  • us-east-2
  • us-west-1
  • us-west-2
  • af-south-1
  • ap-east-1
  • ap-south-1
  • ap-northeast-1
  • ap-northeast-2
  • ap-northeast-3
  • ap-southeast-1
  • ap-southeast-2
  • ca-central-1
  • eu-central-1
  • eu-west-1
  • eu-west-2
  • eu-west-3
  • eu-north-1
  • me-south-1
  • sa-east-1

GCS

The following snippet demonstrates how to define a storage bucket using Google Cloud Storage (GCS):

"storage" : {
	"name": "example_gcs",
	"description": "A Google Cloud Storage bucket.",
	"org": "<example org>"
	"settings": {
		"bucket_name": "<example name>",
		"bucket_path": "/",
		"region": "<example region>",
		"cloud": "gcp",
		"endpoint": "<endpoint URL>",
	}
}

Use gcp as the value of the "cloud" field to indicate that the storage bucket is hosted on GCS.
Hydrolix storage bucket configuration supports the following Google Cloud Storage region labels:

  • asia-east1
  • asia-east2
  • asia-northeast1
  • asia-northeast2
  • asia-northeast3
  • asia-south1
  • asia-southeast1
  • asia-southeast2
  • australia-southeast1
  • europe-central2
  • europe-north1
  • europe-west1
  • europe-west2
  • europe-west3
  • europe-west4
  • europe-west6
  • northamerica-northeast1
  • southamerica-east1
  • us-central1
  • us-east1
  • us-east4
  • us-west1
  • us-west2
  • us-west3
  • us-west4

Linode

The following snippet demonstrates how to define a storage bucket using Linode Cloud Storage:

"storage" : {
	"name": "example_linode",
	"description": "A Linode Cloud Storage bucket.",
	"org": "<example org>"
	"settings": {
		"bucket_name": "<example name>",
		"bucket_path": "/",
		"region": "<example region>",
		"cloud": "linode",
		"endpoint": "<endpoint URL>",
	}
}

Use linode as the value of the "cloud" field to indicate that the storage bucket is hosted on Linode.
Hydrolix storage bucket configuration supports the following Linode region labels:

  • ap-west
  • ca-central
  • ap-southeast
  • us-central
  • us-west
  • us-southeast
  • us-east
  • eu-west
  • ap-south
  • eu-central
  • ap-northeast

Azure

The following snippet demonstrates how to define a storage bucket using Azure Storage:

"storage" : {
	"name": "example_azure",
	"description": "An Azure Storage bucket.",
	"org": "<example org>"
	"settings": {
		"bucket_name": "<example name>",
		"bucket_path": "/",
		"region": "<example region>",
		"cloud": "azure",
		"endpoint": "<endpoint URL>",
	}
}

Use azure as the value of the "cloud" field to indicate that the storage bucket is hosted on Azure.
Hydrolix storage bucket configuration supports the following Azure region labels:

  • eastus
  • eastus2
  • southcentralus
  • westus
  • westus2
  • centralus
  • northcentralus
  • southcentralus
  • canadacentral
  • canadaeast
  • brazilsouth
  • northeurope
  • westeurope
  • uksouth
  • ukwest
  • francecentral
  • francesouth
  • switzerlandnorth
  • switzerlandwest
  • germanywestcentral
  • germanynorth
  • norwaywest
  • norwayeast
  • eastasia
  • southeastasia
  • japaneast
  • japanwest
  • koreacentral
  • koreasouth
  • australiaeast
  • australiasoutheast
  • centralindia
  • southindia
  • westindia
  • uaenorth
  • uaecentral
  • southafricanorth
  • southafricawest