Usage
See the table of contents to see an overview of the advice on this page.
Initial Setup and the init Command⚓︎
To start using hdxcli, you need a profile configuration that tells the CLI how to connect to your Hydrolix cluster.
Automatic Setup on First Use:⚓︎
If you run any hdxcli command (for example, hdxcli project list) and no existing configuration is found, the CLI will automatically guide you through the initial setup process. This process includes:
- 'default' Profile Configuration: You will be prompted for your cluster's hostname and whether you want to use https (TLS) for the connection. This information is saved in the configuration file (by default, in
~/.hdx_cli/config.toml). You can customize the configuration directory location by setting theHDX_CONFIG_DIRenvironment variable. - Initial Login: Next, you will be asked for your Hydrolix username (usually your email) and password to authenticate with the cluster.
- Service Account Option: After a successful login, you will have the option to:
- Continue using your user session token.
- Set up the CLI to use a token generated from a Service Account (either an existing one or by creating a new one). This allows the CLI to use an SA token for later operations, which is useful for long-running sessions or automation.
Important Note: If the process of setting up or generating a token from a Service Account fails for any reason, hdxcli will inform you of the error. The CLI will then continue to use the token generated from your initial username and password login for the current session. Your ability to use the CLI will not be blocked.
The resulting token (from user or SA) will be cached.
use the hdxcli init Command (Optional)⚓︎
If you prefer to explicitly set up the CLI before running other commands, you can use hdxcli init. This command will guide you through the same three-step process described above (default profile configuration, login, and Service Account option).
Example Flow with hdxcli init:
Command-line Tool Organization⚓︎
The tool is organized mostly with the general invocation form of:
Table and project resources have defaults that depend on the profile you are working with, so they can be omitted if you previously used the set command.
For all other resources, you can use --transform, --dictionary, --source, and so on Please see the command line help for more information.
Profiles⚓︎
hdxcli supports multiple profiles. You can use a default profile or use the --profile option to operate on a non-default profile.
When invoking a command, if a login to the server is necessary (for example, if no valid token is cached or if the token has expired), a prompt for your user credentials will be shown. After you successfully authenticate with your username and password, you will be presented with the option to continue with that user session token or to configure and use a token from a Service Account for subsequent operations. The chosen token is cached for the active profile.
For automation or scripts where interactive prompts are not suitable, you can provide the --username and --password global options directly with your command. If the current token is invalid or expired, hdxcli will attempt to re-authenticate using these provided credentials without any extra output or interactive prompts.
Listing and Showing Profiles⚓︎
Listing profiles:
Showing default profile:
Logging out of a profile (clears the cached token):
Projects, Tables, and Transforms⚓︎
The basic operations you can do with these resources are:
- list them
- create a new resource
- delete an existing resource
- modify an existing resource
- show a resource in raw JSON format
- show settings from a resource
- write a setting
- show a single setting
Work with transforms⚓︎
You can create and override transforms with the following commands.
Create a transform:
Remember that a transform is applied to a table in a project, so whatever you set with the command line tool will be the target of your transform.
If you want to override it, specify the table name with the --table setting:
For an example of a valid transform file structure, see our Transform Structure page.
Data Migration Command for Hydrolix Tables⚓︎
This command provides a way to migrate Hydrolix tables and its data to a target cluster or even within the same cluster. You only need to specify the source and target table names in the format project_name.table_name and the RClone service information. The migration process will handle creating the project, functions, dictionaries, table, and transforms at the target location. It will then copy the partitions from the source bucket to the target bucket and finally upload the catalog so that Hydrolix can associate the created table with the migrated partitions.
Usage
Options⚓︎
--target-profile⚓︎
This option must be used to provide the name of the profile for the target cluster connection during the migration. You can specify an existing profile if it has already been created, or alternatively, you can provide the required connection options manually, such as --target-hostname, --target-username, --target-password, and --target-uri-scheme.
--allow-merge⚓︎
This flag allows skipping the check for the merge setting enabled on the source table.
--only⚓︎
This option expects either resources or data. If resources is selected, only the resources (project, functions, dictionaries, table, and transforms) will be migrated. If data is selected, only the data will be migrated, and the resources must already exist.
--from-date and --to-date⚓︎
These options help filter the partitions to be migrated. They expect dates in the format: YYYY-MM-DD HH:MM:SS.
--reuse-partitions⚓︎
This option enables dry migration. Both the source and target clusters must share the storage where the table's partitions are located. This allows migrating the table to the target cluster while reusing the partitions from the source cluster without creating new ones. This results in an almost instant migration but requires that the same partitions are shared by different tables across clusters. Note: Modifying data in one table may cause issues in the other.
--rc-user and --rc-pass⚓︎
These options specify the credentials required to authenticate with the RClone service. Ensure you provide valid credentials to enable file migration functionality.
--concurrency⚓︎
This option allows manually setting the concurrency level for partition migration. The default value is 20, with a minimum of 1 and a maximum of 50. Note: Generally, higher concurrency is beneficial when migrating a large number of small partitions.
--temp-catalog⚓︎
This option uses a temporarily saved version of the table catalog stored in the /tmp directory, if it exists. This is particularly useful when handling large catalogs, as it avoids downloading the catalog multiple times.
Supported Cloud Storages:⚓︎
- AWS
- GCP
- Azure
- Linode
During the migration process, credentials to access these clouds will likely be required. These credentials need to be provided when prompted:
- GCP: You need the path to the JSON file of the service account with access to the bucket.
- AWS and Linode: Requires access key and secret key.
- Azure: Account and key must be provided.
Pre-Migration Checks and Validations⚓︎
Before creating resources and migrating partitions, the following checks are performed:
- The source table does not have the merge setting enabled (use
--allow-mergeto skip this validation) - There are no running alter jobs on the source table
- If filtering is applied, it validates that there are partitions remaining to migrate after filtering
- If using the
--reuse-partitionsoption, it checks that the storage where the partitions are located is shared between both clusters
Migrating Resources⚓︎
This command migrates resources from one cluster to another (or even within the same cluster). It supports the following resources: projects, tables, transforms, functions, and dictionaries. These resources are cloned with the same settings to ensure uniqueness in the target cluster.
General Command Syntax⚓︎
Explanation⚓︎
The above command migrates a project (<project-name>) from the --profile specified as <source-profile> to a new project (<new-project-name>) in the <target-profile>. By default, it migrates all related resources in the project’s configuration tree, including tables and transforms.
Flags to Customize Behavior⚓︎
--only: Migrates only the project without its related configuration tree resources (tables + transforms).--functions: Includes functions associated with the project during migration.--dictionaries: Includes dictionaries associated with the project during migration.--no-rollback: Disables the rollback process in case an issue occurs during migration.
Cluster Connection Details⚓︎
If you need to specify the target cluster's connection information directly:
Examples⚓︎
Project Migration⚓︎
- Migrate a project with tables and transforms
Migrates the project hydrolix from the default profile to the test profile. The new project name will be new_hydrolix. This includes the project's tables and transforms.
- Include functions and dictionaries
Same as above, but also migrates functions and dictionaries associated with the project.
- Migrate only the project
Migrates only the hydrolix project without any related tables or transforms.
Table Migration⚓︎
- Migrate a table with transforms
Migrates the table logs (within the hydrolix project) from the default profile to the test profile. The new table name will be new_logs (within the new_hydro project). This includes any transforms associated with the table.
- Migrate only the table
Migrates only the logs table without any associated transforms.
Handling Interactive Prompts During Migration⚓︎
In some scenarios, the CLI requires user input to handle specific resource configurations during the migration process. These cases ensure that important settings are either preserved, updated, or removed based on the user's decision.
Common Scenarios Requiring User Input⚓︎
- Default Storage Settings. If a table has a default storage configuration, the CLI prompts the user to choose how to handle it:
- Preserve the current settings
- Specify a new default storage ID
- Remove the settings and use the cluster's default storage
- Auto-Ingest Settings. For tables with auto-ingest configurations, the user can choose whether to keep or remove these settings during the migration.
- Merge Pool Names. Tables with merge pool configurations prompt the user to specify how to handle these settings.
- Summary Tables. If a table is a summary table, the CLI will request the new parent
project.tablefor the summary query.
Example: Interactive Migration Workflow⚓︎
Here is an example of how the CLI handles these prompts during a project and table migration:
Ingest⚓︎
Batch Job⚓︎
Create a batch job:
job-name is the name of the job that will be displayed when listing batch jobs. job-settings is the path to the file containing the specifications required to create that ingestion (for more information on the required specifications, see Hydrolix API Reference).
In this case, the project, table, and transform are being omitted. hdxcli will use the default transform within the project and table previously configured in the profile with the set command. Otherwise, you can add --project <project-name> --table <table-name> --transform <transform-name>.
This allows you to execute the command as follows:
Stream⚓︎
Create the streaming ingest as follows:
data-file is the path of the data file to be used for the ingest. This can be .csv, .json, or compressed files. The transform has to have that configuration (type and compression).