Usage
Work with data, projects, transforms, and many other Hydrolix features using the HDXCLI
See the table of contents to see an overview of the advice on this page.
Initial Setup and the init
Command
init
CommandTo start using hdxcli
, you need a profile configuration that tells the CLI how to connect to your Hydrolix cluster.
Automatic Setup on First Use:
If you run any hdxcli
command (for example, hdxcli project list
) and no existing configuration is found, the CLI will automatically guide you through the initial setup process. This process includes:
- 'default' Profile Configuration: You will be prompted for your cluster's hostname and whether you want to use https (TLS) for the connection. This information is saved in the configuration file (by default, in
~/.hdx_cli/config.toml
). You can customize the configuration directory location by setting theHDX_CONFIG_DIR
environment variable. - Initial Login: Next, you will be asked for your Hydrolix username (usually your email) and password to authenticate with the cluster.
- Service Account Option: After a successful login, you will have the option to:
- Continue using your user session token.
- Set up the CLI to use a token generated from a Service Account (either an existing one or by creating a new one). This allows the CLI to use an SA token for later operations, which is useful for long-running sessions or automation.
Important Note: If the process of setting up or generating a token from a Service Account fails for any reason, hdxcli
will inform you of the error. The CLI will then continue to use the token generated from your initial username and password login for the current session. Your ability to use the CLI will not be blocked.
The resulting token (from user or SA) will be cached.
Using the hdxcli init
Command (Optional)
hdxcli init
Command (Optional)If you prefer to explicitly set up the CLI before running other commands, you can use hdxcli init
. This command will guide you through the same three-step process described above (default profile configuration, login, and Service Account option).
Example Flow with hdxcli init
:
$ hdxcli init
No configuration found for your Hydrolix cluster. # (This message appears on first time use)
Let's create the 'default' profile to get you started.
----- Configuring Profile [default] -----
Enter the host address for the profile: my-cluster.example.com
Use TLS (https) for connection? (Y/n): Y
The configuration for 'default' profile has been created at /home/user/.hdx_cli/config.toml
------------------------------
Please login to profile 'default' (my-cluster.example.com) to continue.
Username: [email protected]
Password for [[email protected]]: ****
----- Service Account Configuration -----
You can configure a Service Account for automated access.
How would you like to proceed for future authentications?
1. Use an existing Service Account
2. Create a new Service Account
3. Continue using my user credentials (default)
Choose an option: 3
Continuing with your user credentials for this profile.
----- End of Service Account Configuration -----
Command-line Tool Organization
The tool is organized mostly with the general invocation form of:
hdxcli <resource> [subresource] <verb> [resource_name]
Table and project resources have defaults that depend on the profile you are working with, so they can be omitted if you previously used the set
command.
For all other resources, you can use --transform
, --dictionary
, --source
, etc. Please see the command line help for more information.
Profiles
hdxcli
supports multiple profiles. You can use a default profile or use the --profile
option to operate on a non-default profile.
When invoking a command, if a login to the server is necessary (e.g., if no valid token is cached or if the token has expired), a prompt for your user credentials will be shown. After you successfully authenticate with your username and password, you will be presented with the option to continue with that user session token or to configure and use a token from a Service Account for subsequent operations. The chosen token is cached for the active profile.
For automation or scripts where interactive prompts are not suitable, you can provide the --username
and --password
global options directly with your command. If the current token is invalid or expired, hdxcli
will attempt to re-authenticate using these provided credentials without any extra output or interactive prompts.
Listing and Showing Profiles
Listing profiles:
hdxcli profile list
Showing default profile:
hdxcli profile show default
Logging out of a profile (clears the cached token):
hdxcli profile logout <profile-name>
Projects, Tables, and Transforms
The basic operations you can do with these resources are:
- list them
- create a new resource
- delete an existing resource
- modify an existing resource
- show a resource in raw JSON format
- show settings from a resource
- write a setting
- show a single setting
Working with Transforms
You can create and override transforms with the following commands.
Create a transform:
hdxcli transform create -f <transform-settings-file>.json <transform-name>
Remember that a transform is applied to a table in a project, so whatever you set with the command line tool will be the target of your transform.
If you want to override it, specify the table name with the --table
setting:
hdxcli transform --project <project-name> --table <table-name> create -f <transform-settings>.json <transform-name>
For an example of a valid transform file structure, see our Transform Structure page.
Data Migration Command for Hydrolix Tables
This command provides a way to migrate Hydrolix tables and its data to a target cluster or even within the same cluster. You only need to specify the source and target table names in the format project_name.table_name
and the RClone service information. The migration process will handle creating the project, functions, dictionaries, table, and transforms at the target location. It will then copy the partitions from the source bucket to the target bucket and finally upload the catalog so that Hydrolix can associate the created table with the migrated partitions.
Usage
hdxcli migrate [OPTIONS] SOURCE_TABLE TARGET_TABLE RCLONE_HOST
Options
-tp, --target-profile
-h, --target-hostname
-u, --target-username
-p, --target-password
-s, --target-uri-scheme
--allow-merge Allow migration if the merge setting is enabled.
--only The migration type: "resources" or "data".
--from-date Minimum timestamp for filtering partitions in YYYY-MM-DD HH:MM:SS format.
--to-date Maximum timestamp for filtering partitions in YYYY-MM-DD HH:MM:SS format.
--reuse-partitions Perform a dry migration without moving partitions. Both clusters must share the bucket(s) where the partitions are stored.
--rc-user The username for authenticating with the RClone server.
--rc-pass The password for authenticating with the RClone server.
--concurrency Number of concurrent requests during file migration. Default is 20.
--temp-catalog Use a previously downloaded catalog stored in a temporary file, instead of downloading it again.
--help Show this message and exit.
--target-profile
This option must be used to provide the name of the profile for the target cluster connection during the migration. You can specify an existing profile if it has already been created, or alternatively, you can provide the required connection options manually, such as --target-hostname
, --target-username
, --target-password
, and --target-uri-scheme
.
--allow-merge
This flag allows skipping the check for the merge setting enabled on the source table.
--only
This option expects either resources
or data
. If resources
is selected, only the resources (project, functions, dictionaries, table, and transforms) will be migrated. If data
is selected, only the data will be migrated, and the resources must already exist.
--from-date and --to-date
These options help filter the partitions to be migrated. They expect dates in the format: YYYY-MM-DD HH:MM:SS
.
--reuse-partitions
This option enables dry migration. Both the source and target clusters must share the storage where the table's partitions are located. This allows migrating the table to the target cluster while reusing the partitions from the source cluster without creating new ones. This results in an almost instant migration but requires that the same partitions are shared by different tables across clusters.
Note: Modifying data in one table may cause issues in the other.
--rc-user and --rc-pass
These options specify the credentials required to authenticate with the RClone service. Ensure you provide valid credentials to enable file migration functionality.
--concurrency
This option allows manually setting the concurrency level for partition migration. The default value is 20, with a minimum of 1 and a maximum of 50.
Note: Generally, higher concurrency is beneficial when migrating a large number of small partitions.
--temp-catalog
This option uses a temporarily saved version of the table catalog stored in the /tmp
directory, if it exists. This is particularly useful when handling large catalogs, as it avoids downloading the catalog multiple times.
Supported Cloud Storages:
- AWS
- GCP
- Azure
- Linode
During the migration process, credentials to access these clouds will likely be required. These credentials need to be provided when prompted:
- GCP: You need the path to the JSON file of the service account with access to the bucket.
- AWS and Linode: Requires access key and secret key.
- Azure: Account and key must be provided.
Pre-Migration Checks and Validations
Before creating resources and migrating partitions, the following checks are performed:
- The source table does not have the merge setting enabled (use
--allow-merge
to skip this validation) - There are no running alter jobs on the source table
- If filtering is applied, it validates that there are partitions remaining to migrate after filtering
- If using the
--reuse-partitions
option, it checks that the storage where the partitions are located is shared between both clusters
Migrating Resources
This command migrates resources from one cluster to another (or even within the same cluster). It supports the following resources: projects, tables, transforms, functions, and dictionaries. These resources are cloned with the same settings to ensure uniqueness in the target cluster.
General Command Syntax
hdxcli --profile <source-profile> project --project <project-name> migrate <new-project-name> --target-profile <target-profile>
Explanation
The above command migrates a project (<project-name>
) from the --profile
specified as <source-profile>
to a new project (<new-project-name>
) in the <target-profile>
. By default, it migrates all related resources in the project’s configuration tree, including tables and transforms.
Flags to Customize Behavior
--only
: Migrates only the project without its related configuration tree resources (tables + transforms).--functions
: Includes functions associated with the project during migration.--dictionaries
: Includes dictionaries associated with the project during migration.--no-rollback
: Disables the rollback process in case an issue occurs during migration.
Cluster Connection Details
If you need to specify the target cluster's connection information directly:
hdxcli --profile <source-profile> project --project <project-name> migrate <new-project-name> \
--target-cluster-hostname <target-cluster-hostname> \
--target-cluster-username <target-cluster-username> \
--target-cluster-password <target-cluster-password> \
--target-cluster-uri-scheme <http/https>
Examples
Project Migration
- Migrate a project with tables and transforms
Migrates the project hydrolix
from the default
profile to the test
profile. The new project name will be new_hydrolix
. This includes the project's tables and transforms.
hdxcli --profile default project --project hydrolix migrate new_hydrolix --target-profile test
- Include functions and dictionaries
Same as above, but also migrates functions and dictionaries associated with the project.
hdxcli --profile default project --project hydrolix migrate new_hydrolix --target-profile test --functions --dictionaries
- Migrate only the project
Migrates only the hydrolix
project without any related tables or transforms.
hdxcli --profile default project --project hydrolix migrate new_hydrolix --target-profile test --only
Table Migration
- Migrate a table with transforms
Migrates the table logs
(within the hydrolix
project) from the default
profile to the test
profile. The new table name will be new_logs
(within the new_hydro
project). This includes any transforms associated with the table.
hdxcli --profile default table --project hydrolix --table logs migrate new_hydro new_logs --target-profile test
- Migrate only the table
Migrates only the logs
table without any associated transforms.
hdxcli --profile default table --project hydrolix --table logs migrate new_hydro new_logs --target-profile test --only
Handling Interactive Prompts During Migration
In some scenarios, the CLI requires user input to handle specific resource configurations during the migration process. These cases ensure that important settings are either preserved, updated, or removed based on the user's decision.
Common Scenarios Requiring User Input
- Default Storage Settings. If a table has a default storage configuration, the CLI prompts the user to choose how to handle it:
- Preserve the current settings
- Specify a new default storage ID
- Remove the settings and use the cluster's default storage
- Auto-Ingest Settings. For tables with auto-ingest configurations, the user can choose whether to keep or remove these settings during the migration.
- Merge Pool Names. Tables with merge pool configurations prompt the user to specify how to handle these settings.
- Summary Tables. If a table is a summary table, the CLI will request the new parent
project.table
for the summary query.
Example: Interactive Migration Workflow
Here is an example of how the CLI handles these prompts during a project and table migration:
hdxcli project --project hydrolix migrate new_hydrolix --target-profile test
[INFO] Migrating project 'new_hydrolix'...
[SUCCESS] Project 'new_hydrolix' Migrated
[INFO] Migrating table 'logs'...
[WARNING] Storage settings found in the table 'logs'
Default Storage ID: 24aa950d-71cc-4940-a34d-da4567cf838a
Column Name: None
Column Value Mapping: -
[PROMPT] How would you like to proceed?
1) Preserve all existing settings without any changes
2) Specify a new default storage ID
3) Remove the storage settings (use cluster default)
Please enter your choice (1/2/3): 2
Please enter the new default storage ID: 0d42b1e9-a1e7-4e5a-96c3-72bd05e580a8
[SUCCESS] Table 'logs' Migrated
[INFO] Migrating table 'summary'...
[WARNING] Summary settings found in the table 'summary'
The current parents for the summary table are: hydrolix.logs
[PROMPT] Please enter a new project and table in 'project.table' format (leave blank to keep current)
New parents 'project.table': new_hydrolix.logs
[WARNING] Storage settings found in the table 'summary'
Default Storage ID: 24aa950d-71cc-4940-a34d-da4567cf838a
Column Name: None
Column Value Mapping: -
[PROMPT] How would you like to proceed?
1) Preserve all existing settings without any changes
2) Specify a new default storage ID
3) Remove the storage settings (use cluster default)
Please enter your choice (1/2/3): 3
[SUCCESS] Table 'summary' Migrated
Ingest
Batch Job
Create a batch job:
hdxcli job batch ingest <job-name> <job-settings>.json
job-name
is the name of the job that will be displayed when listing batch jobs. job-settings
is the path to the file containing the specifications required to create that ingestion (for more information on the required specifications, see Hydrolix API Reference).
In this case, the project, table, and transform are being omitted. hdxcli
will use the default transform within the project and table previously configured in the profile with the set
command. Otherwise, you can add --project <project-name> --table <table-name> --transform <transform-name>
.
This allows you to execute the command as follows:
hdxcli job batch --project <project-name> --table <table-name> --transform <transform-name> ingest <job-name> <job-settings>.json
Stream
Create the streaming ingest as follows:
hdxcli stream --project <project-name> --table <table-name> --transform <transform-name> ingest <data-file>
data-file
is the path of the data file to be used for the ingest. This can be .csv, .json, or compressed files. The transform has to have that configuration (type and compression).
Updated 7 days ago