Usage

Work with data, projects, transforms, and many other Hydrolix features using the HDXCLI

See the table of contents to see an overview of the advice on this page.

Initial Setup and the init Command

To start using hdxcli, you need a profile configuration that tells the CLI how to connect to your Hydrolix cluster.

Automatic Setup on First Use:

If you run any hdxcli command (for example, hdxcli project list) and no existing configuration is found, the CLI will automatically guide you through the initial setup process. This process includes:

  1. 'default' Profile Configuration: You will be prompted for your cluster's hostname and whether you want to use https (TLS) for the connection. This information is saved in the configuration file (by default, in ~/.hdx_cli/config.toml). You can customize the configuration directory location by setting the HDX_CONFIG_DIR environment variable.
  2. Initial Login: Next, you will be asked for your Hydrolix username (usually your email) and password to authenticate with the cluster.
  3. Service Account Option: After a successful login, you will have the option to:
    1. Continue using your user session token.
    2. Set up the CLI to use a token generated from a Service Account (either an existing one or by creating a new one). This allows the CLI to use an SA token for later operations, which is useful for long-running sessions or automation.

Important Note: If the process of setting up or generating a token from a Service Account fails for any reason, hdxcli will inform you of the error. The CLI will then continue to use the token generated from your initial username and password login for the current session. Your ability to use the CLI will not be blocked.

The resulting token (from user or SA) will be cached.

Using the hdxcli init Command (Optional)

If you prefer to explicitly set up the CLI before running other commands, you can use hdxcli init. This command will guide you through the same three-step process described above (default profile configuration, login, and Service Account option).

Example Flow with hdxcli init:

$ hdxcli init
No configuration found for your Hydrolix cluster. # (This message appears on first time use)
Let's create the 'default' profile to get you started.

----- Configuring Profile [default] -----
Enter the host address for the profile: my-cluster.example.com
Use TLS (https) for connection? (Y/n): Y

The configuration for 'default' profile has been created at /home/user/.hdx_cli/config.toml
------------------------------
Please login to profile 'default' (my-cluster.example.com) to continue.
Username: [email protected]
Password for [[email protected]]: ****

----- Service Account Configuration -----
You can configure a Service Account for automated access.

How would you like to proceed for future authentications?
  1. Use an existing Service Account
  2. Create a new Service Account
  3. Continue using my user credentials (default)
Choose an option: 3

Continuing with your user credentials for this profile.

----- End of Service Account Configuration -----

Command-line Tool Organization

The tool is organized mostly with the general invocation form of:

hdxcli <resource> [subresource] <verb> [resource_name]

Table and project resources have defaults that depend on the profile you are working with, so they can be omitted if you previously used the set command.

For all other resources, you can use --transform, --dictionary, --source, etc. Please see the command line help for more information.

Profiles

hdxcli supports multiple profiles. You can use a default profile or use the --profile option to operate on a non-default profile.

When invoking a command, if a login to the server is necessary (e.g., if no valid token is cached or if the token has expired), a prompt for your user credentials will be shown. After you successfully authenticate with your username and password, you will be presented with the option to continue with that user session token or to configure and use a token from a Service Account for subsequent operations. The chosen token is cached for the active profile.

For automation or scripts where interactive prompts are not suitable, you can provide the --username and --password global options directly with your command. If the current token is invalid or expired, hdxcli will attempt to re-authenticate using these provided credentials without any extra output or interactive prompts.

Listing and Showing Profiles

Listing profiles:

hdxcli profile list

Showing default profile:

hdxcli profile show default

Logging out of a profile (clears the cached token):

hdxcli profile logout <profile-name>

Projects, Tables, and Transforms

The basic operations you can do with these resources are:

  • list them
  • create a new resource
  • delete an existing resource
  • modify an existing resource
  • show a resource in raw JSON format
  • show settings from a resource
  • write a setting
  • show a single setting

Working with Transforms

You can create and override transforms with the following commands.

Create a transform:

hdxcli transform create -f <transform-settings-file>.json <transform-name>

Remember that a transform is applied to a table in a project, so whatever you set with the command line tool will be the target of your transform.

If you want to override it, specify the table name with the --table setting:

hdxcli transform --project <project-name> --table <table-name> create -f <transform-settings>.json <transform-name>

For an example of a valid transform file structure, see our Transform Structure page.

Data Migration Command for Hydrolix Tables

This command provides a way to migrate Hydrolix tables and its data to a target cluster or even within the same cluster. You only need to specify the source and target table names in the format project_name.table_name and the RClone service information. The migration process will handle creating the project, functions, dictionaries, table, and transforms at the target location. It will then copy the partitions from the source bucket to the target bucket and finally upload the catalog so that Hydrolix can associate the created table with the migrated partitions.

Usage

hdxcli migrate [OPTIONS] SOURCE_TABLE TARGET_TABLE RCLONE_HOST

Options

-tp, --target-profile
-h, --target-hostname 
-u, --target-username
-p, --target-password
-s, --target-uri-scheme
--allow-merge	            Allow migration if the merge setting is enabled.
--only	                  The migration type: "resources" or "data".
--from-date               Minimum timestamp for filtering partitions in YYYY-MM-DD HH:MM:SS format.
--to-date                 Maximum timestamp for filtering partitions in YYYY-MM-DD HH:MM:SS format.
--reuse-partitions        Perform a dry migration without moving partitions. Both clusters must share the bucket(s) where the partitions are stored.
--rc-user                 The username for authenticating with the RClone server.
--rc-pass                 The password for authenticating with the RClone server.
--concurrency	            Number of concurrent requests during file migration. Default is 20.
--temp-catalog            Use a previously downloaded catalog stored in a temporary file, instead of downloading it again.
--help                    Show this message and exit.

--target-profile

This option must be used to provide the name of the profile for the target cluster connection during the migration. You can specify an existing profile if it has already been created, or alternatively, you can provide the required connection options manually, such as --target-hostname, --target-username, --target-password, and --target-uri-scheme.

--allow-merge

This flag allows skipping the check for the merge setting enabled on the source table.

--only

This option expects either resources or data. If resources is selected, only the resources (project, functions, dictionaries, table, and transforms) will be migrated. If data is selected, only the data will be migrated, and the resources must already exist.

--from-date and --to-date

These options help filter the partitions to be migrated. They expect dates in the format: YYYY-MM-DD HH:MM:SS.

--reuse-partitions

This option enables dry migration. Both the source and target clusters must share the storage where the table's partitions are located. This allows migrating the table to the target cluster while reusing the partitions from the source cluster without creating new ones. This results in an almost instant migration but requires that the same partitions are shared by different tables across clusters.
Note: Modifying data in one table may cause issues in the other.

--rc-user and --rc-pass

These options specify the credentials required to authenticate with the RClone service. Ensure you provide valid credentials to enable file migration functionality.

--concurrency

This option allows manually setting the concurrency level for partition migration. The default value is 20, with a minimum of 1 and a maximum of 50.
Note: Generally, higher concurrency is beneficial when migrating a large number of small partitions.

--temp-catalog

This option uses a temporarily saved version of the table catalog stored in the /tmp directory, if it exists. This is particularly useful when handling large catalogs, as it avoids downloading the catalog multiple times.

Supported Cloud Storages:

  • AWS
  • GCP
  • Azure
  • Linode

During the migration process, credentials to access these clouds will likely be required. These credentials need to be provided when prompted:

  • GCP: You need the path to the JSON file of the service account with access to the bucket.
  • AWS and Linode: Requires access key and secret key.
  • Azure: Account and key must be provided.

Pre-Migration Checks and Validations

Before creating resources and migrating partitions, the following checks are performed:

  • The source table does not have the merge setting enabled (use --allow-merge to skip this validation)
  • There are no running alter jobs on the source table
  • If filtering is applied, it validates that there are partitions remaining to migrate after filtering
  • If using the --reuse-partitions option, it checks that the storage where the partitions are located is shared between both clusters

Migrating Resources

This command migrates resources from one cluster to another (or even within the same cluster). It supports the following resources: projects, tables, transforms, functions, and dictionaries. These resources are cloned with the same settings to ensure uniqueness in the target cluster.

General Command Syntax

hdxcli --profile <source-profile> project --project <project-name> migrate <new-project-name> --target-profile <target-profile>

Explanation

The above command migrates a project (<project-name>) from the --profile specified as <source-profile> to a new project (<new-project-name>) in the <target-profile>. By default, it migrates all related resources in the project’s configuration tree, including tables and transforms.

Flags to Customize Behavior

  • --only: Migrates only the project without its related configuration tree resources (tables + transforms).
  • --functions: Includes functions associated with the project during migration.
  • --dictionaries: Includes dictionaries associated with the project during migration.
  • --no-rollback: Disables the rollback process in case an issue occurs during migration.

Cluster Connection Details

If you need to specify the target cluster's connection information directly:

hdxcli --profile <source-profile> project --project <project-name> migrate <new-project-name> \
  --target-cluster-hostname <target-cluster-hostname> \
  --target-cluster-username <target-cluster-username> \
  --target-cluster-password <target-cluster-password> \
  --target-cluster-uri-scheme <http/https>

Examples

Project Migration

  • Migrate a project with tables and transforms

Migrates the project hydrolix from the default profile to the test profile. The new project name will be new_hydrolix. This includes the project's tables and transforms.

hdxcli --profile default project --project hydrolix migrate new_hydrolix --target-profile test
  • Include functions and dictionaries

Same as above, but also migrates functions and dictionaries associated with the project.

hdxcli --profile default project --project hydrolix migrate new_hydrolix --target-profile test --functions --dictionaries
  • Migrate only the project

Migrates only the hydrolix project without any related tables or transforms.

hdxcli --profile default project --project hydrolix migrate new_hydrolix --target-profile test --only

Table Migration

  • Migrate a table with transforms

Migrates the table logs (within the hydrolix project) from the default profile to the test profile. The new table name will be new_logs (within the new_hydro project). This includes any transforms associated with the table.

hdxcli --profile default table --project hydrolix --table logs migrate new_hydro new_logs --target-profile test
  • Migrate only the table

Migrates only the logs table without any associated transforms.

hdxcli --profile default table --project hydrolix --table logs migrate new_hydro new_logs --target-profile test --only

Handling Interactive Prompts During Migration

In some scenarios, the CLI requires user input to handle specific resource configurations during the migration process. These cases ensure that important settings are either preserved, updated, or removed based on the user's decision.

Common Scenarios Requiring User Input

  • Default Storage Settings. If a table has a default storage configuration, the CLI prompts the user to choose how to handle it:
    • Preserve the current settings
    • Specify a new default storage ID
    • Remove the settings and use the cluster's default storage
  • Auto-Ingest Settings. For tables with auto-ingest configurations, the user can choose whether to keep or remove these settings during the migration.
  • Merge Pool Names. Tables with merge pool configurations prompt the user to specify how to handle these settings.
  • Summary Tables. If a table is a summary table, the CLI will request the new parent project.table for the summary query.

Example: Interactive Migration Workflow

Here is an example of how the CLI handles these prompts during a project and table migration:

hdxcli project --project hydrolix migrate new_hydrolix --target-profile test
[INFO] Migrating project 'new_hydrolix'...
[SUCCESS] Project 'new_hydrolix' Migrated

[INFO] Migrating table 'logs'...
  [WARNING] Storage settings found in the table 'logs'
      Default Storage ID: 24aa950d-71cc-4940-a34d-da4567cf838a
      Column Name: None
      Column Value Mapping: -
      
  [PROMPT] How would you like to proceed?
      1) Preserve all existing settings without any changes
      2) Specify a new default storage ID
      3) Remove the storage settings (use cluster default)
      Please enter your choice (1/2/3): 2
      Please enter the new default storage ID: 0d42b1e9-a1e7-4e5a-96c3-72bd05e580a8
[SUCCESS] Table 'logs' Migrated

[INFO] Migrating table 'summary'...
  [WARNING] Summary settings found in the table 'summary'
      The current parents for the summary table are: hydrolix.logs

  [PROMPT] Please enter a new project and table in 'project.table' format (leave blank to keep current)
      New parents 'project.table': new_hydrolix.logs

  [WARNING] Storage settings found in the table 'summary'
      Default Storage ID: 24aa950d-71cc-4940-a34d-da4567cf838a
      Column Name: None
      Column Value Mapping: -
      
  [PROMPT] How would you like to proceed?
      1) Preserve all existing settings without any changes
      2) Specify a new default storage ID
      3) Remove the storage settings (use cluster default)
      Please enter your choice (1/2/3): 3
[SUCCESS] Table 'summary' Migrated

Ingest

Batch Job

Create a batch job:

hdxcli job batch ingest <job-name> <job-settings>.json

job-name is the name of the job that will be displayed when listing batch jobs. job-settings is the path to the file containing the specifications required to create that ingestion (for more information on the required specifications, see Hydrolix API Reference).

In this case, the project, table, and transform are being omitted. hdxcli will use the default transform within the project and table previously configured in the profile with the set command. Otherwise, you can add --project <project-name> --table <table-name> --transform <transform-name>.

This allows you to execute the command as follows:

hdxcli job batch --project <project-name> --table <table-name> --transform <transform-name> ingest <job-name> <job-settings>.json

Stream

Create the streaming ingest as follows:

hdxcli stream --project <project-name> --table <table-name> --transform <transform-name> ingest <data-file>

data-file is the path of the data file to be used for the ingest. This can be .csv, .json, or compressed files. The transform has to have that configuration (type and compression).