The Hydrolix Platform is deployed with a number of "Core" and base components that are required to operate the system. The main core components are the Bastion, Database bucket and the Catalog.

The Database bucket

The Database bucket contains the platform configuration, table and database configurations, platform tuneables, logs and the data files encoded in the HDX file format themselves. Database files are stored as time defined partitions containing the raw data and the indexes. The bucket name will be in the format of hdxcli-xxxxxx.

For example an AWS deployed service has the following directory structure:

[email protected]:~$ aws s3 ls hdxcli-xxxxyy99
                           PRE cf_templates/
                           PRE config/
                           PRE db/
                           PRE hdxinf/
                           PRE logs/
                           PRE results/
                           PRE secrets/

The Catalog

The Catalog is a deployed database instance that is used as the 'Glue' to stitch the stateless and the stateful components together. It contains metadata on the data partitions that are stored within the Database Storage bucket and information on the jobs and tasks that have been executed as part of the ingest load.

The Catalog can be queried through the Query infrastructure using the reserved view for each table #.catalog.

query-peer :) select * from sample.`my_data#.catalog` limit 1

SELECT *
FROM sample.`my_data#.catalog`
LIMIT 1

Query id: 2627bd78-bc1b-4e4d-bff1-1d30789d9d69

┌─partition────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────min_timestamp─┬───────max_timestamp─┬─manifest_size─┬─data_size─┬─index_size─┬─rows─┬─mem_size─┬─root_path─────────────────────────────────────────────────────────────────┬─shard_key─┐
│ b78ff71c-639e-4da4-b194-732630e6b5bb/55a880fb-a606-4bec-88a6-277c6bc9ec03/data/v2/current/1230768000-1435622400-e9a81367ca4523ea.hdx │ 2009-01-01 00:00:00 │ 2015-06-30 00:00:00 │           880 │     15815 │         29 │ 5000 │   141845 │ b78ff71c-639e-4da4-b194-732630e6b5bb/55a880fb-a606-4bec-88a6-277c6bc9ec03 │           │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┴─────────────────────┴───────────────┴───────────┴────────────┴──────┴──────────┴───────────────────────────────────────────────────────────────────────────┴───────────┘

👍

Query Metadata

To know more about table metadata take a look here - Querying table metadata

The Bastion

The Bastion is a deployed instance that is used as a gateway to the underlying Platform. The different service types (query-peer, stream peers etc) can be accessed directly via the bastion and on-box logs, Podman containers etc can be inspected. Access is only allowed through IP Allow-lists (by default public access is removed) and SSH.

Base Components

The base components involve things like networking, public IP's, security permissions/roles and groups and various other platform specific components that are needed to operate and run the data platform. More information on these can be found within the respective Cloud platform sections.


Did this page help you?