Skip to content

Catalog Metadata

Every table makes metadata about itself available through special virtual tables that you can query through Hydrolix's SQL API or query UI. These include catalogs of the files and directories that hold tables' data and indexes, and descriptions of the columns and time-intervals held by each table's constituent partitions.

Catalog⚓︎

A catalog virtual table contains one row for every partition used by its associated fluid table. The row's data include that partition's filesystem path, the span of time-series data it contains, and the amount of storage space it takes up.

Catalog columns⚓︎

In the following table, "this partition" means the particular partition of fluid table data that each catalog row describes.

Name Purpose
partition This partition's filesystem path, relative to the table's cloud-storage location.
min_timestamp The earliest timestamp of the contiguous time-series data held by this partition.
max_timestamp The latest timestamp of the contiguous time-series data held by this partition.
manifest_size The size of this partition's manifest file, in bytes.
data_size The size of this partition's data file, in bytes.
index_size The size of this partition's index file, in bytes.

Query a table's catalog⚓︎

Query a table's catalog using the special view name .catalog.

Concatenate the table and view with # and enclose the combined table and view name inside backticks.

The sample queries demonstrate how to query the catalog of the cluster's table hydro.logs.

SELECT * FROM hydro.`logs#.catalog` LIMIT 10

To view the total table size in bytes, use

1
2
3
SELECT
  sum(manifest_size + data_size + index_size) as total_size
FROM hydro.`logs#.catalog`

Use catalog access to analyze effects of configuration changes, especially in the ingestion software. Hydrolix recommends new partitions are created between 1 GB and 3 GB in size. The merge system compacts partitions to reach optimal storage size of 2 GB and 4 GB per partition.

Visualize the distribution of partition sizes per day for the last ten days with this query.

WITH 1048576 as mb, 1073741824 as gb
SELECT
  toStartOfDay(min_timestamp) as time,
  count(rows) as partitions,
  countIf(mem_size < 1 * mb) as "< 1MB",
  countIf(mem_size >= 1 * mb and mem_size < 5 * mb) as  "< 5MB",
  countIf(mem_size >= 5 * mb and mem_size < 50 * mb) as " < 50MB",
  countIf(mem_size >= 50 * mb and mem_size < 500 * mb) as " < 500MB",
  countIf(mem_size >= 500 * mb and mem_size < gb) as " < 1 GB",
  countIf(mem_size >= gb and mem_size < 2 * gb) as " < 2 GB",
  countIf(mem_size >= 2 * gb and mem_size < 3 * gb) as " < 3 GB",
  countIf(mem_size >= 3 * gb and mem_size < 4 * gb) as " < 4 GB"
FROM hydro."logs#.catalog"
where min_timestamp >= now() - INTERVAL 10 DAY
GROUP BY time
ORDER BY time DESC