Subsystem - Storage

Description the storage subsystem

Overview

Relevant components

The storage subsystem holds the raw data and state for a running cluster.

Bucket

Startup dependencies

  • storage must be accessible
  • catalog (postgres) must be accessible
  • a cluster configuration file config.json must be available (is this true?)

Filesystem layout

TODO: https://hydrolix.atlassian.net/wiki/spaces/~89001792/pages/1965424641/Paths
Using the above as a guide.

  • backups/keycloak_<version>-<timestamp> - periodic keycloak snapshots
  • config/v2/manifest.json - pointer to current configuration snapshot
  • config/v2/<org>_<increment>.json - sequence of configuration files for the cluster
  • db/hdx/<project>/<table>/data/v2/current/<shard>/<timestamp_min>-<timestamp_max>-<digest?>.hdx/ -
  • spill/raw/<partition folder>/<reverse timestamp millis>/<project_id>/<table_id>/<shard key hash>/<unique id>.tar.gz
  • logs/<podtype>/<pod>/<YYYY-MM-DD>/<timestamp>-<hash>.log.gz

TODO QUESTIONS

  • backup/keycloak: files extension is .enc -- does this mean encrypted? if yes, link to runbook on restoring from backup, etc.?

  • what is config/alt parallel to config/v2 which looks like the handoff spot in the storage directory for the periodically updated config blobs

Runtime dependencies

Runtime behavior

Configuration

What breaks if intake-head is failing

Effects of breakage

  • The system stops working.

Troubleshooting