Endpoint Errors

Issue: Client Connection Timeout

Check

Check Traefik service count is < 1.

kubectl get deployment/traefik -o wide

Fix

Scale Traefik via hydrolixcluster.yaml.

traefik:
   replicas: <Scale>

For more information, see the documentation on scale profiles.

Check

Check that the IP allowlist contains the requesting IP address.

Download the cluster configuration:

kubectl get hydrolixcluster -o yaml

Fix

  1. Add the IP to the allowlist:
kubectl get hydrolixcluster -o yaml > hydrolixcluster.yaml
  1. Add the IPs.
  2. Apply the changes to your cluster:
kubectl apply -f hydrolixcluster.yaml

📘

You can find more information on IP allowlists in the documentation.

Issue: Client Authorization Error

Check

Check if Query endpoint authentication is enabled.

Download your cluster configuration:

kubectl get hydrolixcluster -o yaml

In the configuration, look for enable_query_auth:

spec:
  enable_query_auth: true

Fix

If enabled, ensure the user is using the correct username and password or bearer token. For more information, see the query authentication documentation .

If query authentication is not enabled, enable it.

Check

Check if TLS is enabled.

Download your cluster configuration:

kubectl get hydrolixcluster -o yaml

In the configuration, look for use_tls:

spec:  
    use_tls: true

Fix

If TLS is enabled, check the client is connecting using a secure TLS connection.

If TLS is not enabled, enable it.

If using native protocol connection, secure uses port 9440 and non-secure uses port 9000.

Issue: HTTP 503 Service Temporarily Unavailable

Check

Check query-head count is at a minimum of 1.

kubectl get deployment/query-head -o wide

Fix

Scale query-head via hydrolixcluster.yaml:

query-head:
   replicas: <scale number>

For more information, see the documentation on scale profiles.

Check

Check query-peer count is at a minimum of 1.

kubectl get deployment/query-peer-o wide

Fix

Scale query-peer via hydrolixcluster.yaml:

query-peer:  
   replicas: <Scale>

For more information, see the documentation on scale profiles.

Pools

If using Pools, check that the pool has query-peers.

Issue: Database Exceptions

Including:

  • DB::Exception: HdxStorageError: No peers available to run query in pool
  • DB::Exception: Database '_local' doesn't exist
  • DB::NetException: Net Exception: No route to host

Check

Check query-peer instance count is a minimum of 1:

kubectl get deployment/query-peer -o wide

Fix

Scale query-peer via hydrolixcluster.yaml:

query-peer:
   replicas: <Scale>

For more information, see the documentation on scale profiles.

Check

Check that the Zookeeper Instance count is 3:

kubectl get deployment/zookeeper -o wide

Fix

Scale zookeeper via hydrolixcluster.yaml:

zookeeper:
   replicas: <Scale>

For more information, see the documentation on scale profiles.

Issue: Database Timeout

Including:

  • DB::NetException: Timeout: connect timed out

Check

Check Zookeeper Instance count is 3:

kubectl get deployment/zookeeper -o wide

Fix

Scale zookeeper via hydrolixcluster.yaml:

zookeeper:
   replicas: <Scale>

For more information, see the documentation on scale profiles.

Issue: Database Lost Connection

Including:

  • DB::NetException: Error: Lost connection to the database server. (version 3.x.x)>. (STD_EXCEPTION)

Check

This should be highly transient and should only occasionally happen when a Query Head isn’t used for very long periods of time (days or weeks).

Fix

Retry the query.

If this doesn’t resolve, check that the PostgreSQL instance is suitably scaled. Increase CPU or Memory if required.

  1. Download the cluster configuration:
kubectl get hydrolixcluster -o yaml > hydrolixcluster.yaml
  1. Update the scale.
  2. Apply the configuration to your cluster:
kubectl apply -f hydrolixcluster.yaml

Check

Check that PostgreSQL is running and in a healthy state.

kubectl describe statefulset postgres

Fix

Update the scale of PostgreSQL (Memory, CPU, Disk).

  1. Download the cluster configuration:
kubectl get hydrolixcluster -o yaml > hydrolixcluster.yaml
  1. Update the scale.
  2. Apply the configuration to your cluster:
kubectl apply -f hydrolixcluster.yaml

If PostgreSQL is failing, even if resolved this is a Severity 1 incident and Hydrolix should be notified ASAP regardless of resolution.

Syntax and User Query Errors

Check

Unknown Table/Project.

DB::Exception: Table _local.XXXX does not exist

Fix

The project must be specified as well as the table, e.g.:

SELECT x FROM project.table WHERE ...

Check

Unknown Table/Project. Table and/or project has a hyphen in the name, e.g. my-table.my-project

Expected one of: token, Dot, UUID, alias, AS, identifier, FINAL, SAMPLE, table, table function...

Fix

Add backticks (`) around the table and project names.

select x from `my-project`.`,my-table` where...

Other Errors

When a query error occurs with syntax or execution errors, the system can be overly communicative. Errors can be large and contain a lot of information. A good place to start when debugging is the first couple of lines: they often contain the reason for the error. For example, the following shows an unknown table and database has been requested:

query-peer :) select thisdoesntexist from some.table

SELECT thisdoesntexist
FROM some.table

Query id: f8f0d160-cc58-4466-9c4a-011b5067a152

[query-head-7b65c8b674-dsnlv] 2022.08.31 17:00:10.197969 [ 10 ] {f8f0d160-cc58-4466-9c4a-011b5067a152} <Error> executeQuery: Code: 81. DB::Exception: Database some doesn't exist. (UNKNOWN_DATABASE) (version 22.3.2.1) (from 0.0.0.0:0) (in query: select thisdoesntexist from some.table), Stack trace (when copying this message, always include the lines below):

0. StackTrace::StackTrace() @ 0xac3418c in /usr/bin/turbine_server
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xabe9fad in /usr/bin/turbine_server
2. DB::DatabaseCatalog::assertDatabaseExistsUnlocked(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const @ 0x1129bbd7 in /usr/bin/turbine_server
3. DB::DatabaseCatalog::getDatabase(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const @ 0x1129d9a8 in /usr/bin/turbine_server
4. DB::Context::resolveStorageID(DB::StorageID, DB::Context::StorageNamespace) const @ 0x111e5e8c in /usr/bin/turbine_server
5. DB::JoinedTables::getLeftTableStorage() @ 0x11711c1e in /usr/bin/turbine_server
...

Query Circuit Breaker Errors

Hydrolix can apply Circuit Breakers to a Query that help protect the infrastructure from abuse. More information on these can be found here - https://docs.hydrolix.io/docs/query-circuit-breakers.

ErrorCircuit Breaker Information
DB::Exception: HdxStorageError Maximum number of rows exceeded for query: XXX rows (maximum is YYY)https://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_rows
DB::Exception: Limit for result exceeded, max bytes: XXX B, current bytes: YYYhttps://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_result_bytes
DB::Exception: Limit for result exceeded, max rows: XX, current rows: YYY thousandhttps://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_result_rows
DB::Exception: h::db::HdxStorageError: <HdxStorageError Maximum time range exceeded for query: XXX seconds (maximum is YYY)https://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_timerange_sec
DB::Exception: h::db::HdxStorageError: <HdxStorageError hdx_query_timerange_required is set to true. Your query needs a time range filter in a WHERE clausehttps://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_timerange_required
DB::Exception: h::db::HdxStorageError: <HdxStorageError Maximum number of partitions exceeded for query: XXX partitions (maximum is YYY)https://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_partitions
DB::Exception: Timeout exceeded: elapsed XXX seconds, maximum: YYY.https://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_execution_time
DB::Exception: Limit for number of columns to read exceeded. Requested: 2, maximum: 1.https://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_columns_to_read
DB::Exception: Memory limit (for query) exceeded: would use 16.00 KiB (attempt to allocate chunk of 4096 bytes), maximum: 1.00 KiB. (MEMORY_LIMIT_EXCEEDED)https://docs.hydrolix.io/docs/query-circuit-breakers#hdx_query_max_memory_usage

Query Components

Accessible via the path: https://<yourhost>.hydrolix.live/query/

ComponentUsed to
Traefik - Application Load-balancerRoutes requests to appropriate endpoints. Requests to the path /query or via port 9440(TLS)/9000 are routed to query head.
Query-HeadReceives all incoming queries for data and farms out work to query-peers for query execution. Aggregates response to be sent to the end-user/application.
Query-PeerQuery workers that retrieve partitions and execute query logic. Can have “pools” of peers for different workloads/teams.
ZookeeperInforms the Query Heads of available Query peers.
Catalog (Postgres)Stores information on the basic storage structure and partitions of the data within Cloud Storage (GCS, S3 etc). Includes a persistent volume within Kubernetes.
Cloud Storage BucketStorage bucket (GCS, S3 etc) containing the “stateful” data required to run the system, including configuration files (/config/), database (/db/) and a copy of the system logs (/logs/).