Skip to content

Query

Query enables you to read data out of your Hydrolix cluster. You can scale query resources with Query Pools, which let you sandbox different workflows to avoid "noisy neighbor" performance issues.

You can query a Hydrolix cluster in the following ways:

The Hydrolix Connector for Apache Spark does not depend on all of the above query subsystem components. It offers a way to query Hydrolix partitions requiring only access to the Config API which mediates access to the catalog database. The Spark software retrieves and reads the Hydrolix partitions itself.

Components⚓︎

The following components comprise Hydrolix's Query architecture:

Diagram of Query system and supporting components

Data is always stored centrally within the Hydrolix Database storage bucket, no matter how many query pools you create. Because query peers are stateless, they do not cache data in between queries.

Component Description Scale to 0
Load Balancer An application load-balancer used to route traffic to the Query Heads. Requests to the path /query or via port 9440(TLS)/9000 are routed to Query Heads. Yes
Query Head Receives queries. Delegates sub-queries on partition ranges to Query Peers. Aggregates Query Peer responses to generate a query result. Yes
Query Peer Pool The workers that execute queries. They retrieve partitions from the Hydrolix Database storage bucket and execute their portion of the query on that dataset. Then they return the result to the Query Head. Yes
ZooKeeper Used for cluster management of the Query Head and Query Peer pools. The Query Head learns of Query Peer availability through ZooKeeper. Yes
Hydrolix Database Storage Bucket Contains the partitions that comprise the database. Part of the core infrastructure. No
Catalog Contains metadata regarding data stored in Hydrolix. Part of the core infrastructure. No

🚲 Try a Query

To get started with writing your first query, see Query.

Query Pools⚓︎

Because Hydrolix query infrastructure is decoupled from the storage layer, you can create separate query pools for different groups of users. For example, you might configure separate sandboxes for administrator queries, interactive analyst queries, and monitoring queries.

Pool groups support independent scaling, so capacity for each pool can adjust automatically to satisfy demand. You can even scale an entire pool to zero when demand is negligible -- for example, over the weekend when staff have no need to access data. When demand returns, you can scale the pool back up within minutes.

Conceptual visual of query pools devoted to specific usage or users

🛠️ Configure Query Pools

To configure query pools in your Hydrolix cluster, see Resource Pools.