The Hydrolix Query platform is a modular infrastructure that is focused at the extraction of data from the datastore (cloud commodity storage). As with other components within the platform it is independently scalable and configurable from other workloads. In addition it can be scaled using pools, where resources can be sandboxed and utilized by different teams and users.

The Query infrastructure has a number of access points:

  • Native Interface
  • JDBC


Query: Use it!

More information on using query can be found here - Query

Query Components

The components that make up the Query architecture.


Data is always stored centrally within the HDX DB Storage bucket, along with other configuration information used in the operating of the platform. Regardless of how many servers are created the same data repository is used. Data is not cached on servers either, with the servers themselves not retaining any state.

This means that servers can be destroyed and rebuilt quickly, without fear of data loss.

ComponentDescriptionScale to 0
App. Load-balancerAn application load-balancer used to route traffic to the Query Heads.Yes
Query HeadThe Query Head servers receive the requested queries and farm them out to the Query peers for execution.Yes
Query Peer PoolThe Query Peer servers are the workers that execute queries. They retrieve partitions from the HDX DB Storage Bucket and execute their portion of the query on that dataset. They feed their results back to the Query Head servers for the aggregate response to be supplied back to the end-user (or written to storage).Yes
ZookeeperZookeeper servers are used for cluster management of the Query Head and Peer Server Pool.Yes
HDX DB Storage BucketContains the database (including partitions), configuration and other state information concerning the platform. Forms part of the core infrastructure.No
CatalogContains metadata on the database, partitions and job tasks. Forms part of the core infrastructure.No

Query Pools & Role Based Performance Control (RBPC).


Query Pools: Use it!

To use Pools have a look here Service Pools

Due to the decoupled nature of the Hydrolix platform it is possible to extend traditional Role Based Controls to include performance requirements in accessing data.

Where a system or team require a specific or known level of performance, a separate pool of query servers can be sand-boxed away from the general populous, with individual resource groups can be assigned to groups (or roles) without the need for data or infrastructure duplication.

Noisy neighbour applications and heavy usage roles (such as batch-jobs, or dev ops staff) can have specific infrastructure assigned to them away from regular users ensuring they have access in a timely manner, with a specific level of performance.

In addition to this, pool groups support independent scaling. As demands rise and fall for certain pools, capacity can be adjusted independently to satisfy the need. Indeed, to save cost, its possible to scale an entire pool to 0 at times (for example at the weekend when for example staff are not there to access data), and then in a matter of minutes scale it back up to a level where users can use it.