The Hydrolix Data Platform

Hydrolix Overview

The Hydrolix platform is a high performance, petabyte-scale time-series database platform.

Designed to change the expectations and economics of managing massive amounts of data, the platform uses commodity cloud compute in combination with commodity storage such as Google Cloud Storage and Amazon S3, to provide on-box like performance across massive datasets, all hosted in a distributed systems environment.

By separating compute from storage workload independence is achieved, where ingest, lifecycle and query can be scaled dynamically, both horizontally and vertically without one affecting another. In addition further architectures can be deployed so that Role Based Performance Control can be achieved without the need for data duplication.

The Hydrolix platform offers significant cost savings while providing interactive query performance over large data sets.

A Data Platform for Massive datasets.

Existing solutions operate under the assumption that object storage is slow, and it is always fastest to work with local attached storage even thought this typically limits the amount of data that can be stored. The premise of Hydrolix is that object storage need not be "slow" in comparison to locally attached storage if the system is designed to leverage the strengths of the underlying cloud platform. Although cloud object storage is limited by network transfer speeds, it can serve massive, parallel operations on files, while local attached storage can only serve a few requests very quickly. As data sets and data consumers grow, local attached storage does not scale with the demand in either its ability to store, query or manage data cost effectively.

Scaling of Data and of the ingesting and querying platform with Hydrolix is significantly simpler than traditional databases. Traditional approaches involve many copies of the data across a number of architectures with varying levels of data detail and query performance. Noisy neighbour issues often become problematic as heavy users or systems often cause access or performance problems.

Hydrolix does not suffer from these shortcomings. Through a patented storage and retrieval technology, effectively creating an interface on top of cloud object storage, the Hydrolix platform can use distributed database storage at speeds of local attached storage. Significant research has gone into ensuring raw data and index's are highly compacted and cheaply de-compressible to reduce the overhead needed to fetch it from remote object storage.

Some highlights of the platform

  • Index choice need not be feared, Hydrolix is able to fully index all columns and store the data in object storage at 5-8% of the original size. No need to "choose" which columns should be indexed.
  • Data transfer is kept to a minimum. Metadata is stored that when processed at query time points the query servers to not just the partition that needs to be read, but the specific bytes within those files, minimizing the data transfer.
  • Massive Parallel Processing (mpp) is be used to query data, you decide how massive (and how fast) you want the cluster to be.
  • No need for Noisy Neighbours, have a Query sandbox (pool) for R&D and DevOps teams without affecting everyone else.
  • Workload independence, scale your ingest separately from your query architecture, no need to transfer data to new clusters, all data is written to a centralized store for all to access.
  • Improved reliability - no need for data and compute replication to ensure systems access. With system and database state held within centralized storage and by using stateless compute, clusters can be destroyed without impact to data integrity and new clusters created in minutes.

The platform is composed of numerous services:

Did this page help you?