The Hydrolix Data Platform

Hydrolix Overview

The Hydrolix platform is a high performance, petabyte-scale time-series database platform. The Hydrolix platform offers significant cost savings while providing interactive query performance over large data sets.

Designed to change the expectations and economics of managing massive amounts of data, the Hydrolix platform uses commodity cloud compute in combination with commodity storage such as Google Cloud Storage and Amazon S3, to provide SSD-like performance across massive datasets, all hosted in a distributed systems environment.

Workload independence is achieved by separating compute from storage. Ingest, query and other data lifecycle services can be independently auto-scaled, both horizontally and vertically. Furthermore, dedicated compute resources can be used to enable Role Based Performance Control without the need for any data duplication.

1266

A Data Platform for Massive Datasets.

Legacy data platforms operate under the assumption that object storage is slow, and it is always fastest to work with local attached storage (or data cached in memory) even thought this typically limits the volume of data that can be stored and queried. Hydrolix was founded on the idea that object storage could deliver similar performance to locally attached storage if the storage format and query execution engine were re-designed to leverage the strengths of the cloud object storage. Although cloud object storage is penalized by network latency, it is capable of servicing massively parallel operations. As data sets and data consumers grow, local attached storage does not scale with the demand in either its ability to store, query or manage data cost effectively.

Independent auto-scaling of ingestion and querying infrastructure is significantly simpler with Hydrolix than
with traditional databases. Traditional approaches involve many copies of the data across a number of shards with varying levels of data detail and query performance. Noisy neighbor issues often become problematic as heavy users or systems often cause access or performance problems, and compute resources often remain under-utilized due to the static assignment of data to specific nodes.

Hydrolix does not suffer from these shortcomings. Our patented storage and retrieval technology establishes a high-performance interface on top of cloud object storage. The Hydrolix platform can execute sub-second queries directly against decoupled data at speed comparable to locally attached storage. Significant research has gone into ensuring that (lossless) data and indexes are highly compacted and cheaply de-compressible to avoid the cost and overhead of full-table or full-column scans.

Some highlights of the platform

  • Fearless indexing and retention. Hydrolix is able to fully index every dimension in a table and still reduce storage costs by 95% (on average, 55GB per 1TB of raw data). With Hydrolix, there is no need to "choose" which columns should be indexed.
  • Exceptional "needle in a haystack" query performance. Per-column indexes enable Hydrolix query servers to selectively read relevant byte-ranges within each column, avoiding the kind of brute-force full-table and full-columns scan generally associated with serverless databases and data lake technology in general. By minimizing the amount of data that needs to be transferred on each query, Hydrolix also reduces costs and avoids any dependency on caching.
  • Flexible Massive Parallel Processing (fmpp). Because data is not statically partitioned, customers can decide on a per-query basis how massive (and how fast/expensive) they want each query to be without the time (and expense) of shard re-balancing.
  • Sandbox isolation. Hydrolix avoids "noisy neighbor" problems by allowing per-team compute resources (ie, query pools) so that individual workloads can be run independently from one another, sharing only a single copy of the data.
  • Workload independence. Ingest, Query and Data lifecycle infrastructure can each be auto-scaled independently, and dynamically resized without downtime. All services within Hydrolix are optimized for stateless operations, relying only on cloud object storage as a centralized "source of truth".
  • Improved reliability and data durability. Stateless computing means that data is stored solely in decoupled object storage. Individual resources can therefore be scaled down, upgraded or destroyed at any time without any impact to data integrity. New clusters can be created in minutes.

The platform is composed of numerous services:

Advanced Options