How Turbine Works
Using architectural independence between storage, ingest, and query, Turbine is able to scale based on the workload required of the data. By combining commodity storage and compute with our unique, cluster-free architecture enables business owners to scale dynamically, meeting the demands of data ingest while delivering the same sub-second query performance of an expensive, clustered SSD solution all on one platform.
Built as a stateless architecture, Hydrolix Turbine does not need knowledge of previous interactions and does not store session information. Storage is completely independent of compute resources Turbine compute scales horizontally because any of the available compute resources can service any request. Without stored session data, you can simply add more compute resources as needed. When that capacity is no longer required, you can safely terminate those individual resources, after running tasks have been drained.
What’s special about Turbine is its ability to filter & retrieve records from remote object storage at a very fast rate without having to download all the data. Conventional wisdom says local attached disk is the fastest. It is when you are serving small amounts of data to a limited number of requests. To scale for more data, more requests, you add more nodes with more local attached storage. Turbine leverages the massive scale of object storage, it’s unparalleled compression to minimize the bytes transfered, and metadata to retrieve only the remote bytes needed. Copies of data are not required to serve more requests.
Turbine keeps the raw data highly compacted and cheaply decompressible to reduce the overhead needed to fetch it from remote object storage.
Turbine is a high performance, petabyte-scale time-series database & service. It separates compute and storage, allowing them to scale independently by relying on object storage services like Google Cloud Storage and Amazon S3 to store and retrieve data. The separation of compute and storage enables the ingest and query services to be completely stateless, allowing them to scale dynamically and independently. This offers significant cost savings while providing interactive query performance over large data sets.
Existing solutions operate under the assumption that object storage is slow, and it is always fastest to work with local attached storage. The premise of Turbine is that object storage is not “slow”, and local attached storage ultimately does not scale. Although cloud object storage is limited by network transfer speeds, it can serve massive, parallel operations on files, while local attached storage can only serve a few requests very quickly. As data sets and data consumers grow, local attached storage is not a scalable model for resource or cost optimization.
Cloud object storage can massively, parallely serve files. Large amounts of data can be operated in parallel, while local attached storage can only serve a few requests very fast.
Scaling with Turbine looks like many observers, with one or more (few) copies of the data, whereas traditional approaches involve many copies of the data with lots of synchronization between them and further copying for distributed systems management.
Turbine achieves this through patented storage and retrieval technology, effectively creating an interface on top of cloud object storage to allow it to be used as database storage at speeds of local attached storage. A lot of thought was given to keep the raw data highly compacted and cheaply decompressible to reduce the overhead needed to fetch it from remote object storage.
- On ingestion, Turbine is able to fully index all columns and store the data in object storage at 5-8% of the original size.
- Metadata is stored that is processed on query to point Turbine to not just the files, but the specific bytes containing the requested data
- On query, a Turbine QueryHead processes the query and assigns the work to QueryPeers. The number of QueryPeers is up to the user.
- Because the data is so highly compressed, the amount of data transferred from S3 to the QueryPeers is minimal and very fast.
- On each QueryPeer, the data is decompressed and the result is returned to the QueryHead.
- The QueryHead combines the results and returns them.
Only the data required to answer the query is pulled from storage, not full files containing the single data point to answer the query.
The HDX file format is designed so that remotely stored records can be filtered and retrieved with the help of meta information.
Turbine service is composed of the following services:
- Index Service
- Query Service
- Merger Service
- Catalog Service