Sizing Guidlines

Hydrolix understands that data comes in all shapes and sizes. For this reason Hydrolix supplies the following guidance on how to scale your architecture so that it is most suitable for your data.

Batch Intake

Scaling of Batch Intake is typically determined by the size of the data to be imported - the size of the files to be imported directly affect memory utilization. For this reason, Hydrolix recommends the r5 AWS instance type.

Files Under 2GB Compressed (~20GB RAW)

the recommended instance is r5.2xlarge

Files greater than 2GB Compressed

Batch peers will consume RAM roughly 4x the size of the RAW data and 10x the size of compressed. For files larger than 2GB. Apply the following formula:

(Max batch file size) * 10 = Instance memory requirements

Example: 8 GB * 10 = 80 GiB

In this case the recommended instance size will be r5.4xlarge.

Memory (GB) Instance Size
64 r5.2xlarge
128 r5.4xlarge
256 r5.8xlarge
384 r5.12xlarge
512 r5.16xlarge
768 r5.24xlarge

Streaming Intake

There is (theoretically speaking) no hard limit on the size of messages Hydrolix can ingest the Stream API. The only practical consideration is ensuring that the Stream Head instances have sufficient RAM to handle the (uncompressed) size of each message. Behind the scenes, the stream head will split messages up as needed to fit within the limits of Knesis.

Query

Events Per Second Recommended Partition Size (Mins) Max Rows Partitions per Day Recommended Query Peer Instance Type Available Cores Partitions per core
18,500 60 66,600,000 24 1x c5n.xlarge 3 8
37,000 30 66,600,000 48 1x c5n.2xlarge 7 7
74,000 15 66,600,000 96 1x c5n.4xlarge 15 6
222,000 5 66,600,000 288 1x c5n.9xlarge 35 8
1,110,000 1 66,600,000 1440 1x c5n.18xlarge 213 7