Ingest API Options

Hydrolix data is partitioned by table then time. All events for a particular table and time interval go into a single partition.

Time intervals for event data fall into two separate classifications:

  • recent (or “hot”) data
  • historical (or “cold” data) .

The width of the time interval as well as how many intervals may be active at once is configurable.

Example Configuration settings

Click arrow to expand

	"settings": {
        "stream": {
            "max_minutes_per_partition": 15,
            "max_rows_per_partition": 33554432,
            "hot_data_max_age_minutes": 15,
            "hot_data_max_active_partitions": 4,
            "hot_data_max_rows_per_partition": 1000000,
            "hot_data_max_minutes_per_partition": 5,
            "hot_data_max_open_seconds": 60,
            "hot_data_max_idle_seconds": 30,
            "cold_data_max_age_days": 365,
            "cold_data_max_active_partitions": 5,
            "cold_data_max_rows_per_partition": 1000000,
            "cold_data_max_minutes_per_partition": 15,
            "cold_data_max_open_seconds": 60,
            "cold_data_max_idle_seconds": 30
        }
    }

Element Description Default
max_minutes_per_partition Sets the target partition size in minutes. 1
max_rows_per_partition Sets the target number of rows to be stored in a partition. This causes events to be buffered until the total number of rows are reached. 1000000
hot_data_max_age_minutes concerns the primary of a data row. A primary older than this duration is considered too old to be recent 15
hot_data_max_active_partitions the maximum number of partitions to keep open at any one time 4
hot_data_max_rows_per_partition the maximum size (measured in number of rows) to allow any open partition to reach 1000000
hot_data_max_minutes_per_partition the maximum width in time of a partition. This is the maximum allowable distance between the newest and oldest primary of rows in the partition 5
hot_data_max_open_seconds the maximum duration (in wall clock time) to wait for events to trickle in for a recent-data partition 60
hot_data_max_idle_seconds the maximum duration to wait for new data to appear at all before automatically closing an open partition 30
cold_data_max_age_days concerns the primary of a data row. A primary older than this duration is considered too old to be worth indexing at all and will be consigned to the scrap heap of history. 365
cold_data_max_active_partitions the maximum number of partitions to keep open at any one time 5
cold_data_max_rows_per_partition the maximum size (measured in number of rows) to allow any open partition to reach 1000000
cold_data_max_minutes_per_partition the maximum width in time of a partition. This is the maximum allowable distance between the newest and oldest primary of rows in the partition 15
cold_data_max_open_seconds the maximum duration (in wall clock time) to wait for events to trickle in for a recent-data partition 60
cold_data_max_idle_seconds the maximum duration to wait for new data to appear at all before automatically closing an open partition 30

Hot vs Cold Data - an event is determined as Hot or Cold based on the primary datetime object specified in the incoming event. Hot data is defined as an event received within the hot_data_max_age_minutes window from now. Cold data is defined as data that is late arriving event data that is beyond the hot_data_max_age_minutes. The ability to determine Hot vs Cold is useful when a portion of event logs may be received late, for example CDN logs where a 95% of logs are provided within 15 minutes and the last 5% are supplied within 24 hours.