Dynamic Ingest Routing
Route data to different ingest pools using headers and query parameters.
Feature introduced in v5.7.4
Overview
Dynamic ingest routing forwards requests arriving at the default ingestion URL (/ingest/event) to different intake processing pools based on the HTTP headers or query parameters of the incoming request.
The Hydrolix cluster ingest system respects these headers and query parameters on requests arriving to the default ingest endpoint and will route the traffic to the appropriate intake-head pool using rules of precedence and rules length priority if necessary.
If no headers or query parameters match, the default intake-head pool will process the incoming request.
This feature is implemented in the cluster's reverse proxy, Traefik, which lies in front of the ingestion system. It's useful in clusters with multiple resource pools for the HTTP Stream API service.
Dynamic routing with headers
The reverse proxy can inspect a standard or configurable set of HTTP headers and match header values exactly or by regular expression to select a non-default intake pool.
Default headers
You can configure dynamic routing using the following standard headers:
x-hdx-tablex-hdx-transformx-hdx-project- only used for dynamic ingest routing
Clients frequently specify the destination table and transform in HTTP headers. The dynamic ingest routing feature allows the cluster administrator to select an alternate intake-head pool without updating the endpoint in the client configuration.
Enable dynamic routing with headers
Update your Hydrolix cluster config for an intake-head pool with the following routing block:
pools:
routing_demo:
routing:
headers:
x-hdx-table: my_project.my_table
x-hdx-transform: my_transform
name: routing_demo
service: intake-head
With the above configuration, incoming requests matching both headers x-hdx-table: my_project.my_table and x-hdx-transform: my_transform will be routed through the intake-head pool called routing_demo. Requests matching only one of the two specified headers are sent through the default intake-head pool.
For more information on creating and updating service pools, see the Resource Pools page.
Custom headers
Set the traefik_service_allowed_headers tunable to specify custom headers for controlling intake-head routing.
Customize the traefik_service_allowed_headers tunable by listing every header your routing rules use.
If the list is empty, Traefik defaults to x-hdx-table, x-hdx-transform, and x-hdx-project.
To add a custom header, include these default headers and any additional ones as needed.
Custom header keys should match the entries passed in the pool annotations. Values must match an existing pool name.
The following Hydrolix spec configures only HTTP header x-hdx-intake-pool for dynamic routing:
spec:
pools:
intake-head-private-pool:
routing:
headers:
x-hdx-intake-pool: intake-head-private-pool
name: intake-head-private-pool
service: intake-head
cpu: "8"
memory: 8Gi
replicas: "5"
traefik_service_allowed_headers: ['x-hdx-intake-pool']
Incoming traffic sent to the default ingest endpoint /ingest/event and containing the header key/value pair x-hdx-intake-pool: intake-head-private-pool will be routed through the intake-head-private-pool ingest pool.
All other incoming requests use the default ingest pool.
Regex matching on headers
Regular expression matching on headers is supported.
pools:
routing_demo:
routing:
headers:
x-hdx-table: regex|company[.]cdn.*
name: routing_demo
service: intake-head
Prefix the value with regex| to signal the value is a regular expression.
Any incoming request with header x-hdx-table matching the regular expression company[.]cdn.* will be sent through the routing_demo intake pool.
Dynamic routing with query parameters
The reverse proxy can inspect a standard or configurable set of query parameters and match values exactly or by regular expression to select a non-default intake pool.
Default query parameters
Clients frequently use query parameters to specify table and transform. The dynamic ingest routing feature allows the cluster administrator to select an alternate intake-head pool without updating the endpoint in the client configuration.
For example, the following Hydrolix configuration spec:
pools:
intake-head-private-pool:
routing:
query_params:
table: my_project.my_table
transform: my_transform
name: intake-head-private-pool
service: intake-head
allows a user to send data to intake-head-private-pool with a request using the specified query parameters:
POST /ingest/event?table=my_project.my_table&transform=my_transform HTTP/1.1
Host: {myhost}.hydrolix.live
Content-Type: application/json
Custom query parameters
Set the traefik_service_allowed_query_params tunable to specify custom query parameters for controlling intake-head routing.
When customizing the tunable traefik_service_allowed_query_params list, you must specify the full list of query parameters used in any routing rules. When the list is empty, only table and transform are used, as demonstrated above. If setting a custom query parameter with this tunable, include all the standard query parameters already in use.
Custom parameter keys should match the entries passed in the pool annotations. Values must match an existing pool name.
The following Hydrolix spec configures only query parameter intake-pool for dynamic routing:
spec:
pools:
intake-head-private-pool:
routing:
query_params:
intake-pool: intake-head-private-pool
name: intake-head-private-pool
service: intake-head
cpu: "8"
memory: 8Gi
replicas: "5"
traefik_service_allowed_query_params: ['intake-pool']
Then incoming traffic sent to the default ingest endpoint with the specified query parameter:
https://{myhost}.hydrolix.live/ingest/event?intake_pool=intake-head-private-pool
will be routed through the intake-head-private-pool ingest pool.
When customizing the tunable traefik_service_allowed_query_params you must specify the full list of query parameters. When the list is empty, the query parameters table and transform are used, as demonstrated above. If enabling a custom query parameters with this tunable, include all the standard query parameters already in use.
Regex matching on query parameters
Regular expression matching on query parameters is supported.
pools:
intake-head-private-pool:
routing:
query_params:
table: regex|company[.]cdn.*
name: intake-head-private-pool
service: intake-head
Prefix the value with regex| to signal the value is a regular expression.
Any incoming request with query parameter table matching the regular expression company[.]cdn.* will be sent through the intake-head-private-pool intake pool.
Route from one intake pool to multiple tables
Traffic streams with different destination tables can route through the same intake-head pool. For example, the following Hydrolix cluster configuration:
pools:
multi-table-routing-demo:
name: multi-table-routing-demo
service: intake-head
routing:
headers:
x-hdx-table: demoproject.demotable
x-hdx-transform: demotransform
query_params:
table:
- project.table0
- regex|project2[.]table[0-9]+
This results in the following Traefik proxy configuration:
PathPrefix(`/pool/multi-table-routing-demo`) ||
(PathPrefix(`/ingest`) && (Header(`x-hdx-table`,`demoproject.demotable`) && Header(`x-hdx-transform`,`demotransform`)) ||
(PathPrefix(`/ingest`) && (Query(`table`,`project.table0`) || QueryRegexp(`table`,`project2.table[0-9]+`))
With this configuration, traffic that meets any of the following requirements is directed through the multi-table-routing-demo intake pool:
- Traffic sent directly to the
multi-table-routing-demopool. - Traffic sent to the default ingest endpoint with the header
x-hdx-table: demoproject.demotableand the headerx-hdx-transform: demotransform. - Traffic sent to the default ingest endpoint with the query parameter
table=project.table0or atablequery parameter matching the regexproject2[.]table[0-9]+. For example,table=project2.table0andtable=project2.table99.
As a result, traffic for the tables demoproject.demotable, project.table0, and any number of tables in project2 route through the multi-table-routing-demo pool.
See Reverse proxy configuration for an explanation of Traefik routing rules.
Reverse proxy configuration
Dynamic routing configuration updates the Traefik configuration. It uses Traefik rules and priority to determine which ingest pool should handle an incoming request.
Dynamic ingestion rules are only executed on requests that arrive at the default pool (/ingest/event). Incoming requests to any non-default pool (/pool/{pool_name}/ingest/event) are handled exclusively by that pool.
Traefik matchers
The Hydrolix operator dynamically reconfigures the traefik matchers according to instructions in the cluster spec file.
The following matchers are used:
PathPrefix- explicit routing to a specific pool, always present for each defined poolHeader- exact string match on HTTP header and valueQuery- exact string match on query parameter and valueHeaderRegexp- regular expression match on a value in a specific HTTP headerQueryRegexp- regular expression match on a value in a specific query parameter
Verify Traefik config updates
If you haven't already, install K9s.
If an existing pool called intake-head-private-pool is updated with the following routing configuration:
spec:
pools:
intake-head-private-pool:
name: intake-head-private-pool
service: intake-head
spec:
pools:
intake-head-private-pool:
name: intake-head-private-pool
routing:
headers:
x-hdx-header: my_project.my_table
query_params:
intake-pool: private
service: intake-head
You can confirm the Traefik configuration has been updated using the following steps.
- Start k9s from a shell:
k9s. - Open up the pods selector by entering:
:pods. - Select the Traefik pod and shell into the
traefikcontainer using the commands. - Run the following command:
watch -n 1 grep -B8 -A2 'PathPrefix\(\`/pool/intake-head-p' /etc/traefik/dynamic_conf.yaml
After a few minutes of latency at most, you will observe the following changes:
http:
routers:
slash-pool/intake-head-private-pool-router:
rule: PathPrefix(`/pool/intake-head-private-pool`)
service: intake-head-private-pool
http:
routers:
slash-pool/intake-head-private-pool-router:
rule: PathPrefix(`/pool/intake-head-private-pool`) || (PathPrefix(`/ingest`) && Header(`x-hdx-table`, `my_project.my_table`)) && Query(`intake-pool`, `private`)
service: intake-head-private-pool
Rules and priority
Rules and priority within a Hydrolix cluster therefore respect the following descending order of precedence:
-
Explicit pool endpoint, PathPrefix(
/pool/pool-name): Which endpoint the request arrives at, if the endpoint is a non-default ingest endpoint. For example, requests arriving athttps://{myhost}.hydrolix.live/pool/{pool_name}/ingest/eventwill be handled by the ingest pool called{pool_name}regardless of the headers or query parameters included. -
Default ingestion endpoint and query parameters, PathPrefix(
/ingest) and Query Parameters (Query(key, value)): For example, a request sent tohttps://{myhost}.hydrolix.live/ingest/event?table=my_project.my_table&transform=my_transformand the headerx-hdx-myheader: secondary_poolwith the following cluster configuration:spec: pools: custom-ingest-pool: routing: query_params: table: my_project.my_table transform: my_transform name: custom-ingest-pool service: intake-head secondary-pool: routing: headers: x-hdx-myheader: secondary_pool name: secondary-pool service: intake-headwould be handled by
custom-ingest-poolrather thansecondary-pool. -
Default ingestion endpoint and HTTP headers, PathPrefix(
/ingest) and HTTP Headers (Header(key, value)):
In this example, a request is sent to https://{myhost}.hydrolix.live/ingest/event with headers x-hdx-table: my_project.my_table and x-hdx-transform: my_transform with the following cluster configuration:
spec:
pools:
custom-ingest-pool:
routing:
headers:
x-hdx-table: my_project.my_table
x-hdx-transform: my_transform
name: custom-ingest-pool
service: intake-head
would be handled by the custom-ingest-pool ingest pool.
Overlapping rules
Multiple rules can match an incoming request's header and query parameter configuration.
In this case, Traefik determines which ingest pool will handle the request using a rules length priority calculation.
For example, given the following configuration:
spec:
pools:
long-rule-pool:
routing:
query_params:
table: my_project.my_table
transform: my_transform
name: long-rule-pool
service: intake-head
short-rule-pool:
routing:
query_params:
table: my_project.my_table
name: short-rule-pool
service: intake-head
This generates the Traefik rules:
rule: PathPrefix(/pool/long-rule-pool) || PathPrefix(/ingest) && Query(table, my_project.my_table) && Query(transform, my_transform)
rule: PathPrefix(/pool/short-rule-pool) || PathPrefix(/ingest) && Query(table, my_project.my_table)
A request coming sent to https://{myhost}.hydrolix.live/ingest/event?table=my_table&transform=my_transform matches both rules. The longer rule has priority, so the long-rule-pool processes the incoming request.
Updated 8 days ago