Dynamic Ingest Routing

Overview

Dynamic ingest routing directs a cluster to forward requests arriving at the default ingestion URL (https://{myhost}.hydrolix.live/ingest/event) in a cluster to different intake processing pools based on the HTTP headers or query parameters of the incoming request. The Hydrolix cluster ingest system respects these headers and query parameters on requests arriving to the default ingest endpoint and will route the traffic to the appropriate intake-head pool using rules of precedence and rules length priority if necessary. If no headers or query parameters match an existing rule, the default intake-head pool will process the incoming request.

This feature is useful in clusters with multiple resource pools for the HTTP Stream API service.

Dynamic routing with headers

Default headers

You can configure dynamic routing using the following headers:

  • x-hdx-project
  • x-hdx-table
  • x-hdx-transform

This allows clients to specify the destination project, table, transform, and select an intake-head pool to ingest their data without updating the endpoint to which the client is sending data.

To enable this feature, update your Kubernetes config for an intake-head pool with the following routing block:

pools:
  routing_demo:
    routing:
      headers:
        x-hdx-table: my_project.my_table
        x-hdx-transform: my_transform
    name: routing_demo
    service: intake-head

With the above configuration, any traffic which contains both of the headers x-hdx-table: my_project.my_table and x-hdx-transform: my_transform will be routed through the intake-head pool called routing_demo.

For more information on creating and updating service pools, see the Resource Pools page.

Custom headers

By setting the traefik_service_allowed_headers tunable, you can configure custom headers to control intake-head routing in addition to the default headers. Custom header keys should match the entries passed in the pool annotations. Values must match an existing pool name. For example, consider the following Hydrolix spec:

spec:
  pools:
    intake-head-private-pool:
      routing:
        headers:
          x-hdx-intake-pool: intake-head-private-pool
      name: intake-head-private-pool
      service: intake-head
      cpu: "8"
      memory: 8Gi
      replicas: "5"

  traefik_service_allowed_headers: ['x-hdx-intake-pool']

Incoming traffic sent to the default ingest endpoint:

https://{myhost}.hydrolix.live/ingest/event

containing the header key/value pair x-hdx-intake-pool: intake-head-private-pool will be routed through the intake-head-private-pool ingest pool.

Dynamic routing with query parameters

Default query parameters

You can use the table and transform query parameters to route traffic to a particular intake-head pool without updating the endpoint to which the client is sending data. For example, the following Hydrolix configuration spec:

pools:
  intake-head-private-pool:
    routing:
      query_params:
        table: my_project.my_table
        transform: my_transform
    name: intake-head-private-pool
    service: intake-head

allows a user to send data to intake-head-private-pool with a request using the specified query parameters:

POST /ingest/event?table=my_project.my_table&transform=my_transform HTTP/1.1
Host: {myhost}.hydrolix.live
Content-Type: application/json

Custom query parameters

By setting the traefik_service_allowed_query_params tunable, you can configure custom query parameters to control intake-head routing in addition to the defaults. Custom parameter keys should match the entries passed in the pool annotations. Values must match an existing pool name. For example, consider the following Hydrolix spec:

spec:
  pools:
    intake-head-private-pool:
      routing:
        query_params:
          intake-pool: intake-head-private-pool
      name: intake-head-private-pool
      service: intake-head
      cpu: "8"
      memory: 8Gi
      replicas: "5"

  traefik_service_allowed_query_params: ['intake-pool']

Then incoming traffic sent to the default ingest endpoint with the specified query parameter:

https://{myhost}.hydrolix.live/ingest/event?intake_pool=intake-head-private-pool

will be routed through the intake-head-private-pool ingest pool.

Verify Traefik config updates

Dynamic routing configuration in the Hydrolix cluster spec config results in updates to Traefik's configuration. This configuration will be updated with any new dynamic routing paths using headers or query parameters.

If you haven't already, install K9s.

If an existing pool called intake-head-private-pool is updated with the following routing configuration:

spec:
  pools:
    intake-head-private-pool:
      name: intake-head-private-pool
      service: intake-head
spec:
  pools:
    intake-head-private-pool:
      name: intake-head-private-pool
      routing:
        headers:
          x-hdx-header: my_project.my_table
        query_params:
          intake-pool: private
      service: intake-head

You can confirm the Traefik configuration has been updated using the following steps.

  1. Start k9s from a shell: k9s.
  2. Open up the pods selector by entering: :pods.
  3. Select the Traefik pod and shell into the traefik container using the command s.
  4. Run the following command:
watch -n 1 grep -B8 -A2 'PathPrefix\(\`/pool/intake-head-p' /etc/traefik/dynamic_conf.yaml

After a few minutes of latency at most, you will observe the following changes:

http:
  routers: 
    slash-pool/intake-head-private-pool-router:
      rule: PathPrefix(`/pool/intake-head-private-pool`)
      service: intake-head-private-pool
http:
  routers: 
    slash-pool/intake-head-private-pool-router:
      rule: PathPrefix(`/pool/intake-head-private-pool`) || (PathPrefix(`/ingest`) && Header(`x-hdx-table`, `my_project.my_table`)) && Query(`intake-pool`, `private`)
      service: intake-head-private-pool

Precedence

Dynamic routing uses Traefik rules and priority to determine which ingest pool should handle an incoming request. Note that dynamic ingestion rules are only executed on requests that arrive at the default pool (https://{myhost}.hydrolix.live/ingest/event). Incoming requests to any non-default pool (https://{myhost}.hydrolix.live/pool/{pool_name}/ingest/event) are handled exclusively by that pool.

Rules and priority within a Hydrolix cluster therefore respect the following descending order of precedence:

  1. Explicit pool endpoint, PathPrefix(/pool/pool-name): Which endpoint the request arrives at, if the endpoint is a non-default ingest endpoint. For example, requests arriving at https://{myhost}.hydrolix.live/pool/{pool_name}/ingest/event will be handled by the ingest pool called {pool_name} regardless of the headers or query parameters included.
  2. Default ingestion endpoint and query parameters, PathPrefix(/ingest) and Query Parameters (Query(key, value)): For example, a request sent to https://{myhost}.hydrolix.live/ingest/event?table=my_table&transform=my_transform and the header x-hdx-myheader: secondary_poolwith the following cluster configuration:
spec:
  pools:
    custom-ingest-pool:
      routing:
        query_params:
          table: my_table
          transform: my_transform
      name: custom-ingest-pool
      service: intake-head
    secondary-pool:
      routing:
        headers:
          x-hdx-myheader: secondary_pool
      name: secondary-pool
      service: intake-head

would be handled by custom-ingest-pool rather than secondary-pool.

  1. Default ingestion endpoint and query header, PathPrefix(/ingest) and HTTP Headers (Header(key, value)): For example, a request sent to https://{myhost}.hydrolix.live/ingest/event with headers x-hdx-table: my_table and x-hdx-transform: my_transform with the following cluster configuration:
spec:
  pools:
    custom-ingest-pool:
      routing:
        headers:
          x-hdx-table: my_table
          x-hdx-transform: my_transform
      name: custom-ingest-pool
      service: intake-head

would be handled by the custom-ingest-pool ingest pool.

Overlapping rules

Multiple rules can match an incoming request's header and query parameter configuration.

In this case, Traefik determines which ingest pool will handle the request using a rules length priority calculation.

For example, given the following configuration:

spec:
  pools:
    long-rule-pool:
      routing:
        query_params:
          table: my_table
          transform: my_transform
      name: long-rule-pool
      service: intake-head
    short-rule-pool:
      routing:
        query_params:
          table: my_table
      name: short-rule-pool
      service: intake-head

This generates the Traefik rules:

rule: PathPrefix(/pool/long-rule-pool) || PathPrefix(/ingest) && Query(table, my_table) && Query(transform, my_transform)
rule: PathPrefix(/pool/short-rule-pool) || PathPrefix(/ingest) && Query(table, my_table)

A request coming sent to https://{myhost}.hydrolix.live/ingest/event?table=my_table&transform=my_transform matches both rules. The longer rule has priority, so the long-rule-pool processes the incoming request.