Skip to content

Calculated Columns

Overview⚓︎

Hydrolix separates the write schema (storage) from the read schema (views). Calculated columns exist only in the read schema and are calculated at query time from a SQL expression. They're defined in the table schema, and can be referenced as output once the schema is applied.

Calculated columns can make queries simpler by combining logic and related fields into a single reusable expression.

Calculated columns aren't the same as the aliases attribute for stored columns.

Calculated columns define new columns in views that aren't stored. The aliases attribute refers to column renaming functions.

Define calculated columns⚓︎

Calculated columns aren't created in transforms, but you can reference them in the output_columns section once they're defined in the table schema.

In this example, the day of the week expression converts the date into a more readable output.

  1. Add a new ALIAS column to the columns list:
    • Add the name to reference in queries.
    • Add the type: ALIAS.
    • Add the SQL "expression".
  2. Save the schema.

    Day of the Week
    1
    2
    3
    4
    5
    {
      "name": "day_of_week",
      "type": "ALIAS",
      "expression": "toDayOfWeek(timestamp)"
    }
    

Use calculated columns⚓︎

When querying a table, calculated columns behave like a shortcut for an expression.

For example, a calculated column may be an expression that calculates a value added tax (VAT) rate, and can be reused anywhere that calculation is needed.

Column schema (column object)⚓︎

An example of the differences between a real column and a calculated column.

## Real column
{
  "id": "1234",
  "name": "userAgent",
  "type": "string",
  "aliases": ["userAgent", "ua"],
  "index": true
},

## Calculated column
{
  "name": "ua_normalized",
  "type": "string",
  "expression": "coalesce(userAgent, UA)",
  "index": false
}

Schema examples⚓︎

Calculated columns use:

  • name - the column name used for identification
  • expression - required for calculated columns
  • type - inferred automatically

Chain calculated columns⚓︎

In this example, one calculated column references another, but isn't a cyclical chain.

{
  "name": "alias_int_1",
  "type": "int32",
  "expression": "int + 1"
},
{
  "name": "alias_int_2",
  "type": "int64",
  "expression": "alias_int_1 * 2"
}

In transform output_columns⚓︎

The output_columns section can include a defined calculated column like ua_normalized. This ensures consistent logic without rewriting expressions in every transform or query.

"output_columns": [
  {
    "name": "userAgent",
    "id": "userAgent_uuid",
    "datatype": { "type": "string" }
  },
  {
    "name": "ua_normalized",
    "expression": "coalesce(userAgent, UA)"
  }
]

Timestamp parsing⚓︎

1
2
3
4
5
{
  "name": "event_hour",
  "type": "ALIAS",
  "expression": "toStartOfHour(event_time)"
}

This calculated column groups data by hour without storing a separate value.

Calculate with expressions⚓︎

In this example, the total calculated column expands to price + (price * tax_rate) when the query runs:

1
2
3
4
5
{
  "name": "total",
  "type": "uint32",
  "expression": "price + (price * tax_rate)"
}

The SQL query is now much shorter with total:

1
2
3
4
5
SELECT
  price,
  tax_rate,
  total
FROM products;

The total calculated column isn't stored, but is calculated each time the query runs.

Rules and limitations⚓︎

Query behavior⚓︎

  • The query must specifically reference the calculated column by name. For example, SELECT calculated_column FROM works, but SELECT * doesn't.
  • Calculated columns can be used with WHERE, JOIN, and SELECT.
  • Calculated columns can't be indexed. To optimize performance, index the underlying columns instead.
  • Calculated columns can't be used as inputs for summary tables. As an alternative, use standard SQL aggregation aliases in your SELECT statement when defining a summary table.
  • The calculated column type must be valid. For example, the expression must resolve to a supported data type, and can't contain tuples.
  • Calculated columns can be deleted if they have no dependencies.

Insertion and storage⚓︎

  • Calculated columns aren't stored and don't exist on disk. The values are calculated when they're run in a query.
  • You can't insert data directly into a calculated column.
  • Calculated columns aren't stored in partitions, or returned in .medatata or .catalog.

Naming rules⚓︎

A calculated column can reference constants, regular columns, or other calculated columns. However, calculated columns can't reference each other in a circular way. For example, if calculated_col_1 uses calculated_col_2, then calculated_col_2 can't use calculated_col_1.

If calculated columns are set incorrectly, you'll receive an error indicating Alias loop detected or Cyclic alias dependency.

See an example of a Chain calculated columns.