Transform Structure

A transform (or write schema) is a JSON document that configures how Hydrolix will index and store incoming data onto a given table. A transform's content includes a general description of the incoming data's format, as well as a description of every column that data will produce and populate in the target table.

The following example transform would let a Hydrolix table ingest a GZIP-compressed, CSV-formatted data stream with two columns:

{
    "name": "my_special_transform",
    "description": "description of my data",
    "type": "csv",
    "settings": {
        "output_columns": [
            {
                "name": "timestamp",
                "datatype": {
                    "type": "datetime",
                    "primary": true,
                    "format": "2006-01-02 15:04:05 MST",
                    "source": { "from_input_index": 0 }
                }
            },
            {
                "name": "the_data",
                "datatype": {
                    "type": "uint64",
                    "source": { "from_input_index": 1 }
                }
            }
        ],
        "compression": "gzip",
        "format_details": {
            "skip_head": 1,
            "delimiter": "\t"
        }
    }
}

Basic properties

Transform documents have the following top-level properties:

PropertyPurpose
nameA name for this transform; must be unique within its table.
descriptionAn optional description of this transform.
typeThe file type of the input data this transform handles. One of csv, or json.
settingsA JSON object describing the data to ingest.

Settings

"settings": {
	"is_default": true,
    "compression": "gzip",
    "sql_transform": "select ..... from {STREAM}",
    "null_values" : ["^","-"],
    "format_details": {
        // Configuration specific to the data's overall 
        // format goes here
    },
    "output_columns": [
        // A list of column-definition objects goes here
    ]
}

The settings object contains the details of the content and structure of the data. This is where the data input format, compression, and resulting columns are defined.

PropertyPurposeRequired
is_defaultIf set to true, then this transform's table will apply this transform to a streaming ingest that doesn't otherwise specify a transform. Default is required for auto-ingestY
output_columnsAn array of JSON objects with a description of every column in the target table that the incoming data will populate. See Column definitionsY
compressionA string describing the compression applied to the data (or "none", the default). See CompressionY
format_detailsA JSON object containing configuration information specific to the data's overall format, just as field-delimiter characters with CSV files. See Format OptionsY
sql_transformA SQL statement runs at ingest allowing users to aggregate, enrich, modify and/or filter the data.
See Real-Time Enrichment and filtering at Ingest and Summary Tables (Aggregation)
N
null_valuesAn array of string which will be replaced by null if found in the value.
For example ["^","-"] will replace ^ OR - by null on any column.
N

Settings: Output Columns

The core of a transform is the Output Columns section under settings. This section is where you define what the data looks like and how it should be loaded into the Hydrolix platform. These definitions are used to specify Data-Types, indexing and other characteristics your data should have when it is being written into the HDX format.

"settings": {
   "output_columns": [
        {
            "name": "timestamp",
            "datatype": {
                "type": "datetime",
                "primary": true,
                "format": "2006-01-02 15:04:05 MST",
                "source": { "from_input_index": 0 }
            }
        },
        {
            "name": "data",
            "datatype": {
                "type": "uint64",
                "source": { "from_input_index": 1 }
            }
        },
        {
            ...
    ],
    
    // Other "settings" properties go here

}

A transform must contain one object in its output_columns list for every column in the target table that it ingests.
This includes the table's primary timestamp column, as well as any column that will copy a new value based directly on incoming data. The transform enables Hydrolix to copy these values into the target table by specifying a positional (CSV) or name-based (JSON) mapping between the incoming data fields and the table's columns.

The transform can also set columns to derive a new value based on a JavaScript expression, or a simple default.

The properties supported by column definition objects include the following:

AttributePurposeRequired
nameA name for the column, for use by subsequent SQL queries.Yes
datatypeAn object containing a detailed column definition.Yes

The datatype property requires an object as its value in order to allow recursive definitions, as described in complex data types. The properties supported by datatype include the following:

👍

Primary

Every table has to have a primary field set. This is a datetime/epoch field that is a primary index used by Hydrolix for partitioning.

AttributePurposeRequired
catch_allPrimarily used with JSON encoded datasets. If true, then Hydrolix will put any data that is not matched in JSON into this field as a string.No
catch_rejectsCatches rejects where a column exists however the data type does not match the incoming data. See the Rejects page for more information.No
defaultA default value to apply when the source data's value for this column is empty.No
denullifyForces a column to have a "non-nullable" entry. For example a String is stored as "" or a UInt8 as 0.No
elementsA description of the constituent elements of array-type data. See arrays.No, except for array and map datatypes.
formatParsing information for datetime and epoch data types. See Timestamp formatting.No, except for datetime and epoch.
ignoreBoolean to specify if the field is ignored or not by indexer. An output column marked as ignore:true will not be processed and will not be available to the SQL transform nor appear in the output. Compare this to suppress and virtual in this table.No
indexEnable indexing on the column. Defaults to true (indexed), except datatype double.No
index_optionsElement which contains advanced indexing options such as fulltext boolean.
More details on fulltext page.
No
limitsReject rows or clamp values based on rules. More information can be found on these here: LimitersNo
null_valuesArray of strings which are replaced by a null.
Example:
["^","-"] will replace ^ OR - by null.
No
primaryIf true, then this marks the target table's single, primary field. It must have either datetime or epoch type.Exactly one per transform.
resolutionSet to "ms" (milliseconds) or "s" (seconds) to define the time-granularity that Hydrolix will apply when storing this column's datetime or epoch-type data. Defaults to "s" (seconds).No.
scriptA JavaScript expression whose output becomes the stored value of this column. See Scripting and Auto Values.No.
sourceUsed with CSV and JSON encoded files, if specified this is used as the source column name. When data is then written to the database the column will be called by the "name" value above. It is possible to specify multiple source columns, the first value found for the column is the one that is stored. This is best used when there is an OR requirement between two columns (e.g. If column A has a value OR column B has a value then store it in this column).No
suppressBoolean to state that the field is not available in the output, but is available to an SQL transform. In other words, If a field is marked "suppress": true, the field will be processed normally but will be suppressed in the final output. Compare this to ignore and virtual in this table.No
typeThe data type of the field.Yes
virtualIf true, then Hydrolix will not map this column to any incoming data. Useful for constant or derived data when paired with the default or script settings, respectively. Compare this to ignore and suppress in this table.No