String functions are grouped into the following catagories:

The majority of Search Functions take one or more regular expressions as a pattern. The regex syntax is re2. You can test expressions online here.

General Functions

startsWith

Checks if the string starts with the specified prefix

Parameters:

  • string: the target String
  • prefix: the prefix String

Usage:

  • startsWith(string, prefix)

Returns:

  • UInt8 0 | 1

endsWith

Checks if the string ends with the specified suffix.

Parameters:

  • string: the target String
  • suffix: the suffix String

Usage:

  • endsWith(string,suffix)

Returns:

  • UInt8 0 | 1

substring

Extracts substring starting with the byte from the 'offset' index that is 'length' bytes long. Character indexing starts from one (as in standard SQL).

The 'offset' and 'length' arguments must be constants.

Parameters:

  • string: the target String
  • offset: starting byte
  • length: number of bytes long

Usage:

  • substring(string, offset, length)

Returns:

  • String - extracted substring

upper

Converts ASCII Latin symbols in a string to uppercase. For UTF-8 encoded strings use upperUTF8(string).

Parameters:

  • string: the target String

Usage:

  • upper(string)
  • upperUTF8(string)

Returns:

  • String

lower

Converts ASCII Latin symbols in a string to lowercase. For UTF-8 encoded strings use lowerUTF8.

Parameters:

  • string: the target String

Usage:

  • lower(string)
  • lowerUTF8(string)

Returns:

  • String

length

Calculates the length of a string in bytes. For UTF-8 encoded strings use lengthUTF8(string). Returns the length in Unicode code points.

Parameters:

  • string: the target String

Usage:

  • length(string)
  • lengthUTF8(string)

Returns:

  • UInt32 - length in bytes

format

Applies formatting constant pattern with the string listed in the arguments. The pattern is a simplified Python format pattern.

Format pattern contains "replacement fields" surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output.

Brace character in the literal text, need to be escaped by doubling: {{ and }}. Field names can be numbers (starting from zero) or empty (then they are treated as consequence numbers).

Parameters:

  • pattern: the formating pattern to apply
  • string: 1 or more string arguments

Usage:

  • format(pattern, string0, string1...)

Returns:

  • String

empty, notEmpty

Checks if a string is empty or inversely not-empty. A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.

Parameters:

  • string: the target String

Usage:

  • empty(string)
  • notEmpty(string)

Returns:

  • UInt8 0 false| 1 true

trimRight, trimLeft, trimBoth

Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning, end or both of a string. It doesn't remove other kinds of whitespace characters (tab, no-break space, etc.).

Parameters:

  • string: the target String

Usage:

  • trimRight(string)
  • trimLeft(string)
  • trimBoth(string)

Returns:

  • String with whitespace removed

trim

Removes all specified characters from a string depending on mode. Defaults to trimBoth(string) usage if mode not specified.

Parameters:

  • mode (Optional): LEADING|TRAILING|BOTH
  • characters: a string of characters
  • string: the target String

Usage:

  • trim('% ' FROM string)
  • trim(LEADING '% ' FROM string)
  • trim(TRAILING '% ' FROM string)
  • trim(BOTH '% ' FROM string)

Returns:

  • String with characters removed

repeat

Repeats a string as many times as specified and concatenates the replicated values as a single string. If n < 1 an empty string is returned.

Parameters:

  • string: the target String
  • n: number of times to repeat

Usage:

  • repeat(string, n)

Returns:

  • String

concat

Concatenates the strings listed in the arguments, without a separator.

Parameters:

  • string: listed String

Usage:

  • concat(string0, string1, ...)

Returns:

  • String

reverse

Reverses the string (as a sequence of bytes). For UTF-8 encoded strings use reverseUTF8(string).

Parameters:

  • string: the target String

Usage:

  • reverse(string)
  • reverseUTF8(string)

Returns:

  • String

appendTrailingCharIfAbsent

Appends specified character to the end of a string - except when the string is empty or specified character is already at the end.

Parameters:

  • string: the target String
  • character: a single char

Usage:

  • appendTrailingCharIfAbsent(string,character)

Returns:

  • String

base64Encode

Encodes string into base64

Parameters:

  • string: the target String

Usage:

  • base64Encode(string)

Returns:

  • String

tryBase64Decode, base64Decode

Decode base64-encoded string into original string. tryBase64Decode returns an empty string on failure while base64Decode raises an exception .

Parameters:

  • string: the target String

Usage:

  • tryBase64Decode(string)
  • base64Decode(string)

Returns:

  • String

CRC32, CRC64

Computes the checksum of a string.

CRC32 using IEEE 802.3 polynomial and CRC64 using CRC-64-ECMA polynomial.

Initial value 0xffffffff (zlib implementation).

Parameters:

  • string: the target String

Usage:

  • CRC32(string)
  • CRC64(string)

Returns:

  • UInt32 CRC32
  • UInt64 CRC64

Search Functions

position

Searches for a pattern within string. Positions are indexed starting from 1. For case-insensitive position or/and in UTF-8 format use functions
positionCaseInsensitiveUTF8, positionUTF8.

Parameters:

  • string: the target String
  • regex: regex string pattern

Usage:

  • position(string, regex)

Returns:

  • UInt16 - in bytes if found else 0

multiSearchAllPositions

Searches for a list of patterns within a string. Positions are indexed starting from 1.

The search is performed on sequences of bytes without respect to string encoding and collation. For case-insensitive search or/and in UTF-8 format use functions multiSearchAllPositionsCaseInsensitive, multiSearchAllPositionsUTF8, multiSearchAllPositionsCaseInsensitiveUTF8.

Parameters:

  • string: the target String
  • regex: a list of regex string patterns

Usage:

  • multiSearchAllPositions(string, [regix1, regex2, ...])

Returns:

  • Array(Integer) - in bytes if found else 0

multiSearchAny

Searches a list of regex patterns matching the string.

Parameters:

  • string: the target String
  • regex: a list of regex string patterns

Usage:

  • multiSearchAny(string, [regex1, regex2, ..])

Returns:

  • UInt8 0 | 1

multiSearchFirstIndex

Searches a list of regex patterns stopping at the first (left most) patterns that matches.

For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8

Parameters:

  • string: the target String
  • regex: a list of regex string patterns

Usage:

  • multiSearchFirstIndex(string, [regex1, regex2, ...])

Returns:

  • UInt16 - the index (starting from 1) of found pattern else 0

match

Search a string for matching pattern.

Parameters:

  • string: the target String
  • regex: a regex String pattern

Usage:

  • match(string, regex')

Returns:

  • UInt8 0 | 1

extract

Extracts a fragment of a string using a regular expression.

If the regex doesn't contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.

Parameters:

  • string: the target String
  • regex: a regex string pattern

Usage:

  • extract(string, regex)

Returns:

  • String - empty string if no match

extractAll

Extracts all the fragments of a string using a regular expression.

Parameters:

  • string: the target String
  • regex: a regex string pattern

Usage:

  • extractAll(string, regex)

Returns:

  • String Array - empty string if no match

like

Checks if a string matches a simple regular expression.

The regular expression can contain the metasymbols % and _. Use the backslash (\) for escaping metasymbols.

  • % indicates any quantity of any bytes (including zero characters).

  • _ indicates any one byte.

Parameters:

  • string: the target String
  • regex: a simple regex string pattern

Usage:

  • like(string, regex)

Returns:

  • UInt8 0 | 1

notLike

Checks if a string does not matches a simple regular expression. Opposite of the like function.

Parameters:

  • string: the target String
  • regex: a regex string pattern

Usage:

  • notLike(string, regex)

Returns:

  • UInt8 0 | 1

ngramDistance

Calculates the 4-gram distance between string and pattern: counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities.

For case-insensitive search or/and in UTF-8 format use functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8.

Parameters:

  • string: the target String
  • regex: a regex string pattern

Usage:

  • ngramDistance(string, regex)

Returns:

  • Float32 between 0 and 1. The closer to zero, the more strings are similar to each other

ngramSearch

Same as ngramDistance but calculates the non-symmetric difference between pattern and string -- the number of n-grams from pattern minus the common number of n-grams normalized by the number of pattern n-grams. The closer to one, the more likely pattern is in the string. Can be useful for fuzzy string search.

For case-insensitive search or/and in UTF-8 format use functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8.

Parameters:

  • string: the target String
  • regex: a regex string pattern

Usage:

  • ngramSearch(sring, regex)

Returns:

  • Float32 between 0 and 1. The closer to zero, the more strings are similar to each other

Split Merge Functions

splitByChar

Splits a string into substrings separated by character.

Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.

Parameters:

  • seperator: character to seperate on
  • string: the target String

Usage:

  • splitByChar(seperator, string)

Returns:

  • String array

splitByString

Splits a string into substrings separated by multiple characters as the separator. The string must be non-empty.

Parameters:

  • seperator: the string to seperate on
  • string: the target String

Usage:

  • splitByString(sperator, string)

Returns:

  • String array

alphaTokens

Selects substrings of consecutive bytes from the ranges a-z and A-Z.

Parameters:

  • string: the target String

Usage:

  • alphaTokens(string)

Returns:

  • String array

arrayStringConcat

Concatenates the strings listed in the array with the separator. The separator is optional (default empty string).

Parameters:

  • string_array: an array of strings
  • seperator (Optional): seperating character

Usage:

  • arrayStringConcat(string_array)
  • arrayStringConcat(string_array, seperator)

Returns:

  • String

Did this page help you?