A hash is a small, fixed value that is calculated on a larger piece of data. Hashes are typically consistent on the same data between calls. They can be used cryptographically, but can also be used to map values for fast processing or distribution. For example, hashes are used in "HashMaps" to help quickly find a value.

Hashes are also called "hash value", "hash code", and "digest".

"Collisions" are when two different pieces of data "hash" to the same hash value. This is not a problem for all applications of hashes.

Cryptographically secure hashes are meant to be irreversible, meaning it is extremely hard, or impossible, to recreate the original value with the hashed value. They generally create a unique hash for each unique input.

cityHash64

Returns a 64 bit CityHash of a value. This hash is NOT cryptographically secure.

Parameters:

  • the_value - the value to hash. Takes all data types.

Usage:

  • cityHash64(the_value) - Takes a comma delimited list of n values.

Returns:

  • UInt64 representing the 64 bit CityHash of the value.

farmHash64

FarmHash is a successor to cityHash. It returns a 64 bit FarmHash of a String. This hash is NOT cryptographically secure.

Parameters:

  • the_value - the value to hash.

Usage:

  • farmHash64(the_value) - Takes a comma delimited list of n values.

Returns:

  • UInt64 representing the 64 bit FarmHash of the value.

halfMD5

Returns a UInt64 that represents the first 8 bytes of the MD5 of a String reinterpretted as UInt64. To get the String value, hex can be used.

Note: This function is not cryptographically secure.

Parameters:

  • the_string - the value to hash. Only takes String values

Usage:

  • halfMD5(the_string) - to return the UInt64 value
  • lower(hex(halfMD5(the_string))) - the String value of halfMD5 calculation

Returns:

  • UInt64 representing the 64 bit halfMD5 hash of the value.

hiveHash

HiveHash is a hashing algorithm that was part of early Apache Hive releases, similar to javaHash. It should not be used except as needed when interacting with systems that use this method. In the example query, you can see that javaHash and hiveHash return the same values except on the DateTime data type.

Note: This function is not cryptographically secure.

Parameters:

  • the_value - the value to hash. Takes any data type. Only accepts one value.

Usage:

  • hiveHash(the_value)

Returns:

  • Int32 hash of the_value. If the_value is an integer, that integer IS the hash value returned.

intHash 32|64

intHash32 and intHash64 return a hash of an integer value. both are relatively fast, and generate hashes of average quality. intHash64 is a bit faster than intHash32.

Note: These are not cryptographically secure hashes.

Parameters:

  • the_integer - the value to hash. Data types other than integers will cause an error. Only accepts one value.

Usage:

  • intHash32(the_integer)
  • intHash64(the_integer)

Returns:

  • UInt32 or UInt64 hash of the_integer, depending on which specific method was called.
  • Values other than integers throw an error.

javaHash

javaHash calculates the hash of a value using the Java String.hashCode() default implementation.

Note: This function is not cryptographically secure.

Parameters:

  • the_value - the value to hash. Takes any data type. Only accepts one value.

Usage:

  • javaHash(the_value)

Returns:

  • Int32 hash of the_value. If the_value is an integer, that integer IS the hash value returned.

javaHashUTF16LE

javaHashUTF16LE is the same as javaHash, but assumes the value is UTF-16 encoded.

Note: This function is not cryptographically secure.

Parameters:

  • the_value - the value to hash. Takes any data type. Only accepts one value.

Usage:

  • javaHashUTH16LE(the_value)

Returns:

  • Int32 hash of the_value. If the_value is an integer, that integer IS the hash value returned.

jumpConsistentHash

jumpConsistentHash is a hashing function typically used for "sharding" data across several partitions in a distributed system.

Parameters:

  • the_key - the integer to hash. Accepts integers only.
  • num_buckets - the number of buckets, or partitions, the data needs to be divided into. Accepts integers only.

Usage:

  • jumpConsistentHash(the_key, num_buckets)

Returns:

  • Int32 representing jumpConsistentHash result.

MD5

MD5 returns a FixedString that represents the MD5 hash of the value.

To use this hash as a String for comparison, it needs to be converted to a String using lower + hex.

Note: This function is not cryptographically secure.

Parameters:

  • the_string - the value to hash. Only takes String values. Only accepts one value.

Usage:

  • MD5(the_string) - the FixedString value.
  • lower(hex(MD5(the_string))) - the String value of MD5 calculation that can be used for comparison purposes.

Returns:

  • FixedString representing the MD5 hash

metroHash64

MetroHash is a more performant successor to farmHash. It returns a 64 bit MetroHash of a String. This hash is NOT cryptographically secure.

Parameters:

  • the_value - the value to hash.

Usage:

  • metroHash64(the_value)

Returns:

  • UInt64 representing the 64 bit MetroHash of the value.

murmurHash2 32|64, murmurHash3 32|64|128

Murmur Hashes are hash functions intended to help with fast lookups. Murmur 2 is a predecessor to Murmur 3. Murmur 2 is provided for compatability with systems using this hash method, however, Murmur 3 is recommended for new applications.

Note: These functions are not cryptographically secure.

Parameters:

  • the_value - the value to hash. Accepts a comma delimited list of n arguments.

Usage:

  • murmurHash2_32(the_value)
  • murmurHash2_64(the_value)
  • murmurHash3_32(the_value)
  • murmurHash3_64(the_value)

Returns:

  • UInt of the appropriate length (32 or 64) based on the method called.

murmurHash3_128

Murmur Hashes are hash functions intended to help with fast lookups. murmurHash3_128 returns a FixedString representing the 128-bit murmur3 hash of the value.

Note: This function is not cryptographically secure.

Parameters:

  • the_string - the String to hash. Only accepts one string

Usage:

  • murmurHash3_128(the_value)

Returns:

  • FixedString representing the 128-bit murmur3 hash of the_value

SHA1

Secure Hash Algorithms, or SHAs are hash functions intended for cryptography.

SHA1 is similar to MD5 and has not been considered cryptographically secure since 2010. It is only provided for backward compatibility with older applications. However, SHA1 is frequently used to validate whether or not data has been corrupted.

Parameters:

  • the_string - the value to hash. Only takes String values. Only accepts one value.

Usage:

  • SHA1(the_string) - the FixedString[20] value.
  • lower(hex(SHA1(the_string))) - the String value of SHA1 calculation that can be used for comparison purposes.

Returns:

  • FixedString[20] representing the SHA1 hash

SHA 224|256

Secure Hash Algorithm 2, or SHA-2 is a set of cryptographic algorithms that are still deemed reasonably secure, though they can be broken with a significatn amount of computation. They were built as an improvement to security problems in SHA-1.

SHA-256 is a computed with a 32-bit word.
SHA-224 is a truncated version of SHA-256

These both tend to be compute intensive, and therefore slow, to run.

Parameters:

  • the_string - the value to hash. Only takes String values. Only accepts one value.

Usage:

  • SHA256(the_string) - the FixedString[32] value.
  • lower(hex(SHA256(the_string))) - the String value of SHA256 calculation that can be used for comparison purposes.
  • SHA224(the_string) - the FixedString[28] value.
  • lower(hex(SHA224(the_string))) - the String value of SHA224 calculation that can be used for comparison purposes.

Returns:

  • FixedString[32] for SHA-256
  • FixedString[28] for SHA-224

sipHash64

SipHash is used as a message authentication protocol with comparable performance to cityHash. sipHash64 returns a UInt64. sipHash64 is 3x faster than MD5. Arguments are each hashed individually, then iteratively hashed together. So value1 and value2 and hashed together to make hash1, then value3 is hashed hash1 to get a new hash. This is repeated until all of the hashs have been iterated over.

Parameters:

  • the_value - the value to hash. Takes any data type. Accepts a comma delimited list of n values.

Usage:

  • sipHash64(the_value)
  • sipHash64(the_value1, the_value2...)

Returns:

  • UInt64 representing the hash code.

sipHash128

SipHash is used as a message authentication protocol with comparable performance to cityHash. sipHash128 returns a 128 bit hash code as a FixedString.

Parameters:

  • the_string - the value to hash. Only takes String values. Only accepts one value.

Usage:

  • sipHash128(the_string)

Returns:

  • FixedString[16]

URLHash

URLHash is a function specific to Yandex.Metrica. It returns a hash of a url. It optionally can be told only to parse up to a certain level in the URL Hierarchy. It is not intended for cryptographic purposes.

Parameters:

  • the_url - the URL, represented as a String to hash
  • OPTIONAL:url_level - the level of the URL hierarchy to hash up to.

Usage:

  • URLHash(the_url)
  • URLHash(the_url, url_level)

Returns:

  • FixedString[16]

xxHash 32|64

xxHash is a very fast non-cryptographic hash function. Returns a UInt.

Parameters:

  • the_string - the value to hash. Only takes String values. Only accepts one value.

Usage:

  • xxHash32(the_string)
  • xxHash64(the_string)

Returns:

  • UInt32 or UInt64 depending on which method is called

Did this page help you?