Full Text Search
Hydrolix support full text search analysis.
Text is split into major and minor segment, we have the following separator per segment:
Major separator:
[ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? +
And minor separator:
/ : = @ . - $ # % \ _
Let's take the following log message:
66.249.65.159 - - [06/Nov/2014:19:10:38 +0600] "GET /news/53f8d72920ba2744fe873ebc.html HTTP/1.1" 404 177 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
The first major segment will be: 66.249.65.159
which will then be separated with the following minor segment:
- 66
- 249
- 65
- 159
To enable full text search the following should be defined in the transform:
{
"name": "message",
"datatype": {
"type":"string",
"index": true,
"index_options": {
"fulltext": true,
"major_separators": "[ ] < > ( ) { } | ! ; , ' \" * \\n \\r \\s \\t & ? +",
"minor_separators": "\/ : = @ . - $ # % \\ _"
}
}
}
In this example column message
is a string where fulltext search is enabled with the default separator.
By default Hydrolix is using the function LIKE to search the fulltext index created:
SELECT message
FROM project.table
WHERE message LIKE '%error%'
AND timestamp < now()
AND timestamp > (now() - INTERVAL 60 MINUTE)
ORDER BY timestamp DESC
LIMIT 50
SETTINGS hdx_query_debug=true
In this example we are looking for the word error
in our column message
for the last 1h.
By leveraging the query debug we can see that we are leveraging the index for that query:
X-Hdx-Query-Stats: exec_time=107 rows_read=0 bytes_read=0 num_partitions=58 num_peers=3 query_attempts=1 memory_usage=9491822
index_stats=[{"project.table":{"columns_read":["message","timestamp"],"indexes_used":["message","timestamp"],"shard_key_values_used":[]}}]
By enabling Full Text Search you'll be able to filter and search for words much faster using standard delimiters.
Updated about 2 months ago