Full Text Search
Full text search improves performance when searching for a specific word, term, or character string in a column.
Example
Consider the following log message:
66.249.65.159 - - [06/Nov/2014:19:10:38 +0600] "GET /news/53f8d72920ba2744fe873ebc.html HTTP/1.1" 404 177 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Based on the above separator rules, full text search finds a first major segment value of 66.249.65.159
.
This major segment contains the following minor segments:
66
249
65
159
Enable
To enable full text search, set the following configuration in your transform on the column you would like to full text search:
- set
index
totrue
- set
index_options
to the following:
{
"name": "message",
"datatype": {
"type":"string",
"index": true,
"index_options": {
"fulltext": true
}
}
}
You can use LIKE
to search the full text index:
SELECT message
FROM project.table
WHERE message LIKE '%error%'
AND timestamp < now()
AND timestamp > (now() - INTERVAL 60 MINUTE)
ORDER BY timestamp DESC
LIMIT 50
SETTINGS hdx_query_debug=true
This example searches for the word error
in the message
column within the last hour.
You can confirm that your query uses the full text index with query debug. Check to make sure your full text indexed column name appears in the indexes_used
field:
X-Hdx-Query-Stats: exec_time=107 rows_read=0 bytes_read=0 num_partitions=58 num_peers=3 query_attempts=1 memory_usage=9491822
index_stats=[{"project.table":{"columns_read":["message","timestamp"],"indexes_used":["message","timestamp"],"shard_key_values_used":[]}}]
Separators
Full text search splits text into major and minor segments. Hydrolix determines these segments using the following separators:
Major separators
[]
<>
()
{}
|
!
;
,
'
"
*
\n
\r
\s
\t
&
?
+
Minor Separators
/
:
=
@
.
-
$
#
%
\
_
Best Practices
When field values contain punctuation symbols, follow these best practices for optimum performance:
- Match punctuation symbols with literals, not wildcards.
- Avoid wildcards in the beginning or middle of a term, which can reduce search performance and lead to unexpected results.
Updated 3 months ago