A new InnoDB variable to control whether InnoDB FTS should ignore stopword list

Registered by Yura Sorokin

https://github.com/percona/percona-server/pull/1988

Description:
Ngram indexes also check the stopwords list, to see if any indexed element *contain* one of the words on that list. This looks good and it is the normal behaviour, but I don't think that the default table is suitable to use it with ngram.

For example, any item that contains 'a' or 'i' will be ignored. So for example, if you have word "east", you cannot search for "ea" because it has been ignored.

Ngram should have a different default list of stopwords, or an empty list.

Suggestion:
Introduce a new 'innodb_ft_ignore_stopwords' InnoDB session/global variable which can be set to 'ON' to instruct InnoDB Full Text Search to ignore stopwords.

Notes:
Please also notice that although this variable is introduced to resolve ngram issues, it affects non-ngram FTS as well. It has absolutely the same meaning - if it's enabled, FTS won't be checking if current token is a stopword when building/updating an FTS index. However, being a stopword doesn't just mean to be a one of the predefined words from the list. Tokens shorter than 'innodb_ft_min_token_size' or 'longer than innodb_ft_max_token_size' are also considered stopwords. Therefore, when 'innodb_ft_ignore_stopwords' is set to 'ON' even for non-ngram FTS, 'innodb_ft_min_token_size' / 'innodb_ft_max_token_size' will be ignored meaning that in this case very short and very long words will also be indexed.

See also:
https://bugs.launchpad.net/percona-server/+bug/1679135
https://bugs.mysql.com/bug.php?id=84420

Blueprint information

Status:
Complete
Approver:
Laurynas Biveinis
Priority:
High
Drafter:
Yura Sorokin
Direction:
Approved
Assignee:
Yura Sorokin
Definition:
Approved
Series goal:
Accepted for 5.7
Implementation:
Implemented
Milestone target:
milestone icon 5.7.20-18
Started by
Yura Sorokin
Completed by
Yura Sorokin

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.