Integrate Sphinx into Drizzle

Registered by CaptTofu

Integrate Sphinx more closely with Drizzle

Blueprint information

Status:
Not started
Approver:
None
Priority:
Low
Drafter:
CaptTofu
Direction:
Needs approval
Assignee:
CaptTofu
Definition:
New
Series goal:
Accepted for trunk
Implementation:
Not started
Milestone target:
milestone icon future

Related branches

Sprints

Whiteboard

Discussion with Andrew July 23rd, and discussed four possible approaches:

1. have a wrapper on db side that manages .conf file, launches indexer, talks to searchd, does
 nothing itself
2. have more tightly integrated wrapper on db side that links to libsphinx directly, manages indexes itself, does not care about indexer/searchd/sphinx.conf that in fact it does not really want to care about
3. wait for an unspecified amount of time until at least alpha dynamic updates are implemented and have a wrapper that uses them, by talking to searchd
4. same as 3, but uses by linking against libsphinx

From further discussion, it seems option 2 is the most feasible and best choice. This option would have these requirements/issues:

* libsphinx used within the drizzle server
* only those needed functionalities that searchd peforms, from libsphinx, included* configuration information stored somewhere - my.cnf?
* data source for sphinx at handler or storage engine level (?)
* searchd currently listens, logs, and proxies. The key is to identify what is needed of those
. Logging can be peformed by drizzle.
* Details of indexes "hidden" from the user
* How do we get a query within drizzle to result in the index being searched? What syntax is used - FT functions, or new Sphinx functions?

1st stage/version:

1) hooks create index (or something), and builds an index (slowly) on that
2) hooks SELECT and does searching
3) does not handle INSERT UPDATE DELETE at stage0

The idea being, get something that at least searches the index. perhaps have the index built u
pon 'create index' being issued. The index is built once, and can be searched. Nothing else is
 yet implemented at this stage. Possible to implement 'drop index' deleting the index.

2nd stage/version:
1) Build() (builds index) - where it gets its data source - handler? No need of database client library, lower level.
(shodan) this belongs to 1st stage in fact - Build() should be implemented for CREATE INDEX anyway

Other useful info:

Searchd
* 6500 lines of code..
* first 1400 are logging, globals, helpers, networks buffers etc..
* 800 lines of distributed querying code...
* 50 lines of schema minimization..
* 50 lines of really old network proto fixup..
* 600 lines of parsing the network search request and emitting network search result..
* 800 lines that do all the searching including distributed stuff, multiquery optimizations, and merging search results across several indexes.
* 100 lines of search command network chatter again... ;)
* 200 lines of index rotation, as well as search unrelated (excerpts updates buildkeywords etc)
* The rest 2000 lines are about some async ops, signal handling, configparsing and startup in general, and all that "main loop" stuff.

Of these:
800 + 50 + 800 could be reused if you want distributed support..
and less than 1000 would be needed if you don't.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.