RPM

Ensure that solvedb lookups scale

Registered by Jeff Johnson

There are several scaling issues with solvedb's:

1) the "best match" policy forces all matches to be examined
    A "first-found" policy to use the first matching solution is one solution.
    A "nearest match" policy for intra-solvedb affinity (i.e. prefer
    answers from same solvedb) needs to be attempted as well.

2) failed searches MUST loop over all solvedb's
    A Bloom filter tied to each solvedb avoids unnecessary lookup's.

3) (generation) only Providename and Filepaths indices are needed/used currently.
    Certain large per-file indices setups SHOULD be avoided. Perhaps just creating
    Packages, and then relying on lazy index creation?

4) tuning for solvedb's is currently defaulted (and DB_CONFIG setup is manual)
    (bdb) Setting mp_mmapsize= 25% of available memory is a win.
    (bdb generate) Another win is using "nofsync". O_DIRECT should be looked at too.
    (bdb generate) Another win is "private"to disable locking overhead

5) lookups in multiple solvedb's might benefit from multi-threading and/or map/reduce.

Blueprint information

Status:
Started
Approver:
Jeff Johnson
Priority:
Medium
Drafter:
None
Direction:
Approved
Assignee:
Jeff Johnson
Definition:
Discussion
Series goal:
Accepted for 5.3
Implementation:
Good progress
Milestone target:
milestone icon 5.3.6
Started by
Jeff Johnson

Related branches

Sprints

Whiteboard

Investigated: Pokylinux solvedb generation for ~5000 pkgs went from 24min to 2min after some tuning fiddle-ups.
Enabling O_DIRECT is slower than not enabling. There's a /proc "swappiness" (iirc) tunable instead.
DIsabling with "nofsync" appears to be the biggest win still.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.