background-statistics

Registered by Vladimir Kolesnikov

Currently the key distribution statistics is gathered in foreground which makes it hard to maintain fine-grained statistics. The idea is to move statistics gathering process to a background thread. This will allow to have more relevant stats and re-new them more flexibly as new changes come in.

Having background stats collector will allow to implement a more sophisticated stats gathering algorithm. Some ideas for the algorithm:

1. Keep an index-like disk structure (probably in a separate file, a file per key or per field) that would have key1 -> count1; key2 -> count2; ... layout. This would be interpreted as "there are count2 - count1 records between key1 and key2"

2. Approximate the actual value distribution with a well-known distribution. This way we'd need to store only distribution id and several distribution-specific parameters

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.