RPM

Multi-threaded --verify

Registered by Jeff Johnson

OpenMP #pragmas were added to --verify as proof-of-concept
for doing multi-threaded dependency assertion checks.

The higher level parallelization speedups available are from performing
the high level --verify tasks in parallel. The --verify tasks are
   verify the header signature/digest
   verify a single header's dependencies
   verify the file content digests contained in a header

The overall speed-up was ~1.5x (from memory measured with callgrind)
using multiple threads. The results were also checked with helgrind/valgrind
to ensure identical results +/- parallelization, and to attempt to detect
raciness. There are still some races calling rpmtsCheck() but in general
all the data is RO while computing assertion values, so the races don't
matter much. The critically important races were fixed (but there's likely more
to do).

The bottleneck to performance turned out to be searching the dependency
namespace table: the linear search was replaced with a binary search.

The dependency checking per-se isn't really amenable to parallelization:
removing redundant lookups will be a more important speed up
(but that will need a memoization stage).

The other ulterior motivation for looking at --verify parallelization
was to look at the cost of using OpenMP #pragmas rather than writing
custom code using yarnLock's and mutexes. The portability issues
with OpenMP 2.0/3.0 and gcc vs other compilers were also briefly
looked at. OpenMP 2.0 #pragmas don't have enough expressiveness
to handle the getters/setters and data hiding throughout RPM code.
So parts of the --verify code needed to be rewritten as for(...) loops
in order to achieve some parallelization. OpenMP 3.0 has a task
pragma that is easier to apply and use in RPM code (imho).

Support for OpenMP on Mac OS X will be problematic because of GCC <-> LLVM
issues and because libgomp is essentially frozen in at gcc-4.2. The approach
to threading on Mac OS X using GCD may also need a different/3rd approach
to multithreading in RPM.

Blueprint information

Status:
Started
Approver:
Jeff Johnson
Priority:
Medium
Drafter:
Jeff Johnson
Direction:
Approved
Assignee:
Jeff Johnson
Definition:
Review
Series goal:
Accepted for 5.4
Implementation:
Good progress
Milestone target:
None
Started by
Jeff Johnson

Related branches

Sprints

Whiteboard

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.