Prototype use of traces for QEMU speed improvements

Registered by Peter Maydell

This blueprint is for prototyping a more significant change to QEMU's internals which might produce a better perf improvement or lay the foundation for future improvements by giving scope for more advanced optimisations to work. The issue we're trying to address is that TCG basic blocks are typically very short, because they end at any branch. This means that there's not much potential for optimisations to actually kick in. So we want to prototype some sort of 'trace' setup which allows the codegen and optimisation to work on a larger chunk of code. For a month's worth of work we'd hope to come out with a prototype suitable for posting upstream as an 'RFC' patchset (for example, we might make any required frontend changes only to the ARM frontend, and backend changes only to the x86 backend). Actually creating a completely mergeable patchset would be a separate blueprint and probably another month.

Some other people in QEMU upstream are already looking at generic TCG speed improvements (eg Aurelien, Kirill), so we need to make sure we cooperate here.

Blueprint information

Status:
Not started
Approver:
Michael Hope
Priority:
Not
Drafter:
Peter Maydell
Direction:
Needs approval
Assignee:
None
Definition:
Approved
Series goal:
None
Implementation:
Deferred
Milestone target:
milestone icon backlog

Related branches

Sprints

Whiteboard

(?)

Work Items

Work items:
Become familiar with QEMU's current codegen approach: TODO
Sketch out a design for adding traces: TODO
Propose upstream, collect feedback: TODO
Implement prototype 1: TODO
Implement prototype 2: TODO
Implement prototype 3: TODO
Implement prototype 4: TODO
Benchmark and instrument to see how effective it is: TODO
Tweaks based on benchmarking results: TODO
Submit RFC patchseries upstream: TODO

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.