Add GPU Passthrough support in the XenAPI driver

Registered by John Garbutt

This blueprint has been superseded. See the newer blueprint "support for PCI passthrough and SR-IOV" for updated plans.

Expose the XenAPI ability to give GPUs to guests in nova

Blueprint information

Status:
Complete
Approver:
Vish Ishaya
Priority:
Low
Drafter:
Citrix OpenStack development team
Direction:
Approved
Assignee:
Mate Lakat
Definition:
Superseded
Series goal:
None
Implementation:
Not started
Milestone target:
None
Completed by
Bob Ball

Related branches

Sprints

Whiteboard

This is still a valid approach, but there is little interest in working on this right now - johngarbutt

----
Goals:
- pass a physical gpu to an instance

Known limitations with XenAPI:
- HVM guests only
- Can't hot-plug GPU
- One to one mapping with physical
- No error thrown on overcommit, assigned first come first served at VM boot

Concerns:
- Live migration: is this possible?
- Migration: ensure the scheduler works correctly
- Should we extend to support schedulring differently between different types of GPU?
- How can we extend the scheduler to support tracking details like # of GPUs?

End user view:
- extra specs + flavour to expose GPU cababilities

Admin view:
- flavour extra specs
- host aggregates - say which hypervisors can do GPU
- streach goal: specify an image that requires a GPU

Doc impact for compute admin guide:
- limitations
- hardware/software requirements
- schedular options to make feature work
- flavor creation

old notes
--------------

Concepts:
  - gpu-group
  - vgpu

GPU passthrough is only available on HVM

Each host can have multiple gpu-groups

GPU passthrough xe commands:
xe vgpu-create vm-uuid=<vm-uuid> gpu-group-uuid=<gpu-group-uuid>

find out scheduler specific things
http://docs.openstack.org/developer/nova/devref/filter_scheduler.html#id15

How could we specify that a given instance needs GPU?
- It must be billable -> it should be a flavor extra data ("OS-FLV-EXT-DATA:gpu": 1 tells how many gpus are required)
- Should we put it on the image as well?

Compute should periodically report the availible gpus.

_publish_service_capabilities @ manager

Modify nova/virt/xenapi/host.py HostState.update_state

Aggregate metadata should be used to indicate the existence of High performance gpus.

Use the name of the GPU as a capability, or vendor:device string?
http://www.pcidatabase.com/ ?

Add this to xapi

Number of available gpus =
count(xe pgpu-list) - count(xe vgpu-list)

(?)

Work Items

Work items:
compute capabilites: report GPU type and usage (similar to memory) : TODO
scheduler: decide when GPU(s) are in use (like memory, without overcommit): TODO
xenapi driver: when a GPU flavor is selected, and there are resources, pass GPU to VM: TODO
? API extention for service list to show hypervisors GPU details, see CPU/RAM in current: TODO

This blueprint contains Public information 
Everyone can see this information.