Allow non admin users to use hardware offloaded ovs.
When hardware offloaded ovs was implemented it was intended to be useable by non-admin simply by setting vninc_type=direct.
This was broken by change I0b5f062bcbf023
as noted in https:/
the original neutron bug boiled down to the fact that if you could not simply deploy hardware offloaded ovs and normal sriov on the same host if the neutron mech driver list was
mechanism_drivers = openvswitch,
that is because the ovs mech driver would bind all direct type ports but then fail when we booted the VM in port plugging.
the simple workaround at the time was just to reverse the order of the mech driver.
mechanism_drivers = sriovnicswitch,
https:/
instead, it was chosen to require that '{"capabilities": ["switchdev"]}' is present in the binding profile which requires the user to set that.
that is broken in 3 ways.
first, the binding profile field is defined as providing information from the hypervisor to the network backend not the user to the network backend.
Second that field is admin only and it's unsafe to allow normal users to write to the binding profile.
Third neutron just assume if that is set that the vf is actually in switchdev mode.
that is not true. there is nothing that considers this on the nova side as
https:/
so while nova does have the nic feature in the database, switchdev is not one of the capabilities we record and we never implemented the ability to schedule based on the neutron port capability because we change direction with the creation of the placement service.
instead, nova should simple be extended to discover if a VF's parent PF is in switchdev mode and report that to neutron in the binding:profile.
this will not by itself enable scheduling based on this capability but it will allow non-admins to use hardware offloaded ovs transparently as nova will add the capability if the VF support it automatically.
Blueprint information
- Status:
- Complete
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- sean mooney
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- Obsolete
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
- sean mooney
Related branches
Related bugs
Sprints
Whiteboard
note that since libvirt does not provide this info
sean@cloud:~$ virsh nodedev-dumpxml net_enp34s0f0np
<device>
<name>
<path>
<parent>
<capability type='net'>
<interface>
<address>
<link speed='25000' state='up'/>
<capability type='80203'/>
</capability>
</device>
sean@cloud:~$ virsh nodedev-dumpxml pci_0000_22_00_0
<device>
<name>
<path>
<parent>
<driver>
<name>
</driver>
<capability type='pci'>
<class>
<domain>
<bus>34</bus>
<slot>0</slot>
<function>
<product id='0x101d'>MT2892 Family [ConnectX-6 Dx]</product>
<vendor id='0x15b3'
<capability type='virt_
<iommuGroup number='70'>
<address domain='0x0000' bus='0x22' slot='0x00' function='0x0'/>
</iommuGroup>
<numa node='1'/>
<pci-express>
<link validity='cap' port='0' speed='16' width='8'/>
<link validity='sta' speed='8' width='8'/>
</pci-express>
</capability>
</device>
and its not provided in the genic offload we get form ethtool
sean@cloud:~$ ethtool -k enp34s0f0np0
Features for enp34s0f0np0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-
tx-checksum-ipv6: off [fixed]
tx-checksum-
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-
tcp-segmentatio
tx-tcp-
tx-tcp-
tx-tcp-
tx-tcp6-
generic-
generic-
large-receive-
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-
tx-gre-
tx-gre-
tx-ipxip4-
tx-ipxip6-
tx-udp_
tx-udp_
tx-gso-partial: on
tx-tunnel-
tx-sctp-
tx-esp-
tx-udp-
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: on
tx-vlan-
rx-vlan-
rx-vlan-
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-
rx-udp_
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-
hsr-tag-
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]
we will have to directly detect this form sysfs.
we indirectly already do this in os-vif
def _get_phys_
"""Get the interface name and return its phys_switch_id
:param ifname: The interface name
:return: The phys_switch_id of the given ifname
"""
phys_
if not os.path.
return None
with open(phys_
return fd.readline(
as phys_switch_id will be none if the PF is not isn switchdev mode
def _is_switchdev(
"""Returns True if a netdev has a readable phys_switch_id"""
try:
if phys_switch_id != "" and phys_switch_id is not None:
return True
except (OSError, IOError):
return False
return False
but we can check it directly in sysfs too.
actually looking at this hw-tc-offload: on is actually what we are looking for. i think ill need to verify so we might be able to just use that and avoid the sysfs lookup.
sigh so there might also be a libvirt bug at least on my current host
https:/
as part of
https:/
we also enhanced libvirt to detect the swtihc dev capability but
libvirt 8.0.0 on ubuntu 22.04 is not reporting any nic feature
for connex6-dx cards when they are in switch deve mode.
its works fine for intel cards in legacy mode
sean@cloud:~$ virsh nodedev-dumpxml net_enp8s0f0_
<device>
<name>
<path>
<parent>
<capability type='net'>
<interface>
<address>
<link speed='1000' state='up'/>
<feature name='rx'/>
<feature name='tx'/>
<feature name='sg'/>
<feature name='tso'/>
<feature name='gso'/>
<feature name='gro'/>
<feature name='rxvlan'/>
<feature name='txvlan'/>
<feature name='rxhash'/>
<feature name='txudptnl'/>
<capability type='80203'/>
</capability>
</device>