Comment 12 for bug 1921880

Revision history for this message
Markus Schade (lp-markusschade) wrote :

Testing is done on EPYC 7763 and EPYC 7713 systems

Before: linux 5.4.0-70-generic, libvirt 6.0.0-0ubuntu8.8, qemu 1:4.2-3ubuntu6.14:

# virsh domcapabilities | grep EPYC
      <model fallback='forbid'>EPYC-Rome</model>
      <model usable='yes'>EPYC-Rome</model>
      <model usable='yes'>EPYC-IBPB</model>
      <model usable='yes'>EPYC</model>

After: libvirt 6.0.0-0ubuntu8.9~focalppa1, qemu 1:4.2-3ubuntu6.16~focalppa1

# virsh domcapabilities | grep EPYC
      <model fallback='forbid'>EPYC-Milan</model>
      <model usable='yes'>EPYC-Rome</model>
      <model usable='no'>EPYC-Milan</model>
      <model usable='yes'>EPYC-IBPB</model>
      <model usable='yes'>EPYC</model>

which is expected given that we are missing:

      <feature policy='disable' name='invpcid'/>
      <feature policy='disable' name='pku'/>
      <feature policy='disable' name='fsrm'/>
      <feature policy='disable' name='svme-addr-chk'/>

# qemu-system-x86_64 -cpu ? | grep EPYC
x86 EPYC (alias configured by machine type)
x86 EPYC-IBPB (alias of EPYC-v2)
x86 EPYC-Milan (alias configured by machine type)
x86 EPYC-Milan-v1 AMD EPYC-Milan Processor
x86 EPYC-Rome (alias configured by machine type)
x86 EPYC-Rome-v1 AMD EPYC-Rome Processor
x86 EPYC-Rome-v2 AMD EPYC-Rome Processor
x86 EPYC-v1 AMD EPYC Processor
x86 EPYC-v2 AMD EPYC Processor (with IBPB)
x86 EPYC-v3 AMD EPYC Processor

Running a focal instance (5.4.0-70-generic) on the "before" system with the EPYC-Rome type on a Milan CPU results in the following error. This is due to the missing IBRS flag, which is one of the reasons, I'd like to see this backported ;-)

unchecked MSR access error: WRMSR to 0x48 (tried to write 0x0000000000000006) at rIP: 0xffffffff89a73594 (native_write_msr+0x4/0x30)
Call Trace:
 ? __switch_to_xtra+0x1ae/0x5e0
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 __switch_to+0x3b0/0x470
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 __schedule+0x2e3/0x740
 preempt_schedule_common+0x18/0x30
 _cond_resched+0x22/0x30
 stop_one_cpu+0x69/0xa0
 ? sched_ttwu_pending+0xe0/0xe0
 sched_exec+0x92/0xc0
 __do_execve_file.isra.0+0x1fc/0x840
 ? strncpy_from_user+0x4c/0x150
 __x64_sys_execve+0x39/0x50
 do_syscall_64+0x57/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f31e09ef2fb
Code: 41 89 01 eb da 66 2e 0f 1f 84 00 00 00 00 00 f7 d8 64 41 89 01 eb d6 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 65 4b 10 00 f7 d8 64 89 01 48
RSP: 002b:00007fff1cfd3b48 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 000055713f000370 RCX: 00007f31e09ef2fb
RDX: 000055713f0d5010 RSI: 000055713f069690 RDI: 000055713f006070
RBP: 00007fff1cfd3d50 R08: 000055713f057cd0 R09: 0000000000000000
R10: 000055713efe9980 R11: 0000000000000246 R12: 0000000000000000
R13: 000055713f0d0f50 R14: 0000000000000000 R15: 000055713f069690

Live migration of such an instance from a "before" system to an "after" system went without problems.

Starting the same instance (now with the ibrs flag added), resolves the the above MSR error and pretty much works just like my own build. So live migrating back and forth showed no problems.

Trying to start an instance with EPYC-Milan on my 7713 system results in the following libvirt error:

error: the CPU is incompatible with host CPU: Host CPU does not provide required features: erms, fsrm

Now the fsrm is expected, but the erms flag is only missing on the host with the 7713, but present on all my 7763 systems. I haven't got a second 7713 to confirm, but I can say it's present on multiple 7763. Not sure if this is due to it being an early sample or because it is not present on all SKUs.

7713:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca

7763:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca

Going around libvirt and passing the EPYC-Milan (+ arch-capabilities=on) type directly to qemu results in a working instance on a 7763:

processor : 0
vendor_id : AuthenticAMD
cpu family : 25
model : 1
model name : AMD EPYC-Milan Processor
stepping : 1
microcode : 0x1000065
cpu MHz : 2445.404
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_save umip rdpid arch_capabilities
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips : 4890.80
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

So pcid, ibrs, ssbd, erms are present, fsrm, invpcid, pku and svme-addr-chk missing.