Backport Intel's AVX512 patches on openssl 3.0

Bug #2030784 reported by Simon Chopin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openssl (Ubuntu)
Fix Released
Wishlist
Unassigned

Bug Description

https://github.com/openssl/openssl/pull/14908

https://github.com/openssl/openssl/pull/17239

These should provide a nice performance bonus on recent CPUs, and the patches are fairly self-contained.

Related branches

Revision history for this message
Tobias Heider (tobhe) wrote :

Maybe worth holding back until there is a fix for https://downfall.page/

Simon Chopin (schopin)
tags: added: block-proposed
Simon Chopin (schopin)
Changed in openssl (Ubuntu):
importance: Medium → Wishlist
Simon Chopin (schopin)
Changed in openssl (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Simon Chopin (schopin) wrote :

Since the microcode updates for Downfall have made it to Mantic, I'm removing the block-proposed tag. Please add it back if I missed or misunderstood something :)

tags: removed: block-proposed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssl - 3.0.10-1ubuntu2

---------------
openssl (3.0.10-1ubuntu2) mantic; urgency=medium

  * d/p/intel/*: cherry-pick AVX512 patches for recent Intel CPUs (LP: #2030784)

 -- Simon Chopin <email address hidden> Tue, 08 Aug 2023 17:51:58 +0200

Changed in openssl (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Bun K Tan (bktan1) wrote :

Hi @schopin,

Recommended way to test the relevant code paths would be to use OpenSSL’s Capability Bits Environment variable. Notes below:

Ubuntu - OpenSSL OPENSSL_ia32cap Environment Variable
OpenSSL Environment variable processor feature bit disable combos for testing.
https://www.openssl.org/docs/manmaster/man3/OPENSSL_ia32cap.html

* AES-GCM Relevant Feature Disable

   Disable VAES-NI
   $ export OPENSSL_ia32cap=:~0x20000000000

   Disable VPCLMULQDQ
   $ export OPENSSL_ia32cap=:~0x40000000000

   Disable AES-NI
   $ export OPENSSL_ia32cap=~0x200000000000000

   Disable AESNI + VAESNI
   $ export OPENSSL_ia32cap=~0x200000000000000:~0x20000000000

* RSA 2K/3K/4K Sign Relevant Feature Disable

   Disable AVX512F
   $ export OPENSSL_ia32cap=:~0x10000

   Disable AVX512VL
   $ export OPENSSL_ia32cap=:~0x80000000

   Disable AVX512DQ
   $ export OPENSSL_ia32cap=:~0x20000

   Disable AVX512IFMA
   $ export OPENSSL_ia32cap=:~0x200000

* Unset any previous caps
$ unset OPENSSL_ia32cap

Examples:
   * AES-128-GCM | AES-256-GCM
      - Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer platform. This should be the most performant flow.
        $ taskset -c 0 openssl speed -evp aes-128-gcm

      - Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX AESNI flow and should have equivalent performance
        $ OPENSSL_ia32cap=:~0x20000000000 taskset -c 0 openssl speed -evp aes-128-gcm
        $ OPENSSL_ia32cap=:~0x40000000000 taskset -c 0 openssl speed -evp aes-128-gcm

      - AESNI and VAESNI Disabled should fallback to 'C code' performance
        $ OPENSSL_ia32cap=~0x200000000000000:~0x20000000000 taskset -c 0 openssl speed -evp aes-128-gcm

   * RSA 2K/3K/4K Sign Performance
      - Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on ICX or newer platform. This should be the most performant flow.
        $ taskset -c 0 openssl speed rsa2048 rsa3072 rsa4096

      - Individual AVX512F, AVX512VL, and AVX512IFMA features should yield equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
        $ OPENSSL_ia32cap=:~0x10000 taskset -c 0 openssl speed rsa2048 rsa3072 rsa4096
        $ OPENSSL_ia32cap=:~0x80000000 taskset -c 0 openssl speed rsa2048 rsa3072 rsa4096
        $ OPENSSL_ia32cap=:~0x20000 taskset -c 0 openssl speed rsa2048 rsa3072 rsa4096
        $ OPENSSL_ia32cap=:~0x200000 taskset -c 0 openssl speed rsa2048 rsa3072 rsa4096

Revision history for this message
Adrien Nader (adrien) wrote (last edit ):
Download full text (5.1 KiB)

Thanks a lot for the tests, that's very appreciated.

I ran that on my laptop (11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz) which quite surprisingly has all these CPU features. Mostly idle, dynamic CPU governor but no thermal throttling at all (and if there were, it would probably slow down the AVX-512 code anyway), and tests are long enough for CPU governors to not matter much.

============================================================

* AES-128-GCM | AES-256-GCM
 - Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer platform. This should be the most performant flow.
AES-128-GCM 855360.29k 3158479.88k 6093932.91k 8905067.37k 13336828.91k 13788498.58k

 - Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX AESNI flow and should have equivalent performance
AES-128-GCM 785422.85k 1936140.78k 4404423.77k 6481577.18k 7732716.48k 7873213.39k
AES-128-GCM 790775.41k 1942054.64k 4404868.20k 6484287.87k 7711803.10k 7778795.52k

 - AESNI and VAESNI Disabled should fallback to 'C code' performance
AES-128-GCM 150183.11k 167807.25k 598198.71k 662922.19k 681574.40k 678182.91k

* RSA 2K/3K/4K Sign Performance
 - Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on ICX or newer platform. This should be the most performant flow.
rsa 2048 bits 0.000246s 0.000015s 4057.2 65278.3
rsa 3072 bits 0.000701s 0.000032s 1426.4 31247.7
rsa 4096 bits 0.001434s 0.000055s 697.4 18052.7

 - Individual AVX512F, AVX512VL, and AVX512IFMA features should yield equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
rsa 2048 bits 0.000523s 0.000015s 1910.4 65748.2
rsa 3072 bits 0.001579s 0.000032s 633.3 31158.1
rsa 4096 bits 0.003529s 0.000055s 283.4 18093.6

rsa 2048 bits 0.000524s 0.000015s 1909.0 66310.8
rsa 3072 bits 0.001577s 0.000032s 634.1 31309.7
rsa 4096 bits 0.003568s 0.000055s 280.2 18120.4

rsa 2048 bits 0.000523s 0.000015s 1913.3 65234.3
rsa 3072 bits 0.001583s 0.000032s 631.7 31094.6
rsa 4096 bits 0.003607s 0.000055s 277.3 18076.8

rsa 2048 bits 0.000524s 0.000015s 1907.6 66299.6
rsa 3072 bits 0.001577s 0.000032s 634.1 31214.4
rsa 4096 bits 0.003586s 0.000055s 278.9 18096.1

============================================================

We see the expected behavior (AFAIU, all features must be available at the same time for the changes to have effect).

I'm not comparing everything number by number because I don't think we're looking for specific percentages of improvements.

Overall we see up to ~2.4 performance improvement and we always see large improvements (double digit percentages).

As a control I also ran that on lunar, therefore without the patches (I acknowledge this is not the same openssl version and there are also other changes but I do not think this matters here).

============================================================

# AES-128-GCM | AES-256-GCM
 - Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer platform. This should be the most performant flow.
AES-128-GCM 782474.44k 1938211.66k 4430867.84k 6402298.54k 7685819.33k 7840186.37k

 - ...

Read more...

Revision history for this message
Adrien Nader (adrien) wrote :

I tested this patch set on a Zen 4 machine too and saw roughly similar speedups.

And before someone asks: no, I'm not testing that on Via CPUs!

Revision history for this message
Adrien Nader (adrien) wrote :

While preparing an update to 3.0.13 for Noble, I started encoutering testsuite failures.

The cause is the AES patch combined with 3.0.13 (more specifically with the dupctx patches. The problematic combination looks something like the following:
- AES-GCM-enabled-with-AVX512-vAES-and-vPCLMULQDQ
- make-inability-to-dup-clone-ciphers-an-error
- Add-dupctx-support-to-aead-ciphers
- Fix-a-key-repointing-in-various-ciphers (this is probably only needed to avoid merge conflicts and not a cause of the issue)

This happens both on Intel and AMD systems which have the corresponding CPU features.

I am going to prepare 3.0.13 _without_ the AES patch from here and I will continue to investigate this with upstream's 3.2 (since this is a rare CPU feature, it's possible CI tests don't exercise it).

Revision history for this message
Adrien Nader (adrien) wrote :

I'm not seeing the issue on 3.2.1. I'm preparing 3.0.13 without the AES patch and will probably deal with it after the feature freeze at the end of the month.

Revision history for this message
Loïc Minier (lool) wrote :

It might cause performance regressions in production though; would you be able to document the failing tests here?

I wonder if we would simply have to cherry pick more from 3.2 for these to pass.

Revision history for this message
Bun K Tan (bktan1) wrote (last edit ):

Hello @adrien-n,

I had Dan Zimmerman take a look at this issue and found a solution. This is what he shared:

Steps to reproduce:
-------------------
Get Source and apply patch:
git clone https://git.launchpad.net/ubuntu/+source/openssl ubuntu_openssl
cd ubuntu_openssl
git checkout applied/ubuntu/devel
git apply 001-vaes_gcm_avx512.patch (See comment #11 for attachment)
This patch is essentially what is referred to as "[PATCH 2/2] AES-GCM enabled with AVX512 vAES and vPCLMULQDQ."
Build OpenSSL
./config --prefix=/tmp/ubuntu_openssl_install --openssldir=/tmp/ubuntu_openssl_install
make -j
make test
Note AES-GCM Test Failures

Steps to resolution:
--------------------
Apply patch:
make clean
git apply 002-vaes_gcm_avx512_fix.patch
make -j
make test
Note AES-GCM Tests Pass

Solution:
---------
The solution to the failed test cases comes from this merged OpenSSL Pull Request: Avoid having another copy of key schedule in PROV_GCM_CTX by t8m · Pull Request #22384 · openssl/openssl (github.com)
Direct application of this PR will fail due to the fact that the OpenSSL doesn't support SM4_GCM in v3.0.13. So I made the edits by hand and created the patch file 002-vaes_gcm_avx512_fix.patch

Revision history for this message
Bun K Tan (bktan1) wrote :
Revision history for this message
Adrien Nader (adrien) wrote :

Thanks a lot for looking at this. The issue seems fixed on my machine. There are currently several changes being prepared for openssl and I think I'd rather batch them considering the state of the CI queue but this will definitely go into Noble. Thanks again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.