Ubuntu
openssl package

Bug #2030784
Comment #5

Comment 5 for bug 2030784

Revision history for this message

Adrien Nader (adrien) wrote on 2023-12-01 (last edit on 2023-12-01):

Thanks a lot for the tests, that's very appreciated.

I ran that on my laptop (11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz) which quite surprisingly has all these CPU features. Mostly idle, dynamic CPU governor but no thermal throttling at all (and if there were, it would probably slow down the AVX-512 code anyway), and tests are long enough for CPU governors to not matter much.

============================================================

* AES-128-GCM | AES-256-GCM
- Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer platform. This should be the most performant flow.
AES-128-GCM 855360.29k 3158479.88k 6093932.91k 8905067.37k 13336828.91k 13788498.58k

- Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX AESNI flow and should have equivalent performance
AES-128-GCM 785422.85k 1936140.78k 4404423.77k 6481577.18k 7732716.48k 7873213.39k
AES-128-GCM 790775.41k 1942054.64k 4404868.20k 6484287.87k 7711803.10k 7778795.52k

- AESNI and VAESNI Disabled should fallback to 'C code' performance
AES-128-GCM 150183.11k 167807.25k 598198.71k 662922.19k 681574.40k 678182.91k

* RSA 2K/3K/4K Sign Performance
- Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on ICX or newer platform. This should be the most performant flow.
rsa 2048 bits 0.000246s 0.000015s 4057.2 65278.3
rsa 3072 bits 0.000701s 0.000032s 1426.4 31247.7
rsa 4096 bits 0.001434s 0.000055s 697.4 18052.7

- Individual AVX512F, AVX512VL, and AVX512IFMA features should yield equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
rsa 2048 bits 0.000523s 0.000015s 1910.4 65748.2
rsa 3072 bits 0.001579s 0.000032s 633.3 31158.1
rsa 4096 bits 0.003529s 0.000055s 283.4 18093.6

rsa 2048 bits 0.000524s 0.000015s 1909.0 66310.8
rsa 3072 bits 0.001577s 0.000032s 634.1 31309.7
rsa 4096 bits 0.003568s 0.000055s 280.2 18120.4

rsa 2048 bits 0.000523s 0.000015s 1913.3 65234.3
rsa 3072 bits 0.001583s 0.000032s 631.7 31094.6
rsa 4096 bits 0.003607s 0.000055s 277.3 18076.8

rsa 2048 bits 0.000524s 0.000015s 1907.6 66299.6
rsa 3072 bits 0.001577s 0.000032s 634.1 31214.4
rsa 4096 bits 0.003586s 0.000055s 278.9 18096.1

============================================================

We see the expected behavior (AFAIU, all features must be available at the same time for the changes to have effect).

I'm not comparing everything number by number because I don't think we're looking for specific percentages of improvements.

Overall we see up to ~2.4 performance improvement and we always see large improvements (double digit percentages).

As a control I also ran that on lunar, therefore without the patches (I acknowledge this is not the same openssl version and there are also other changes but I do not think this matters here).

============================================================

# AES-128-GCM | AES-256-GCM
- Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer platform. This should be the most performant flow.
AES-128-GCM 782474.44k 1938211.66k 4430867.84k 6402298.54k 7685819.33k 7840186.37k

- Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX AESNI flow and should have equivalent performance
AES-128-GCM 750028.44k 1926234.78k 4365867.67k 6383893.16k 7742842.78k 7843146.41k
AES-128-GCM 786910.34k 1934779.33k 4421411.45k 6389114.88k 7650086.87k 7797479.86k

- AESNI and VAESNI Disabled should fallback to 'C code' performance
AES-128-GCM 147889.72k 167843.85k 599710.04k 663642.45k 679072.96k 680631.91k

# RSA 2K/3K/4K Sign Performance
- Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on ICX or newer platform. This should be the most performant flow.
rsa 2048 bits 0.000247s 0.000015s 4050.8 66072.6
rsa 3072 bits 0.001596s 0.000032s 626.5 31144.2
rsa 4096 bits 0.003534s 0.000056s 282.9 18003.6

- Individual AVX512F, AVX512VL, and AVX512IFMA features should yield equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
rsa 2048 bits 0.000528s 0.000015s 1892.3 66008.3
rsa 3072 bits 0.001573s 0.000032s 635.6 31094.2
rsa 4096 bits 0.003534s 0.000055s 282.9 18073.8

rsa 2048 bits 0.000522s 0.000015s 1914.7 65763.4
rsa 3072 bits 0.001575s 0.000032s 635.0 31237.8
rsa 4096 bits 0.003530s 0.000055s 283.2 18093.1

rsa 2048 bits 0.000522s 0.000015s 1917.4 65826.2
rsa 3072 bits 0.001575s 0.000032s 635.0 31177.2
rsa 4096 bits 0.003549s 0.000055s 281.8 18109.9

rsa 2048 bits 0.000522s 0.000015s 1915.1 65760.4
rsa 3072 bits 0.001575s 0.000032s 635.0 31180.2
rsa 4096 bits 0.003538s 0.000055s 282.6 18109.9

============================================================

We can see there are no change with the CPU feature flags, except for the test that disables AESNI, in which case the performance is the same in lunar and mantic. That the CPU feature flags don't change the performance except i the one aforementioned case, indicate that these patches are responsible for the large performance increase we have seen. We can also see that they don't otherwise degrade performance on this machine.

Thanks a lot for the tests, that's very appreciated.

============================================================

* AES-128-GCM | AES-256-GCM
 - Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer platform. This should be the most performant flow.
AES-128-GCM     855360.29k  3158479.88k  6093932.91k  8905067.37k 13336828.91k 13788498.58k

- Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX AESNI flow and should have equivalent performance
AES-128-GCM     785422.85k  1936140.78k  4404423.77k  6481577.18k  7732716.48k  7873213.39k
AES-128-GCM     790775.41k  1942054.64k  4404868.20k  6484287.87k  7711803.10k  7778795.52k

- AESNI and VAESNI Disabled should fallback to 'C code' performance
AES-128-GCM     150183.11k   167807.25k   598198.71k   662922.19k   681574.40k   678182.91k

* RSA 2K/3K/4K Sign Performance
 - Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on ICX or newer platform. This should be the most performant flow.
rsa 2048 bits 0.000246s 0.000015s   4057.2  65278.3
rsa 3072 bits 0.000701s 0.000032s   1426.4  31247.7
rsa 4096 bits 0.001434s 0.000055s    697.4  18052.7

- Individual AVX512F, AVX512VL, and AVX512IFMA features should yield equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
rsa 2048 bits 0.000523s 0.000015s   1910.4  65748.2
rsa 3072 bits 0.001579s 0.000032s    633.3  31158.1
rsa 4096 bits 0.003529s 0.000055s    283.4  18093.6

rsa 2048 bits 0.000524s 0.000015s   1909.0  66310.8
rsa 3072 bits 0.001577s 0.000032s    634.1  31309.7
rsa 4096 bits 0.003568s 0.000055s    280.2  18120.4

rsa 2048 bits 0.000523s 0.000015s   1913.3  65234.3
rsa 3072 bits 0.001583s 0.000032s    631.7  31094.6
rsa 4096 bits 0.003607s 0.000055s    277.3  18076.8

rsa 2048 bits 0.000524s 0.000015s   1907.6  66299.6
rsa 3072 bits 0.001577s 0.000032s    634.1  31214.4
rsa 4096 bits 0.003586s 0.000055s    278.9  18096.1

============================================================

We see the expected behavior (AFAIU, all features must be available at the same time for the changes to have effect).

I'm not comparing everything number by number because I don't think we're looking for specific percentages of improvements.

Overall we see up to ~2.4 performance improvement and we always see large improvements (double digit percentages).

As a control I also ran that on lunar, therefore without the patches (I acknowledge this is not the same openssl version and there are also other changes but I do not think this matters here).

============================================================

# AES-128-GCM | AES-256-GCM
 - Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer platform. This should be the most performant flow.
AES-128-GCM     782474.44k  1938211.66k  4430867.84k  6402298.54k  7685819.33k  7840186.37k

- Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX AESNI flow and should have equivalent performance
AES-128-GCM     750028.44k  1926234.78k  4365867.67k  6383893.16k  7742842.78k  7843146.41k
AES-128-GCM     786910.34k  1934779.33k  4421411.45k  6389114.88k  7650086.87k  7797479.86k

- AESNI and VAESNI Disabled should fallback to 'C code' performance
AES-128-GCM     147889.72k   167843.85k   599710.04k   663642.45k   679072.96k   680631.91k

# RSA 2K/3K/4K Sign Performance
 - Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on ICX or newer platform. This should be the most performant flow.
rsa 2048 bits 0.000247s 0.000015s   4050.8  66072.6
rsa 3072 bits 0.001596s 0.000032s    626.5  31144.2
rsa 4096 bits 0.003534s 0.000056s    282.9  18003.6

- Individual AVX512F, AVX512VL, and AVX512IFMA features should yield equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
rsa 2048 bits 0.000528s 0.000015s   1892.3  66008.3
rsa 3072 bits 0.001573s 0.000032s    635.6  31094.2
rsa 4096 bits 0.003534s 0.000055s    282.9  18073.8

rsa 2048 bits 0.000522s 0.000015s   1914.7  65763.4
rsa 3072 bits 0.001575s 0.000032s    635.0  31237.8
rsa 4096 bits 0.003530s 0.000055s    283.2  18093.1

rsa 2048 bits 0.000522s 0.000015s   1917.4  65826.2
rsa 3072 bits 0.001575s 0.000032s    635.0  31177.2
rsa 4096 bits 0.003549s 0.000055s    281.8  18109.9

rsa 2048 bits 0.000522s 0.000015s   1915.1  65760.4
rsa 3072 bits 0.001575s 0.000032s    635.0  31180.2
rsa 4096 bits 0.003538s 0.000055s    282.6  18109.9

============================================================

Ubuntuopenssl package

Comment 5 for bug 2030784

Ubuntu
openssl package