Discussion:
[strongSwan] GCM/AES-NI Performance
Sydney Meyer
2015-01-18 06:29:53 UTC
Permalink
Hello Everybody,

what level of a performance can one expect when using AES in GCM with AESNI, more or less?

I am getting about 400 Mbit/s with iperf on Debian 7 (AMD64) with Linux 3.16.7, strongSwan 5.2.1 and "ike=aes128gcm16-aesxcbc-modp1024", "esp=aes128gcm16-modp1024" on a Intel Haswell i3 (4130T).

ipsec statusall:

Status of IKE charon daemon (strongSwan 5.2.1, Linux 3.16.0-0.bpo.4-amd64, x86_64):
uptime: 9 minutes, since Jan 18 07:14:15 2015
malloc: sbrk 540672, mmap 0, used 376912, free 163760
worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 3
loaded plugins: charon aes rc2 sha1 sha2 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem openssl fips-prf gmp agent xcbc hmac gcm attr kernel-netlink resolve socket-default farp stroke updown eap-identity eap-aka eap-md5 eap-gtc eap-mschapv2 eap-radius eap-tls eap-ttls eap-tnc xauth-generic xauth-eap xauth-pam tnc-tnccs dhcp lookip error-notify certexpire led addrblock unity
Listening IP addresses:
192.168.140.151
192.168.99.1
Connections:
net-net: 192.168.140.151...192.168.140.155 IKEv2
net-net: local: [moon.strongswan.org] uses pre-shared key authentication
net-net: remote: [sun.strongswan.org] uses pre-shared key authentication
net-net: child: 192.168.99.0/24 === 192.168.98.0/24 TUNNEL
Security Associations (1 up, 0 connecting):
net-net[1]: ESTABLISHED 9 minutes ago, 192.168.140.151[moon.strongswan.org]...192.168.140.155[sun.strongswan.org]
net-net[1]: IKEv2 SPIs: 3326c285014ee25b_i* 6e19d113f6333f0b_r, pre-shared key reauthentication in 42 minutes
net-net[1]: IKE proposal: AES_GCM_16_128/PRF_AES128_XCBC/MODP_1024
net-net{1}: INSTALLED, TUNNEL, ESP SPIs: c30be83d_i cf78024c_o
net-net{1}: AES_GCM_16_128, 19485984 bytes_i (374603 pkts, 509s ago), 1102162106 bytes_o (769403 pkts, 519s ago), rekeying in 5 minutes
net-net{1}: 192.168.99.0/24 === 192.168.98.0/24
Sydney Meyer
2015-01-19 04:28:29 UTC
Permalink
Never mind.. The bottleneck doesn't seem to be encryption, more likely Interrupt Request and/or ESP Kernel handling.

Anyhow, if someone has numbers/experience with decent hardware, I would appreciate it..

Cheers,

S.
Post by Sydney Meyer
Hello Everybody,
what level of a performance can one expect when using AES in GCM with AESNI, more or less?
I am getting about 400 Mbit/s with iperf on Debian 7 (AMD64) with Linux 3.16.7, strongSwan 5.2.1 and "ike=aes128gcm16-aesxcbc-modp1024", "esp=aes128gcm16-modp1024" on a Intel Haswell i3 (4130T).
uptime: 9 minutes, since Jan 18 07:14:15 2015
malloc: sbrk 540672, mmap 0, used 376912, free 163760
worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 3
loaded plugins: charon aes rc2 sha1 sha2 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem openssl fips-prf gmp agent xcbc hmac gcm attr kernel-netlink resolve socket-default farp stroke updown eap-identity eap-aka eap-md5 eap-gtc eap-mschapv2 eap-radius eap-tls eap-ttls eap-tnc xauth-generic xauth-eap xauth-pam tnc-tnccs dhcp lookip error-notify certexpire led addrblock unity
192.168.140.151
192.168.99.1
net-net: 192.168.140.151...192.168.140.155 IKEv2
net-net: local: [moon.strongswan.org] uses pre-shared key authentication
net-net: remote: [sun.strongswan.org] uses pre-shared key authentication
net-net: child: 192.168.99.0/24 === 192.168.98.0/24 TUNNEL
net-net[1]: ESTABLISHED 9 minutes ago, 192.168.140.151[moon.strongswan.org]...192.168.140.155[sun.strongswan.org]
net-net[1]: IKEv2 SPIs: 3326c285014ee25b_i* 6e19d113f6333f0b_r, pre-shared key reauthentication in 42 minutes
net-net[1]: IKE proposal: AES_GCM_16_128/PRF_AES128_XCBC/MODP_1024
net-net{1}: INSTALLED, TUNNEL, ESP SPIs: c30be83d_i cf78024c_o
net-net{1}: AES_GCM_16_128, 19485984 bytes_i (374603 pkts, 509s ago), 1102162106 bytes_o (769403 pkts, 519s ago), rekeying in 5 minutes
net-net{1}: 192.168.99.0/24 === 192.168.98.0/24
McCaffrey, Timothy M
2015-01-19 20:21:28 UTC
Permalink
Post by Sydney Meyer
I am getting about 400 Mbit/s with iperf on Debian 7 (AMD64) with Linux 3.16.7, strongSwan 5.2.1 and "ike=aes128gcm16-aesxcbc-modp1024", "esp=aes128gcm16-modp1024" on a Intel Haswell i3 (4130T).
Performance tests I've done indicate somewhere between 25-40 Megabytes per CPU Ghz. IOW, for a i3-4130 (max 2.9 Ghz) you would
expect somewhere between 70-125 Megabytes/second.

1) Do you have hyperthreading on? Turn it off, IPsec is pretty serialized, it is better to have faster CPUs rather than more of them.

2) Use mpstat & top to determine actual CPU usage.
a) If you are pegging a CPU, perhaps there is some tuning for the NIC that may help (irq coalescing, etc).
b) If the CPU is not pegged, then perhaps you are having a TCP issue, and need to run iperf
with bigger buffers/window sizes (some tweeking Is also possible using ifconfig).

Good luck - Tim
Noel Kuntze
2015-01-19 21:49:41 UTC
Permalink
Hello Tim, Hello Sidney,

As a side note: there is the pcrypt module, which is a wrapper around the encryption and authentication algorithms
to make them parallelizable. Also, using hardware accelerated algorithms (AES with AES-NI, for example) is very good for obvious reasons.

Mit freundlichen Grüßen/Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658
Post by McCaffrey, Timothy M
Post by Sydney Meyer
I am getting about 400 Mbit/s with iperf on Debian 7 (AMD64) with Linux 3.16.7, strongSwan 5.2.1 and "ike=aes128gcm16-aesxcbc-modp1024", "esp=aes128gcm16-modp1024" on a Intel Haswell i3 (4130T).
Performance tests I've done indicate somewhere between 25-40 Megabytes per CPU Ghz. IOW, for a i3-4130 (max 2.9 Ghz) you would
expect somewhere between 70-125 Megabytes/second.
1) Do you have hyperthreading on? Turn it off, IPsec is pretty serialized, it is better to have faster CPUs rather than more of them.
2) Use mpstat & top to determine actual CPU usage.
a) If you are pegging a CPU, perhaps there is some tuning for the NIC that may help (irq coalescing, etc).
b) If the CPU is not pegged, then perhaps you are having a TCP issue, and need to run iperf
with bigger buffers/window sizes (some tweeking Is also possible using ifconfig).
Good luck - Tim
_______________________________________________
Users mailing list
https://lists.strongswan.org/mailman/listinfo/users
Sydney Meyer
2015-01-20 01:48:33 UTC
Permalink
@Timothy: Thanks, that sounds reasonable.. I read this Paper from Intel about Improving IPSec Performance with AESNI and GCM in Linux Crypto, but they've used a multi-queue NIC with Receive Side Scaling and a dedicated CPU-Core per Flow from a Six Core HT-capable Xeon. In Fact, they have shown that enabling HT can actually be beneficial, if you have multiple Tunnel and you (can) set IRQ CPU affinity. The Bottleneck here was definitely Kernel ESP-Processing and Network Stack handling.

What I probably should have mentioned earlier: I did this quick test on a Xen HVM DomU with a single, pinned CPU-Core and AESNI enabled via xen-netfront devices. So my guess would be that the ~400 MBit/s are about right for the overhead and the drawbacks (cpu0 hammered by irqs on a single queue) of a virtualized environment and your indicated average is what i would also expect from a real world implementation, so thanks again.

@Noel: Yes, I read about this here (https://www.strongswan.org/docs/Steffen_Klassert_Parallelizing_IPsec.pdf). I guess this helps if you are using e.g. AES in CBC-Mode with SHA1, but i am not sure about GCM, because of its "authenticated encryption design" (no crypto expert here:). But you surely have read this already. But nonetheless, thanks for your tip.

Cheers,
S.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hello Tim, Hello Sidney,
As a side note: there is the pcrypt module, which is a wrapper around the encryption and authentication algorithms
to make them parallelizable. Also, using hardware accelerated algorithms (AES with AES-NI, for example) is very good for obvious reasons.
Mit freundlichen Grüßen/Regards,
Noel Kuntze
GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658
Post by McCaffrey, Timothy M
Post by Sydney Meyer
I am getting about 400 Mbit/s with iperf on Debian 7 (AMD64) with Linux 3.16.7, strongSwan 5.2.1 and "ike=aes128gcm16-aesxcbc-modp1024", "esp=aes128gcm16-modp1024" on a Intel Haswell i3 (4130T).
Performance tests I've done indicate somewhere between 25-40 Megabytes per CPU Ghz. IOW, for a i3-4130 (max 2.9 Ghz) you would
expect somewhere between 70-125 Megabytes/second.
1) Do you have hyperthreading on? Turn it off, IPsec is pretty serialized, it is better to have faster CPUs rather than more of them.
2) Use mpstat & top to determine actual CPU usage.
a) If you are pegging a CPU, perhaps there is some tuning for the NIC that may help (irq coalescing, etc).
b) If the CPU is not pegged, then perhaps you are having a TCP issue, and need to run iperf
with bigger buffers/window sizes (some tweeking Is also possible using ifconfig).
Good luck - Tim
_______________________________________________
Users mailing list
https://lists.strongswan.org/mailman/listinfo/users
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJUvXvyAAoJEDg5KY9j7GZYENoP/jMhr4qv8lzcLVDQcIZ5qGwz
RUuYejwvmvfrAYoIaUdq4+dOI8bfX5Z5ejA16+r3/vxFC9wOcnH9XgsmBrPRuvuA
dnZ1yY9iNN0M31OyYCEkreWopcHrhF8APrr8s4W9GqstvecvhqS+2VNGvEDSglNK
yDjGEN7nWtKG8p3ABA+AYKjN5doMFA0wFVlceUF+DpOHivRN7BgR/IM1AvOscszn
Rc8aLUqveiApBOrbcCJTdhYqCFQOGkreZB/tuTKx54eMzpbYCYNctvlL6kwd3Lp7
tX+547X3eRtIT/WQ3EDXT8bqwrnerPb3aC6T+L+nMfzdDeKzgJp5vNMGFnWAwDDq
f+4BonzWNiAw8Jn7wV8CKE07DdmdWEIZQWXP6SJWF4tFc66pTFlkNRleuA/2ah74
7SFJTqlx+G7Ps6RrvWUNHJ4zuUzm2V2rnypXaM3bGoOXDiiE8YR5wnlYhhTqIszX
L/yUxICsKPrJO9ZQL1iU71Mrgt95tImZq2fxUyQNPTv1VoYnQUJIW9RP47fXe0qt
2mWe6hxzPXG6uLzcGJzBXQ/c8aM8YXDdpd1U24A6jlP/JrZ5DqlZymKVAN7qMNEd
kHbjSNKvVMKyYQ6uArPAyetphohNWILm46z5SGGHiOJoKqbIaiAbQ1nj/yfK8vxi
WKzfWhj4gRT3Tpofuv6W
=BSPS
-----END PGP SIGNATURE-----
_______________________________________________
Users mailing list
https://lists.strongswan.org/mailman/listinfo/users
Loading...