Arzhel Younsi
2018-05-29 16:05:47 UTC
Hello!
I started to troubleshoot intermittent but large spikes of ICMP "packet too big" messages on our servers running IPsec in transport mode with StrongSwan.
We're tracking that issue "internally" on https://phabricator.wikimedia.org/T195365 with many digressions and real data, but here is a summarized version:
hostA and hostB have IPsec configured such as all traffic between the two hosts is being encrypted. Traffic is relatively steady.
At (so far) random times, a packet capture on hostA's loopback shows large spikes of ICMP "packet too big" from and to hostA's interface IP.
The payload (detailed in the phabricator task) says: hostA tried to send a 1516 bytes packet to hostB while hostA's interface MTU is 1500.
During that spike of ICMP, running:
"ip -s route get hostB" on hostA shows "mtu 1500".
This mtu mention is absent during "quiet time" (default value?).
The ICMP spike stops before the end of the "cache" countdown. But if the ICMP spike happens again, the "cache" countdown gets re-initialized.
Locking the MTU with:
"ip route add hostB via xxx mtu lock 1400" seems to fix the issue.
Our current guess is something along the lines of:
1/ An unknown event (eg. congestion) triggers a MTU probing from the kernel (we have tcp_mtu_probing set to 1)
(As it's all in ipsec, we can't inspect the traffic and see what and how traffic is flowing)
2/ The kernel sets a temporary PMTU value based on the interface (and maybe hostB)
without taking the ESP overhead into consideration
3/ Traffic use that mtu 1500 to send traffic, but can't get passed the interface after beeing encrypted because being too big.
But as this is still quite speculative, and for Ocham's razor' sake I'd expect a miss-configuration on our side instead of a bug in the kernel/StrongSwan :)
How to figure out what creates that cache entry?
Is our guess plausible?
How to troubleshoot it more?
Any help welcome.
As we have many to many IPsec links, I would rather avoid deploying the mtu lock everywhere. This also doesn't help understanding and nailing the root of the issue.
Cheers
I started to troubleshoot intermittent but large spikes of ICMP "packet too big" messages on our servers running IPsec in transport mode with StrongSwan.
We're tracking that issue "internally" on https://phabricator.wikimedia.org/T195365 with many digressions and real data, but here is a summarized version:
hostA and hostB have IPsec configured such as all traffic between the two hosts is being encrypted. Traffic is relatively steady.
At (so far) random times, a packet capture on hostA's loopback shows large spikes of ICMP "packet too big" from and to hostA's interface IP.
The payload (detailed in the phabricator task) says: hostA tried to send a 1516 bytes packet to hostB while hostA's interface MTU is 1500.
During that spike of ICMP, running:
"ip -s route get hostB" on hostA shows "mtu 1500".
This mtu mention is absent during "quiet time" (default value?).
The ICMP spike stops before the end of the "cache" countdown. But if the ICMP spike happens again, the "cache" countdown gets re-initialized.
Locking the MTU with:
"ip route add hostB via xxx mtu lock 1400" seems to fix the issue.
Our current guess is something along the lines of:
1/ An unknown event (eg. congestion) triggers a MTU probing from the kernel (we have tcp_mtu_probing set to 1)
(As it's all in ipsec, we can't inspect the traffic and see what and how traffic is flowing)
2/ The kernel sets a temporary PMTU value based on the interface (and maybe hostB)
without taking the ESP overhead into consideration
3/ Traffic use that mtu 1500 to send traffic, but can't get passed the interface after beeing encrypted because being too big.
But as this is still quite speculative, and for Ocham's razor' sake I'd expect a miss-configuration on our side instead of a bug in the kernel/StrongSwan :)
How to figure out what creates that cache entry?
Is our guess plausible?
How to troubleshoot it more?
Any help welcome.
As we have many to many IPsec links, I would rather avoid deploying the mtu lock everywhere. This also doesn't help understanding and nailing the root of the issue.
Cheers
--
Arzhel Younsi
Arzhel Younsi