We’ve just gotten a new link to the internet at work. After getting online I did some quick tyre–kicking. Since we self–host our git repositories, this included making an SSH connection from an appserver back to the office. I made the usual invocation, but no dice—just a lone blinking carat on my prompt. Over our previous internet connection, this worked just fine.
I immediately suspect things like the ISP blocking ports, or myself having forgotten to forward port 22 to the git–hosting machine. I connect with
telnet to reduce the variables.
home ≋ telnet office 22 Trying 18.104.22.168... Connected to office Escape character is '^]'. SSH-2.0-OpenSSH_6.0p1 Debian-3ubuntu1.2
Huh, well, that’s weird. The connectivity is fine. So the SSH client is probably getting stuck somewhere during handshake?
home ≋ ssh -v office OpenSSH_6.6, OpenSSL 1.0.1f 6 Jan 2014 debug1: Reading configuration data /home/user/.ssh/config # snip debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-ctr hmac-md5 email@example.com debug1: kex: client->server aes128-ctr hmac-md5 firstname.lastname@example.org debug1: sending SSH2_MSG_KEX_ECDH_INIT debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
… and then nothing, until a timeout. I’m confident the key itself is OK, since I can log in successfully from an in-office machine, but forwarding the SSH agent from that machine to home and trying to come back in leads to the same result. Folk online suggest configuring some ciphers in
/etc/ssh/ssh_config, which I don’t think will be related—this client/server pair has communicated just fine in the past. Setting the ciphers indeed does nothing. What next?
Since I can still get shell on both machines, I use
tcpdump to inspect traffic from both sides during the SSH handshake. The home machine keeps repeating
ACK packets to the office machine, but the office machine doesn’t reply.
This suggests that the packet itself is the variable causing the problem, rather than an issue with the protocol or handshake. Let’s try changing the MTU?
office ≋ sudo ip link set eth0 mtu 1492
This is a shot in the dark—but I can now connect. Now to hunt down where the problem occurs.
home ≋ tracepath -p22 office 1?: [LOCALHOST] pmtu 1500 # snip 15: office 9.095ms pmtu 1492
And from the other side:
user@office ≋ tracepath -p22 home 1?: [LOCALHOST] pmtu 1500 1: Zero 0.502ms 1: Zero 0.489ms 2: Zero 0.465ms pmtu 1492 # snip
Looks like the shenanigans are happening at
Zero, which is a router—but the MTU values are being reported OK. Why aren’t packets being fragmented like they’re designed to? Well. That’s a question for another day.