This article discusses strategies for maintaining long-lived, healthy TCP connections in Go, focusing on the use of TCP Keepalive messages and system-level tuning.
Abstract
The article "Maintaining Healthy TCP Connections in Golang" provides insights into managing long-lived TCP connections, which are crucial for applications that use custom protocols. It emphasizes the importance of TCP Keepalive messages to prevent dead connections due to factors like firewall idle timeouts and network changes. The author explains how to enable keepalives in Go using the net.TCPConn.SetKeepAlive() method and how to adjust system-level parameters such as tcp_keepalive_time, tcp_keepalive_intvl, and tcp_keepalive_probes on Linux systems to ensure timely detection of unhealthy connections. Additionally, the article highlights the use of net.TCPConn.SetKeepAlivePeriod() in Go to override system defaults for keepalive intervals and idle times, which is particularly useful for systems that require quicker detection of dead connections to manage resources efficiently or reduce transactional latency.
Opinions
The author suggests that the default settings for TCP Keepalives may not be suitable for all systems, especially those that need to manage resources efficiently or reduce transactional latency.
Enabling TCP Keepalives is seen as a helpful way to prevent dead connections, with the benefit that only one side of the connection needs to enable this feature.
The article implies that the ability to override system defaults for keepalive parameters in Go using net.TCPConn.SetKeepAlivePeriod() is a critical feature for systems that cannot wait the default 2 hours to detect a dead connection.
There is an underlying opinion that while TCP Keepalives are useful, they are only part of the solution for maintaining healthy TCP connections, with system-level tuning being equally important.
The author endorses an AI service, ZAI.chat, as a cost-effective alternative to ChatGPT Plus (GPT-4), suggesting it as a valuable tool for those interested in the topic.
Maintaining Healthy TCP Connections in Golang
Photo by Pixabay
While most systems these days integrate over gRPC or HTTP, quite a few applications still speak custom protocols. And many of these custom protocols don't have handy packages like net/http to manage all of the TCP connection creation and management.
Today's article is for anyone working directly with TCP connections. This article will discuss maintaining healthy TCP sessions over long periods and how to tune our system to keep sessions long-lived.
Healthy Connections
One of the most common problems for applications that manage long-lived TCP connections is keeping those connections healthy. Many factors are working against long-lived sessions. For example, firewalls often allow administrators to set a max idle session time. This idle timer will kill TCP sessions that haven't sent any data for an extended period.
Changes to clients and servers can also cause a session to be disconnected. But while one side might know of the disconnection right away, the other side might not find out until it tries to send a message.
One helpful way of preventing these dead connections is to enable TCP keepalive messages.
TCP Keepalive
Keepalives are a feature of TCP that sends special packets after a period of inactivity. This packet contains no data but does require a TCP ACK(Acknowledgement) packet to be returned. When the remote host receives the keepalive packet, they will acknowledge they received the packet by sending a ACK packet.
What's helpful with this design is that only one side of the connection needs to enable TCP keepalives. Because keepalives are a packet with the ACK flag set to on, the TCP protocol requires the remote ACK to be sent regardless of keepalive configuration.
Enabling Keepalives
To enable keepalives is very simple. With Go any net.TCPConn type can have keepalives enabled by running the net.TCPConn.SetKeepAlive() method with true as the value.
The below example shows enabling keepalives from the server-side perspective.
And setting this from the client-side is the same.
Tuning the System
On Linux, three kernel parameters govern how TCP Keepalives behave.
Keepalive Idle Time
The first is tcp_keepalive_time this parameter specifies the length of time a connection must be idle before sending a keepalive.
Keepalives will keep TCP connections alive and healthy by periodically sending data back and forth. But this is only needed when a TCP session is not frequently sending data. If an issue were to occur on an actively used TCP connection, both the client and server would quickly identify errors due to missing ACK packets or even a RST (Reset) from the other side.
In general, the default value tcp_keepalive_time is 7200 seconds (2 hours). This default value means that once enabled; our TCP connections will start sending keepalives only after the connection has been idle (no packets exchanged) for 2 hours.
This parameter can, of course, be changed via the sysctl command (as shown below).
$ sysctl -w net.ipv4.tcp_keepalive_time=300
To keep this value after reboot, we must define this same setting within the /etc/sysctl.conf file.
Keepalive Interval
The second parameter is tcp_keepalive_intvl; that this parameter defines the length of time between keepalive packets once the idle time is reached.
By default, the value is typically set to 75 seconds. The previous setting means that by default when keepalives are enabled, a connection that is idle for 2 hours will start receiving keepalive packets every 75 seconds.
As with the above, we can change this with the sysctl command and within the /etc/sysctl.conf file.
$ sysctl -w net.ipv4.tcp_keepalive_intvl=30
Keepalive Failures
The final parameter is tcp_keepalive_probes this parameter defines the number of unanswered keepalive packets. An unanswered packet means even though the system sent a keepalive packet, that packet was not acknowledged with a ACK packet.
In general, the default value is set to 9. With this parameter set, a connection with keepalives enabled that is idle for 2 hours will start receiving keepalive packets every 75 seconds. After nine attempts (11 minutes, 15 seconds), the kernel will notify the application layer of the unhealthy connection.
Again, we can modify this via the sysctl command and within the /etc/sysctl.conf file.
$ sysctl -w net.ipv4.tcp_keepalive_probes=5
Overriding System Defaults
The tcp_keepalive_probes, tcp_keepalive_intvl, and tcp_keepalive_time parameters govern the default behavior when TCP Keepalives are enabled on a connection.
Setting Frequency of Keepalives with Go
With Go, the frequency of keepalives can be changed from the default using the net.TCPConn.SetKeepAlivePeriod() method.
The above example sets the TCP_KEEPINTVL socket option to 30 seconds; overriding the tcp_keepalive_intvl system parameter for this specific connection. In addition to overriding the keepalive interval, this method also changes the idle time, setting the TCP_KEEPIDLE socket option to 30 seconds; overriding the tcp_keepalive_time system parameter.
Summary
As we can see from this article, TCP Keepalives are a great way to keep TCP connections healthy and active for long periods of time. But, they do require a bit of adjusting. For many, 2 hours and 11 minutes to identify a "dead" connection is probably ok.
The dead connection may cause no harm outside of a few system resources. , But for some systems, systems with connection limitations or keep long live sessions open to reduce transactional latency. Two hours can be a long time.
For these systems, being able to override the system defaults using net.TCPConn.SetKeepAlivePeriod() is a crucial feature.