122

I think I understand the formal meaning of the option. In some legacy code I'm handling now, the option is used. The customer complains about RST as response to FIN from its side on connection close from its side.

I am not sure I can remove it safely, since I don't understand when it should be used.

Can you please give an example of when the option would be required?

1
  • 1
    You should remove it. It shouldn't be used in production code. The only time I ever saw it used was as the result of an invalid benchmark.
    – user207421
    Commented Aug 24, 2014 at 0:46

7 Answers 7

245

For my suggestion, please read the last section: “When to use SO_LINGER with timeout 0”.

Before we come to that a little lecture about:

  • Normal TCP termination
  • TIME_WAIT
  • FIN, ACK and RST

Normal TCP termination

The normal TCP termination sequence looks like this (simplified):

We have two peers: A and B

  1. A calls close()
  • A sends FIN to B
  • A goes into FIN_WAIT_1 state
  1. B receives FIN
  • B sends ACK to A
  • B goes into CLOSE_WAIT state
  1. A receives ACK
  • A goes into FIN_WAIT_2 state
  1. B calls close()
  • B sends FIN to A
  • B goes into LAST_ACK state
  1. A receives FIN
  • A sends ACK to B
  • A goes into TIME_WAIT state
  1. B receives ACK
  • B goes to CLOSED state – i.e. is removed from the socket tables

TIME_WAIT

So the peer that initiates the termination – i.e. calls close() first – will end up in the TIME_WAIT state.

To understand why the TIME_WAIT state is our friend, please read section 2.7 in "UNIX Network Programming" third edition by Stevens et al (page 43).

However, it can be a problem with lots of sockets in TIME_WAIT state on a server as it could eventually prevent new connections from being accepted.

To work around this problem, I have seen many suggesting to set the SO_LINGER socket option with timeout 0 before calling close(). However, this is a bad solution as it causes the TCP connection to be terminated with an error.

Instead, design your application protocol so the connection termination is always initiated from the client side. If the client always knows when it has read all remaining data it can initiate the termination sequence. As an example, a browser knows from the Content-Length HTTP header when it has read all data and can initiate the close. (I know that in HTTP 1.1 it will keep it open for a while for a possible reuse, and then close it.)

If the server needs to close the connection, design the application protocol so the server asks the client to call close().

When to use SO_LINGER with timeout 0

Again, according to "UNIX Network Programming" third edition page 202-203, setting SO_LINGER with timeout 0 prior to calling close() will cause the normal termination sequence not to be initiated.

Instead, the peer setting this option and calling close() will send a RST (connection reset) which indicates an error condition and this is how it will be perceived at the other end. You will typically see errors like "Connection reset by peer".

Therefore, in the normal situation it is a really bad idea to set SO_LINGER with timeout 0 prior to calling close() – from now on called abortive close – in a server application.

However, certain situation warrants doing so anyway:

  • If a client of your server application misbehaves (times out, returns invalid data, etc.) an abortive close makes sense to avoid being stuck in CLOSE_WAIT or ending up in the TIME_WAIT state.
  • If you must restart your server application which currently has thousands of client connections you might consider setting this socket option to avoid thousands of server sockets in TIME_WAIT (when calling close() from the server end) as this might prevent the server from getting available ports for new client connections after being restarted.
  • On page 202 in the aforementioned book it specifically says: "There are certain circumstances which warrant using this feature to send an abortive close. One example is an RS-232 terminal server, which might hang forever in CLOSE_WAIT trying to deliver data to a stuck terminal port, but would properly reset the stuck port if it got an RST to discard the pending data."

I would recommend this long article which I believe gives a very good answer to your question.

3
  • 8
    TIME_WAIT is a friend only when it doesn't start to cause problems: www.greatytc.com/questions/1803566/…
    – Pacerier
    Commented Jan 23, 2016 at 4:23
  • 2
    so what if you are writing a web server? how do you "tell the client to initiate a close"?
    – Shaun Neal
    Commented Mar 10, 2016 at 2:00
  • 3
    @ShaunNeal you obviously don't. But a well written client / browser will initiate the close. If the client is not well-behaving, luckily we have TIME_WAIT assassination to ensure we do not run out of socket descriptors and ephemeral ports.
    – mgd
    Commented Mar 10, 2016 at 7:17
99

The typical reason to set a SO_LINGER timeout of zero is to avoid large numbers of connections sitting in the TIME_WAIT state, tying up all the available resources on a server.

When a TCP connection is closed cleanly, the end that initiated the close ("active close") ends up with the connection sitting in TIME_WAIT for several minutes. So if your protocol is one where the server initiates the connection close, and involves very large numbers of short-lived connections, then it might be susceptible to this problem.

This isn't a good idea, though - TIME_WAIT exists for a reason (to ensure that stray packets from old connections don't interfere with new connections). It's a better idea to redesign your protocol to one where the client initiates the connection close, if possible.

11
  • 4
    I totally agree. I have seen a monitoring application which was initiating many (a few thousands short lived connections every X seconds), and it had probem to scale bigger (a thousand connection more). I don't know why, but the applicatoin was non responsive. Someone suggested the SO_LINGER = true, TIME_WAIT = 0 to free OS resources quickly, and after short investigation we did try this solution with very good results. The TIME_WAIT is no longer a problem for this app.
    – bartosz.r
    Commented Mar 14, 2012 at 15:13
  • 25
    I disagree. An application level protocol sitting on top of TCP should be designed in such a way that the client always initiates the connection close. That way, the TIME_WAIT will sit at the client doing no harm. Remember as it says in "UNIX Network Programming" third edition (Stevens et al) page 203: "The TIME_WAIT state is your friend and is there to help us. Instead of trying to avoid the state, we should understand it (Section 2.7)."
    – mgd
    Commented Oct 26, 2012 at 13:29
  • 8
    What if a client wants to open 4000 connections every 30 seconds (this monitoring application is a client! because it initiates connection)? Yes we can redesing the application, add some local agents in the infrastructure, change the model to push. But if we already have such an application and it grows, then we can make it work by tuning twe linger. You change one param, and you suddenly have working application, without investing a budget to implement new architecture.
    – bartosz.r
    Commented Oct 29, 2012 at 11:09
  • 4
    @bartosz.r: I only saying that using SO_LINGER with timeout 0 should really be a last resort. Again, in "UNIX Network Programming" third edition (Stevens et al) page 203 it also says that you risk data corruption. Consider reading RFC 1337 where you can see why TIME_WAIT is your friend.
    – mgd
    Commented Oct 30, 2012 at 14:54
  • 7
    @caf No, the classic solution would be a connection pool, as seen in every heavy-duty TCP API, for example HTTP 1.1.
    – user207421
    Commented Aug 24, 2014 at 10:12
18

When linger is on but the timeout is zero the TCP stack doesn't wait for pending data to be sent before closing the connection. Data could be lost due to this but by setting linger this way you're accepting this and asking that the connection be reset straight away rather than closed gracefully. This causes an RST to be sent rather than the usual FIN.

Thanks to EJP for his comment, see here for details.

4
  • 1
    I understand this. what I'm asking is for "realistic" example when we would like to use hard reset.
    – dimba
    Commented Sep 21, 2010 at 7:49
  • 5
    Whenever you want to abort a connection; so if your protocol fails validation and you have a client talking rubbish at you all of a sudden you'd abort the connection with an RST, etc. Commented Sep 21, 2010 at 8:37
  • 6
    You're confusing a zero linger timeout with linger off. Linger off means that close() doesn't block. Linger on with a positive timeout means that close() blocks for up to the timeout. Linger on with a zero timeout causes RST, and this is what the question is about.
    – user207421
    Commented Aug 24, 2014 at 0:45
  • 2
    Yes, you're correct. I'll adjust the answer to correct my terminology. Commented Aug 24, 2014 at 8:59
8

Whether you can remove the linger in your code safely or not depends on the type of your application: is it a „client“ (opening TCP connections and actively closing it first) or is it a „server“ (listening to a TCP open and closing it after the other side initiated the close)?

If your application has the flavor of a „client“ (closing first) AND you initiate & close a huge number of connections to different servers (e.g. when your app is a monitoring app supervising the reachability of a huge number of different servers) your app has the problem that all your client connections are stuck in TIME_WAIT state. Then, I would recommend to shorten the timeout to a smaller value than the default to still shutdown gracefully but free up the client connections resources earlier. I would not set the timeout to 0, as 0 does not shutdown gracefully with FIN but abortive with RST.

If your application has the flavor of a „client“ and has to fetch a huge amount of small files from the same server, you should not initiate a new TCP connection per file and end up in a huge amount of client connections in TIME_WAIT, but keep the connection open and fetch all data over the same connection. Linger option can and should be removed.

If your application is a „server“ (close second as reaction to peer‘s close), on close() your connection is shutdown gracefully and resources are freed up as you don‘t enter TIME_WAIT state. Linger should not be used. But if your sever app has a supervisory process detecting inactive open connections idleing for a long time („long“ is to be defined) you can shutdown this inactive connection from your side - see it as kind of error handling - with an abortive shutdown. This is done by setting linger timeout to 0. close() will then send a RST to the client, telling him that you are angry :-)

1

In servers, you may like to send RST instead of FIN when disconnecting misbehaving clients. That skips FIN-WAIT followed by TIME-WAIT socket states in the server, which prevents from depleting server resources, and, hence, protects from this kind of denial-of-service attack.

1

I like Maxim's observation that DOS attacks can exhaust server resources. It also happens without an actually malicious adversary.

Some servers have to deal with the 'unintentional DOS attack' which occurs when the client app has a bug with connection leak, where they keep creating a new connection for every new command they send to your server. And then perhaps eventually closing their connections if they hit GC pressure, or perhaps the connections eventually time out.

Another scenario is when 'all clients have the same TCP address' scenario. Then client connections are distinguishable only by port numbers (if they connect to a single server). And if clients start rapidly cycling opening/closing connections for any reason they can exhaust the (client addr+port, server IP+port) tuple-space.

So I think servers may be best advised to switch to the Linger-Zero strategy when they see a high number of sockets in the TIME_WAIT state - although it doesn't fix the client behavior, it might reduce the impact.

0

The listen socket on a server can use linger with time 0 to have access to binding back to the socket immediately and to reset any clients whose connections are not yet finished connecting. TIME_WAIT is something that is only interesting when you have a multi-path network and can end up with miss-ordered packets or otherwise are dealing with odd network packet ordering/arrival-timing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.