What happened to the socket if the network has degraded


Suppose a simple network model: A has successfully created a TCP connection to B, and they are communicating with each other like this

A <----------> B

I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?

If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?

By the way, I notice that there is an option SO_KEEPALIVE in socket which may be useful to me http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/. But I wonder how much the cost will be if I set the probing interval to 2~3 second (which by default is 75 seoncd)? And it seems interval configuration is a global one, so is this gonna affect all the sockets on the machine?

Final question... Say the network has broken down and any write attempt would cause EPIPE some time later. If, however, instead of trying to write, I put this socket into epoll device, what will happend then? Will epoll_wait return EPOLLHUP or EPOLLERR event?

There's numerous other ways a TCP connection can go dead undetected

  • someone yanks out a network cable inbetween.
  • the computer at the other end gets nuked.
  • a nat gateway inbetween silently drops the connection
  • the OS at the other end crashes hard.
  • the FIN packets gets lost.
  • undetectable errors: A router in-between the endpoints may drops packets.(including control packets) reff

In all cases you can know about it when you try to write on socket this cause through SIGPIPE error in your program and terminate it.

By read() it can't be know whether other-side live or not. Thants Why SO_KEEPALIVE useful. Keepalive is non-invasive, and in most cases, if you're in doubt, you can turn it on without the risk of doing something wrong. But do remember that it generates extra network traffic, which can have an impact on routers and firewalls.

And this affects all sockets on your machine too!(you are correct). And Because SO_KEEPALIVE increase traffic and consume CPU. It's best to set the SIGPIPE handle, if there is a chance application will ever write to a broken connection.

Also use SO_KEEPALIVE at reasonable place in the application. It's poor to use it for whole connection duration (i.e do use so_keepalive when server works for long on client query).

Setting the probing interval Dependends on your application or say Application layer protocol.

Though enabling TCP keepalive, you'll detect it eventually - at least during a couple of hours.

Say if the network has broken down and however, instead of trying to write, socket is puted into some epoll device :

The second argument in epoll:

 n = epoll_wait (efd, events, MAXEVENTS, -1);

Set with correct event-related code, Good practice is to check this code for
caution as follow.

n = epoll_wait (efd, events, MAXEVENTS, -1);
for (i = 0; i < n; i++)
    if ((events[i].events & EPOLLERR) ||
          (events[i].events & EPOLLHUP) ||
          (!(events[i].events & EPOLLIN)))
          /* An error has occured on this fd, or the socket is not
             ready for reading (why were we notified then?) */
      fprintf (stderr, "epoll error\n");
      close (events[i].data.fd);

    else if (sfd == events[i].data.fd)
          /* We have a notification on the listening socket, which
         means one or more incoming connections. */

         // Do what you wants

Where EPOLLRDHUP means is:
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)