??xml version="1.0" encoding="utf-8" standalone="yes"?> 当一个TCPq接建立Ӟ发生?jin)以下场景?x) The server must be prepared to accept an incoming connection. This is normally done by calling socket, bind, and listen and is called a passive open. The client issues an active open by calling connect. This causes the client TCP to send a "synchronize" (SYN) segment, which tells the server the client's initial sequence number for the data that the client will send on the connection. Normally, there is no data sent with the SYN; it just contains an IP header, a TCP header, and possible TCP options (which we will talk about shortly). The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends its SYN and the ACK of the client's SYN in a single segment. The client must acknowledge the server's SYN. 最需要三ơ包交换Q因此称作TCP的三ơ握手,如下图所C:(x) We show the client's initial sequence number as J and the server's initial sequence number as K. The acknowledgment number in an ACK is the next expected sequence number for the end sending the ACK. Since a SYN occupies one byte of the sequence number space, the acknowledgment number in the ACK of each SYN is the initial sequence number plus one. Similarly, the ACK of each FIN is the sequence number of the FIN plus one. An everyday analogy for establishing a TCP connection is the telephone system [Nemeth 1997]. The socket function is the equivalent of having a telephone to use. bind is telling other people your telephone number so that they can call you. listen is turning on the ringer so that you will hear when an incoming call arrives. connect requires that we know the other person's phone number and dial it. accept is when the person being called answers the phone. Having the client's identity returned by accept (where the identify is the client's IP address and port number) is similar to having the caller ID feature show the caller's phone number. One difference, however, is that accept returns the client's identity only after the connection has been established, whereas the caller ID feature shows the caller's phone number before we choose whether to answer the phone or not. If the DNS is used (Chapter 11), it provides a service analogous to a telephone book. getaddrinfo is similar to looking up a person's phone number in the phone book. getnameinfo would be the equivalent of having a phone book sorted by telephone numbers that we could search, instead of a book sorted by name. MSS option. With this option, the TCP sending the SYN announces its maximum segment size, the maximum amount of data that it is willing to accept in each TCP segment, on this connection. The sending TCP uses the receiver's MSS value as the maximum size of a segment that it sends. We will see how to fetch and set this TCP option with the TCP_MAXSEG socket option (Section 7.9). Window scale option. The maximum window that either TCP can advertise to the other TCP is 65,535, because the corresponding field in the TCP header occupies 16 bits. But, high-speed connections, common in today's Internet (45 Mbits/sec and faster, as described in RFC 1323 [Jacobson, Braden, and Borman 1992]), or long delay paths (satellite links) require a larger window to obtain the maximum throughput possible. This newer option specifies that the advertised window in the TCP header must be scaled (left-shifted) by 0?4 bits, providing a maximum window of almost one gigabyte (65,535 x 214). Both end-systems must support this option for the window scale to be used on a connection. We will see how to affect this option with the SO_RCVBUF socket option (Section 7.5). To provide interoperability with older implementations that do not support this option, the following rules apply. TCP can send the option with its SYN as part of an active open. But, it can scale its windows only if the other end also sends the option with its SYN. Similarly, the server's TCP can send this option only if it receives the option with the client's SYN. This logic assumes that implementations ignore options that they do not understand, which is required and common, but unfortunately, not guaranteed with all implementations. Timestamp option. This option is needed for high-speed connections to prevent possible data corruption caused by old, delayed, or duplicated segments. Since it is a newer option, it is negotiated similarly to the window scale option. As network programmers there is nothing we need to worry about with this option. These common options are supported by most implementations. The latter two are sometimes called the "RFC 1323 options," as that RFC [Jacobson, Braden, and Borman 1992] specifies the options. They are also called the "long fat pipe options," since a network with either a high bandwidth or a long delay is called a long fat pipe. Chapter 24 of TCPv1 contains more details on these options. TCP建立旉要三ơ通知Q而终止一个TCPq接旉要四ơ通知?br />One application calls close first, and we say that this end performs the active close. This end's TCP sends a FIN segment, which means it is finished sending data. The other end that receives the FIN performs the passive close. The received FIN is acknowledged by TCP. The receipt of the FIN is also passed to the application as an end-of-file (after any data that may have already been queued for the application to receive), since the receipt of the FIN means the application will not receive any additional data on the connection. Sometime later, the application that received the end-of-file will close its socket. This causes its TCP to send a FIN. The TCP on the system that receives this final FIN (the end that did the active close) acknowledges the FIN. Since a FIN and an ACK are required in each direction, four segments are normally required. We use the qualifier "normally" because in some scenarios, the FIN in Step 1 is sent with data. Also, the segments in Steps 2 and 3 are both from the end performing the passive close and could be combined into one segment. We show these packets in Figure 2.3. A FIN occupies one byte of sequence number space just like a SYN. Therefore, the ACK of each FIN is the sequence number of the FIN plus one. Between Steps 2 and 3 it is possible for data to flow from the end doing the passive close to the end doing the active close. This is called a half-close and we will talk about this in detail with the shutdown function in Section 6.6. The sending of each FIN occurs when a socket is closed. We indicated that the application calls close for this to happen, but realize that when a Unix process terminates, either voluntarily (calling exit or having the main function return) or involuntarily (receiving a signal that terminates the process), all open descriptors are closed, which will also cause a FIN to be sent on any TCP connection that is still open. Although we show the client in Figure 2.3 performing the active close, either end—the client or the server—can perform the active close. Often the client performs the active close, but with some protocols (notably HTTP/1.0), the server performs the active close. 关于TCPq接的徏立和l止操作可以׃个状态{换图来详l说明,如下图所C:(x) There are 11 different states defined for a connection and the rules of TCP dictate the transitions from one state to another, based on the current state and the segment received in that state. For example, if an application performs an active open in the CLOSED state, TCP sends a SYN and the new state is SYN_SENT. If TCP next receives a SYN with an ACK, it sends an ACK and the new state is ESTABLISHED. This final state is where most data transfer occurs. The two arrows leading from the ESTABLISHED state deal with the termination of a connection. If an application calls close before receiving a FIN (an active close), the transition is to the FIN_WAIT_1 state. But if an application receives a FIN while in the ESTABLISHED state (a passive close), the transition is to the CLOSE_WAIT state. We denote the normal client transitions with a darker solid line and the normal server transitions with a darker dashed line. We also note that there are two transitions that we have not talked about: a simultaneous open (when both ends send SYNs at about the same time and the SYNs cross in the network) and a simultaneous close (when both ends send FINs at the same time). Chapter 18 of TCPv1 contains examples and a discussion of both scenarios, which are possible but rare. One reason for showing the state transition diagram is to show the 11 TCP states with their names. These states are displayed by netstat, which is a useful tool when debugging client/server applications. We will use netstat to monitor state changes in Chapter 5. 下土昄?jin)一个TCPq接实际发生的包交换情况Q连接徏立,数据传输和连接终止。在每一端传输的时候,也给Z(jin)TCP状态。?br />TCP q接的包交换 The client in this example announces an MSS of 536 (indicating that it implements only the minimum reassembly buffer size) and the server announces an MSS of 1,460 (typical for IPv4 on an Ethernet). It is okay for the MSS to be different in each direction (see Exercise 2.5). Once a connection is established, the client forms a request and sends it to the server. We assume this request fits into a single TCP segment (i.e., less than 1,460 bytes given the server's announced MSS). The server processes the request and sends a reply, and we assume that the reply fits in a single segment (less than 536 in this example). We show both data segments as bolder arrows. Notice that the acknowledgment of the client's request is sent with the server's reply. This is called piggybacking and will normally happen when the time it takes the server to process the request and generate the reply is less than around 200 ms. If the server takes longer, say one second, we would see the acknowledgment followed later by the reply. (The dynamics of TCP data flow are covered in detail in Chapters 19 and 20 of TCPv1.) We then show the four segments that terminate the connection. Notice that the end that performs the active close (the client in this scenario) enters the TIME_WAIT state. We will discuss this in the next section. It is important to notice in Figure 2.5 that if the entire purpose of this connection was to send a one-segment request and receive a one-segment reply, there would be eight segments of overhead involved when using TCP. If UDP was used instead, only two packets would be exchanged: the request and the reply. But switching from TCP to UDP removes all the reliability that TCP provides to the application, pushing lots of these details from the transport layer (TCP) to the UDP application. Another important feature provided by TCP is congestion control, which must then be handled by the UDP application. Nevertheless, it is important to understand that many applications are built using UDP because the application exchanges small amounts of data and UDP avoids the overhead of TCP connection establishment and connection termination. Every implementation of TCP must choose a value for the MSL. The recommended value in RFC 1122 [Braden 1989] is 2 minutes, although Berkeley-derived implementations have traditionally used a value of 30 seconds instead. This means the duration of the TIME_WAIT state is between 1 and 4 minutes. The MSL is the maximum amount of time that any given IP datagram can live in a network. We know this time is bounded because every datagram contains an 8-bit hop limit (the IPv4 TTL field in Figure A.1 and the IPv6 hop limit field in Figure A.2) with a maximum value of 255. Although this is a hop limit and not a true time limit, the assumption is made that a packet with the maximum hop limit of 255 cannot exist in a network for more than MSL seconds. The way in which a packet gets "lost" in a network is usually the result of routing anomalies. A router crashes or a link between two routers goes down and it takes the routing protocols seconds or minutes to stabilize and find an alternate path. During that time period, routing loops can occur (router A sends packets to router B, and B sends them back to A) and packets can get caught in these loops. In the meantime, assuming the lost packet is a TCP segment, the sending TCP times out and retransmits the packet, and the retransmitted packet gets to the final destination by some alternate path. But sometime later (up to MSL seconds after the lost packet started on its journey), the routing loop is corrected and the packet that was lost in the loop is sent to the final destination. This original packet is called a lost duplicate or a wandering duplicate. TCP must handle these duplicates. There are two reasons for the TIME_WAIT state: To implement TCP's full-duplex connection termination reliably To allow old duplicate segments to expire in the network The first reason can be explained by looking at Figure 2.5 and assuming that the final ACK is lost. The server will resend its final FIN, so the client must maintain state information, allowing it to resend the final ACK. If it did not maintain this information, it would respond with an RST (a different type of TCP segment), which would be interpreted by the server as an error. If TCP is performing all the work necessary to terminate both directions of data flow cleanly for a connection (its full-duplex close), then it must correctly handle the loss of any of these four segments. This example also shows why the end that performs the active close is the end that remains in the TIME_WAIT state: because that end is the one that might have to retransmit the final ACK. To understand the second reason for the TIME_WAIT state, assume we have a TCP connection between 12.106.32.254 port 1500 and 206.168.112.219 port 21. This connection is closed and then sometime later, we establish another connection between the same IP addresses and ports: 12.106.32.254 port 1500 and 206.168.112.219 port 21. This latter connection is called an incarnation of the previous connection since the IP addresses and ports are the same. TCP must prevent old duplicates from a connection from reappearing at some later time and being misinterpreted as belonging to a new incarnation of the same connection. To do this, TCP will not initiate a new incarnation of a connection that is currently in the TIME_WAIT state. Since the duration of the TIME_WAIT state is twice the MSL, this allows MSL seconds for a packet in one direction to be lost, and another MSL seconds for the reply to be lost. By enforcing this rule, we are guaranteed that when we successfully establish a TCP connection, all old duplicates from previous incarnations of the connection have expired in the network. There is an exception to this rule. Berkeley-derived implementations will initiate a new incarnation of a connection that is currently in the TIME_WAIT state if the arriving SYN has a sequence number that is "greater than" the ending sequence number from the previous incarnation. Pages 958?59 of TCPv2 talk about this in more detail. This requires the server to perform the active close, since the TIME_WAIT state must exist on the end that receives the next SYN. This capability is used by the rsh command. RFC 1185 [Jacobson, Braden, and Zhang 1990] talks about some pitfalls in doing this.
]]>
Z(jin)帮助我们理解conncetQacceptQcloseq几个函敎ͼ以及(qing)使用netstat工具来调试TCP应用E序Q我们必ȝ解TCPq接是如何徏立和l止的和TCP状态{换图?br />三次握手
囄Q TCP 的三ơ握?/p>TCP 选项
Each SYN can contain TCP options. Commonly used options include the following:
TCP q接的终?/h4>
Figure 2.3. Packets exchanged when a TCP connection is closed.
TCP 状态{换图
囄QTCP 状态{换图观察包(Watching the PacketsQ?/h4>
TIME_WAIT 状?/h3>
毫无疑问Q关于网l编E中最让h误解点之一是 TIME_WAIT 状态?在一端调用了(jin)close之后Q该端维持这个状态的旉Z倍最大段生存旉Q?span class="docEmphasis">maximum segment lifetime (MSL)Q?/h3>
]]>
]]>
1.3 一个简单的日期旉服务器程序代?br /> q个服务器程序可以ؓ(f)上一节的客户端提供服务?br />
1 #include "unp.h"
2 #include <time.h>
3 int
4 main(int argc, char **argv)
5 {
6 int listenfd, connfd;
7 struct sockaddr_in servaddr;
8 char buff[MAXLINE];
9 time_t ticks;
10 listenfd = Socket(AF_INET, SOCK_STREAM, 0);
11 bzeros(&servaddr, sizeof(servaddr));
12 servaddr.sin_family = AF_INET;
13 servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
14 servaddr.sin_port = htons(13); /* daytime server */
15 Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));
16 Listen(listenfd, LISTENQ);
17 for ( ; ; ) {
18 connfd = Accept(listenfd, (SA *) NULL, NULL);
19 ticks = time(NULL);
20 snprintf(buff, sizeof(buff), "%.24s\r\n", ctime(&ticks));
21 Write(connfd, buff, strlen(buff));
22 Close(connfd);
23 }
24 }
11?5 服务器通过填充|络套接字结构体中的端口域,以及(qing)服务器的|络接口QIP地址Q,然后q行l定Q调用bindQ。在q里指定IP地址为INADDR_ANYQؓ(f)?jin)让客户端可以连接服务器的Q一|络接口Q因为服务器可能有多块网卡,也就对应?jin)多个IP地址Q,也就是说如果服务器有两个IP地址Q客L(fng)q接MIP地址卛_。后l章节中介绍?jin)如何限制客L(fng)q接C个固定的接口上?/p>
16
通过调用listenQ一个套接字p{换ؓ(f)监听套接字,q就是说该套接字负责接收来自客户端的q接hQ而ƈ不真正与客户端进行信息传输?br />帔RLISTENQ 是在头文?tt>unp.h中定义的Q它是指能够同时监听客户端连接的个数。不过LISTENQ的客L(fng)同时q接服务器,它们?x)在一个队列中排队Q来{待服务器的处理。后l章节有更详l的讨论?br />
接收客户端连接,发送回?/p>
17?1 一般地Q服务器q程在调用accept之后q入到睡眠状态,{待着客户端地q接h. 一个TCPq接通过一个称ZҎ(gu)手来建立Q当三方握手完成之后Qaccept调用q回。返回值是一个新的套接字描述W(一个整数值connfdQ,q个新的套接字负责与客户端进行通讯。对于每一个客L(fng)地连接,accept都返回一个新的套接字描述W。整本书使用的无限@环风格是q样的:(x)
for ( ; ; ) { . . . }
当前旉和日期通过调用库函数time来获得,q且通过调用ctimeq行转换Q得我们能够直观的阅读。如下:(x)
Mon May 26 20:58:40 2003
22
客户端调用close之后Q服务器关闭q接。这时候引起了(jin)一个TCPq接l止序列Q一个FIN发送到每一端,同时每一个FIN都要被另一端确认。在后面章节中将?x)对TCPq接建立时候的三方握手以及(qing)TCPq接l止时候的四包交换有更详细的讨论?br /> 以上l出的客L(fng)和服务器版本都是协议相关的(IPv4Q,在后面将?x)给Z个协议无关的版本QIPv4和IPv6都适用Q主要通过使用getaddrinfo函数Q?br /> 最后需要补充的一Ҏ(gu)Q在以上涉及(qing)到Socket API调用的时候,每个函数的第一个字母变成了(jin)大写Q其意义和小写开头的是一L(fng)Q只不过多了(jin)一个错误处理Ş?jin)?/p>
We define our own set of error functions that are used throughout the text to
handle error conditions. The reason for using our own error functions is to
let us write our error handling with a single line of C code, as in
if (error condition) err_sys (printf format with any number of arguments);
instead of
if (error condition) { char buff [2002]; snprintf(buff, sizeof (buff), printf format
with any number of arguments); perror(buff); exit (1); }
Our error functions use the variable-length argument list facility from ANSI C.
See Section 7.3 of [Kernighan and Ritchie 1988] for additional details.
Figure D.3
lists the differences between the various error functions.
If the global integer daemon_proc is nonzero, the message is passed to
syslog with the indicated level; otherwise, the error is output to standard error.
Figure D.3. Summary of our standard error functions.
Figure D.4 shows the first five functions from Figure D.3.
lib/error.c
1 #include "unp.h" 2 #include <stdarg.h> /* ANSI C header file */ 3 #include <syslog.h> /* for syslog() */ 4 int daemon_proc; /* set nonzero by daemon_init() */ 5 static void err_doit(int, int, const char *, va_list); 6 /* Nonfatal error related to system call 7 * Print message and return */ 8 void 9 err_ret(const char *fmt, ...) 10 { 11 va_list ap; 12 va_start(ap, fmt); 13 err_doit(1, LOG_INFO, fmt, ap); 14 va_end(ap); 15 return; 16 } 17 /* Fatal error related to system call 18 * Print message and terminate */ 19 void 20 err_sys(const char *fmt, ...) 21 { 22 va_list ap; 23 va_start(ap, fmt); 24 err_doit(1, LOG_ERR, fmt, ap); 25 va_end(ap); 26 exit(1); 27 } 28 /* Fatal error related to system call 29 * Print message, dump core, and terminate */ 30 void 31 err_dump(const char *fmt, ...) 32 { 33 va_list ap; 34 va_start(ap, fmt); 35 err_doit(1, LOG_ERR, fmt, ap); 36 va_end(ap); 37 abort(); /* dump core and terminate */ 38 exit(1); /* shouldn't get here */ 39 } 40 /* Nonfatal error unrelated to system call 41 * Print message and return */ 42 void 43 err_msg(const char *fmt, ...) 44 { 45 va_list ap; 46 va_start(ap, fmt); 47 err_doit(0, LOG_INFO, fmt, ap); 48 va_end(ap); 49 return; 50 } 51 /* Fatal error unrelated to system call 52 * Print message and terminate */ 53 void 54 err_quit(const char *fmt, ...) 55 { 56 va_list ap; 57 va_start(ap, fmt); 58 err_doit(0, LOG_ERR, fmt, ap); 59 va_end(ap); 60 exit(1); 61 } 62 /* Print message and return to caller 63 * Caller specifies "errnoflag" and "level" */ 64 static void 65 err_doit(int errnoflag, int level, const char *fmt, va_list ap) 66 { 67 int errno_save, n; 68 char buf[MAXLINE + 1]; 69 errno_save = errno; /* value caller might want printed */ 70 #ifdef HAVE_VSNPRINTF 71 vsnprintf(buf, MAXLINE, fmt, ap); * safe */ 72 #else 73 vsprintf(buf, fmt, ap); /* not safe */ 74 #endif 75 n = strlen(buf); 76 if (errnoflag) 77 snprintf(buf + n, MAXLINE - n, ": %s", strerror(errno_save)); 78 strcat(buf, "\n"); 79 if (daemon_proc) { 80 syslog(level, buf); 81 } else { 82 fflush(stdout); /* in case stdout and stderr are the same */ 83 fputs(buf, stderr); 84 fflush(stderr); 85 } 86 return; 87 }
图?.2 服务器同时处理多个客L(fng)
1.2 代码CZ和解?/p>
1 #include "unp.h"
2 int
3 main(int argc, char **argv)
4 {
5 int sockfd, n;
6 char recvline[MAXLINE + 1];
7 struct sockaddr_in servaddr;
8 if (argc != 2)
9 err_quit("usage: a.out <IPaddress>");
10 if ( (sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
11 err_sys("socket error");
12 bzero(&servaddr, sizeof(servaddr));
13 servaddr.sin_family = AF_INET;
14 servaddr.sin_port = htons(13); /* daytime server */
15 if (inet_pton(AF_INET, argv[1], &servaddr.sin_addr) <= 0)
16 err_quit("inet_pton error for %s", argv[1]);
17 if (connect(sockfd, (SA *) &servaddr, sizeof(servaddr)) < 0)
18 err_sys("connect error");
19 while ( (n = read(sockfd, recvline, MAXLINE)) > 0) {
20 recvline[n] = 0; /* null terminate */
21 if (fputs(recvline, stdout) == EOF)
22 err_sys("fputs error");
23 }
24 if (n < 0)
25 err_sys("read error");
26 exit(0);
27 }
其中unp.h是自定义的头文gQ?a class="" title="unp.h文g内容" href="/sscchh-2000/archive/2006/05/09/6836.html" target="_blank">查看源代?/a>。我们编译ƈ执行以上代码Q得C下输出结果:(x)
solaris %a.out 206.168.112.96 our input | |
Mon May 26 20:58:40 2003 the program's output |
1 该头文g包含?jin)大多数|络E序所需要的多个头文件以?qing)定义?jin)我们要使用的一些常?例如 MAXLINE).
10?1 socket函数调用创徏?jin)一个网l流套接?(Internet (AF_INET) stream (SOCK_STREAM) socket), 该函数返回一个整数?它描qC(jin)该套接字,以后的函数通过该整数值来使用q个套接?例如connect和read{调?. 其中err_开头的函数是我们自定义的函?详见q里.
Mon May 26 20 : 58 : 40 2003\r\n
\r 是回? \n 是换?
l止E序
26 exit l止E序.Unix在一个进E结束时候L关闭所有打开的描q符,因此我们的TCP套接字此时关闭了(jin).
后箋内容对此有更深入的讨论.