??xml version="1.0" encoding="utf-8" standalone="yes"?> System I/O can be blocking, or non-blocking synchronous, or non-blocking asynchronous [1, 2]. Blocking I/O means that the calling system does not return control to the caller until the operation is finished. As a result, the caller is blocked and cannot perform other activities during that time. Most important, the caller thread cannot be reused for other request processing while waiting for the I/O to complete, and becomes a wasted resource during that time. For example, a By contrast, a non-blocking synchronous call returns control to the caller immediately. The caller is not made to wait, and the invoked system immediately returns one of two responses: If the call was executed and the results are ready, then the caller is told of that. Alternatively, the invoked system can tell the caller that the system has no resources (no data in the socket) to perform the requested action. In that case, it is the responsibility of the caller may repeat the call until it succeeds. For example, a In a non-blocking asynchronous call, the calling function returns control to the caller immediately, reporting that the requested action was started. The calling system will execute the caller's request using additional system resources/threads and will notify the caller (by callback for example), when the result is ready for processing. For example, a Windows This article investigates different non-blocking I/O multiplexing mechanisms and proposes a single multi-platform design pattern/solution. We hope that this article will help developers of high performance TCP based servers to choose optimal design solution. We also compare the performance of Java, C# and C++ implementations of proposed and existing solutions. We will exclude the blocking approach from further discussion and comparison at all, as it the least effective approach for scalability and performance. In general, I/O multiplexing mechanisms rely on an event demultiplexor [1, 3], an object that dispatches I/O events from a limited number of sources to the appropriate read/write event handlers. The developer registers interest in specific events and provides event handlers, or callbacks. The event demultiplexor delivers the requested events to the event handlers. Two patterns that involve event demultiplexors are called Reactor and Proactor [1]. The Reactor patterns involve synchronous I/O, whereas the Proactor pattern involves asynchronous I/O. In Reactor, the event demultiplexor waits for events that indicate when a file descriptor or socket is ready for a read or write operation. The demultiplexor passes this event to the appropriate handler, which is responsible for performing the actual read or write. In the Proactor pattern, by contrast, the handler—or the event demultiplexor on behalf of the handler—initiates asynchronous read and write operations. The I/O operation itself is performed by the operating system (OS). The parameters passed to the OS include the addresses of user-defined data buffers from which the OS gets data to write, or to which the OS puts data read. The event demultiplexor waits for events that indicate the completion of the I/O operation, and forwards those events to the appropriate handlers. For example, on Windows a handler could initiate async I/O (overlapped in Microsoft terminology) operations, and the event demultiplexor could wait for IOCompletion events [1]. The implementation of this classic asynchronous pattern is based on an asynchronous OS-level API, and we will call this implementation the "system-level" or "true" async, because the application fully relies on the OS to execute actual I/O. An example will help you understand the difference between Reactor and Proactor. We will focus on the read operation here, as the write implementation is similar. Here's a read in Reactor: By comparison, here is a read operation in Proactor (true async): The open-source C++ development framework ACE [1, 3] developed by Douglas Schmidt, et al., offers a wide range of platform-independent, low-level concurrency support classes (threading, mutexes, etc). On the top level it provides two separate groups of classes: implementations of the ACE Reactor and ACE Proactor. Although both of them are based on platform-independent primitives, these tools offer different interfaces. The ACE Proactor gives much better performance and robustness on MS-Windows, as Windows provides a very efficient async API, based on operating-system-level support [4, 5]. Unfortunately, not all operating systems provide full robust async OS-level support. For instance, many Unix systems do not. Therefore, ACE Reactor is a preferable solution in UNIX (currently UNIX does not have robust async facilities for sockets). As a result, to achieve the best performance on each system, developers of networked applications need to maintain two separate code-bases: an ACE Proactor based solution on Windows and an ACE Reactor based solution for Unix-based systems. As we mentioned, the true async Proactor pattern requires operating-system-level support. Due to the differing nature of event handler and operating-system interaction, it is difficult to create common, unified external interfaces for both Reactor and Proactor patterns. That, in turn, makes it hard to create a fully portable development framework and encapsulate the interface and OS- related differences. In this section, we will propose a solution to the challenge of designing a portable framework for the Proactor and Reactor I/O patterns. To demonstrate this solution, we will transform a Reactor demultiplexor I/O solution to an emulated async I/O by moving read/write operations from event handlers inside the demultiplexor (this is "emulated async" approach). The following example illustrates that conversion for a read operation: As we can see, by adding functionality to the demultiplexor I/O pattern, we were able to convert the Reactor pattern to a Proactor pattern. In terms of the amount of work performed, this approach is exactly the same as the Reactor pattern. We simply shifted responsibilities between different actors. There is no performance degradation because the amount of work performed is still the same. The work was simply performed by different actors. The following lists of steps demonstrate that each approach performs an equal amount of work: Standard/classic Reactor: Proposed emulated Proactor: With an operating system that does not provide an async I/O API, this approach allows us to hide the reactive nature of available socket APIs and to expose a fully proactive async interface. This allows us to create a fully portable platform-independent solution with a common external interface. The proposed solution (TProactor) was developed and implemented at Terabit P/L [6]. The solution has two alternative implementations, one in C++ and one in Java. The C++ version was built using ACE cross-platform low-level primitives and has a common unified async proactive interface on all platforms. The main TProactor components are the Engine and WaitStrategy interfaces. Engine manages the async operations lifecycle. WaitStrategy manages concurrency strategies. WaitStrategy depends on Engine and the two always work in pairs. Interfaces between Engine and WaitStrategy are strongly defined. Engines and waiting strategies are implemented as pluggable class-drivers (for the full list of all implemented Engines and corresponding WaitStrategies, see Appendix 1). TProactor is a highly configurable solution. It internally implements three engines (POSIX AIO, SUN AIO and Emulated AIO) and hides six different waiting strategies, based on an asynchronous kernel API (for POSIX- this is not efficient right now due to internal POSIX AIO API problems) and synchronous Unix With a set of mutually interchangeable "lego-style" Engines and WaitStrategies, a developer can choose the appropriate internal mechanism (engine and waiting strategy) at run time by setting appropriate configuration parameters. These settings may be specified according to specific requirements, such as the number of connections, scalability, and the targeted OS. If the operating system supports async API, a developer may use the true async approach, otherwise the user can opt for an emulated async solutions built on different sync waiting strategies. All of those strategies are hidden behind an emulated async façade. For an HTTP server running on Sun Solaris, for example, the /dev/poll or In terms of performance, our tests show that emulating from reactive to proactive does not impose any overhead—it can be faster, but not slower. According to our test results, the TProactor gives on average of up to 10-35 % better performance (measured in terms of both throughput and response times) than the reactive model in the standard ACE Reactor implementation on various UNIX/Linux platforms. On Windows it gives the same performance as standard ACE Proactor. In addition to C++, as we also implemented TProactor in Java. As for JDK version 1.4, Java provides only the sync-based approach that is logically similar to C Figures 1 and 2 chart the transfer rate in bits/sec versus the number of connections. These charts represent comparison results for a simple echo-server built on standard ACE Reactor, using RedHat Linux 9.0, TProactor C++ and Java (IBM 1.4JVM) on Microsoft's Windows and RedHat Linux9.0, and a C# echo-server running on the Windows operating system. Performance of native AIO APIs is represented by "Async"-marked curves; by emulated AIO (TProactor)—AsyncE curves; and by TP_Reactor—Synch curves. All implementations were bombarded by the same client application—a continuous stream of arbitrary fixed sized messages via N connections. The full set of tests was performed on the same hardware. Tests on different machines proved that relative results are consistent. The following is the skeleton of a simple TProactor-based Java echo-server. In a nutshell, the developer only has to implement the two interfaces: TProactor provides a common, flexible, and configurable solution for multi-platform high- performance communications development. All of the problems and complexities mentioned in Appendix 2, are hidden from the developer. It is clear from the charts that C++ is still the preferable approach for high performance communication solutions, but Java on Linux comes quite close. However, the overall Java performance was weakened by poor results on Windows. One reason for that may be that the Java 1.4 nio package is based on Note. All tests for Java are performed on "raw" buffers (java.nio.ByteBuffer) without data processing. Taking into account the latest activities to develop robust AIO on Linux [9], we can conclude that Linux Kernel API (io_xxxx set of system calls) should be more scalable in comparison with POSIX standard, but still not portable. In this case, TProactor with new Engine/Wait Strategy pair, based on native LINUX AIO can be easily implemented to overcome portability issues and to cover Linux native AIO with standard ACE Proactor interface. Engines and waiting strategies implemented in TProactor All sync waiting strategies can be divided into two groups: Let us describe some common logical problems for those groups: [1] Douglas C. Schmidt, Stephen D. Huston "C++ Network Programming." 2002, Addison-Wesley ISBN 0-201-60464-7 [2] W. Richard Stevens "UNIX Network Programming" vol. 1 and 2, 1999, Prentice Hill, ISBN 0-13- 490012-X [3] Douglas C. Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann "Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, Volume 2" Wiley & Sons, NY 2000 [4] INFO: Socket Overlapped I/O Versus Blocking/Non-blocking Mode. Q181611. Microsoft Knowledge Base Articles. [5] Microsoft MSDN. I/O Completion Ports. [6] TProactor (ACE compatible Proactor). [7] JavaDoc java.nio.channels [8] JavaDoc Java.nio.channels.spi Class SelectorProvider [9] Linux AIO development See Also: Ian Barile "I/O Multiplexing & Scalable Socket Servers", 2004 February, DDJ Further reading on event handling The Adaptive Communication Environment Terabit Solutions Alex Libman has been programming for 15 years. During the past 5 years his main area of interest is pattern-oriented multiplatform networked programming using C++ and Java. He is big fan and contributor of ACE. Vlad Gilbourd works as a computer consultant, but wishes to spend more time listening jazz :) As a hobby, he started and runswww.corporatenews.com.au website. System I/O can be blocking, or non-blocking synchronous, or non-blocking asynchronous [1, 2]. Blocking I/O means that the calling system does not return control to the caller until the operation is finished. As a result, the caller is blocked and cannot perform other activities during that time. Most important, the caller thread cannot be reused for other request processing while waiting for the I/O to complete, and becomes a wasted resource during that time. For example, a By contrast, a non-blocking synchronous call returns control to the caller immediately. The caller is not made to wait, and the invoked system immediately returns one of two responses: If the call was executed and the results are ready, then the caller is told of that. Alternatively, the invoked system can tell the caller that the system has no resources (no data in the socket) to perform the requested action. In that case, it is the responsibility of the caller may repeat the call until it succeeds. For example, a In a non-blocking asynchronous call, the calling function returns control to the caller immediately, reporting that the requested action was started. The calling system will execute the caller's request using additional system resources/threads and will notify the caller (by callback for example), when the result is ready for processing. For example, a Windows This article investigates different non-blocking I/O multiplexing mechanisms and proposes a single multi-platform design pattern/solution. We hope that this article will help developers of high performance TCP based servers to choose optimal design solution. We also compare the performance of Java, C# and C++ implementations of proposed and existing solutions. We will exclude the blocking approach from further discussion and comparison at all, as it the least effective approach for scalability and performance. In general, I/O multiplexing mechanisms rely on an event demultiplexor [1, 3], an object that dispatches I/O events from a limited number of sources to the appropriate read/write event handlers. The developer registers interest in specific events and provides event handlers, or callbacks. The event demultiplexor delivers the requested events to the event handlers. Two patterns that involve event demultiplexors are called Reactor and Proactor [1]. The Reactor patterns involve synchronous I/O, whereas the Proactor pattern involves asynchronous I/O. In Reactor, the event demultiplexor waits for events that indicate when a file descriptor or socket is ready for a read or write operation. The demultiplexor passes this event to the appropriate handler, which is responsible for performing the actual read or write. In the Proactor pattern, by contrast, the handler—or the event demultiplexor on behalf of the handler—initiates asynchronous read and write operations. The I/O operation itself is performed by the operating system (OS). The parameters passed to the OS include the addresses of user-defined data buffers from which the OS gets data to write, or to which the OS puts data read. The event demultiplexor waits for events that indicate the completion of the I/O operation, and forwards those events to the appropriate handlers. For example, on Windows a handler could initiate async I/O (overlapped in Microsoft terminology) operations, and the event demultiplexor could wait for IOCompletion events [1]. The implementation of this classic asynchronous pattern is based on an asynchronous OS-level API, and we will call this implementation the "system-level" or "true" async, because the application fully relies on the OS to execute actual I/O. An example will help you understand the difference between Reactor and Proactor. We will focus on the read operation here, as the write implementation is similar. Here's a read in Reactor: By comparison, here is a read operation in Proactor (true async): The open-source C++ development framework ACE [1, 3] developed by Douglas Schmidt, et al., offers a wide range of platform-independent, low-level concurrency support classes (threading, mutexes, etc). On the top level it provides two separate groups of classes: implementations of the ACE Reactor and ACE Proactor. Although both of them are based on platform-independent primitives, these tools offer different interfaces. The ACE Proactor gives much better performance and robustness on MS-Windows, as Windows provides a very efficient async API, based on operating-system-level support [4, 5]. Unfortunately, not all operating systems provide full robust async OS-level support. For instance, many Unix systems do not. Therefore, ACE Reactor is a preferable solution in UNIX (currently UNIX does not have robust async facilities for sockets). As a result, to achieve the best performance on each system, developers of networked applications need to maintain two separate code-bases: an ACE Proactor based solution on Windows and an ACE Reactor based solution for Unix-based systems. As we mentioned, the true async Proactor pattern requires operating-system-level support. Due to the differing nature of event handler and operating-system interaction, it is difficult to create common, unified external interfaces for both Reactor and Proactor patterns. That, in turn, makes it hard to create a fully portable development framework and encapsulate the interface and OS- related differences. In this section, we will propose a solution to the challenge of designing a portable framework for the Proactor and Reactor I/O patterns. To demonstrate this solution, we will transform a Reactor demultiplexor I/O solution to an emulated async I/O by moving read/write operations from event handlers inside the demultiplexor (this is "emulated async" approach). The following example illustrates that conversion for a read operation: As we can see, by adding functionality to the demultiplexor I/O pattern, we were able to convert the Reactor pattern to a Proactor pattern. In terms of the amount of work performed, this approach is exactly the same as the Reactor pattern. We simply shifted responsibilities between different actors. There is no performance degradation because the amount of work performed is still the same. The work was simply performed by different actors. The following lists of steps demonstrate that each approach performs an equal amount of work: Standard/classic Reactor: Proposed emulated Proactor: With an operating system that does not provide an async I/O API, this approach allows us to hide the reactive nature of available socket APIs and to expose a fully proactive async interface. This allows us to create a fully portable platform-independent solution with a common external interface. The proposed solution (TProactor) was developed and implemented at Terabit P/L [6]. The solution has two alternative implementations, one in C++ and one in Java. The C++ version was built using ACE cross-platform low-level primitives and has a common unified async proactive interface on all platforms. The main TProactor components are the Engine and WaitStrategy interfaces. Engine manages the async operations lifecycle. WaitStrategy manages concurrency strategies. WaitStrategy depends on Engine and the two always work in pairs. Interfaces between Engine and WaitStrategy are strongly defined. Engines and waiting strategies are implemented as pluggable class-drivers (for the full list of all implemented Engines and corresponding WaitStrategies, see Appendix 1). TProactor is a highly configurable solution. It internally implements three engines (POSIX AIO, SUN AIO and Emulated AIO) and hides six different waiting strategies, based on an asynchronous kernel API (for POSIX- this is not efficient right now due to internal POSIX AIO API problems) and synchronous Unix With a set of mutually interchangeable "lego-style" Engines and WaitStrategies, a developer can choose the appropriate internal mechanism (engine and waiting strategy) at run time by setting appropriate configuration parameters. These settings may be specified according to specific requirements, such as the number of connections, scalability, and the targeted OS. If the operating system supports async API, a developer may use the true async approach, otherwise the user can opt for an emulated async solutions built on different sync waiting strategies. All of those strategies are hidden behind an emulated async façade. For an HTTP server running on Sun Solaris, for example, the /dev/poll or In terms of performance, our tests show that emulating from reactive to proactive does not impose any overhead—it can be faster, but not slower. According to our test results, the TProactor gives on average of up to 10-35 % better performance (measured in terms of both throughput and response times) than the reactive model in the standard ACE Reactor implementation on various UNIX/Linux platforms. On Windows it gives the same performance as standard ACE Proactor. In addition to C++, as we also implemented TProactor in Java. As for JDK version 1.4, Java provides only the sync-based approach that is logically similar to C Figures 1 and 2 chart the transfer rate in bits/sec versus the number of connections. These charts represent comparison results for a simple echo-server built on standard ACE Reactor, using RedHat Linux 9.0, TProactor C++ and Java (IBM 1.4JVM) on Microsoft's Windows and RedHat Linux9.0, and a C# echo-server running on the Windows operating system. Performance of native AIO APIs is represented by "Async"-marked curves; by emulated AIO (TProactor)—AsyncE curves; and by TP_Reactor—Synch curves. All implementations were bombarded by the same client application—a continuous stream of arbitrary fixed sized messages via N connections. The full set of tests was performed on the same hardware. Tests on different machines proved that relative results are consistent. The following is the skeleton of a simple TProactor-based Java echo-server. In a nutshell, the developer only has to implement the two interfaces: TProactor provides a common, flexible, and configurable solution for multi-platform high- performance communications development. All of the problems and complexities mentioned in Appendix 2, are hidden from the developer. It is clear from the charts that C++ is still the preferable approach for high performance communication solutions, but Java on Linux comes quite close. However, the overall Java performance was weakened by poor results on Windows. One reason for that may be that the Java 1.4 nio package is based on Note. All tests for Java are performed on "raw" buffers (java.nio.ByteBuffer) without data processing. Taking into account the latest activities to develop robust AIO on Linux [9], we can conclude that Linux Kernel API (io_xxxx set of system calls) should be more scalable in comparison with POSIX standard, but still not portable. In this case, TProactor with new Engine/Wait Strategy pair, based on native LINUX AIO can be easily implemented to overcome portability issues and to cover Linux native AIO with standard ACE Proactor interface. Engines and waiting strategies implemented in TProactor All sync waiting strategies can be divided into two groups: Let us describe some common logical problems for those groups: [1] Douglas C. Schmidt, Stephen D. Huston "C++ Network Programming." 2002, Addison-Wesley ISBN 0-201-60464-7 [2] W. Richard Stevens "UNIX Network Programming" vol. 1 and 2, 1999, Prentice Hill, ISBN 0-13- 490012-X [3] Douglas C. Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann "Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, Volume 2" Wiley & Sons, NY 2000 [4] INFO: Socket Overlapped I/O Versus Blocking/Non-blocking Mode. Q181611. Microsoft Knowledge Base Articles. [5] Microsoft MSDN. I/O Completion Ports. [6] TProactor (ACE compatible Proactor). [7] JavaDoc java.nio.channels [8] JavaDoc Java.nio.channels.spi Class SelectorProvider [9] Linux AIO development See Also: Ian Barile "I/O Multiplexing & Scalable Socket Servers", 2004 February, DDJ Further reading on event handling The Adaptive Communication Environment Terabit Solutions Alex Libman has been programming for 15 years. During the past 5 years his main area of interest is pattern-oriented multiplatform networked programming using C++ and Java. He is big fan and contributor of ACE. Vlad Gilbourd works as a computer consultant, but wishes to spend more time listening jazz :) As a hobby, he started and runswww.corporatenews.com.au website.
November 25, 2005 read()
operation on a socket in blocking mode will not return control if the socket buffer is empty until some data becomes available.read()
operation on a socket in non-blocking mode may return the number of read bytes or a special return code -1 with errno set to EWOULBLOCK/EAGAIN
, meaning "not ready; try again later."ReadFile()
or POSIX aio_read()
API returns immediately and initiates an internal system read operation. Of the three approaches, this non-blocking asynchronous approach offers the best scalability and performance.Reactor and Proactor: two I/O multiplexing approaches
Current practice
Proposed solution
select()
);TProactor
select()
, poll()
, /dev/poll (Solaris 5.8+), port_get
(Solaris 5.10), RealTime (RT) signals (Linux 2.4+), epoll (Linux 2.6), k-queue (FreeBSD) APIs. TProactor conforms to the standard ACE Proactor implementation interface. That makes it possible to develop a single cross-platform solution (POSIX/MS-WINDOWS) with a common (ACE Proactor) interface.port_get()
-based engines is the most suitable choice, able to serve huge number of connections, but for another UNIX solution with a limited number of connections but high throughput requirements, aselect()
-based engine may be a better approach. Such flexibility cannot be achieved with a standard ACE Reactor/Proactor, due to inherent algorithmic problems of different wait strategies (see Appendix 2).Performance comparison (JAVA versus C++ versus C#).
select()
[7, 8]. Java TProactor is based on Java's non-blocking facilities (java.nio packages) logically similar to C++ TProactor with waiting strategy based on select()
.User code example
OpRead
with buffer where TProactor puts its read results, and OpWrite
with a buffer from which TProactor takes data. The developer will also need to implement protocol-specific logic via providing callbacks onReadCompleted()
and onWriteCompleted()
in the AsynchHandler
interface implementation. Those callbacks will be asynchronously called by TProactor on completion of read/write operations and executed on a thread pool space provided by TProactor (the developer doesn't need to write his own pool).IOHandler
is a TProactor base class. AsynchHandler
and Multiplexor, among other things, internally execute the wait strategy chosen by the developer.Conclusion
select()
-style API. K?It is true, Java NIO package is kind of Reactor pattern based on select()
-style API (see [7, 8]). Java NIO allows to write your own select()
-style provider (equivalent of TProactor waiting strategies). Looking at Java NIO implementation for Windows (to do this enough to examine import symbols in jdk1.5.0\jre\bin\nio.dll), we can make a conclusion that Java NIO 1.4.2 and 1.5.0 for Windows is based on WSAEventSelect () API. That is better than select()
, but slower than IOCompletionPortKs for significant number of connections. . Should the 1.5 version of Java's nio be based on IOCompletionPorts, then that should improve performance. If Java NIO would use IOCompletionPorts, than conversion of Proactor pattern to Reactor pattern should be made inside nio.dll. Although such conversion is more complicated than Reactor- >Proactor conversion, but it can be implemented in frames of Java NIO interfaces. (this the topic of next arcticle, but we can provide algorithm). At this time, no TProactor performance tests were done on JDK 1.5.Appendix I
Engine Type Wait Strategies Operating System POSIX_AIO (true async) aio_read()
/aio_write()
aio_suspend()
Waiting for RT signal
Callback functionPOSIX complained UNIX (not robust)
POSIX (not robust)
SGI IRIX, LINUX (not robust)SUN_AIO (true async) aio_read()
/aio_write()
aio_wait()
SUN (not robust) Emulated Async
Non-blocking read()
/write()
select()
poll()
/dev/poll
Linux RT signals
Kqueuegeneric POSIX
Mostly all POSIX implementations
SUN
Linux
FreeBSDAppendix II
select()
, poll()
, /dev/poll)—readiness at any time.Resources
http://msdn.microsoft.com/library/default.asp?url=/library/en- us/fileio/fs/i_o_completion_ports.asp
www.terabit.com.au
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/package-summary.html
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/spi/SelectorProvider.html
http://lse.sourceforge.net/io/aio.html, and
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf
- http://www.cs.wustl.edu/~schmidt/ACE-papers.html
http://www.cs.wustl.edu/~schmidt/ACE.html
http://terabit.com.au/solutions.phpAbout the authors
from:
http://www.artima.com/articles/io_design_patterns.html
]]>read()
operation on a socket in blocking mode will not return control if the socket buffer is empty until some data becomes available.read()
operation on a socket in non-blocking mode may return the number of read bytes or a special return code -1 with errno set to EWOULBLOCK/EAGAIN
, meaning "not ready; try again later."ReadFile()
or POSIX aio_read()
API returns immediately and initiates an internal system read operation. Of the three approaches, this non-blocking asynchronous approach offers the best scalability and performance.Current practice
Proposed solution
select()
);TProactor
select()
, poll()
, /dev/poll (Solaris 5.8+), port_get
(Solaris 5.10), RealTime (RT) signals (Linux 2.4+), epoll (Linux 2.6), k-queue (FreeBSD) APIs. TProactor conforms to the standard ACE Proactor implementation interface. That makes it possible to develop a single cross-platform solution (POSIX/MS-WINDOWS) with a common (ACE Proactor) interface.port_get()
-based engines is the most suitable choice, able to serve huge number of connections, but for another UNIX solution with a limited number of connections but high throughput requirements, a select()
-based engine may be a better approach. Such flexibility cannot be achieved with a standard ACE Reactor/Proactor, due to inherent algorithmic problems of different wait strategies (see Appendix 2).Performance comparison (JAVA versus C++ versus C#).
select()
[7, 8]. Java TProactor is based on Java's non-blocking facilities (java.nio packages) logically similar to C++ TProactor with waiting strategy based on select()
.User code example
OpRead
with buffer where TProactor puts its read results, and OpWrite
with a buffer from which TProactor takes data. The developer will also need to implement protocol-specific logic via providing callbacks onReadCompleted()
and onWriteCompleted()
in the AsynchHandler
interface implementation. Those callbacks will be asynchronously called by TProactor on completion of read/write operations and executed on a thread pool space provided by TProactor (the developer doesn't need to write his own pool).class EchoServerProtocol implements AsynchHandler
{
AsynchChannel achannel = null;
EchoServerProtocol( Demultiplexor m, SelectableChannel channel ) throws Exception
{
this.achannel = new AsynchChannel( m, this, channel );
}
public void start() throws Exception
{
// called after construction
System.out.println( Thread.currentThread().getName() + ": EchoServer protocol started" );
achannel.read( buffer);
}
public void onReadCompleted( OpRead opRead ) throws Exception
{
if ( opRead.getError() != null )
{
// handle error, do clean-up if needed
System.out.println( "EchoServer::readCompleted: " + opRead.getError().toString());
achannel.close();
return;
}
if ( opRead.getBytesCompleted () <= 0)
{
System.out.println( "EchoServer::readCompleted: Peer closed " + opRead.getBytesCompleted();
achannel.close();
return;
}
ByteBuffer buffer = opRead.getBuffer();
achannel.write(buffer);
}
public void onWriteCompleted(OpWrite opWrite) throws Exception
{
// logically similar to onReadCompleted
...
}
}
IOHandler
is a TProactor base class. AsynchHandler
and Multiplexor, among other things, internally execute the wait strategy chosen by the developer.Conclusion
select()
-style API. K?It is true, Java NIO package is kind of Reactor pattern based on select()
-style API (see [7, 8]). Java NIO allows to write your own select()
-style provider (equivalent of TProactor waiting strategies). Looking at Java NIO implementation for Windows (to do this enough to examine import symbols in jdk1.5.0\jre\bin\nio.dll), we can make a conclusion that Java NIO 1.4.2 and 1.5.0 for Windows is based on WSAEventSelect () API. That is better than select()
, but slower than IOCompletionPortKs for significant number of connections. . Should the 1.5 version of Java's nio be based on IOCompletionPorts, then that should improve performance. If Java NIO would use IOCompletionPorts, than conversion of Proactor pattern to Reactor pattern should be made inside nio.dll. Although such conversion is more complicated than Reactor- >Proactor conversion, but it can be implemented in frames of Java NIO interfaces. (this the topic of next arcticle, but we can provide algorithm). At this time, no TProactor performance tests were done on JDK 1.5.Appendix I
Engine Type Wait Strategies Operating System POSIX_AIO (true async) aio_read()
/aio_write()
aio_suspend()
Waiting for RT signal
Callback functionPOSIX complained UNIX (not robust)
POSIX (not robust)
SGI IRIX, LINUX (not robust)SUN_AIO (true async) aio_read()
/aio_write()
aio_wait()
SUN (not robust) Emulated Async
Non-blocking read()
/write()
select()
poll()
/dev/poll
Linux RT signals
Kqueuegeneric POSIX
Mostly all POSIX implementations
SUN
Linux
FreeBSDAppendix II
select()
, poll()
, /dev/poll)—readiness at any time.Resources
http://msdn.microsoft.com/library/default.asp?url=/library/en- us/fileio/fs/i_o_completion_ports.asp
www.terabit.com.au
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/package-summary.html
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/spi/SelectorProvider.html
http://lse.sourceforge.net/io/aio.html, and
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf
- http://www.cs.wustl.edu/~schmidt/ACE-papers.html
http://www.cs.wustl.edu/~schmidt/ACE.html
http://terabit.com.au/solutions.phpAbout the authors
from:
http://www.artima.com/articles/io_design_patterns.html
}QSSOverlapped;//for per connection
我下面的server框架的基本思想?
One connection VS one thread in worker thread pool ,worker thread performs completionWorkerRoutine.
A Acceptor thread 专门用来accept socket,兌至IOCP,qWSARecv:post Recv Completion Packet to IOCP.
在completionWorkerRoutine中有以下的职?
1.handle request,当忙时增加completionWorkerThread数量但不过maxThreads,post Recv Completion Packet to IOCP.
2.timeout时检查是否空闲和当前completionWorkerThread数量,当空闲时保持或减至minThreads数量.
3.Ҏ有Accepted-socket理生命周期,q里利用pȝ的keepalive probes,若想实现业务?心蟩探测"只需QSS_SIO_KEEPALIVE_VALS_TIMEOUT 改回pȝ默认?时.
下面l合源代?析一下IOCP:
socketserver.h
#ifndef __Q_SOCKET_SERVER__
#define __Q_SOCKET_SERVER__
#include <winsock2.h>
#include <mstcpip.h>
#define QSS_SIO_KEEPALIVE_VALS_TIMEOUT 30*60*1000
#define QSS_SIO_KEEPALIVE_VALS_INTERVAL 5*1000
#define MAX_THREADS 100
#define MAX_THREADS_MIN 10
#define MIN_WORKER_WAIT_TIMEOUT 20*1000
#define MAX_WORKER_WAIT_TIMEOUT 60*MIN_WORKER_WAIT_TIMEOUT
#define MAX_BUF_SIZE 1024
/*当Accepted socket和socket关闭或发生异常时回调CSocketLifecycleCallback*/
typedef void (*CSocketLifecycleCallback)(SOCKET cs,int lifecycle);//lifecycle:0:OnAccepted,-1:OnClose//注意OnClose此时的socket未必可用,可能已经被非正常关闭或其他异?
/*协议处理回调*/
typedef int (*InternalProtocolHandler)(LPWSAOVERLAPPED overlapped);//return -1:SOCKET_ERROR
typedef struct Q_SOCKET_SERVER SocketServer;
DWORD initializeSocketServer(SocketServer ** ssp,WORD passive,WORD port,CSocketLifecycleCallback cslifecb,InternalProtocolHandler protoHandler,WORD minThreads,WORD maxThreads,long workerWaitTimeout);
DWORD startSocketServer(SocketServer *ss);
DWORD shutdownSocketServer(SocketServer *ss);
#endif
qsocketserver.c U?qss,相应的OVERLAPPEDUqssOl.
#include "socketserver.h"
#include "stdio.h"
typedef struct {
WORD passive;//daemon
WORD port;
WORD minThreads;
WORD maxThreads;
volatile long lifecycleStatus;//0-created,1-starting, 2-running,3-stopping,4-exitKeyPosted,5-stopped
long workerWaitTimeout;//wait timeout
CRITICAL_SECTION QSS_LOCK;
volatile long workerCounter;
volatile long currentBusyWorkers;
volatile long CSocketsCounter;//Accepted-socket引用计数
CSocketLifecycleCallback cslifecb;
InternalProtocolHandler protoHandler;
WORD wsaVersion;//=MAKEWORD(2,0);
WSADATA wsData;
SOCKET server_s;
SOCKADDR_IN serv_addr;
HANDLE iocpHandle;
}QSocketServer;
typedef struct {
WSAOVERLAPPED overlapped;
SOCKET client_s;
SOCKADDR_IN client_addr;
WORD optCode;
char buf[MAX_BUF_SIZE];
WSABUF wsaBuf;
DWORD numberOfBytesTransferred;
DWORD flags;
}QSSOverlapped;
DWORD acceptorRoutine(LPVOID);
DWORD completionWorkerRoutine(LPVOID);
static void adjustQSSWorkerLimits(QSocketServer *qss){
/*adjust size and timeout.*/
/*if(qss->maxThreads <= 0) {
qss->maxThreads = MAX_THREADS;
} else if (qss->maxThreads < MAX_THREADS_MIN) {
qss->maxThreads = MAX_THREADS_MIN;
}
if(qss->minThreads > qss->maxThreads) {
qss->minThreads = qss->maxThreads;
}
if(qss->minThreads <= 0) {
if(1 == qss->maxThreads) {
qss->minThreads = 1;
} else {
qss->minThreads = qss->maxThreads/2;
}
}
if(qss->workerWaitTimeout<MIN_WORKER_WAIT_TIMEOUT)
qss->workerWaitTimeout=MIN_WORKER_WAIT_TIMEOUT;
if(qss->workerWaitTimeout>MAX_WORKER_WAIT_TIMEOUT)
qss->workerWaitTimeout=MAX_WORKER_WAIT_TIMEOUT; */
}
typedef struct{
QSocketServer * qss;
HANDLE th;
}QSSWORKER_PARAM;
static WORD addQSSWorker(QSocketServer *qss,WORD addCounter){
WORD res=0;
if(qss->workerCounter<qss->minThreads||(qss->currentBusyWorkers==qss->workerCounter&&qss->workerCounter<qss->maxThreads)){
DWORD threadId;
QSSWORKER_PARAM * pParam=NULL;
int i=0;
EnterCriticalSection(&qss->QSS_LOCK);
if(qss->workerCounter+addCounter<=qss->maxThreads)
for(;i<addCounter;i++)
{
pParam=malloc(sizeof(QSSWORKER_PARAM));
if(pParam){
pParam->th=CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)completionWorkerRoutine,pParam,CREATE_SUSPENDED,&threadId);
pParam->qss=qss;
ResumeThread(pParam->th);
qss->workerCounter++,res++;
}
}
LeaveCriticalSection(&qss->QSS_LOCK);
}
return res;
}
static void SOlogger(const char * msg,SOCKET s,int clearup){
perror(msg);
if(s>0)
closesocket(s);
if(clearup)
WSACleanup();
}
static int _InternalEchoProtocolHandler(LPWSAOVERLAPPED overlapped){
QSSOverlapped *qssOl=(QSSOverlapped *)overlapped;
printf("numOfT:%d,WSARecvd:%s,\n",qssOl->numberOfBytesTransferred,qssOl->buf);
//Sleep(500);
return send(qssOl->client_s,qssOl->buf,qssOl->numberOfBytesTransferred,0);
}
DWORD initializeSocketServer(SocketServer ** ssp,WORD passive,WORD port,CSocketLifecycleCallback cslifecb,InternalProtocolHandler protoHandler,WORD minThreads,WORD maxThreads,long workerWaitTimeout){
QSocketServer * qss=malloc(sizeof(QSocketServer));
qss->passive=passive>0?1:0;
qss->port=port;
qss->minThreads=minThreads;
qss->maxThreads=maxThreads;
qss->workerWaitTimeout=workerWaitTimeout;
qss->wsaVersion=MAKEWORD(2,0);
qss->lifecycleStatus=0;
InitializeCriticalSection(&qss->QSS_LOCK);
qss->workerCounter=0;
qss->currentBusyWorkers=0;
qss->CSocketsCounter=0;
qss->cslifecb=cslifecb,qss->protoHandler=protoHandler;
if(!qss->protoHandler)
qss->protoHandler=_InternalEchoProtocolHandler;
adjustQSSWorkerLimits(qss);
*ssp=(SocketServer *)qss;
return 1;
}
DWORD startSocketServer(SocketServer *ss){
QSocketServer * qss=(QSocketServer *)ss;
if(qss==NULL||InterlockedCompareExchange(&qss->lifecycleStatus,1,0))
return 0;
qss->serv_addr.sin_family=AF_INET;
qss->serv_addr.sin_port=htons(qss->port);
qss->serv_addr.sin_addr.s_addr=INADDR_ANY;//inet_addr("127.0.0.1");
if(WSAStartup(qss->wsaVersion,&qss->wsData)){
/*q里q有个插曲就是这个WSAStartup被调用的时?它居然会启动一条额外的U程,当然E后q条U程会自动退出的.不知WSAClearup又会如何?......*/
SOlogger("WSAStartup failed.\n",0,0);
return 0;
}
qss->server_s=socket(AF_INET,SOCK_STREAM,IPPROTO_IP);
if(qss->server_s==INVALID_SOCKET){
SOlogger("socket failed.\n",0,1);
return 0;
}
if(bind(qss->server_s,(LPSOCKADDR)&qss->serv_addr,sizeof(SOCKADDR_IN))==SOCKET_ERROR){
SOlogger("bind failed.\n",qss->server_s,1);
return 0;
}
if(listen(qss->server_s,SOMAXCONN)==SOCKET_ERROR)/*q里来谈?strong>backlog,很多Z知道设成何?我见到过1,5,50,100?有h说设定的大耗资?的确,q里设成SOMAXCONN不代表windows会真的用SOMAXCONN,而是" If set to SOMAXCONN, the underlying service provider responsible for socket s will set the backlog to a maximum reasonable value. "Q同时在现实环境中,不同操作pȝ支持TCP~冲队列有所不同Q所以还不如让操作系l来军_它的倹{像Apacheq种服务器:
#ifndef DEFAULT_LISTENBACKLOG
#define DEFAULT_LISTENBACKLOG 511
#endif
*/
{
SOlogger("listen failed.\n",qss->server_s,1);
return 0;
}
qss->iocpHandle=CreateIoCompletionPort(INVALID_HANDLE_VALUE,NULL,0,/*NumberOfConcurrentThreads-->*/qss->maxThreads);
//initialize worker for completion routine.
addQSSWorker(qss,qss->minThreads);
qss->lifecycleStatus=2;
{
QSSWORKER_PARAM * pParam=malloc(sizeof(QSSWORKER_PARAM));
pParam->qss=qss;
pParam->th=NULL;
if(qss->passive){
DWORD threadId;
pParam->th=CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)acceptorRoutine,pParam,0,&threadId);
}else
return acceptorRoutine(pParam);
}
return 1;
}
DWORD shutdownSocketServer(SocketServer *ss){
QSocketServer * qss=(QSocketServer *)ss;
if(qss==NULL||InterlockedCompareExchange(&qss->lifecycleStatus,3,2)!=2)
return 0;
closesocket(qss->server_s/*listen-socket*/);//..other accepted-sockets associated with the listen-socket will not be closed,except WSACleanup is called..
if(qss->CSocketsCounter==0)
qss->lifecycleStatus=4,PostQueuedCompletionStatus(qss->iocpHandle,0,-1,NULL);
WSACleanup();
return 1;
}
DWORD acceptorRoutine(LPVOID ss){
QSSWORKER_PARAM * pParam=(QSSWORKER_PARAM *)ss;
QSocketServer * qss=pParam->qss;
HANDLE curThread=pParam->th;
QSSOverlapped *qssOl=NULL;
SOCKADDR_IN client_addr;
int client_addr_leng=sizeof(SOCKADDR_IN);
SOCKET cs;
free(pParam);
while(1){
printf("accept starting.....\n");
cs/*Accepted-socket*/=accept(qss->server_s,(LPSOCKADDR)&client_addr,&client_addr_leng);
if(cs==INVALID_SOCKET)
{
printf("accept failed:%d\n",GetLastError());
break;
}else{//SO_KEEPALIVE,SIO_KEEPALIVE_VALS q里是利用系l的"心蟩探测",keepalive probes.linux:setsockopt,SOL_TCP:TCP_KEEPIDLE,TCP_KEEPINTVL,TCP_KEEPCNT
struct tcp_keepalive alive,aliveOut;
int so_keepalive_opt=1;
DWORD outDW;
if(!setsockopt(cs,SOL_SOCKET,SO_KEEPALIVE,(char *)&so_keepalive_opt,sizeof(so_keepalive_opt))){
alive.onoff=TRUE;
alive.keepalivetime=QSS_SIO_KEEPALIVE_VALS_TIMEOUT;
alive.keepaliveinterval=QSS_SIO_KEEPALIVE_VALS_INTERVAL;
if(WSAIoctl(cs,SIO_KEEPALIVE_VALS,&alive,sizeof(alive),&aliveOut,sizeof(aliveOut),&outDW,NULL,NULL)==SOCKET_ERROR){
printf("WSAIoctl SIO_KEEPALIVE_VALS failed:%d\n",GetLastError());
break;
}
}else{
printf("setsockopt SO_KEEPALIVE failed:%d\n",GetLastError());
break;
}
}
CreateIoCompletionPort((HANDLE)cs,qss->iocpHandle,cs,0);
if(qssOl==NULL){
qssOl=malloc(sizeof(QSSOverlapped));
}
qssOl->client_s=cs;
qssOl->wsaBuf.len=MAX_BUF_SIZE,qssOl->wsaBuf.buf=qssOl->buf,qssOl->numberOfBytesTransferred=0,qssOl->flags=0;//initialize WSABuf.
memset(&qssOl->overlapped,0,sizeof(WSAOVERLAPPED));
{
DWORD lastErr=GetLastError();
int ret=0;
SetLastError(0);
ret=WSARecv(cs,&qssOl->wsaBuf,1,&qssOl->numberOfBytesTransferred,&qssOl->flags,&qssOl->overlapped,NULL);
if(ret==0||(ret==SOCKET_ERROR&&GetLastError()==WSA_IO_PENDING)){
InterlockedIncrement(&qss->CSocketsCounter);//Accepted-socket计数递增.
if(qss->cslifecb)
qss->cslifecb(cs,0);
qssOl=NULL;
}
if(!GetLastError())
SetLastError(lastErr);
}
printf("accept flags:%d ,cs:%d.\n",GetLastError(),cs);
}//end while.
if(qssOl)
free(qssOl);
if(qss)
shutdownSocketServer((SocketServer *)qss);
if(curThread)
CloseHandle(curThread);
return 1;
}
static int postRecvCompletionPacket(QSSOverlapped * qssOl,int SOErrOccurredCode){
int SOErrOccurred=0;
DWORD lastErr=GetLastError();
SetLastError(0);
//SOCKET_ERROR:-1,WSA_IO_PENDING:997
if(WSARecv(qssOl->client_s,&qssOl->wsaBuf,1,&qssOl->numberOfBytesTransferred,&qssOl->flags,&qssOl->overlapped,NULL)==SOCKET_ERROR
&&GetLastError()!=WSA_IO_PENDING)//this case lastError maybe 64, 10054
{
SOErrOccurred=SOErrOccurredCode;
}
if(!GetLastError())
SetLastError(lastErr);
if(SOErrOccurred)
printf("worker[%d] postRecvCompletionPacket SOErrOccurred=%d,preErr:%d,postedErr:%d\n",GetCurrentThreadId(),SOErrOccurred,lastErr,GetLastError());
return SOErrOccurred;
}
DWORD completionWorkerRoutine(LPVOID ss){
QSSWORKER_PARAM * pParam=(QSSWORKER_PARAM *)ss;
QSocketServer * qss=pParam->qss;
HANDLE curThread=pParam->th;
QSSOverlapped * qssOl=NULL;
DWORD numberOfBytesTransferred=0;
ULONG_PTR completionKey=0;
int postRes=0,handleCode=0,exitCode=0,SOErrOccurred=0;
free(pParam);
while(!exitCode){
SetLastError(0);
if(GetQueuedCompletionStatus(qss->iocpHandle,&numberOfBytesTransferred,&completionKey,(LPOVERLAPPED *)&qssOl,qss->workerWaitTimeout)){
if(completionKey==-1&&qss->lifecycleStatus>=4)
{
printf("worker[%d] completionKey -1:%d \n",GetCurrentThreadId(),GetLastError());
if(qss->workerCounter>1)
PostQueuedCompletionStatus(qss->iocpHandle,0,-1,NULL);
exitCode=1;
break;
}
if(numberOfBytesTransferred>0){
InterlockedIncrement(&qss->currentBusyWorkers);
addQSSWorker(qss,1);
handleCode=qss->protoHandler((LPWSAOVERLAPPED)qssOl);
InterlockedDecrement(&qss->currentBusyWorkers);
if(handleCode>=0){
SOErrOccurred=postRecvCompletionPacket(qssOl,1);
}else
SOErrOccurred=2;
}else{
printf("worker[%d] numberOfBytesTransferred==0 ***** closesocket servS or cs *****,%d,%d ,ol is:%d\n",GetCurrentThreadId(),GetLastError(),completionKey,qssOl==NULL?0:1);
SOErrOccurred=3;
}
}else{ //GetQueuedCompletionStatus rtn FALSE, lastError 64 ,995[timeout worker thread exit.] ,WAIT_TIMEOUT:258
if(qssOl){
SOErrOccurred=postRecvCompletionPacket(qssOl,4);
}else {
printf("worker[%d] GetQueuedCompletionStatus F:%d \n",GetCurrentThreadId(),GetLastError());
if(GetLastError()!=WAIT_TIMEOUT){
exitCode=2;
}else{//wait timeout
if(qss->lifecycleStatus!=4&&qss->currentBusyWorkers==0&&qss->workerCounter>qss->minThreads){
EnterCriticalSection(&qss->QSS_LOCK);
if(qss->lifecycleStatus!=4&&qss->currentBusyWorkers==0&&qss->workerCounter>qss->minThreads){
qss->workerCounter--;//until qss->workerCounter decrease to qss->minThreads
exitCode=3;
}
LeaveCriticalSection(&qss->QSS_LOCK);
}
}
}
}//end GetQueuedCompletionStatus.
if(SOErrOccurred){
if(qss->cslifecb)
qss->cslifecb(qssOl->client_s,-1);
/*if(qssOl)*/{
closesocket(qssOl->client_s);
free(qssOl);
}
if(InterlockedDecrement(&qss->CSocketsCounter)==0&&qss->lifecycleStatus>=3){
//for qss workerSize,PostQueuedCompletionStatus -1
qss->lifecycleStatus=4,PostQueuedCompletionStatus(qss->iocpHandle,0,-1,NULL);
exitCode=4;
}
}
qssOl=NULL,numberOfBytesTransferred=0,completionKey=0,SOErrOccurred=0;//for net while.
}//end while.
//last to do
if(exitCode!=3){
int clearup=0;
EnterCriticalSection(&qss->QSS_LOCK);
if(!--qss->workerCounter&&qss->lifecycleStatus>=4){//clearup QSS
clearup=1;
}
LeaveCriticalSection(&qss->QSS_LOCK);
if(clearup){
DeleteCriticalSection(&qss->QSS_LOCK);
CloseHandle(qss->iocpHandle);
free(qss);
}
}
CloseHandle(curThread);
return 1;
}
------------------------------------------------------------------------------------------------------------------------
对于IOCP的LastError的L别和处理是个隄,所以请注意我的completionWorkerRoutine的whilel构,
l构如下:
while(!exitCode){
if(completionKey==-1){...break;}
if(GetQueuedCompletionStatus){/*在这个if体中只要你投递的OVERLAPPED is not NULL,那么q里你得到的是?/strong>.*/
if(numberOfBytesTransferred>0){
/*在这里handle request,记得要l投递你的OVERLAPPED? */
}else{
/*q里可能客户端或服务端closesocket(the socket),但是OVERLAPPED is not NULL,只要你投递的不ؓNULL!*/
}
}else{/*在这里的if体中,虽然GetQueuedCompletionStatus return FALSE,但是不代表OVERLAPPED一定ؓNULL.特别是OVERLAPPED is not NULL的情况下,不要以ؓLastError发生?׃表当前的socket无用或发生致命的异常,比如发生lastError:995q种情况下此时的socket有可能是一切正常的可用?你不应该关闭?/strong>.*/
if(OVERLAPPED is not NULL){
/*q种情况?请不?7,21l箋投递吧!在投递后再检错?/strong>.*/
}else{
}
}
if(socket error occured){
}
prepare for next while.
}
行文仓促,隑օ有错误或不之处,希望大家t跃指正评论,谢谢!
q个模型在性能上还是有改进的空间哦Q?/strong>
from:
http://www.shnenglu.com/adapterofcoms/archive/2010/06/26/118781.aspx
TCP
的三ơ握?/strong>是怎么q行的了Q发送端发送一个SYN=1QACK=0标志的数据包l接收端Q请求进行连接,q是W一ơ握手;接收端收到请
求ƈ且允许连接的话,׃发送一个SYN=1QACK=1标志的数据包l发送端Q告诉它Q可以通讯了,q且让发送端发送一个确认数据包Q这是第二次握手Q?
最后,发送端发送一个SYN=0QACK=1的数据包l接收端Q告诉它q接已被认Q这是W三ơ握手。之后,一个TCPq接建立Q开始通讯?/p>
*SYNQ同步标?br style="line-height: normal;">同步序列~号(Synchronize Sequence
Numbers)栏有效。该标志仅在三次握手建立TCPq接时有效。它提示TCPq接的服务端查序列编P该序列编号ؓTCPq接初始?一般是客户
?的初始序列编受在q里Q可以把TCP序列~号看作是一个范围从0?Q?94Q?67Q?95?2位计数器。通过TCPq接交换的数据中每一个字
节都l过序列~号。在TCP报头中的序列~号栏包括了TCP分段中第一个字节的序列~号?/p>
*ACKQ确认标?br style="line-height: normal;">认~号(Acknowledgement
Number)栏有效。大多数情况下该标志位是|位的。TCP报头内的认~号栏内包含的确认编?w+1QFigure-1)Z一个预期的序列~号Q?
同时提示q端pȝ已经成功接收所有数据?/p>
*RSTQ复位标?br style="line-height: normal;">复位标志有效。用于复位相应的TCPq接?/p>
*URGQ紧急标?br style="line-height: normal;">紧?The urgent pointer) 标志有效。紧急标志置位, *PSHQ推标志 *FINQ结束标?br style="line-height: normal;">带有该标志置位的数据包用来结束一个TCP回话Q但对应端口仍处于开攄态,准备接收后箋数据?/p>
=============================================================
?
标志|位Ӟ接收端不该数据q行队列处理Q而是可能快数据{由应用处理。在处理 telnet ?rlogin
{交互模式的q接Ӟ该标志L|位的?/p>
tcp协议?
w是可靠?q不{于应用E序用tcp发送数据就一定是可靠?不管是否d,send发送的大小,q不代表对端recv到多的数据. socklen_t
sendbuflen = 0;
?span style="line-height: normal; color: #ff0000;">d模式?
send函数的过E是应用程序请求发送的数据拯到发送缓存中发送ƈ得到认后再q回.但由于发送缓存的存在,表现?如果发送缓存大比h发送的?
要?那么send函数立即q回,同时向网l中发送数?否则,send向网l发送缓存中不能容纳的那部分数据,q等待对端确认后再返?接收端只要将
数据收到接收~存?׃认,q不一定要{待应用E序调用recv);
?/span>非阻塞模?/span>?send函数的过E仅仅是数据拷
贝到协议栈的~存已,如果~存区可用空间不?则尽能力的拷?q回成功拯的大?如缓存区可用I间?,则返?1,同时讄errno?
EAGAIN.
linux下可?span style="line-height: normal; color: #cc3333;">sysctl -a | grep net.ipv4.tcp_wmem查看pȝ?
认的发送缓存大?
net.ipv4.tcp_wmem = 4096 16384 81920
q?
有三个?W一个值是socket的发送缓存区分配的最字节数,W二个值是默认?该g被net.core.wmem_default覆盖),~存?
在系l负载不重的情况下可以增长到q个?W三个值是发送缓存区I间的最大字节数(该g被net.core.wmem_max覆盖).
Ҏ实际试,
如果手工更改了net.ipv4.tcp_wmem的?则会按更改的值来q行,否则在默认情况下,协议栈通常是按
net.core.wmem_default和net.core.wmem_max的值来分配内存?
应用E序应该Ҏ应用的特性在E序中更改发送缓存大?
socklen_t len = sizeof(sendbuflen);
getsockopt(clientSocket,
SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("default,sendbuf:%d\n",
sendbuflen);
sendbuflen
= 10240;
setsockopt(clientSocket, SOL_SOCKET,
SO_SNDBUF, (void*)&sendbuflen, len);
getsockopt(clientSocket,
SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("now,sendbuf:%d\n",
sendbuflen);
需要注意的?虽然发送缓存设|?
成了10k,但实际上,协议栈会其扩大1?设ؓ20k.
-------------------?
例分?---------------------
?
实际应用?如果发送端是非d发?׃|络的阻塞或者接收端处理q慢,通常出现的情冉|,发送应用程序看h发送了10k的数?但是只发送了2k?
对端~存?q有8k在本机缓存中(未发送或者未得到接收端的认).那么此时,接收应用E序能够收到的数据ؓ2k.假如接收应用E序调用recv函数?
取了1k的数据在处理,在这个瞬?发生了以下情况之一,双方表现?
A. 发送应用程序认为send完了10k数据,关闭了socket:
?
送主Z为tcp的主动关闭?q接处于FIN_WAIT1的半关闭状?{待Ҏ的ack),q且,发送缓存中?k数据q不清除,依然会发送给?
?如果接收应用E序依然在recv,那么它会收到余下?k数据(q个前题?接收端会在发送端FIN_WAIT1状态超时前收到余下?k数据.),
然后得到一个对端socket被关闭的消息(recvq回0).q时,应该q行关闭.
B. 发送应用程序再ơ调用send发?k的数?
?
如发送缓存的I间?0k,那么发送缓存可用空间ؓ20-8=12k,大于h发送的8k,所以send函数数据做拯?q立卌?192;
?
如发
送缓存的I间?2k,那么此时发送缓存可用空间还?2-8=4k,send()会返?096,应用E序发现q回的值小于请求发送的大小值后,可以?
为缓存区已满,q时必须d(或通过select{待下一ơsocket可写的信?,如果应用E序不理?立即再次调用send,那么会得?1的?
在linux下表Cؓerrno=EAGAIN.
C. 接收应用E序在处理完1k数据?关闭了socket:
?
收主ZZ动关闭?q接处于FIN_WAIT1的半关闭状?{待Ҏ的ack).然后,发送应用程序会收到socket可读的信?通常?
select调用q回socket可读),但在d时会发现recv函数q回0,q时应该调用close函数来关闭socket(发送给Ҏack);
?
果发送应用程序没有处理这个可ȝ信号,而是在send,那么q要分两U情冉|考虑,假如是在发送端收到RST标志之后调用send,send返?
-1,同时errno设ؓECONNRESET表示对端|络已断开,但是,也有说法是进E会收到SIGPIPE信号,
该信L默认响应动作是退E?如果忽略该信?那么send是返?1,errno为EPIPE(未证?;如果是在发送端收到RST标志之前,则send像往怸样工?
以上说的是非d?
send情况,假如send是阻塞调?q且正好处于d?例如一ơ性发送一个巨大的buf,出了发送缓?,对端socket关闭,那么send?
q回成功发送的字节?如果再次调用send,那么会同上一?
D. 交换机或路由器的|络断开:
接收应用E序在处理完已收到的1k数据?会l从~存
取余下的1k数据,然后pCؓ无数据可ȝ现象,q种情况需要应用程序来处理时.一般做法是讑֮一个select{待的最大时?如果出q个旉?
然没有数据可?则认为socket已不可用.
?
送应用程序会不断的将余下的数据发送到|络?但始l得不到认,所以缓存区的可用空间持lؓ0,q种情况也需要应用程序来处理.
如果不由应用E序来处理这U情况超时的情况,也可以通过tcp协议本n来处?具体可以?
看sysctl中?
net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_keepalive_time
原文地址 http://xufish.blogbus.com/logs/40537344.html
大家都很熟悉HTTP协议的应用,因ؓ每天都在|络上浏览着不少东西Q也都知道是HTTP协议是相当简单的。每ơ用
thunder之类的下载Y件下载网,当用到那?#8220;用thunder下蝲全部链接”时总觉得很奇?br>
后来xQ其实要实现q些下蝲功能也ƈ不难Q只要按照HTTP协议发送requestQ然后对接收到的数据q行分析Q如果页面上q有href之类的链接指
向标志就可以q行׃层的下蝲了。HTTP协议目前用的最多的?.1
版本Q要全面透彻地搞懂它参考RFC2616文档吧。我是怕rfc文档了的,要看自己ȝ吧^_^
源代码如下:
/******* http客户端程?httpclient.c
************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <errno.h>
#include <unistd.h>
#include <netinet/in.h>
#include <limits.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <ctype.h>
//////////////////////////////httpclient.c
开?//////////////////////////////////////////
/********************************************
功能Q搜索字W串双LW一个匹配字W?br>
********************************************/
char * Rstrchr(char * s, char x) {
int i = strlen(s);
if(!(*s)) return 0;
while(s[i-1]) if(strchr(s + (i - 1), x)) return (s + (i - 1)); else i--;
return 0;
}
/********************************************
功能Q把字符串{换ؓ全小?br>
********************************************/
void ToLowerCase(char * s) {
while(s && *s) {*s=tolower(*s);s++;}
}
/**************************************************************
功能Q从字符串src中分析出|站地址和端口,q得到用戯下蝲的文?br>
***************************************************************/
void GetHost(char * src, char * web, char * file, int * port) {
char * pA;
char * pB;
memset(web,
0, sizeof(web));
memset(file, 0, sizeof(file));
*port = 0;
if(!(*src)) return;
pA = src;
if(!strncmp(pA, "http://", strlen("http://"))) pA = src+strlen("http://");
else if(!strncmp(pA, "https://", strlen("https://"))) pA = src+strlen("https://");
pB = strchr(pA, '/');
if(pB)
{
memcpy(web, pA, strlen(pA) - strlen(pB));
if(pB+1) {
memcpy(file, pB + 1, strlen(pB) - 1);
file[strlen(pB) - 1] = 0;
}
}
else memcpy(web, pA, strlen(pA));
if(pB)
web[strlen(pA) - strlen(pB)] = 0;
else web[strlen(pA)] = 0;
pA = strchr(web, ':');
if(pA)
*port = atoi(pA + 1);
else *port =
80;
}
int main(int
argc, char *argv[])
{
int sockfd;
char buffer[1024];
struct sockaddr_in server_addr;
struct hostent *host;
int portnumber,nbytes;
char host_addr[256];
char host_file[1024];
char local_file[256];
FILE * fp;
char request[1024];
int send,
totalsend;
int i;
char * pt;
if(argc!=2)
{
fprintf(stderr,"Usage:%s web-address\a\n",argv[0]);
exit(1);
}
printf("parameter.1
is: %s\n", argv[1]);
ToLowerCase(argv[1]);/*参数{换ؓ全小?/
printf("lowercase
parameter.1 is: %s\n",
argv[1]);
GetHost(argv[1], host_addr, host_file, &portnumber);/*分析|址、端口、文件名{?/
printf("webhost:%s\n", host_addr);
printf("hostfile:%s\n", host_file);
printf("portnumber:%d\n\n", portnumber);
if((host=gethostbyname(host_addr))==NULL)/*取得LIP地址*/
{
fprintf(stderr,"Gethostname error, %s\n", strerror(errno));
exit(1);
}
/* 客户E序开始徏?sockfd描述W?*/
if((sockfd=socket(AF_INET,SOCK_STREAM,0))==-1)/*建立SOCKETq接*/
{
fprintf(stderr,"Socket Error:%s\a\n",strerror(errno));
exit(1);
}
/* 客户E序填充服务端的资料 */
bzero(&server_addr,sizeof(server_addr));
server_addr.sin_family=AF_INET;
server_addr.sin_port=htons(portnumber);
server_addr.sin_addr=*((struct in_addr
*)host->h_addr);
/* 客户E序发vq接h */
if(connect(sockfd,(struct sockaddr *)(&server_addr),sizeof(struct sockaddr))==-1)/*q接|站*/
{
fprintf(stderr,"Connect Error:%s\a\n",strerror(errno));
exit(1);
}
sprintf(request,
"GET /%s HTTP/1.1\r\nAccept:
*/*\r\nAccept-Language: zh-cn\r\n\
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)\r\n\
Host: %s:%d\r\nConnection: Close\r\n\r\n", host_file,
host_addr, portnumber);
printf("%s", request);/*准备requestQ将要发送给L*/
/*取得真实的文件名*/
if(host_file && *host_file)
pt = Rstrchr(host_file, '/');
else pt = 0;
memset(local_file,
0, sizeof(local_file));
if(pt && *pt) {
if((pt
+ 1) && *(pt+1)) strcpy(local_file,
pt + 1);
else memcpy(local_file,
host_file, strlen(host_file)
- 1);
}
else if(host_file
&& *host_file) strcpy(local_file, host_file);
else strcpy(local_file, "index.html");
printf("local
filename to write:%s\n\n",
local_file);
/*发送httphrequest*/
send = 0;totalsend
= 0;
nbytes=strlen(request);
while(totalsend <
nbytes) {
send = write(sockfd, request +
totalsend, nbytes - totalsend);
if(send==-1) {printf("send error!%s\n",
strerror(errno));exit(0);}
totalsend+=send;
printf("%d bytes send OK!\n",
totalsend);
}
fp = fopen(local_file, "a");
if(!fp) {
printf("create file error! %s\n", strerror(errno));
return 0;
}
printf("\nThe
following is the response header:\n");
i=0;
/* q接成功了,接收http响应Qresponse */
while((nbytes=read(sockfd,buffer,1))==1)
{
if(i <
4) {
if(buffer[0] == '\r' || buffer[0] == '\n') i++;
else i = 0;
printf("%c", buffer[0]);/*把http头信息打印在屏幕?/
}
else {
fwrite(buffer, 1, 1, fp);/*httpM信息写入文g*/
i++;
if(i%1024
== 0) fflush(fp);/*?K时存盘一?/
}
}
fclose(fp);
/* l束通讯 */
close(sockfd);
exit(0);
}
zj@zj:~/C_pram/practice/http_client$
ls
httpclient httpclient.c
zj@zj:~/C_pram/practice/http_client$
./httpclient http://www.baidu.com/
parameter.1 is:
http://www.baidu.com/
lowercase parameter.1 is: http://www.baidu.com/
webhost:www.baidu.com
hostfile:
portnumber:80
GET
/ HTTP/1.1
Accept: */*
Accept-Language: zh-cn
User-Agent:
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Host:
www.baidu.com:80
Connection: Close
local filename to
write:index.html
163 bytes send OK!
The following is the
response header:
HTTP/1.1 200 OK
Date: Wed, 29 Oct 2008 10:41:40
GMT
Server: BWS/1.0
Content-Length: 4216
Content-Type:
text/html
Cache-Control: private
Expires: Wed, 29 Oct 2008
10:41:40 GMT
Set-Cookie:
BAIDUID=A93059C8DDF7F1BC47C10CAF9779030E:FG=1; expires=Wed, 29-Oct-38
10:41:40 GMT; path=/; domain=.baidu.com
P3P: CP=" OTI DSP COR IVA OUR
IND COM "
zj@zj:~/C_pram/practice/http_client$ ls
httpclient
httpclient.c index.html
不指定文件名字的?默认是下蝲|站默认的首了^_^.
#include <stdio.h>
#include
<stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#define HTTPPORT 80
char* head =
"GET /u2/76292/ HTTP/1.1\r\n"
"Accept: */*\r\n"
"Accept-Language: zh-cn\r\n"
"Accept-Encoding: gzip, deflate\r\n"
"User-Agent: Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.1; SV1; CIBA; TheWorld)\r\n"
"Host:blog.chinaunix.net\r\n"
"Connection: Keep-Alive\r\n\r\n";
int connect_URL(char *domain,int port)
{
int
sock;
struct hostent *
host;
struct sockaddr_in server;
host =
gethostbyname(domain);
if (host == NULL)
{
printf("gethostbyname error\n");
return
-2;
}
// printf("HostName:
%s\n",host->h_name);
// printf("IP Address: %s\n",inet_ntoa(*((struct in_addr
*)host->h_addr)));
sock = socket(AF_INET,SOCK_STREAM,0);
if (sock
< 0)
{
printf("invalid socket\n");
return -1;
}
memset(&server,0,sizeof(struct sockaddr_in));
memcpy(&server.sin_addr,host->h_addr_list[0],host->h_length);
server.sin_family = AF_INET;
server.sin_port = htons(port);
return (connect(sock,(struct sockaddr *)&server,sizeof(struct sockaddr)) <0) ? -1 : sock;
}
int main()
{
int
sock;
char buf[100];
char *domain
= "blog.chinaunix.net";
fp = fopen("test.txt","rb");
if(NULL == fp){
printf("can't open stockcode file!\n");
return
-1;
}
sock
= connect_URL(domain,HTTPPORT);
if (sock
<0){
printf("connetc err\n");
return
-1;
}
send(sock,head,strlen(head),0);
while(1)
{
if((recv(sock,buf,100,0))<1)
break;
fprintf(fp,"%s",bufp); //save http data
}
fclose(fp);
close(sock);
printf("bye!\n");
return 0;
}
我这里是保存数据到本地硬?可以在这个的基础上修?head头的定义可以自己使用wireshark抓包来看
一个httph的详l过E?/font>
我们来看当我们在览器输?/font>http://www.mycompany.com:8080/mydir/index.html,q后所发生的一切?/font>
首先http是一个应用层的协议,在这个层的协议,只是一U通讯规范Q也是因ؓ双方要进行通讯Q大家要事先U定一个规范?/font>
1.q接 当我们输入这样一个请求时Q首先要建立一个socketq接Q因为socket是通过ip和端口徏立的Q所以之前还有一个DNS解析q程Q把www.mycompany.com变成ipQ如果url里不包含端口P则会使用该协议的默认端口受?/font>
DNS的过E是q样的:首先我们知道我们本地的机器上在配|网l时都会填写DNSQ这h机就会把q个url发给q个配置的DNS服务器,如果能够扑ֈ相应的url则返回其ipQ否则该DNSl将该解析请求发送给上DNSQ整个DNS可以看做是一个树状结构,该请求将一直发送到根直到得到结果。现在已l拥有了目标ip和端口号Q这h们就可以打开socketq接了?/font>
2.h q接成功建立后,开始向web服务器发送请求,q个h一般是GET或POST命oQPOST用于FORM参数的传递)。GET命o的格式ؓQ GET 路径/文g?HTTP/1.0
文g名指出所讉K的文ӞHTTP/1.0指出Web览器用的HTTP版本。现在可以发送GET命oQ?/font>
GET /mydir/index.html HTTP/1.0Q?/font>
3.应答 web服务器收到这个请求,q行处理。从它的文档I间中搜索子目录mydir的文件index.html。如果找到该文gQWeb服务器把该文件内容传送给相应的Web览器?/font>
Z告知览器,QWeb服务器首先传送一些HTTP头信息,然后传送具体内容(即HTTP体信息)QHTTP头信息和HTTP体信息之间用一个空行分开?br>常用的HTTP头信息有Q?br> ?HTTP 1.0 200 OK q是Web服务器应{的W一行,列出服务器正在运行的HTTP版本号和应答代码。代?200 OK"表示h完成?br> ?MIME_Version:1.0 它指CMIMEcd的版本?br> ?content_type:cd q个头信息非帔R要,它指CHTTP体信息的MIMEcd。如Qcontent_type:text/html指示传送的数据是HTML文档?br> ?content_length:长度倹{它指CHTTP体信息的长度Q字节)?/font>
4.关闭q接Q当应答l束后,Web览器与Web服务器必L开Q以保证其它Web览器能够与Web服务器徏立连接?/font>
下面我们具体分析其中的数据包在网l中漫游的经?/font>
在网l分层结构中Q各层之间是严格单向依赖的?#8220;服务”是描q各层之间关pȝ抽象概念Q即|络中各层向紧邻上层提供的一l操作。下层是服务提供者,上层是请求服务的用户。服务的表现形式是原语(primitiveQ,如系l调用或库函数。系l调用是操作pȝ内核向网l应用程序或高层协议提供的服务原语。网l中的n层总要向n+1层提供比n-1层更完备的服务,否则n层就没有存在的h倹{?
传输层实现的?#8220;端到?#8221;通信Q引q网间进E通信概念Q同时也要解军_错控Ӟ量控制Q数据排序(报文排序Q,q接理{问题,为此提供不同的服务方式。通常传输层的服务通过pȝ调用的方式提供,以socket的方式。对于客LQ要惛_立一个socketq接Q需要调用这样一些函数socket() bind() connect(),然后可以通过send()q行数据发送?/font>
现在看数据包在网l中的穿行过E:
应用?/font>
首先我们可以看到在应用层Q根据当前的需求和动作Q结合应用层的协议,有我们确定发送的数据内容Q我们把q些数据攑ֈ一个缓冲区内,然后形成了应用层的报?strong>da
传输?/font>
q些数据通过传输层发送,比如tcp协议。所以它们会被送到传输层处理,在这里报文打上了传输头的包头Q主要包含端口号Q以及tcp的各U制信息Q这些信息是直接得到的,因ؓ接口中需要指定端口。这样就l成了tcp的数据传送单?strong>segment。tcp是一U端到端的协议,利用q些信息Q比如tcp首部中的序号认序号Q根据这些数字,发送的一方不断的q行发送等待确认,发送一个数据段后,会开启一个计数器Q只有当收到认后才会发送下一个,如果过计数旉仍未收到认则进行重发,在接受端如果收到错误数据Q则其丢弃Q这导致发送端时重发。通过tcp协议Q控制了数据包的发送序列的产生Q不断的调整发送序列,实现控和数据完整?/font>
|络?/font>
然后待发送的数据D送到|络层,在网l层被打包,q样装上了|络层的包头Q包头内部含有源及目的的ip地址Q该层数据发送单位被UCؓpacket。网l层开始负责将q样的数据包在网l上传输Q如何穿q\由器Q最l到辄的地址。在q里Q根据目的ip地址Q就需要查找下一跌\q地址。首先在本机Q要查找本机的\pQ在windows上运行route print可以看到当前\p内容Q有如下几项Q?br>Active Routes Default Route Persistent Route.
整个查找q程是这L:
(1)Ҏ目的地址Q得到目的网l号Q如果处在同一个内|,则可以直接发送?br>(2)如果不是Q则查询路由表,扑ֈ一个\由?br>(3)如果找不到明的路由Q此时在路由表中q会有默认网养I也可UCؓ~省|关QIP用缺省的|关地址一个数据传送给下一个指定的路由器,所以网关也可能是\由器Q也可能只是内网向特定\由器传输数据的网兟?br>(4)路由器收到数据后Q它再次E主机或|络查询路由Q若q未扑ֈ路由Q该数据包将发送到该\由器的缺省网兛_址。而数据包中包含一个最大\p敎ͼ如果过q个xQ就会丢弃数据包Q这样可以防止无限传递。\由器收到数据包后Q只会查看网l层的包Ҏ据,目的ip。所以说它是工作在网l层Q传输层的数据对它来说则是透明的?/font>
如果上面q些步骤都没有成功,那么该数据报׃能被传送。如果不能传送的数据报来自本机,那么一般会向生成数据报的应用程序返回一?#8220;L不可?#8221;?“|络不可?#8221;的错误?/font>
以windows下主机的路由表ؓ例,看\q查找q程
======================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 192.168.1.2 192.168.1.101 10
127.0.0.0 255.0.0.0 127.0.0.1 127.0.0.1 1
192.168.1.0 255.255.255.0 192.168.1.101 192.168.1.101 10
192.168.1.101 255.255.255.255 127.0.0.1 127.0.0.1 10
192.168.1.255 255.255.255.255 192.168.1.101 192.168.1.101 10
224.0.0.0 240.0.0.0 192.168.1.101 192.168.1.101 10
255.255.255.255 255.255.255.255 192.168.1.101 192.168.1.101 1
Default Gateway: 192.168.1.2
Network Destination 目的|段
Netmask 子网掩码
Gateway 下一跌\由器入口的ipQ\由器通过interface和gateway定义一调到下一个\由器的链路,通常情况下,interface和gateway是同一|段的?br>Interface 到达该目的地的本路由器的出口ipQ对于我们的个hpc来说Q通常由机机A的网卡,用该|卡的IP地址标识Q当然一个pc也可以有多个|卡Q?/font>
|关q个概念Q主要用于不同子|间的交互,当两个子|内LA,B要进行通讯Ӟ首先A要将数据发送到它的本地|关Q然后网兛_数据发送给B所在的|关Q然后网兛_发送给B?br>默认|关Q当一个数据包的目的网D不在你的\p录中Q那么,你的路由器该把那个数据包发送到哪里Q缺省\q|关是由你的q接上的default gateway军_的,也就是我们通常在网l连接里配置的那个倹{?/font>
通常interface和gateway处在一个子|内Q对于\由器来说Q因为可能具有不同的interface,当数据包到达ӞҎNetwork DestinationL匚w的条目,如果扑ֈQinterface则指明了应当从该路由器的那个接口出去Qgateway则代表了那个子网的网兛_址?/font>
W一?nbsp; 0.0.0.0 0.0.0.0 192.168.1.2 192.168.1.101 10
0.0.0.0代表了缺省\由。该路由记录的意思是Q当我接收到一个数据包的目的网D不在我的\p录中Q我会将该数据包通过192.168.1.101q个接口发送到192.168.1.2q个地址Q这个地址是下一个\由器的一个接口,q样q个数据包就可以交付l下一个\由器处理Q与我无兟뀂该路由记录的线路质?10。当有多个条目匹配时Q会选择h较小Metric值的那个?/font>
W三?nbsp; 192.168.1.0 255.255.255.0 192.168.1.101 192.168.1.101 10
直联|段的\p录:当\由器收到发往直联|段的数据包时该如何处理Q这U情况,路由记录的interface和gateway是同一个。当我接收到一个数据包的目的网D|192.168.1.0Ӟ我会该数据包通过192.168.1.101q个接口直接发送出去,因ؓq个端口直接q接着192.168.1.0q个|段Q该路由记录的线路质?10 Q因interface和gateway是同一个,表示数据包直接传送给目的地址Q不需要再转给路由器)?/font>
一般就分这两种情况Q目的地址与当前\由器接口是否在同一子网。如果是则直接发送,不需再{l\由器Q否则还需要{发给下一个\由器l箋q行处理?/font>
查找C一跳ip地址后,q需要知道它的mac地址Q这个地址要作为链路层数据装进链\层头部。这旉要arp协议Q具体过E是q样的,查找arp~冲Qwindows下运行arp -a可以查看当前arp~冲内容。如果里面含有对应ip的mac地址Q则直接q回。否则需要发生arphQ该h包含源的ip和mac地址Q还有目的地的ip地址Q在|内q行q播Q所有的L会检查自qip与该h中的目的ip是否一P如果刚好对应则返回自qmac地址Q同时将h者的ip mac保存。这样就得到了目标ip的mac地址?/font>
链\?/font>
mac地址及链路层控制信息加到数据包里QŞ?strong>FrameQFrame在链路层协议下,完成了相ȝ节点间的数据传输Q完成连接徏立,控制传输速度Q数据完整?/font>
物理?/font>
物理U\则只负责该数据以bit为单位从L传输C一个目的地?/font>
下一个目的地接受到数据后Q从物理层得到数据然后经q逐层的解??链\??|络层,然后开始上q的处理Q在l网l层 链\?物理层将数据装好l传往下一个地址?/font>
在上面的q程中,可以看到有一个\p查询q程Q而这个\p的徏立则依赖于\q法。也是说\q法实际上只是用来路由器之间更新维护\pQ真正的数据传输q程q不执行q个法Q只查看路由表。这个概念也很重要,需要理解常用的路由法。而整个tcp协议比较复杂Q跟链\层的协议有些怼Q其中有很重要的一些机制或者概念需要认真理解,比如~号与确认,量控制Q重发机Ӟ发送接受窗口?/font>
tcp/ip基本模型及概?/font>
物理?/font>
讑֤Q中l器QrepeaterQ?集线器(hubQ。对于这一层来_从一个端口收到数据,会{发到所有端口?/font>
链\?/font>
协议QSDLCQSynchronous Da
因ؓ有了MAC地址表,所以才充分避免了冲H,因ؓ交换机通过目的MAC地址知道应该把这个数据{发到哪个端口。而不会像HUB一P会{发到所有滴端口。所以,交换机是可以划分冲突域滴?/font>
|络?/font>
四个主要的协?
|际协议IPQ负责在L和网l之间寻址和\由数据包?nbsp;
地址解析协议ARPQ获得同一物理|络中的gL地址?nbsp;
|际控制消息协议ICMPQ发送消息,q报告有x据包的传送错误?nbsp;
互联l管理协议IGMPQ被IPL拿来向本地多路广播\由器报告Ll成员?/font>
该层讑֤有三层交换机Q\由器?/font>
传输?/font>
两个重要协议 TCP ?UDP ?/font>
端口概念QTCP/UDP 使用 IP 地址标识|上LQ用端口号来标识应用进E,?TCP/UDP 用主?IP 地址和ؓ应用q程分配的端口号来标识应用进E。端口号?16 位的无符h敎ͼ TCP 的端口号?UDP 的端口号是两个独立的序列。尽相互独立,如果 TCP ?UDP 同时提供某种知名服务Q两个协议通常选择相同的端口号。这Ua是ؓ了用方便,而不是协议本w的要求。利用端口号Q一CZ多个q程可以同时使用 TCP/UDP 提供的传输服务,q且q种通信是端到端的,它的数据?IP 传递,但与 IP 数据报的传递\径无兟뀂网l通信中用一个三元组可以在全局唯一标志一个应用进E:Q协议,本地地址Q本地端口号Q?/font>
也就是说tcp和udp可以使用相同的端口?/font>
可以看到通过(协议,源端口,源ipQ目的端口,目的ip)可以用来完全标识一l网l连接?/font>
应用?/font>
ZtcpQTelnet FTP SMTP DNS HTTP
ZudpQRIP NTPQ网落时间协议)和DNS QDNS也用TCPQSNMP TFTP
参考文献:
L本机路由?http://hi.baidu.com/thusness/blog/item/9c18e5bf33725f0818d81f52.html
Internet 传输层协?http://www.cic.tsinghua.edu.cn/jdx/book6/3.htm 计算机网l?谢希?/font>