??xml version="1.0" encoding="utf-8" standalone="yes"?>国产精品久久久久久久久久影院,欧美国产精品久久高清,久久国产成人亚洲精品影院http://www.shnenglu.com/beautykingdom/category/12134.htmlzh-cnTue, 22 May 2012 05:14:40 GMTTue, 22 May 2012 05:14:40 GMT60Comparing Two High-Performance I/O Design Patterns<forward>http://www.shnenglu.com/beautykingdom/archive/2012/05/21/175576.htmlchatlerchatlerMon, 21 May 2012 03:24:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2012/05/21/175576.htmlhttp://www.shnenglu.com/beautykingdom/comments/175576.htmlhttp://www.shnenglu.com/beautykingdom/archive/2012/05/21/175576.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/175576.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/175576.htmlby Alexander Libman with Vladimir Gilbourd
November 25, 2005 

Summary
This article investigates and compares different design patterns of high performance TCP-based servers. In addition to existing approaches, it proposes a scalable single-codebase, multi-platform solution (with code examples) and describes its fine-tuning on different platforms. It also compares performance of Java, C# and C++ implementations of proposed and existing solutions.

System I/O can be blocking, or non-blocking synchronous, or non-blocking asynchronous [12]. Blocking I/O means that the calling system does not return control to the caller until the operation is finished. As a result, the caller is blocked and cannot perform other activities during that time. Most important, the caller thread cannot be reused for other request processing while waiting for the I/O to complete, and becomes a wasted resource during that time. For example, aread() operation on a socket in blocking mode will not return control if the socket buffer is empty until some data becomes available.

By contrast, a non-blocking synchronous call returns control to the caller immediately. The caller is not made to wait, and the invoked system immediately returns one of two responses: If the call was executed and the results are ready, then the caller is told of that. Alternatively, the invoked system can tell the caller that the system has no resources (no data in the socket) to perform the requested action. In that case, it is the responsibility of the caller may repeat the call until it succeeds. For example, a read() operation on a socket in non-blocking mode may return the number of read bytes or a special return code -1 with errno set to EWOULBLOCK/EAGAIN, meaning "not ready; try again later."

In a non-blocking asynchronous call, the calling function returns control to the caller immediately, reporting that the requested action was started. The calling system will execute the caller's request using additional system resources/threads and will notify the caller (by callback for example), when the result is ready for processing. For example, a Windows ReadFile() or POSIX aio_read() API returns immediately and initiates an internal system read operation. Of the three approaches, this non-blocking asynchronous approach offers the best scalability and performance.

This article investigates different non-blocking I/O multiplexing mechanisms and proposes a single multi-platform design pattern/solution. We hope that this article will help developers of high performance TCP based servers to choose optimal design solution. We also compare the performance of Java, C# and C++ implementations of proposed and existing solutions. We will exclude the blocking approach from further discussion and comparison at all, as it the least effective approach for scalability and performance.

Reactor and Proactor: two I/O multiplexing approaches

In general, I/O multiplexing mechanisms rely on an event demultiplexor [13], an object that dispatches I/O events from a limited number of sources to the appropriate read/write event handlers. The developer registers interest in specific events and provides event handlers, or callbacks. The event demultiplexor delivers the requested events to the event handlers.

Two patterns that involve event demultiplexors are called Reactor and Proactor [1]. The Reactor patterns involve synchronous I/O, whereas the Proactor pattern involves asynchronous I/O. In Reactor, the event demultiplexor waits for events that indicate when a file descriptor or socket is ready for a read or write operation. The demultiplexor passes this event to the appropriate handler, which is responsible for performing the actual read or write.

In the Proactor pattern, by contrast, the handler—or the event demultiplexor on behalf of the handler—initiates asynchronous read and write operations. The I/O operation itself is performed by the operating system (OS). The parameters passed to the OS include the addresses of user-defined data buffers from which the OS gets data to write, or to which the OS puts data read. The event demultiplexor waits for events that indicate the completion of the I/O operation, and forwards those events to the appropriate handlers. For example, on Windows a handler could initiate async I/O (overlapped in Microsoft terminology) operations, and the event demultiplexor could wait for IOCompletion events [1]. The implementation of this classic asynchronous pattern is based on an asynchronous OS-level API, and we will call this implementation the "system-level" or "true" async, because the application fully relies on the OS to execute actual I/O.

An example will help you understand the difference between Reactor and Proactor. We will focus on the read operation here, as the write implementation is similar. Here's a read in Reactor:

  • An event handler declares interest in I/O events that indicate readiness for read on a particular socket
  • The event demultiplexor waits for events
  • An event comes in and wakes-up the demultiplexor, and the demultiplexor calls the appropriate handler
  • The event handler performs the actual read operation, handles the data read, declares renewed interest in I/O events, and returns control to the dispatcher

By comparison, here is a read operation in Proactor (true async):

  • A handler initiates an asynchronous read operation (note: the OS must support asynchronous I/O). In this case, the handler does not care about I/O readiness events, but is instead registers interest in receiving completion events.
  • The event demultiplexor waits until the operation is completed
  • While the event demultiplexor waits, the OS executes the read operation in a parallel kernel thread, puts data into a user-defined buffer, and notifies the event demultiplexor that the read is complete
  • The event demultiplexor calls the appropriate handler;
  • The event handler handles the data from user defined buffer, starts a new asynchronous operation, and returns control to the event demultiplexor.

Current practice

The open-source C++ development framework ACE [13] developed by Douglas Schmidt, et al., offers a wide range of platform-independent, low-level concurrency support classes (threading, mutexes, etc). On the top level it provides two separate groups of classes: implementations of the ACE Reactor and ACE Proactor. Although both of them are based on platform-independent primitives, these tools offer different interfaces.

The ACE Proactor gives much better performance and robustness on MS-Windows, as Windows provides a very efficient async API, based on operating-system-level support [45].

Unfortunately, not all operating systems provide full robust async OS-level support. For instance, many Unix systems do not. Therefore, ACE Reactor is a preferable solution in UNIX (currently UNIX does not have robust async facilities for sockets). As a result, to achieve the best performance on each system, developers of networked applications need to maintain two separate code-bases: an ACE Proactor based solution on Windows and an ACE Reactor based solution for Unix-based systems.

As we mentioned, the true async Proactor pattern requires operating-system-level support. Due to the differing nature of event handler and operating-system interaction, it is difficult to create common, unified external interfaces for both Reactor and Proactor patterns. That, in turn, makes it hard to create a fully portable development framework and encapsulate the interface and OS- related differences.

Proposed solution

In this section, we will propose a solution to the challenge of designing a portable framework for the Proactor and Reactor I/O patterns. To demonstrate this solution, we will transform a Reactor demultiplexor I/O solution to an emulated async I/O by moving read/write operations from event handlers inside the demultiplexor (this is "emulated async" approach). The following example illustrates that conversion for a read operation:

  • An event handler declares interest in I/O events (readiness for read) and provides the demultiplexor with information such as the address of a data buffer, or the number of bytes to read.
  • Dispatcher waits for events (for example, on select());
  • When an event arrives, it awakes up the dispatcher. The dispatcher performs a non- blocking read operation (it has all necessary information to perform this operation) and on completion calls the appropriate handler.
  • The event handler handles data from the user-defined buffer, declares new interest, along with information about where to put the data buffer and the number bytes to read in I/O events. The event handler then returns control to the dispatcher.

As we can see, by adding functionality to the demultiplexor I/O pattern, we were able to convert the Reactor pattern to a Proactor pattern. In terms of the amount of work performed, this approach is exactly the same as the Reactor pattern. We simply shifted responsibilities between different actors. There is no performance degradation because the amount of work performed is still the same. The work was simply performed by different actors. The following lists of steps demonstrate that each approach performs an equal amount of work:

Standard/classic Reactor:

  • Step 1) wait for event (Reactor job)
  • Step 2) dispatch "Ready-to-Read" event to user handler ( Reactor job)
  • Step 3) read data (user handler job)
  • Step 4) process data ( user handler job)

Proposed emulated Proactor:

  • Step 1) wait for event (Proactor job)
  • Step 2) read data (now Proactor job)
  • Step 3) dispatch "Read-Completed" event to user handler (Proactor job)
  • Step 4) process data (user handler job)

With an operating system that does not provide an async I/O API, this approach allows us to hide the reactive nature of available socket APIs and to expose a fully proactive async interface. This allows us to create a fully portable platform-independent solution with a common external interface.

TProactor

The proposed solution (TProactor) was developed and implemented at Terabit P/L [6]. The solution has two alternative implementations, one in C++ and one in Java. The C++ version was built using ACE cross-platform low-level primitives and has a common unified async proactive interface on all platforms.

The main TProactor components are the Engine and WaitStrategy interfaces. Engine manages the async operations lifecycle. WaitStrategy manages concurrency strategies. WaitStrategy depends on Engine and the two always work in pairs. Interfaces between Engine and WaitStrategy are strongly defined.

Engines and waiting strategies are implemented as pluggable class-drivers (for the full list of all implemented Engines and corresponding WaitStrategies, see Appendix 1). TProactor is a highly configurable solution. It internally implements three engines (POSIX AIO, SUN AIO and Emulated AIO) and hides six different waiting strategies, based on an asynchronous kernel API (for POSIX- this is not efficient right now due to internal POSIX AIO API problems) and synchronous Unix select()poll(), /dev/poll (Solaris 5.8+), port_get (Solaris 5.10), RealTime (RT) signals (Linux 2.4+), epoll (Linux 2.6), k-queue (FreeBSD) APIs. TProactor conforms to the standard ACE Proactor implementation interface. That makes it possible to develop a single cross-platform solution (POSIX/MS-WINDOWS) with a common (ACE Proactor) interface.

With a set of mutually interchangeable "lego-style" Engines and WaitStrategies, a developer can choose the appropriate internal mechanism (engine and waiting strategy) at run time by setting appropriate configuration parameters. These settings may be specified according to specific requirements, such as the number of connections, scalability, and the targeted OS. If the operating system supports async API, a developer may use the true async approach, otherwise the user can opt for an emulated async solutions built on different sync waiting strategies. All of those strategies are hidden behind an emulated async façade.

For an HTTP server running on Sun Solaris, for example, the /dev/poll or port_get()-based engines is the most suitable choice, able to serve huge number of connections, but for another UNIX solution with a limited number of connections but high throughput requirements, aselect()-based engine may be a better approach. Such flexibility cannot be achieved with a standard ACE Reactor/Proactor, due to inherent algorithmic problems of different wait strategies (see Appendix 2).

In terms of performance, our tests show that emulating from reactive to proactive does not impose any overhead—it can be faster, but not slower. According to our test results, the TProactor gives on average of up to 10-35 % better performance (measured in terms of both throughput and response times) than the reactive model in the standard ACE Reactor implementation on various UNIX/Linux platforms. On Windows it gives the same performance as standard ACE Proactor.

Performance comparison (JAVA versus C++ versus C#).

In addition to C++, as we also implemented TProactor in Java. As for JDK version 1.4, Java provides only the sync-based approach that is logically similar to C select() [78]. Java TProactor is based on Java's non-blocking facilities (java.nio packages) logically similar to C++ TProactor with waiting strategy based on select().

Figures 1 and 2 chart the transfer rate in bits/sec versus the number of connections. These charts represent comparison results for a simple echo-server built on standard ACE Reactor, using RedHat Linux 9.0, TProactor C++ and Java (IBM 1.4JVM) on Microsoft's Windows and RedHat Linux9.0, and a C# echo-server running on the Windows operating system. Performance of native AIO APIs is represented by "Async"-marked curves; by emulated AIO (TProactor)—AsyncE curves; and by TP_Reactor—Synch curves. All implementations were bombarded by the same client application—a continuous stream of arbitrary fixed sized messages via N connections.

The full set of tests was performed on the same hardware. Tests on different machines proved that relative results are consistent.

Figure 1. Windows XP/P4 2.6GHz HyperThreading/512 MB RAM.
Figure 2. Linux RedHat 2.4.20-smp/P4 2.6GHz HyperThreading/512 MB RAM.

User code example

The following is the skeleton of a simple TProactor-based Java echo-server. In a nutshell, the developer only has to implement the two interfaces: OpRead with buffer where TProactor puts its read results, and OpWrite with a buffer from which TProactor takes data. The developer will also need to implement protocol-specific logic via providing callbacks onReadCompleted() and onWriteCompleted() in the AsynchHandler interface implementation. Those callbacks will be asynchronously called by TProactor on completion of read/write operations and executed on a thread pool space provided by TProactor (the developer doesn't need to write his own pool).

class EchoServerProtocol implements AsynchHandler
{
    AsynchChannel achannel = null;
    EchoServerProtocol(Demultiplexor m, SelectableChannel channel) throws Exception 
    {
        this.achannel = new AsynchChannel( m, this, channel );
    }

public void start() throws Exception
{
// called after construction 
System.out.println( Thread.currentThread().getName() + ": EchoServer protocol started" ); 
        achannel.read( buffer);
}

public void onReadCompleted( OpRead opRead ) throws Exception
{
if (opRead.getError() != null )
{
    // handle error, do clean-up if needed  
System.out.println( "EchoServer::readCompleted: " + opRead.getError().toString());
achannel.close();
return;
}
if (opRead.getBytesCompleted () <= 0)
{
System.out.println( "EchoServer::readCompleted: Peer closed " + opRead.getBytesCompleted();
achannel.close();
return;
}

ByteBuffer buffer = opRead.getBuffer();
achannel.write(buffer);
}
public void onWriteCompleted(OpWrite opWrite) throws Exception 
{
// logically similar to onReadCompleted         ...     
}
};

IOHandler is a TProactor base class. AsynchHandler and Multiplexor, among other things, internally execute the wait strategy chosen by the developer.

Conclusion

TProactor provides a common, flexible, and configurable solution for multi-platform high- performance communications development. All of the problems and complexities mentioned in Appendix 2, are hidden from the developer.

It is clear from the charts that C++ is still the preferable approach for high performance communication solutions, but Java on Linux comes quite close. However, the overall Java performance was weakened by poor results on Windows. One reason for that may be that the Java 1.4 nio package is based on select()-style API. K?It is true, Java NIO package is kind of Reactor pattern based on select()-style API (see [78]). Java NIO allows to write your own select()-style provider (equivalent of TProactor waiting strategies). Looking at Java NIO implementation for Windows (to do this enough to examine import symbols in jdk1.5.0\jre\bin\nio.dll), we can make a conclusion that Java NIO 1.4.2 and 1.5.0 for Windows is based on WSAEventSelect () API. That is better than select(), but slower than IOCompletionPortKs for significant number of connections. . Should the 1.5 version of Java's nio be based on IOCompletionPorts, then that should improve performance. If Java NIO would use IOCompletionPorts, than conversion of Proactor pattern to Reactor pattern should be made inside nio.dll. Although such conversion is more complicated than Reactor- >Proactor conversion, but it can be implemented in frames of Java NIO interfaces. (this the topic of next arcticle, but we can provide algorithm). At this time, no TProactor performance tests were done on JDK 1.5.

Note. All tests for Java are performed on "raw" buffers (java.nio.ByteBuffer) without data processing.

Taking into account the latest activities to develop robust AIO on Linux [9], we can conclude that Linux Kernel API (io_xxxx set of system calls) should be more scalable in comparison with POSIX standard, but still not portable. In this case, TProactor with new Engine/Wait Strategy pair, based on native LINUX AIO can be easily implemented to overcome portability issues and to cover Linux native AIO with standard ACE Proactor interface.

Appendix I

Engines and waiting strategies implemented in TProactor

 

Engine TypeWait StrategiesOperating System
POSIX_AIO (true async)
aio_read()/aio_write()
aio_suspend()
Waiting for RT signal
Callback function
POSIX complained UNIX (not robust)
POSIX (not robust)
SGI IRIX, LINUX (not robust)
SUN_AIO (true async)
aio_read()/aio_write()
aio_wait()SUN (not robust)
Emulated Async
Non-blocking read()/write()
select()
poll()
/dev/poll
Linux RT signals
Kqueue
generic POSIX
Mostly all POSIX implementations
SUN
Linux
FreeBSD

Appendix II

All sync waiting strategies can be divided into two groups:

  • edge-triggered (e.g. Linux RT signals)—signal readiness only when socket became ready (changes state);
  • level-triggered (e.g. select()poll(), /dev/poll)—readiness at any time.

Let us describe some common logical problems for those groups:

  • edge-triggered group: after executing I/O operation, the demultiplexing loop can lose the state of socket readiness. Example: the "read" handler did not read whole chunk of data, so the socket remains still ready for read. But the demultiplexor loop will not receive next notification.
  • level-triggered group: when demultiplexor loop detects readiness, it starts the write/read user defined handler. But before the start, it should remove socket descriptior from the set of monitored descriptors. Otherwise, the same event can be dispatched twice.
  • Obviously, solving these problems adds extra complexities to development. All these problems were resolved internally within TProactor and the developer should not worry about those details, while in the synch approach one needs to apply extra effort to resolve them.

Resources

[1] Douglas C. Schmidt, Stephen D. Huston "C++ Network Programming." 2002, Addison-Wesley ISBN 0-201-60464-7

[2] W. Richard Stevens "UNIX Network Programming" vol. 1 and 2, 1999, Prentice Hill, ISBN 0-13- 490012-X 

[3] Douglas C. Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann "Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, Volume 2" Wiley & Sons, NY 2000

[4] INFO: Socket Overlapped I/O Versus Blocking/Non-blocking Mode. Q181611. Microsoft Knowledge Base Articles.

[5] Microsoft MSDN. I/O Completion Ports.
http://msdn.microsoft.com/library/default.asp?url=/library/en- us/fileio/fs/i_o_completion_ports.asp

[6] TProactor (ACE compatible Proactor).
www.terabit.com.au

[7] JavaDoc java.nio.channels
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/package-summary.html

[8] JavaDoc Java.nio.channels.spi Class SelectorProvider 
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/spi/SelectorProvider.html

[9] Linux AIO development 
http://lse.sourceforge.net/io/aio.html, and
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf

See Also:

Ian Barile "I/O Multiplexing & Scalable Socket Servers", 2004 February, DDJ 

Further reading on event handling
- http://www.cs.wustl.edu/~schmidt/ACE-papers.html

The Adaptive Communication Environment
http://www.cs.wustl.edu/~schmidt/ACE.html

Terabit Solutions
http://terabit.com.au/solutions.php

About the authors

Alex Libman has been programming for 15 years. During the past 5 years his main area of interest is pattern-oriented multiplatform networked programming using C++ and Java. He is big fan and contributor of ACE.

Vlad Gilbourd works as a computer consultant, but wishes to spend more time listening jazz :) As a hobby, he started and runswww.corporatenews.com.au website.



from:
http://www.artima.com/articles/io_design_patterns.html

 



chatler 2012-05-21 11:24 发表评论
]]>
Comparing Two High-Performance I/O Design Patternshttp://www.shnenglu.com/beautykingdom/archive/2010/09/08/126175.htmlchatlerchatlerWed, 08 Sep 2010 09:20:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/09/08/126175.htmlhttp://www.shnenglu.com/beautykingdom/comments/126175.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/09/08/126175.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/126175.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/126175.html
Summary
This article investigates and compares different design patterns of high performance TCP-based servers. In addition to existing approaches, it proposes a scalable single-codebase, multi-platform solution (with code examples) and describes its fine-tuning on different platforms. It also compares performance of Java, C# and C++ implementations of proposed and existing solutions.

System I/O can be blocking, or non-blocking synchronous, or non-blocking asynchronous [12]. Blocking I/O means that the calling system does not return control to the caller until the operation is finished. As a result, the caller is blocked and cannot perform other activities during that time. Most important, the caller thread cannot be reused for other request processing while waiting for the I/O to complete, and becomes a wasted resource during that time. For example, a read() operation on a socket in blocking mode will not return control if the socket buffer is empty until some data becomes available.

By contrast, a non-blocking synchronous call returns control to the caller immediately. The caller is not made to wait, and the invoked system immediately returns one of two responses: If the call was executed and the results are ready, then the caller is told of that. Alternatively, the invoked system can tell the caller that the system has no resources (no data in the socket) to perform the requested action. In that case, it is the responsibility of the caller may repeat the call until it succeeds. For example, a read() operation on a socket in non-blocking mode may return the number of read bytes or a special return code -1 with errno set to EWOULBLOCK/EAGAIN, meaning "not ready; try again later."

In a non-blocking asynchronous call, the calling function returns control to the caller immediately, reporting that the requested action was started. The calling system will execute the caller's request using additional system resources/threads and will notify the caller (by callback for example), when the result is ready for processing. For example, a Windows ReadFile() or POSIX aio_read() API returns immediately and initiates an internal system read operation. Of the three approaches, this non-blocking asynchronous approach offers the best scalability and performance.

This article investigates different non-blocking I/O multiplexing mechanisms and proposes a single multi-platform design pattern/solution. We hope that this article will help developers of high performance TCP based servers to choose optimal design solution. We also compare the performance of Java, C# and C++ implementations of proposed and existing solutions. We will exclude the blocking approach from further discussion and comparison at all, as it the least effective approach for scalability and performance.

In general, I/O multiplexing mechanisms rely on an event demultiplexor [13], an object that dispatches I/O events from a limited number of sources to the appropriate read/write event handlers. The developer registers interest in specific events and provides event handlers, or callbacks. The event demultiplexor delivers the requested events to the event handlers.

Two patterns that involve event demultiplexors are called Reactor and Proactor [1]. The Reactor patterns involve synchronous I/O, whereas the Proactor pattern involves asynchronous I/O. In Reactor, the event demultiplexor waits for events that indicate when a file descriptor or socket is ready for a read or write operation. The demultiplexor passes this event to the appropriate handler, which is responsible for performing the actual read or write.

In the Proactor pattern, by contrast, the handler—or the event demultiplexor on behalf of the handler—initiates asynchronous read and write operations. The I/O operation itself is performed by the operating system (OS). The parameters passed to the OS include the addresses of user-defined data buffers from which the OS gets data to write, or to which the OS puts data read. The event demultiplexor waits for events that indicate the completion of the I/O operation, and forwards those events to the appropriate handlers. For example, on Windows a handler could initiate async I/O (overlapped in Microsoft terminology) operations, and the event demultiplexor could wait for IOCompletion events [1]. The implementation of this classic asynchronous pattern is based on an asynchronous OS-level API, and we will call this implementation the "system-level" or "true" async, because the application fully relies on the OS to execute actual I/O.

An example will help you understand the difference between Reactor and Proactor. We will focus on the read operation here, as the write implementation is similar. Here's a read in Reactor:

  • An event handler declares interest in I/O events that indicate readiness for read on a particular socket
  • The event demultiplexor waits for events
  • An event comes in and wakes-up the demultiplexor, and the demultiplexor calls the appropriate handler
  • The event handler performs the actual read operation, handles the data read, declares renewed interest in I/O events, and returns control to the dispatcher

By comparison, here is a read operation in Proactor (true async):

  • A handler initiates an asynchronous read operation (note: the OS must support asynchronous I/O). In this case, the handler does not care about I/O readiness events, but is instead registers interest in receiving completion events.
  • The event demultiplexor waits until the operation is completed
  • While the event demultiplexor waits, the OS executes the read operation in a parallel kernel thread, puts data into a user-defined buffer, and notifies the event demultiplexor that the read is complete
  • The event demultiplexor calls the appropriate handler;
  • The event handler handles the data from user defined buffer, starts a new asynchronous operation, and returns control to the event demultiplexor.

Current practice

The open-source C++ development framework ACE [13] developed by Douglas Schmidt, et al., offers a wide range of platform-independent, low-level concurrency support classes (threading, mutexes, etc). On the top level it provides two separate groups of classes: implementations of the ACE Reactor and ACE Proactor. Although both of them are based on platform-independent primitives, these tools offer different interfaces.

The ACE Proactor gives much better performance and robustness on MS-Windows, as Windows provides a very efficient async API, based on operating-system-level support [45].

Unfortunately, not all operating systems provide full robust async OS-level support. For instance, many Unix systems do not. Therefore, ACE Reactor is a preferable solution in UNIX (currently UNIX does not have robust async facilities for sockets). As a result, to achieve the best performance on each system, developers of networked applications need to maintain two separate code-bases: an ACE Proactor based solution on Windows and an ACE Reactor based solution for Unix-based systems.

As we mentioned, the true async Proactor pattern requires operating-system-level support. Due to the differing nature of event handler and operating-system interaction, it is difficult to create common, unified external interfaces for both Reactor and Proactor patterns. That, in turn, makes it hard to create a fully portable development framework and encapsulate the interface and OS- related differences.

Proposed solution

In this section, we will propose a solution to the challenge of designing a portable framework for the Proactor and Reactor I/O patterns. To demonstrate this solution, we will transform a Reactor demultiplexor I/O solution to an emulated async I/O by moving read/write operations from event handlers inside the demultiplexor (this is "emulated async" approach). The following example illustrates that conversion for a read operation:

  • An event handler declares interest in I/O events (readiness for read) and provides the demultiplexor with information such as the address of a data buffer, or the number of bytes to read.
  • Dispatcher waits for events (for example, on select());
  • When an event arrives, it awakes up the dispatcher. The dispatcher performs a non- blocking read operation (it has all necessary information to perform this operation) and on completion calls the appropriate handler.
  • The event handler handles data from the user-defined buffer, declares new interest, along with information about where to put the data buffer and the number bytes to read in I/O events. The event handler then returns control to the dispatcher.

As we can see, by adding functionality to the demultiplexor I/O pattern, we were able to convert the Reactor pattern to a Proactor pattern. In terms of the amount of work performed, this approach is exactly the same as the Reactor pattern. We simply shifted responsibilities between different actors. There is no performance degradation because the amount of work performed is still the same. The work was simply performed by different actors. The following lists of steps demonstrate that each approach performs an equal amount of work:

Standard/classic Reactor:

  • Step 1) wait for event (Reactor job)
  • Step 2) dispatch "Ready-to-Read" event to user handler ( Reactor job)
  • Step 3) read data (user handler job)
  • Step 4) process data ( user handler job)

Proposed emulated Proactor:

  • Step 1) wait for event (Proactor job)
  • Step 2) read data (now Proactor job)
  • Step 3) dispatch "Read-Completed" event to user handler (Proactor job)
  • Step 4) process data (user handler job)

With an operating system that does not provide an async I/O API, this approach allows us to hide the reactive nature of available socket APIs and to expose a fully proactive async interface. This allows us to create a fully portable platform-independent solution with a common external interface.

TProactor

The proposed solution (TProactor) was developed and implemented at Terabit P/L [6]. The solution has two alternative implementations, one in C++ and one in Java. The C++ version was built using ACE cross-platform low-level primitives and has a common unified async proactive interface on all platforms.

The main TProactor components are the Engine and WaitStrategy interfaces. Engine manages the async operations lifecycle. WaitStrategy manages concurrency strategies. WaitStrategy depends on Engine and the two always work in pairs. Interfaces between Engine and WaitStrategy are strongly defined.

Engines and waiting strategies are implemented as pluggable class-drivers (for the full list of all implemented Engines and corresponding WaitStrategies, see Appendix 1). TProactor is a highly configurable solution. It internally implements three engines (POSIX AIO, SUN AIO and Emulated AIO) and hides six different waiting strategies, based on an asynchronous kernel API (for POSIX- this is not efficient right now due to internal POSIX AIO API problems) and synchronous Unix select()poll(), /dev/poll (Solaris 5.8+), port_get (Solaris 5.10), RealTime (RT) signals (Linux 2.4+), epoll (Linux 2.6), k-queue (FreeBSD) APIs. TProactor conforms to the standard ACE Proactor implementation interface. That makes it possible to develop a single cross-platform solution (POSIX/MS-WINDOWS) with a common (ACE Proactor) interface.

With a set of mutually interchangeable "lego-style" Engines and WaitStrategies, a developer can choose the appropriate internal mechanism (engine and waiting strategy) at run time by setting appropriate configuration parameters. These settings may be specified according to specific requirements, such as the number of connections, scalability, and the targeted OS. If the operating system supports async API, a developer may use the true async approach, otherwise the user can opt for an emulated async solutions built on different sync waiting strategies. All of those strategies are hidden behind an emulated async façade.

For an HTTP server running on Sun Solaris, for example, the /dev/poll or port_get()-based engines is the most suitable choice, able to serve huge number of connections, but for another UNIX solution with a limited number of connections but high throughput requirements, a select()-based engine may be a better approach. Such flexibility cannot be achieved with a standard ACE Reactor/Proactor, due to inherent algorithmic problems of different wait strategies (see Appendix 2).

In terms of performance, our tests show that emulating from reactive to proactive does not impose any overhead—it can be faster, but not slower. According to our test results, the TProactor gives on average of up to 10-35 % better performance (measured in terms of both throughput and response times) than the reactive model in the standard ACE Reactor implementation on various UNIX/Linux platforms. On Windows it gives the same performance as standard ACE Proactor.

Performance comparison (JAVA versus C++ versus C#).

In addition to C++, as we also implemented TProactor in Java. As for JDK version 1.4, Java provides only the sync-based approach that is logically similar to C select() [78]. Java TProactor is based on Java's non-blocking facilities (java.nio packages) logically similar to C++ TProactor with waiting strategy based on select().

Figures 1 and 2 chart the transfer rate in bits/sec versus the number of connections. These charts represent comparison results for a simple echo-server built on standard ACE Reactor, using RedHat Linux 9.0, TProactor C++ and Java (IBM 1.4JVM) on Microsoft's Windows and RedHat Linux9.0, and a C# echo-server running on the Windows operating system. Performance of native AIO APIs is represented by "Async"-marked curves; by emulated AIO (TProactor)—AsyncE curves; and by TP_Reactor—Synch curves. All implementations were bombarded by the same client application—a continuous stream of arbitrary fixed sized messages via N connections.

The full set of tests was performed on the same hardware. Tests on different machines proved that relative results are consistent.

Figure 1. Windows XP/P4 2.6GHz HyperThreading/512 MB RAM.
Figure 2. Linux RedHat 2.4.20-smp/P4 2.6GHz HyperThreading/512 MB RAM.

User code example

The following is the skeleton of a simple TProactor-based Java echo-server. In a nutshell, the developer only has to implement the two interfaces:OpRead with buffer where TProactor puts its read results, and OpWrite with a buffer from which TProactor takes data. The developer will also need to implement protocol-specific logic via providing callbacks onReadCompleted() and onWriteCompleted() in the AsynchHandlerinterface implementation. Those callbacks will be asynchronously called by TProactor on completion of read/write operations and executed on a thread pool space provided by TProactor (the developer doesn't need to write his own pool).

class EchoServerProtocol implements AsynchHandler
{

    AsynchChannel achannel = null;

    EchoServerProtocol( Demultiplexor m,  SelectableChannel channel ) throws Exception
    {
        this.achannel = new AsynchChannel( m, this, channel );
    }

    public void start() throws Exception
    {
        // called after construction
        System.out.println( Thread.currentThread().getName() + ": EchoServer protocol started" );
        achannel.read( buffer);
    }

    public void onReadCompleted( OpRead opRead ) throws Exception
    {
        if ( opRead.getError() != null )
        {
            // handle error, do clean-up if needed
 System.out.println( "EchoServer::readCompleted: " + opRead.getError().toString());
            achannel.close();
            return;
        }

        if ( opRead.getBytesCompleted () <= 0)
        {
            System.out.println( "EchoServer::readCompleted: Peer closed " + opRead.getBytesCompleted();
            achannel.close();
            return;
        }

        ByteBuffer buffer = opRead.getBuffer();

        achannel.write(buffer);
    }

    public void onWriteCompleted(OpWrite opWrite) throws Exception
    {
        // logically similar to onReadCompleted
        ...
    }
}

IOHandler is a TProactor base class. AsynchHandler and Multiplexor, among other things, internally execute the wait strategy chosen by the developer.

Conclusion

TProactor provides a common, flexible, and configurable solution for multi-platform high- performance communications development. All of the problems and complexities mentioned in Appendix 2, are hidden from the developer.

It is clear from the charts that C++ is still the preferable approach for high performance communication solutions, but Java on Linux comes quite close. However, the overall Java performance was weakened by poor results on Windows. One reason for that may be that the Java 1.4 nio package is based on select()-style API. K?It is true, Java NIO package is kind of Reactor pattern based on select()-style API (see [78]). Java NIO allows to write your own select()-style provider (equivalent of TProactor waiting strategies). Looking at Java NIO implementation for Windows (to do this enough to examine import symbols in jdk1.5.0\jre\bin\nio.dll), we can make a conclusion that Java NIO 1.4.2 and 1.5.0 for Windows is based on WSAEventSelect () API. That is better than select(), but slower than IOCompletionPortKs for significant number of connections. . Should the 1.5 version of Java's nio be based on IOCompletionPorts, then that should improve performance. If Java NIO would use IOCompletionPorts, than conversion of Proactor pattern to Reactor pattern should be made inside nio.dll. Although such conversion is more complicated than Reactor- >Proactor conversion, but it can be implemented in frames of Java NIO interfaces. (this the topic of next arcticle, but we can provide algorithm). At this time, no TProactor performance tests were done on JDK 1.5.

Note. All tests for Java are performed on "raw" buffers (java.nio.ByteBuffer) without data processing.

Taking into account the latest activities to develop robust AIO on Linux [9], we can conclude that Linux Kernel API (io_xxxx set of system calls) should be more scalable in comparison with POSIX standard, but still not portable. In this case, TProactor with new Engine/Wait Strategy pair, based on native LINUX AIO can be easily implemented to overcome portability issues and to cover Linux native AIO with standard ACE Proactor interface.

Appendix I

Engines and waiting strategies implemented in TProactor

 

Engine TypeWait StrategiesOperating System
POSIX_AIO (true async)
aio_read()/aio_write()
aio_suspend()
Waiting for RT signal
Callback function
POSIX complained UNIX (not robust)
POSIX (not robust)
SGI IRIX, LINUX (not robust)
SUN_AIO (true async)
aio_read()/aio_write()
aio_wait()SUN (not robust)
Emulated Async
Non-blocking read()/write()
select()
poll()
/dev/poll
Linux RT signals
Kqueue
generic POSIX
Mostly all POSIX implementations
SUN
Linux
FreeBSD

Appendix II

All sync waiting strategies can be divided into two groups:

  • edge-triggered (e.g. Linux RT signals)—signal readiness only when socket became ready (changes state);
  • level-triggered (e.g. select()poll(), /dev/poll)—readiness at any time.

Let us describe some common logical problems for those groups:

  • edge-triggered group: after executing I/O operation, the demultiplexing loop can lose the state of socket readiness. Example: the "read" handler did not read whole chunk of data, so the socket remains still ready for read. But the demultiplexor loop will not receive next notification.
  • level-triggered group: when demultiplexor loop detects readiness, it starts the write/read user defined handler. But before the start, it should remove socket descriptior from the set of monitored descriptors. Otherwise, the same event can be dispatched twice.
  • Obviously, solving these problems adds extra complexities to development. All these problems were resolved internally within TProactor and the developer should not worry about those details, while in the synch approach one needs to apply extra effort to resolve them.

Resources

[1] Douglas C. Schmidt, Stephen D. Huston "C++ Network Programming." 2002, Addison-Wesley ISBN 0-201-60464-7

[2] W. Richard Stevens "UNIX Network Programming" vol. 1 and 2, 1999, Prentice Hill, ISBN 0-13- 490012-X 

[3] Douglas C. Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann "Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, Volume 2" Wiley & Sons, NY 2000

[4] INFO: Socket Overlapped I/O Versus Blocking/Non-blocking Mode. Q181611. Microsoft Knowledge Base Articles.

[5] Microsoft MSDN. I/O Completion Ports.
http://msdn.microsoft.com/library/default.asp?url=/library/en- us/fileio/fs/i_o_completion_ports.asp

[6] TProactor (ACE compatible Proactor).
www.terabit.com.au

[7] JavaDoc java.nio.channels
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/package-summary.html

[8] JavaDoc Java.nio.channels.spi Class SelectorProvider 
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/spi/SelectorProvider.html

[9] Linux AIO development 
http://lse.sourceforge.net/io/aio.html, and
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf

See Also:

Ian Barile "I/O Multiplexing & Scalable Socket Servers", 2004 February, DDJ 

Further reading on event handling
- http://www.cs.wustl.edu/~schmidt/ACE-papers.html

The Adaptive Communication Environment
http://www.cs.wustl.edu/~schmidt/ACE.html

Terabit Solutions
http://terabit.com.au/solutions.php

About the authors

Alex Libman has been programming for 15 years. During the past 5 years his main area of interest is pattern-oriented multiplatform networked programming using C++ and Java. He is big fan and contributor of ACE.

Vlad Gilbourd works as a computer consultant, but wishes to spend more time listening jazz :) As a hobby, he started and runswww.corporatenews.com.au website.

from:

http://www.artima.com/articles/io_design_patterns.html



chatler 2010-09-08 17:20 发表评论
]]>
一个基于完成端口的TCP Server Framework,析IOCPhttp://www.shnenglu.com/beautykingdom/archive/2010/08/25/124731.htmlchatlerchatlerWed, 25 Aug 2010 12:42:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/08/25/124731.htmlhttp://www.shnenglu.com/beautykingdom/comments/124731.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/08/25/124731.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/124731.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/124731.html如果你不投递(POSTQOverlapped I/OQ那么I/O Completion Ports 只能Z提供一个Queue. 
    CreateIoCompletionPort的NumberOfConcurrentThreadsQ?/span>
1.只有当第二个参数ExistingCompletionPort为NULL时它才有效,它是个max threads limits.
2.大家有谁把它讄出cpu个数的|当然不只是cpu个数?倍,而是下面的MAX_THREADS 100甚至更大?/span>
对于q个值的讑֮Qmsdnq没有说非得设成cpu个数?倍,而且也没有把减少U程之间上下文交换这些媄响扯到这里来。I/O Completion Ports MSDN:"If your transaction required a lengthy computation, a larger concurrency value will allow more threads to run. Each completion packet may take longer to finish, but more completion packets will be processed at the same time. "?/span>
    对于struct OVERLAPPEDQ我们常会如下扩展,
typedef struct {
  WSAOVERLAPPED overlapped; //must be first member?   是的Q必LW一个。如果你不肯定,你可以试试?/span>
  SOCKET client_s;
  SOCKADDR_IN client_addr;
  WORD optCode;//1--read,2--send.  有h怼定义q个数据成员Q但也有Z用,争议在send/WSASend,此时的同步和异步是否有必要? 臛_我下面的server更本没用它?/span>
  char buf[MAX_BUF_SIZE];
  WSABUF wsaBuf;//inited ?  q个不要忘了Q?/span>
  DWORD numberOfBytesTransferred;
  DWORD flags;   

}QSSOverlapped;//for per connection
我下面的server框架的基本思想?
One connection VS one thread in worker thread pool ,worker thread performs completionWorkerRoutine.
A Acceptor thread 专门用来accept socket,兌至IOCP,qWSARecv:post Recv Completion Packet to IOCP.
在completionWorkerRoutine中有以下的职?
1.handle request,当忙时增加completionWorkerThread数量但不过maxThreads,post Recv Completion Packet to IOCP.
2.timeout时检查是否空闲和当前completionWorkerThread数量,当空闲时保持或减至minThreads数量.
3.Ҏ有Accepted-socket理生命周期,q里利用pȝ的keepalive probes,若想实现业务?心蟩探测"只需QSS_SIO_KEEPALIVE_VALS_TIMEOUT 改回pȝ默认?时.
下面l合源代?析一下IOCP:
socketserver.h
#ifndef __Q_SOCKET_SERVER__
#define __Q_SOCKET_SERVER__
#include <winsock2.h>
#include <mstcpip.h>
#define QSS_SIO_KEEPALIVE_VALS_TIMEOUT 30*60*1000
#define QSS_SIO_KEEPALIVE_VALS_INTERVAL 5*1000

#define MAX_THREADS 100
#define MAX_THREADS_MIN  10
#define MIN_WORKER_WAIT_TIMEOUT  20*1000
#define MAX_WORKER_WAIT_TIMEOUT  60*MIN_WORKER_WAIT_TIMEOUT

#define MAX_BUF_SIZE 1024

/*当Accepted socket和socket关闭或发生异常时回调CSocketLifecycleCallback*/
typedef void (*CSocketLifecycleCallback)(SOCKET cs,int lifecycle);//lifecycle:0:OnAccepted,-1:OnClose//注意OnClose此时的socket未必可用,可能已经被非正常关闭或其他异?

/*协议处理回调*/
typedef int (*InternalProtocolHandler)(LPWSAOVERLAPPED overlapped);//return -1:SOCKET_ERROR

typedef struct Q_SOCKET_SERVER SocketServer;
DWORD initializeSocketServer(SocketServer ** ssp,WORD passive,WORD port,CSocketLifecycleCallback cslifecb,InternalProtocolHandler protoHandler,WORD minThreads,WORD maxThreads,long workerWaitTimeout);
DWORD startSocketServer(SocketServer *ss);
DWORD shutdownSocketServer(SocketServer *ss);

#endif
 qsocketserver.c      U?qss,相应的OVERLAPPEDUqssOl.
#include "socketserver.h"
#include "stdio.h"
typedef struct {  
  WORD passive;//daemon
  WORD port;
  WORD minThreads;
  WORD maxThreads;
  volatile long lifecycleStatus;//0-created,1-starting, 2-running,3-stopping,4-exitKeyPosted,5-stopped 
  long  workerWaitTimeout;//wait timeout  
  CRITICAL_SECTION QSS_LOCK;
  volatile long workerCounter;
  volatile long currentBusyWorkers;
  volatile long CSocketsCounter;//Accepted-socket引用计数
  CSocketLifecycleCallback cslifecb;
  InternalProtocolHandler protoHandler;
  WORD wsaVersion;//=MAKEWORD(2,0);
  WSADATA wsData;
  SOCKET server_s;
  SOCKADDR_IN serv_addr;
  HANDLE iocpHandle;
}QSocketServer;

typedef struct {
  WSAOVERLAPPED overlapped;  
  SOCKET client_s;
  SOCKADDR_IN client_addr;
  WORD optCode;
  char buf[MAX_BUF_SIZE];
  WSABUF wsaBuf;
  DWORD numberOfBytesTransferred;
  DWORD flags;
}QSSOverlapped;

DWORD  acceptorRoutine(LPVOID);
DWORD  completionWorkerRoutine(LPVOID);

static void adjustQSSWorkerLimits(QSocketServer *qss){
  /*adjust size and timeout.*/
  /*if(qss->maxThreads <= 0) {
   qss->maxThreads = MAX_THREADS;
        } else if (qss->maxThreads < MAX_THREADS_MIN) {            
         qss->maxThreads = MAX_THREADS_MIN;
        }
        if(qss->minThreads >  qss->maxThreads) {
         qss->minThreads =  qss->maxThreads;
        }
        if(qss->minThreads <= 0) {
            if(1 == qss->maxThreads) {
             qss->minThreads = 1;
            } else {
             qss->minThreads = qss->maxThreads/2;
            }
        }
        
        if(qss->workerWaitTimeout<MIN_WORKER_WAIT_TIMEOUT) 
         qss->workerWaitTimeout=MIN_WORKER_WAIT_TIMEOUT;
        if(qss->workerWaitTimeout>MAX_WORKER_WAIT_TIMEOUT)
         qss->workerWaitTimeout=MAX_WORKER_WAIT_TIMEOUT;        */
}

typedef struct{
 QSocketServer * qss;
 HANDLE th;
}QSSWORKER_PARAM;

static WORD addQSSWorker(QSocketServer *qss,WORD addCounter){
 WORD res=0;
 if(qss->workerCounter<qss->minThreads||(qss->currentBusyWorkers==qss->workerCounter&&qss->workerCounter<qss->maxThreads)){
  DWORD threadId;
  QSSWORKER_PARAM * pParam=NULL;
  int i=0;  
  EnterCriticalSection(&qss->QSS_LOCK);
  if(qss->workerCounter+addCounter<=qss->maxThreads)
   for(;i<addCounter;i++)
   {
    pParam=malloc(sizeof(QSSWORKER_PARAM));
    if(pParam){
     pParam->th=CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)completionWorkerRoutine,pParam,CREATE_SUSPENDED,&threadId);
     pParam->qss=qss;
     ResumeThread(pParam->th);
     qss->workerCounter++,res++; 
    }    
   }  
  LeaveCriticalSection(&qss->QSS_LOCK);
 }  
 return res;
}

static void SOlogger(const char * msg,SOCKET s,int clearup){
 perror(msg);
 if(s>0)
 closesocket(s);
 if(clearup)
 WSACleanup();
}

static int _InternalEchoProtocolHandler(LPWSAOVERLAPPED overlapped){
 QSSOverlapped *qssOl=(QSSOverlapped *)overlapped;
 
 printf("numOfT:%d,WSARecvd:%s,\n",qssOl->numberOfBytesTransferred,qssOl->buf);
 //Sleep(500); 
 return send(qssOl->client_s,qssOl->buf,qssOl->numberOfBytesTransferred,0);
}

DWORD initializeSocketServer(SocketServer ** ssp,WORD passive,WORD port,CSocketLifecycleCallback cslifecb,InternalProtocolHandler protoHandler,WORD minThreads,WORD maxThreads,long workerWaitTimeout){
 QSocketServer * qss=malloc(sizeof(QSocketServer));
 qss->passive=passive>0?1:0;
 qss->port=port;
 qss->minThreads=minThreads;
 qss->maxThreads=maxThreads;
 qss->workerWaitTimeout=workerWaitTimeout;
 qss->wsaVersion=MAKEWORD(2,0); 
 qss->lifecycleStatus=0;
 InitializeCriticalSection(&qss->QSS_LOCK);
 qss->workerCounter=0;
 qss->currentBusyWorkers=0;
 qss->CSocketsCounter=0;
 qss->cslifecb=cslifecb,qss->protoHandler=protoHandler;
 if(!qss->protoHandler)
  qss->protoHandler=_InternalEchoProtocolHandler; 
 adjustQSSWorkerLimits(qss);
 *ssp=(SocketServer *)qss;
 return 1;
}

DWORD startSocketServer(SocketServer *ss){ 
 QSocketServer * qss=(QSocketServer *)ss;
 if(qss==NULL||InterlockedCompareExchange(&qss->lifecycleStatus,1,0))
  return 0; 
 qss->serv_addr.sin_family=AF_INET;
 qss->serv_addr.sin_port=htons(qss->port);
 qss->serv_addr.sin_addr.s_addr=INADDR_ANY;//inet_addr("127.0.0.1");
 if(WSAStartup(qss->wsaVersion,&qss->wsData)){  
  /*q里q有个插曲就是这个WSAStartup被调用的时?它居然会启动一条额外的U程,当然E后q条U程会自动退出的.不知WSAClearup又会如何?......*/

  SOlogger("WSAStartup failed.\n",0,0);
  return 0;
 }
 qss->server_s=socket(AF_INET,SOCK_STREAM,IPPROTO_IP);
 if(qss->server_s==INVALID_SOCKET){  
  SOlogger("socket failed.\n",0,1);
  return 0;
 }
 if(bind(qss->server_s,(LPSOCKADDR)&qss->serv_addr,sizeof(SOCKADDR_IN))==SOCKET_ERROR){  
  SOlogger("bind failed.\n",qss->server_s,1);
  return 0;
 }
 if(listen(qss->server_s,SOMAXCONN)==SOCKET_ERROR)/*q里来谈?strong>backlog,很多Z知道设成何?我见到过1,5,50,100?有h说设定的大耗资?的确,q里设成SOMAXCONN不代表windows会真的用SOMAXCONN,而是" If set to SOMAXCONN, the underlying service provider responsible for socket s will set the backlog to a maximum reasonable value. "Q同时在现实环境中,不同操作pȝ支持TCP~冲队列有所不同Q所以还不如让操作系l来军_它的倹{像Apacheq种服务器:
#ifndef DEFAULT_LISTENBACKLOG
#define DEFAULT_LISTENBACKLOG 511
#endif
*/
    {        
  SOlogger("listen failed.\n",qss->server_s,1);
        return 0;
    }
 qss->iocpHandle=CreateIoCompletionPort(INVALID_HANDLE_VALUE,NULL,0,/*NumberOfConcurrentThreads-->*/qss->maxThreads);
 //initialize worker for completion routine.
 addQSSWorker(qss,qss->minThreads);  
 qss->lifecycleStatus=2;
 {
  QSSWORKER_PARAM * pParam=malloc(sizeof(QSSWORKER_PARAM));
  pParam->qss=qss;
  pParam->th=NULL;
  if(qss->passive){
   DWORD threadId;
   pParam->th=CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)acceptorRoutine,pParam,0,&threadId); 
  }else
   return acceptorRoutine(pParam);
 }
 return 1;
}

DWORD shutdownSocketServer(SocketServer *ss){
 QSocketServer * qss=(QSocketServer *)ss;
 if(qss==NULL||InterlockedCompareExchange(&qss->lifecycleStatus,3,2)!=2)
  return 0; 
 closesocket(qss->server_s/*listen-socket*/);//..other accepted-sockets associated with the listen-socket will not be closed,except WSACleanup is called.. 
 if(qss->CSocketsCounter==0)
  qss->lifecycleStatus=4,PostQueuedCompletionStatus(qss->iocpHandle,0,-1,NULL);
 WSACleanup();  
 return 1;
}

DWORD  acceptorRoutine(LPVOID ss){
 QSSWORKER_PARAM * pParam=(QSSWORKER_PARAM *)ss;
 QSocketServer * qss=pParam->qss;
 HANDLE curThread=pParam->th;
 QSSOverlapped *qssOl=NULL;
 SOCKADDR_IN client_addr;
 int client_addr_leng=sizeof(SOCKADDR_IN);
 SOCKET cs; 
 free(pParam);
 while(1){  
  printf("accept starting.....\n");
  cs/*Accepted-socket*/=accept(qss->server_s,(LPSOCKADDR)&client_addr,&client_addr_leng);
  if(cs==INVALID_SOCKET)
        {
   printf("accept failed:%d\n",GetLastError());   
            break;
        }else{//SO_KEEPALIVE,SIO_KEEPALIVE_VALS q里是利用系l的"心蟩探测",keepalive probes.linux:setsockopt,SOL_TCP:TCP_KEEPIDLE,TCP_KEEPINTVL,TCP_KEEPCNT
            struct tcp_keepalive alive,aliveOut;
            int so_keepalive_opt=1;
            DWORD outDW;
            if(!setsockopt(cs,SOL_SOCKET,SO_KEEPALIVE,(char *)&so_keepalive_opt,sizeof(so_keepalive_opt))){
               alive.onoff=TRUE;
               alive.keepalivetime=QSS_SIO_KEEPALIVE_VALS_TIMEOUT;
               alive.keepaliveinterval=QSS_SIO_KEEPALIVE_VALS_INTERVAL;
               if(WSAIoctl(cs,SIO_KEEPALIVE_VALS,&alive,sizeof(alive),&aliveOut,sizeof(aliveOut),&outDW,NULL,NULL)==SOCKET_ERROR){
                    printf("WSAIoctl SIO_KEEPALIVE_VALS failed:%d\n",GetLastError());   
                    break;
                }

            }else{
                     printf("setsockopt SO_KEEPALIVE failed:%d\n",GetLastError());   
                     break;
            }  
  }
  
  CreateIoCompletionPort((HANDLE)cs,qss->iocpHandle,cs,0);
  if(qssOl==NULL){
   qssOl=malloc(sizeof(QSSOverlapped));   
  }
  qssOl->client_s=cs;
  qssOl->wsaBuf.len=MAX_BUF_SIZE,qssOl->wsaBuf.buf=qssOl->buf,qssOl->numberOfBytesTransferred=0,qssOl->flags=0;//initialize WSABuf.
  memset(&qssOl->overlapped,0,sizeof(WSAOVERLAPPED));  
  {
   DWORD lastErr=GetLastError();
   int ret=0;
   SetLastError(0);
   ret=WSARecv(cs,&qssOl->wsaBuf,1,&qssOl->numberOfBytesTransferred,&qssOl->flags,&qssOl->overlapped,NULL);
   if(ret==0||(ret==SOCKET_ERROR&&GetLastError()==WSA_IO_PENDING)){
    InterlockedIncrement(&qss->CSocketsCounter);//Accepted-socket计数递增.
    if(qss->cslifecb)
     qss->cslifecb(cs,0);
    qssOl=NULL;
   }    
   
   if(!GetLastError())
    SetLastError(lastErr);
  }
  
  printf("accept flags:%d ,cs:%d.\n",GetLastError(),cs);
 }//end while.

 if(qssOl)
  free(qssOl);
 if(qss)
  shutdownSocketServer((SocketServer *)qss);
 if(curThread)
  CloseHandle(curThread);

 return 1;
}

static int postRecvCompletionPacket(QSSOverlapped * qssOl,int SOErrOccurredCode){ 
 int SOErrOccurred=0; 
 DWORD lastErr=GetLastError();
 SetLastError(0);
 //SOCKET_ERROR:-1,WSA_IO_PENDING:997
 if(WSARecv(qssOl->client_s,&qssOl->wsaBuf,1,&qssOl->numberOfBytesTransferred,&qssOl->flags,&qssOl->overlapped,NULL)==SOCKET_ERROR
  &&GetLastError()!=WSA_IO_PENDING)//this case lastError maybe 64, 10054 
 {
  SOErrOccurred=SOErrOccurredCode;  
 }      
 if(!GetLastError())
  SetLastError(lastErr); 
 if(SOErrOccurred)
  printf("worker[%d] postRecvCompletionPacket SOErrOccurred=%d,preErr:%d,postedErr:%d\n",GetCurrentThreadId(),SOErrOccurred,lastErr,GetLastError());
 return SOErrOccurred;
}

DWORD  completionWorkerRoutine(LPVOID ss){
 QSSWORKER_PARAM * pParam=(QSSWORKER_PARAM *)ss;
 QSocketServer * qss=pParam->qss;
 HANDLE curThread=pParam->th;
 QSSOverlapped * qssOl=NULL;
 DWORD numberOfBytesTransferred=0;
 ULONG_PTR completionKey=0;
 int postRes=0,handleCode=0,exitCode=0,SOErrOccurred=0; 
 free(pParam);
 while(!exitCode){
  SetLastError(0);
  if(GetQueuedCompletionStatus(qss->iocpHandle,&numberOfBytesTransferred,&completionKey,(LPOVERLAPPED *)&qssOl,qss->workerWaitTimeout)){
   if(completionKey==-1&&qss->lifecycleStatus>=4)
   {
    printf("worker[%d] completionKey -1:%d \n",GetCurrentThreadId(),GetLastError());
    if(qss->workerCounter>1)
     PostQueuedCompletionStatus(qss->iocpHandle,0,-1,NULL);
    exitCode=1;
    break;
   }
   if(numberOfBytesTransferred>0){   
    
    InterlockedIncrement(&qss->currentBusyWorkers);
    addQSSWorker(qss,1);
    handleCode=qss->protoHandler((LPWSAOVERLAPPED)qssOl);    
    InterlockedDecrement(&qss->currentBusyWorkers);    
    
    if(handleCode>=0){
     SOErrOccurred=postRecvCompletionPacket(qssOl,1);
    }else
     SOErrOccurred=2;    
   }else{
    printf("worker[%d] numberOfBytesTransferred==0 ***** closesocket servS or cs *****,%d,%d ,ol is:%d\n",GetCurrentThreadId(),GetLastError(),completionKey,qssOl==NULL?0:1);
    SOErrOccurred=3;     
   }  
  }else{ //GetQueuedCompletionStatus rtn FALSE, lastError 64 ,995[timeout worker thread exit.] ,WAIT_TIMEOUT:258        
   if(qssOl){
    SOErrOccurred=postRecvCompletionPacket(qssOl,4);
   }else {    

    printf("worker[%d] GetQueuedCompletionStatus F:%d \n",GetCurrentThreadId(),GetLastError());
    if(GetLastError()!=WAIT_TIMEOUT){
     exitCode=2;     
    }else{//wait timeout     
     if(qss->lifecycleStatus!=4&&qss->currentBusyWorkers==0&&qss->workerCounter>qss->minThreads){
      EnterCriticalSection(&qss->QSS_LOCK);
      if(qss->lifecycleStatus!=4&&qss->currentBusyWorkers==0&&qss->workerCounter>qss->minThreads){
       qss->workerCounter--;//until qss->workerCounter decrease to qss->minThreads
       exitCode=3;      
      }
      LeaveCriticalSection(&qss->QSS_LOCK);
     }
    }    
   }    
  }//end GetQueuedCompletionStatus.

  if(SOErrOccurred){   
   if(qss->cslifecb)
    qss->cslifecb(qssOl->client_s,-1);
   /*if(qssOl)*/{
    closesocket(qssOl->client_s);
    free(qssOl);
   }
   if(InterlockedDecrement(&qss->CSocketsCounter)==0&&qss->lifecycleStatus>=3){    
    //for qss workerSize,PostQueuedCompletionStatus -1
    qss->lifecycleStatus=4,PostQueuedCompletionStatus(qss->iocpHandle,0,-1,NULL);        
    exitCode=4;
   }
  }
  qssOl=NULL,numberOfBytesTransferred=0,completionKey=0,SOErrOccurred=0;//for net while.
 }//end while.

 //last to do 
 if(exitCode!=3){ 
  int clearup=0;
  EnterCriticalSection(&qss->QSS_LOCK);
  if(!--qss->workerCounter&&qss->lifecycleStatus>=4){//clearup QSS
    clearup=1;
  }
  LeaveCriticalSection(&qss->QSS_LOCK);
  if(clearup){
   DeleteCriticalSection(&qss->QSS_LOCK);
   CloseHandle(qss->iocpHandle);
   free(qss); 
  }
 }
 CloseHandle(curThread);
 return 1;
}
------------------------------------------------------------------------------------------------------------------------
    对于IOCP的LastError的L别和处理是个隄,所以请注意我的completionWorkerRoutine的whilel构,
l构如下:
while(!exitCode){
    if(completionKey==-1){...break;}
    if(GetQueuedCompletionStatus){/*在这个if体中只要你投递的OVERLAPPED is not NULL,那么q里你得到的是?/strong>.*/
        if(numberOfBytesTransferred>0){
               /*在这里handle request,记得要l投递你的OVERLAPPED? */
        }else{
              /*q里可能客户端或服务端closesocket(the socket),但是OVERLAPPED is not NULL,只要你投递的不ؓNULL!*/
        }
    }else{/*在这里的if体中,虽然GetQueuedCompletionStatus return FALSE,但是不代表OVERLAPPED一定ؓNULL.特别是OVERLAPPED is not NULL的情况下,不要以ؓLastError发生?׃表当前的socket无用或发生致命的异常,比如发生lastError:995q种情况下此时的socket有可能是一切正常的可用?你不应该关闭?/strong>.*/
        if(OVERLAPPED is not NULL){
             /*q种情况?请不?7,21l箋投递吧!在投递后再检错?/strong>.*/
        }else{ 

        }
    }
  if(socket error occured){

  }
  prepare for next while.

    行文仓促,隑օ有错误或不之处,希望大家t跃指正评论,谢谢!

    q个模型在性能上还是有改进的空间哦Q?/strong>


from:

http://www.shnenglu.com/adapterofcoms/archive/2010/06/26/118781.aspx



chatler 2010-08-25 20:42 发表评论
]]>
一个基于Event Poll(epoll)的TCP Server Framework,析epollhttp://www.shnenglu.com/beautykingdom/archive/2010/08/25/124730.htmlchatlerchatlerWed, 25 Aug 2010 12:41:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/08/25/124730.htmlhttp://www.shnenglu.com/beautykingdom/comments/124730.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/08/25/124730.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/124730.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/124730.html阅读全文

chatler 2010-08-25 20:41 发表评论
]]>
TCP: SYN ACK FIN RST PSH URG 详解http://www.shnenglu.com/beautykingdom/archive/2010/07/16/120546.htmlchatlerchatlerFri, 16 Jul 2010 06:14:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/07/16/120546.htmlhttp://www.shnenglu.com/beautykingdom/comments/120546.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/07/16/120546.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/120546.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/120546.html 版权声明Q{载时请以链接Ş式标明文章原始出处和作者信息及本声?/a>
http://xufish.blogbus.com/logs/40536553.html

TCP 的三ơ握?/strong>是怎么q行的了Q发送端发送一个SYN=1QACK=0标志的数据包l接收端Q请求进行连接,q是W一ơ握手;接收端收到请 求ƈ且允许连接的话,׃发送一个SYN=1QACK=1标志的数据包l发送端Q告诉它Q可以通讯了,q且让发送端发送一个确认数据包Q这是第二次握手Q? 最后,发送端发送一个SYN=0QACK=1的数据包l接收端Q告诉它q接已被认Q这是W三ơ握手。之后,一个TCPq接建立Q开始通讯?/p>

*SYNQ同步标?br style="line-height: normal;">同步序列~号(Synchronize Sequence Numbers)栏有效。该标志仅在三次握手建立TCPq接时有效。它提示TCPq接的服务端查序列编P该序列编号ؓTCPq接初始?一般是客户 ?的初始序列编受在q里Q可以把TCP序列~号看作是一个范围从0?Q?94Q?67Q?95?2位计数器。通过TCPq接交换的数据中每一个字 节都l过序列~号。在TCP报头中的序列~号栏包括了TCP分段中第一个字节的序列~号?/p>

*ACKQ确认标?br style="line-height: normal;">认~号(Acknowledgement Number)栏有效。大多数情况下该标志位是|位的。TCP报头内的认~号栏内包含的确认编?w+1QFigure-1)Z一个预期的序列~号Q? 同时提示q端pȝ已经成功接收所有数据?/p>

*RSTQ复位标?br style="line-height: normal;">复位标志有效。用于复位相应的TCPq接?/p>

*URGQ紧急标?br style="line-height: normal;">紧?The urgent pointer) 标志有效。紧急标志置位,

*PSHQ推标志
? 标志|位Ӟ接收端不该数据q行队列处理Q而是可能快数据{由应用处理。在处理 telnet ?rlogin {交互模式的q接Ӟ该标志L|位的?/p>

*FINQ结束标?br style="line-height: normal;">带有该标志置位的数据包用来结束一个TCP回话Q但对应端口仍处于开攄态,准备接收后箋数据?/p>

=============================================================

三次握手Three-way Handshake

一个虚拟连接的建立是通过三次握手来实现的

1. (B) --> [SYN] --> (A)

假如? 务器A和客hB通讯. 当A要和B通信ӞB首先向A发一个SYN (Synchronize) 标记的包Q告诉Ah建立q接.

注意: 一? SYN包就是仅SYN标记设ؓ1的TCP?参见TCP包头Resources). 认识到这点很重要Q只有当A受到B发来的SYN包,才可建立q接Q除此之外别无他法。因此,如果你的防火墙丢弃所有的发往外网接口的SYN包,那么你将? 能让外部MLd建立q接?br style="line-height: normal;">
2. (B) <-- [SYN/ACK] <--(A)

接着QA收到后会发一个对SYN包的认?SYN/ACK)? 去,表示对第一个SYN包的认Qƈl箋握手操作.

注意: SYN/ACK包是仅SYN ?ACK 标记?的包.

3. (B) --> [ACK] --> (A)

B收到SYN/ACK ?B发一个确认包(ACK)Q通知Aq接已徏立。至此,三次握手完成Q一个TCPq接完成

Note: ACK包就是仅ACK 标记设ؓ1的TCP? 需要注意的是当三此握手完成、连接徏立以后,TCPq接的每个包都会讄ACK?br style="line-height: normal;">
q就是ؓ何连接跟t很重要的原因了. 没有q接跟踪,防火墙将无法判断收到的ACK包是否属于一个已l徏立的q接.一般的包过?Ipchains)收到ACK包时,会让它通过(q绝对不是个 好主?. 而当状态型防火墙收到此U包Ӟ它会先在q接表中查找是否属于哪个已徏q接Q否则丢弃该?br style="line-height: normal;">
四次握手Four-way Handshake

四次握手用来关闭已徏 立的TCPq接

1. (B) --> ACK/FIN --> (A)

2. (B) <-- ACK <-- (A)

3. (B) <-- ACK/FIN <-- (A)

4. (B) --> ACK --> (A)

注意: ׃TCPq接是双向连? 因此关闭q接需要在两个方向上做。ACK/FIN ?ACK 和FIN 标记设ؓ1)通常被认为是FIN(l结)?然? ׃q接q没有关? FIN包L打上ACK标记. 没有ACK标记而仅有FIN标记的包不是合法的包Qƈ且通常被认为是恶意?br style="line-height: normal;">
q接复位Resetting a connection

四次握手不是关闭 TCPq接的唯一Ҏ. 有时,如果L需要尽快关闭连?或连接超?端口或主Z可达),RST (Reset)包将被发? 注意在,׃RST包不是TCPq接中的必须部分, 可以只发送RST?即不带ACK标记). 但在正常的TCPq接中RST包可以带ACK认标记

h意RST包是? 以不要收到方认?

无效的TCP标记Invalid TCP Flags

到目前ؓ止,你已l看C SYN, ACK, FIN, 和RST 标记. 另外Q还有PSH (Push) 和URG (Urgent)标记.

最常见的非法组合是SYN/FIN ? 注意:׃ SYN包是用来初始化连接的, 它不可能?FIN和RST标记一起出? q也是一个恶意攻?

׃现在大多数防火墙已知 SYN/FIN ? 别的一些组?例如SYN/FIN/PSH, SYN/FIN/RST, SYN/FIN/RST/PSH。很明显Q当|络中出现这U包Ӟ很你的网l肯定受到攻M?br style="line-height: normal;">
别的已知的非法包有FIN (无ACK标记)?NULL"包。如同早先讨论的Q由于ACK/FIN包的出现是ؓ了关闭一个TCPq接Q那么正常的FIN包L带有 ACK 标记?NULL"包就是没有Q何TCP标记的包(URG,ACK,PSH,RST,SYN,FIN都ؓ0)?br style="line-height: normal;">
到目前ؓ止,正常的网 l活动下QTCP协议栈不可能产生带有上面提到的Q何一U标记组合的TCP包。当你发现这些不正常的包Ӟ肯定有h对你的网l不怀好意?br style="line-height: normal;">
UDP (用户数据包协议User Datagram Protocol)
TCP是面向连? 的,而UDP是非q接的协议。UDP没有Ҏ受进行确认的标记和确认机制。对丢包的处理是在应用层来完成的?or accidental arrival).

此处需要重Ҏ意的事情是:在正常情况下Q当UDP包到达一个关闭的端口Ӟ会返回一个UDP复位包。由于UDP是非面向q接? 因此没有M认信息来确认包是否正确到达目的地。因此如果你的防火墙丢弃UDP包,它会开放所有的UDP端口(?)?br style="line-height: normal;">
׃Internet 上正常情况下一些包被丢弃Q甚x些发往已关闭端?非防火墙?的UDP包将不会到达目的Q它们将q回一个复位UDP包?br style="line-height: normal;">
因ؓq个原因QUDP 端口扫描L不精、不可靠的?br style="line-height: normal;">
看v来大UDP包的片是常见的DOS (Denial of Service)d的常见Ş?(q里有个DOSd的例子,http://grc.com/dos/grcdos.htm ).

ICMP (|间控制消息协议Internet Control Message Protocol)
如同名字一P ICMP用来在主?路由器之间传递控制信息的协议?ICMP包可以包含诊断信?ping, traceroute - 注意目前unixpȝ中的traceroute用UDP包而不是ICMP)Q错误信?|络/L/端口 不可? network/host/port unreachable), 信息(旉戳timestamp, 地址掩码address mask request, etc.)Q或控制信息 (source quench, redirect, etc.) ?br style="line-height: normal;">
你可以在http://www.iana.org/assignments/icmp-parameters? 扑ֈICMP包的cd?br style="line-height: normal;">
管ICMP通常是无害的Q还是有些类型的ICMP信息需要丢弃?br style="line-height: normal;">
Redirect (5), Alternate Host Address (6), Router Advertisement (9) 能用来{发通讯?br style="line-height: normal;">
Echo (8), Timestamp (13) and Address Mask Request (17) 能用来分别判断主机是否v来,本地旉和地址掩码。注意它们是和返回的信息cd有关的。它们自己本w是不能被利用的Q但它们泄露出的信息Ҏ击者是有用 的?br style="line-height: normal;">
ICMP 消息有时也被用来作ؓDOSd的一部分(例如Q洪水ping flood ping,?ping ?呵呵Q有?ping of death)?/p>

包碎片注意A Note About Packet Fragmentation

如果一个包的大超q了TCP的最大段长度MSS (Maximum Segment Size) 或MTU (Maximum Transmission Unit)Q能够把此包发往目的的唯一Ҏ是把此包分片。由于包分片是正常的Q它可以被利用来做恶意的d?br style="line-height: normal;">
因ؓ分片的包的第一? 分片包含一个包_若没有包分片的重l功能,包过滤器不可能检附加的包分片。典型的dTypical attacks involve in overlapping the packet data in which packet header is 典型的攻击Typical attacks involve in overlapping the packet data in which packet header isnormal until is it overwritten with different destination IP (or port) thereby bypassing firewall rules。包分片能作?DOS d的一部分Q它可以crash older IP stacks 或涨死CPUq接能力?br style="line-height: normal;">
Netfilter/Iptables中的q接跟踪代码能自动做分片重组。它仍有qQ可? 受到饱和q接dQ可以把CPU资源耗光?br style="line-height: normal;">
握手阶段Q?br style="line-height: normal;">序号 方向 seq ack
1  A->B 10000 0
2 B->A 20000 10000+1=10001
3 A->B 10001 20000+1=20001
解释Q?br style="line-height: normal;">1QA向B发v q接hQ以一个随机数初始化A的seq,q里假设?0000Q此时ACKQ?

2QB收到A的连接请求后Q也以一个随机数初始化B的seqQ这里假设ؓ20000Q意? 是:你的h我已收到Q我q方的数据流׃q个数开始。B的ACK是A的seq?Q即10000Q?Q?0001

3QA收到B的回? 后,它的seq是它的上个请求的seq?Q即10000Q?Q?0001Q意思也是:你的回复我收CQ我q方的数据流׃q个数开始。A此时的ACK 是B的seq?Q即20000+1=20001


数据传输阶段Q?br style="line-height: normal;">序号  方向      seq ack size
23 A->B 40000 70000 1514
24 B->A 70000 40000+1514-54=41460 54
25 A->B 41460 70000+54-54=70000 1514
26 B->A 70000 41460+1514-54=42920 54
解释Q?br style="line-height: normal;">23:B接收? A发来的seq=40000,ack=70000,size=1514的数据包
24: 于是B向A也发一个数据包Q告诉BQ你的上个包我收C。B的seq׃它收到的数据包的ACK填充QACK是它收到的数据包的SEQ加上数据包的大小 (不包括以太网协议_IP_TCP?Q以证实B发过来的数据全收C?br style="line-height: normal;">25:A 在收到B发过来的ack?1460的数据包Ӟ一看到41460Q正好是它的上个数据包的seq加上包的大小Q就明白Q上ơ发送的数据包已安全到达。于 是它再发一个数据包lB。这个正在发送的数据包的seq也以它收到的数据包的ACK填充QACK׃它收到的数据包的seq(70000)加上包的 size(54)填充,即ack=70000+54-54(全是头长Q没数据??br style="line-height: normal;">
其实在握手和l束时确认号应该是对方序列号?,传输数据时则是对方序列号加上Ҏ携带? 用层数据的长?如果从以太网包返回来计算所加的长度,嫌走弯路了.
另外,如果? Ҏ有数据过?则自q认号不?序列号ؓ上次的序列号加上本次应用层数据发送长?/span>

chatler 2010-07-16 14:14 发表评论
]]>
NAT的缺?/title><link>http://www.shnenglu.com/beautykingdom/archive/2010/07/13/120225.html</link><dc:creator>chatler</dc:creator><author>chatler</author><pubDate>Tue, 13 Jul 2010 07:28:00 GMT</pubDate><guid>http://www.shnenglu.com/beautykingdom/archive/2010/07/13/120225.html</guid><wfw:comment>http://www.shnenglu.com/beautykingdom/comments/120225.html</wfw:comment><comments>http://www.shnenglu.com/beautykingdom/archive/2010/07/13/120225.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.shnenglu.com/beautykingdom/comments/commentRss/120225.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/beautykingdom/services/trackbacks/120225.html</trackback:ping><description><![CDATA[NAT的优点不必多?它提供了一pd相关技术来实现多个内网用户通过一个公|ip和外部通信,有效的解决了ipv4地址不够用的问题.那么位于NAT? 的用户用私|ip真的和用公|ip一样吗?NAT解决了所有地址转换的相关问题了?<br>下面主要讲一些NAT不支持的斚w,以及所谓的NAT ?~陷".<br><br>一些应用层协议(如TCP和SIP),在它们的应用层数据中需要包含公|IP地址.拿FTP来说?众所周知,FTP是通过 两个不同的连接来传输控制报文和数据报文的.当传输一个文件时,FTP服务器要求通过控制报文得到卛_传输的数据报文的|络层和传输层地址 (IP/PORT).如果q个时候客户主机是在NAT之后?那么服务器端收到的ip/port会是NAT转化前的U网IP地址,从而会D文g传输? ?<br>SIP(Session Initiation Protocol)主要是来控制音频传输?q个协议也面临同L问题.因ؓSIP建立q接?需要用到几个不同的端口来通过RTP传输音频?而且q些 端口以及IP会被~码到音频流?传输l服务器?从而实现后l的通信.<br>如果没有一些特D的技?如STUN),那么NAT是不支持q些协议? q些协议l过NAT也肯定会p|.<br><span style="color: #000166;">Some Application Layer protocols (such as FTP and SIP) send explicit network addresses within their application data. FTP in active mode, for example, uses separate connections for control traffic (commands) and for data traffic (file contents). When requesting a file transfer, the host making the request identifies the corresponding data connection by its network layer and transport layer addresses. If the host making the request lies behind a simple NAT firewall, the translation of the IP address and/or TCP port number makes the information received by the server invalid. The Session Initiation Protocol (SIP) controls Voice over IP (VoIP) communications and suffers the same problem. SIP may use multiple ports to set up a connection and transmit voice stream via RTP. IP addresses and port numbers are encoded in the payload data and must be known prior to the traversal of NATs. Without special techniques, such as STUN, NAT behavior is unpredictable and communications may fail.</span><br><br>? 面讲一些特D的技?来NAT支持q些Ҏ的应用层协议.<br><br>最直观的想法就?既然NAT修改了IP/PROT,那么我们也修改应用层? 据中相应的IP/PORT.应用层网?ALG)(g或Y仉?是q样来解册个问题的.应用层网兌行在讄了NAT的防火墙讑֤?它会更新? 输数据中的IP/PORT.所?应用层网关也必须能够解析应用层协?而且对于每一U协?可能需要不同的应用层网x?<br><span style="color: #000166;">Application Layer Gateway (ALG) software or hardware may correct these problems. An ALG software module running on a NAT firewall device updates any payload data made invalid by address translation. ALGs obviously need to understand the higher-layer protocol that they need to fix, and so each protocol with this problem requires a separate ALG.</span><br><br>另外一个解x问题的办法就是NATIK?此方法主要利用STUN? ICE{协议或者一些和会话控制相关的特有的Ҏ来实?理论上NATIK最好能够同旉用于基于TCP和基于UDP的应?但是ZUDP的应用相Ҏ 较简?更广为流?也更适合兼容一些种cȝNAT做穿?q样,应用层协议在设计的时?必须考虑到可支持NATIK?但一些其他类型的NAT(比如? UNAT)是无论如何也不能做穿透的.<br><span style="color: #000166;">Another possible solution to this problem is to use NAT traversal techniques using protocols such as STUN or ICE or proprietary approaches in a session border controller. NAT traversal is possible in both TCP- and UDP-based applications, but the UDP-based technique is simpler, more widely understood, and more compatible with legacy NATs. In either case, the high level protocol must be designed with NAT traversal in mind, and it does not work reliably across symmetric NATs or other poorly-behaved legacy NATs.</span><br><br><br>q有一些方?比如UPnP (Universal Plug and Play) ?Bonjour (NAT-PMP),但是q些Ҏ都需要专门的NAT讑֤.<br><span style="color: #000166;">Other possibilities are UPnP (Universal Plug and Play) or Bonjour (NAT-PMP), but these require the cooperation of the NAT device.</span><br><br><br>大部分传l的客户-服务器协?除了FTP),都不定义3层以上的数据? ?所?也就可以和传l的NAT兼容.实际?在设计应用层协议的时候应量避免涉及?层以上的数据,因ؓq样会它兼容NAT时复杂化.<br><span style="color: #000166;">Most traditional client-server protocols (FTP being the main exception), however, do not send layer 3 contact information and therefore do not require any special treatment by NATs. In fact, avoiding NAT complications is practically a requirement when designing new higher-layer protocols today.</span><br style="color: #000166;"><br><br>NAT也会和利用ipsec加密的一些应用冲H?比如SIP电话,如果有很多SIP电话讑֤? NA(P)T之后,那么在电话利用ipsc加密它们的信h,如果也加密了port信息,那么q就意味着NAPT׃能{换port,只能转换IP.但是 q样׃D回来的数据包都被NAT到同一个客L,从而导致通信p|(不太明白).不过,q个问题有很多方法来解决,比如用TLS.TLS是运行在W四 ?OSI模型)?所以它不包含port信息.也可以在UDP之内来封装ipsec,TISPAN 是用这U方法来实现安全NAT转化?<br><span style="color: #000166;">NATs can also cause problems where IPsec encryption is applied and in cases where multiple devices such as SIP phones are located behind a NAT. Phones which encrypt their signaling with IPsec encapsulate the port information within the IPsec packet meaning that NA(P)T devices cannot access and translate the port. In these cases the NA(P)T devices revert to simple NAT operation. This means that all traffic returning to the NAT will be mapped onto one client causing the service to fail. There are a couple of solutions to this problem, one is to use TLS which operates at level 4 in the OSI Reference Model and therefore does not mask the port number, or to Encapsulate the IPsec within UDP - the latter being the solution chosen by TISPAN to achieve secure NAT traversal.</span><br><br><br>Dan Kaminsky ?008q的时候提出NAPTq会间接的媄响DNS协议的健壮?Z避免DNS服务器缓存中?在NA(p)T防火墙之后的DNS服务器最好不要{? 来自外部的DNSh(UDP)的源端口.而对DNS~存中毒d的应Ҏ施就是所有的DNS服务器用随机的端口来接收DNSh.但如果NA(P)T 使DNSh的源端口也随机化,那么在NA(P)T防火墙后面的DNS服务器还是会崩溃?<br><span style="color: #000166;">The DNS protocol vulnerability announced by Dan Kaminsky on 2008 July 8 is indirectly affected by NAT port mapping. To avoid DNS server cache poisoning, it is highly desirable to not translate UDP source port numbers of outgoing DNS requests from any DNS server which is behind a firewall which implements NAT. The recommended work-around for the DNS vulnerability is to make all caching DNS servers use randomized UDP source ports. If the NAT function de-randomizes the UDP source ports, the DNS server will be made vulnerable.</span><br><br>? 于NAT后的L不能实现真的端对端的通信,也不能用一些和NAT冲突的internat协议.而且从外部发LTCPq接和一些无状态的协议(利用 udp的上层协?也不能正常的q行,除非NAT所在设备通过相关技术支持这些协?一些协议能够利用应用层|关或其他技?来只有一端处于NAT后的 通信双方正常通信.但要是双斚w在NAT后就会失?NAT也和一些隧道协?如ipsec)冲突,因ؓNAT会修改ip或port,从而会使协议的完整 性校验失?<br><span style="color: #000166;">Hosts behind NAT-enabled routers do not have end-to-end connectivity and cannot participate in some Internet protocols. Services that require the initiation of TCP connections from the outside network, or stateless protocols such as those using UDP, can be disrupted. Unless the NAT router makes a specific effort to support such protocols, incoming packets cannot reach their destination. Some protocols can accommodate one instance of NAT between participating hosts ("passive mode" FTP, for example), sometimes with the assistance of an application-level gateway (see below), but fail when both systems are separated from the Internet by NAT. Use of NAT also complicates tunneling protocols such as IPsec because NAT modifies values in the headers which interfere with the integrity checks done by IPsec and other tunneling protocols.</span><br><br><br>端对端的q接? internet设计时的一个重要的核心的基本原?而NAT是违背这一原则?但是NAT在设计的时候也充分地考虑Cq些问题.现在Zipv6? NAT已经被广泛关?但许多ipv6架构设计者认为ipv6应该摒弃NAT.<br><span style="color: #000166;">End-to-end connectivity has been a core principle of the Internet, supported for example by the Internet Architecture Board. Current Internet architectural documents observe that NAT is a violation of the End-to-End Principle, but that NAT does have a valid role in careful design. There is considerably more concern with the use of IPv6 NAT, and many IPv6 architects believe IPv6 was intended to remove the need for NAT.</span><br><br><br>׃NAT的连接追t具有短时效?所以在特定的地址转换关系会在一段旉后失? 除非遵守NAT的keep-alive机制,内网L不时的去讉K外部L.q至会造成一些不必要的消?比如消耗手持设备的电量.<br><span style="color: #000166;">Because of the short-lived nature of the stateful translation tables in NAT routers, devices on the internal network lose IP connectivity typically within a very short period of time unless they implement NAT keep-alive mechanisms by frequently accessing outside hosts. This dramatically shortens the power reserves on battery-operated hand-held devices and has thwarted more widespread deployment of such IP-native Internet-enabled devices.</span><br style="color: #000166;"><br><br>一些IPS会直接提供给用户U网IP地址,q样用户必通过IPS? NAT来和外部INTERNET通信.q样,用户实际上没有实现端对端通信,中间加了一个IPS的NAT,q有悖于Internet Architecture Board列出的internal核心基本原则.<br><span style="color: #000166;">Some Internet service providers (ISPs) provide their customers only with "local" IP addresses.[citation needed]Thus, these customers must access services external to the ISP's network through NAT. As a result, the customers cannot achieve true end-to-end connectivity, in violation of the core principles of the Internet as laid out by the Internet Architecture Board.</span><br style="color: #000166;"><br>NAT 最后的一个缺陷就?NAT的推q和使用,解决了ipv4下IP地址不够用的问题,大大的推q了IPV6的发?<br>(说它是优点好?q是~陷? ?)<br><span style="color: #000166;">it is possible that its [NAT] widespread use will significantly delay the need to deploy IPv6</span><br><br>Reference:<br><a target="_blank">Network address translation</a><br><br>from:<br>http://blog.chinaunix.net/u2/86590/showart.php?id=2208148<br><img src ="http://www.shnenglu.com/beautykingdom/aggbug/120225.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/beautykingdom/" target="_blank">chatler</a> 2010-07-13 15:28 <a href="http://www.shnenglu.com/beautykingdom/archive/2010/07/13/120225.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Linux下面socket~程的非dTCP 研究http://www.shnenglu.com/beautykingdom/archive/2010/07/07/119615.htmlchatlerchatlerWed, 07 Jul 2010 09:14:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/07/07/119615.htmlhttp://www.shnenglu.com/beautykingdom/comments/119615.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/07/07/119615.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/119615.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/119615.html

tcp协议? w是可靠?q不{于应用E序用tcp发送数据就一定是可靠?不管是否d,send发送的大小,q不代表对端recv到多的数据.

?span style="line-height: normal; color: #ff0000;">d模式? send函数的过E是应用程序请求发送的数据拯到发送缓存中发送ƈ得到认后再q回.但由于发送缓存的存在,表现?如果发送缓存大比h发送的? 要?那么send函数立即q回,同时向网l中发送数?否则,send向网l发送缓存中不能容纳的那部分数据,q等待对端确认后再返?接收端只要将 数据收到接收~存?׃认,q不一定要{待应用E序调用recv);

?/span>非阻塞模?/span>?send函数的过E仅仅是数据拷 贝到协议栈的~存已,如果~存区可用空间不?则尽能力的拷?q回成功拯的大?如缓存区可用I间?,则返?1,同时讄errno? EAGAIN.


linux下可?span style="line-height: normal; color: #cc3333;">sysctl -a | grep net.ipv4.tcp_wmem查看pȝ? 认的发送缓存大?

net.ipv4.tcp_wmem = 4096 16384 81920
q? 有三个?W一个值是socket的发送缓存区分配的最字节数,W二个值是默认?该g被net.core.wmem_default覆盖),~存? 在系l负载不重的情况下可以增长到q个?W三个值是发送缓存区I间的最大字节数(该g被net.core.wmem_max覆盖).
Ҏ实际试, 如果手工更改了net.ipv4.tcp_wmem的?则会按更改的值来q行,否则在默认情况下,协议栈通常是按 net.core.wmem_default和net.core.wmem_max的值来分配内存?

应用E序应该Ҏ应用的特性在E序中更改发送缓存大?

socklen_t sendbuflen = 0;
socklen_t len = sizeof(sendbuflen);
getsockopt(clientSocket, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("default,sendbuf:%d\n", sendbuflen);

sendbuflen = 10240;
setsockopt(clientSocket, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, len);
getsockopt(clientSocket, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("now,sendbuf:%d\n", sendbuflen);


需要注意的?虽然发送缓存设|? 成了10k,但实际上,协议栈会其扩大1?设ؓ20k.


-------------------? 例分?---------------------


? 实际应用?如果发送端是非d发?׃|络的阻塞或者接收端处理q慢,通常出现的情冉|,发送应用程序看h发送了10k的数?但是只发送了2k? 对端~存?q有8k在本机缓存中(未发送或者未得到接收端的认).那么此时,接收应用E序能够收到的数据ؓ2k.假如接收应用E序调用recv函数? 取了1k的数据在处理,在这个瞬?发生了以下情况之一,双方表现?

A. 发送应用程序认为send完了10k数据,关闭了socket:
? 送主Z为tcp的主动关闭?q接处于FIN_WAIT1的半关闭状?{待Ҏ的ack),q且,发送缓存中?k数据q不清除,依然会发送给? ?如果接收应用E序依然在recv,那么它会收到余下?k数据(q个前题?接收端会在发送端FIN_WAIT1状态超时前收到余下?k数据.), 然后得到一个对端socket被关闭的消息(recvq回0).q时,应该q行关闭.

B. 发送应用程序再ơ调用send发?k的数?
? 如发送缓存的I间?0k,那么发送缓存可用空间ؓ20-8=12k,大于h发送的8k,所以send函数数据做拯?q立卌?192;

? 如发 送缓存的I间?2k,那么此时发送缓存可用空间还?2-8=4k,send()会返?096,应用E序发现q回的值小于请求发送的大小值后,可以? 为缓存区已满,q时必须d(或通过select{待下一ơsocket可写的信?,如果应用E序不理?立即再次调用send,那么会得?1的? 在linux下表Cؓerrno=EAGAIN.


C. 接收应用E序在处理完1k数据?关闭了socket:
? 收主ZZ动关闭?q接处于FIN_WAIT1的半关闭状?{待Ҏ的ack).然后,发送应用程序会收到socket可读的信?通常? select调用q回socket可读),但在d时会发现recv函数q回0,q时应该调用close函数来关闭socket(发送给Ҏack);

? 果发送应用程序没有处理这个可ȝ信号,而是在send,那么q要分两U情冉|考虑,假如是在发送端收到RST标志之后调用send,send返? -1,同时errno设ؓECONNRESET表示对端|络已断开,
但是,也有说法是进E会收到SIGPIPE信号, 该信L默认响应动作是退E?如果忽略该信?那么send是返?1,errno为EPIPE(未证?;如果是在发送端收到RST标志之前,则send像往怸样工?

以上说的是非d? send情况,假如send是阻塞调?q且正好处于d?例如一ơ性发送一个巨大的buf,出了发送缓?,对端socket关闭,那么send? q回成功发送的字节?如果再次调用send,那么会同上一?

D. 交换机或路由器的|络断开:
接收应用E序在处理完已收到的1k数据?会l从~存 取余下的1k数据,然后pCؓ无数据可ȝ现象,q种情况需要应用程序来处理时.一般做法是讑֮一个select{待的最大时?如果出q个旉? 然没有数据可?则认为socket已不可用.

? 送应用程序会不断的将余下的数据发送到|络?但始l得不到认,所以缓存区的可用空间持lؓ0,q种情况也需要应用程序来处理.

如果不由应用E序来处理这U情况超时的情况,也可以通过tcp协议本n来处?具体可以? 看sysctl中?
net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_keepalive_time

 原文地址 http://xufish.blogbus.com/logs/40537344.html

from:
http://blog.chinaunix.net/u2/67780/showart_2056353.html


chatler 2010-07-07 17:14 发表评论
]]>
教你用c实现http协议http://www.shnenglu.com/beautykingdom/archive/2010/06/27/118839.htmlchatlerchatlerSun, 27 Jun 2010 15:16:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/06/27/118839.htmlhttp://www.shnenglu.com/beautykingdom/comments/118839.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/06/27/118839.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/118839.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/118839.html大家都很熟悉HTTP协议的应用,因ؓ每天都在|络上浏览着不少东西Q也都知道是HTTP协议是相当简单的。每ơ用 thunder之类的下载Y件下载网,当用到那?#8220;用thunder下蝲全部链接”时总觉得很奇?br> 后来xQ其实要实现q些下蝲功能也ƈ不难Q只要按照HTTP协议发送requestQ然后对接收到的数据q行分析Q如果页面上q有href之类的链接指 向标志就可以q行׃层的下蝲了。HTTP协议目前用的最多的?.1 版本Q要全面透彻地搞懂它参考RFC2616文档吧。我是怕rfc文档了的,要看自己ȝ吧^_^
源代码如下:
/******* http客户端程?httpclient.c ************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <errno.h>
#include <unistd.h>
#include <netinet/in.h>
#include <limits.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <ctype.h>

//////////////////////////////httpclient.c 开?//////////////////////////////////////////


/********************************************
功能Q搜索字W串双LW一个匹配字W?br> ********************************************/
char * Rstrchr(char * s, char x) {
  int i = strlen(s);
  if(!(*s)) return 0;
  while(s[i-1]) if(strchr(s + (i - 1), x)) return (s + (i - 1)); else i--;
  return 0;
}

/********************************************
功能Q把字符串{换ؓ全小?br> ********************************************/
void ToLowerCase(char * s) {
  while(s && *s) {*s=tolower(*s);s++;}
}

/**************************************************************
功能Q从字符串src中分析出|站地址和端口,q得到用戯下蝲的文?br> ***************************************************************/
void GetHost(char * src, char * web, char * file, int * port) {
  char * pA;
  char * pB;
  memset(web, 0, sizeof(web));
  memset(file, 0, sizeof(file));
  *port = 0;
  if(!(*src)) return;
  pA = src;
  if(!strncmp(pA, "http://", strlen("http://"))) pA = src+strlen("http://");
  else if(!strncmp(pA, "https://", strlen("https://"))) pA = src+strlen("https://");
  pB = strchr(pA, '/');
  if(pB) {
    memcpy(web, pA, strlen(pA) - strlen(pB));
    if(pB+1) {
      memcpy(file, pB + 1, strlen(pB) - 1);
      file[strlen(pB) - 1] = 0;
    }
  }
  else memcpy(web, pA, strlen(pA));
  if(pB) web[strlen(pA) - strlen(pB)] = 0;
  else web[strlen(pA)] = 0;
  pA = strchr(web, ':');
  if(pA) *port = atoi(pA + 1);
  else *port = 80;
}


int main(int argc, char *argv[])
{
  int sockfd;
  char buffer[1024];
  struct sockaddr_in server_addr;
  struct hostent *host;
  int portnumber,nbytes;
  char host_addr[256];
  char host_file[1024];
  char local_file[256];
  FILE * fp;
  char request[1024];
  int send, totalsend;
  int i;
  char * pt;

  if(argc!=2)
  {
    fprintf(stderr,"Usage:%s web-address\a\n",argv[0]);
    exit(1);
  }
  printf("parameter.1 is: %s\n", argv[1]);
  ToLowerCase(argv[1]);/*参数{换ؓ全小?/
  printf("lowercase parameter.1 is: %s\n", argv[1]);

  GetHost(argv[1], host_addr, host_file, &portnumber);/*分析|址、端口、文件名{?/
  printf("webhost:%s\n", host_addr);
  printf("hostfile:%s\n", host_file);
  printf("portnumber:%d\n\n", portnumber);

  if((host=gethostbyname(host_addr))==NULL)/*取得LIP地址*/
  {
    fprintf(stderr,"Gethostname error, %s\n", strerror(errno));
    exit(1);
  }

  /* 客户E序开始徏?sockfd描述W?*/
  if((sockfd=socket(AF_INET,SOCK_STREAM,0))==-1)/*建立SOCKETq接*/
  {
    fprintf(stderr,"Socket Error:%s\a\n",strerror(errno));
    exit(1);
  }

  /* 客户E序填充服务端的资料 */
  bzero(&server_addr,sizeof(server_addr));
  server_addr.sin_family=AF_INET;
  server_addr.sin_port=htons(portnumber);
  server_addr.sin_addr=*((struct in_addr *)host->h_addr);

  /* 客户E序发vq接h */
  if(connect(sockfd,(struct sockaddr *)(&server_addr),sizeof(struct sockaddr))==-1)/*q接|站*/
  {
    fprintf(stderr,"Connect Error:%s\a\n",strerror(errno));
    exit(1);
  }

  sprintf(request, "GET /%s HTTP/1.1\r\nAccept: */*\r\nAccept-Language: zh-cn\r\n\
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)\r\n\
Host: %s:%d\r\nConnection: Close\r\n\r\n", host_file, host_addr, portnumber);
  printf("%s", request);/*准备requestQ将要发送给L*/

  /*取得真实的文件名*/
  if(host_file && *host_file) pt = Rstrchr(host_file, '/');
  else pt = 0;

  memset(local_file, 0, sizeof(local_file));
  if(pt && *pt) {
    if((pt + 1) && *(pt+1)) strcpy(local_file, pt + 1);
    else memcpy(local_file, host_file, strlen(host_file) - 1);
  }
  else if(host_file && *host_file) strcpy(local_file, host_file);
  else strcpy(local_file, "index.html");
  printf("local filename to write:%s\n\n", local_file);

  /*发送httphrequest*/
  send = 0;totalsend = 0;
  nbytes=strlen(request);
  while(totalsend < nbytes) {
    send = write(sockfd, request + totalsend, nbytes - totalsend);
    if(send==-1) {printf("send error!%s\n", strerror(errno));exit(0);}
    totalsend+=send;
    printf("%d bytes send OK!\n", totalsend);
  }

  fp = fopen(local_file, "a");
  if(!fp) {
    printf("create file error! %s\n", strerror(errno));
    return 0;
  }
  printf("\nThe following is the response header:\n");
  i=0;
  /* q接成功了,接收http响应Qresponse */
  while((nbytes=read(sockfd,buffer,1))==1)
  {
    if(i < 4) {
      if(buffer[0] == '\r' || buffer[0] == '\n') i++;
      else i = 0;
      printf("%c", buffer[0]);/*把http头信息打印在屏幕?/
    }
    else {
      fwrite(buffer, 1, 1, fp);/*httpM信息写入文g*/
      i++;
      if(i%1024 == 0) fflush(fp);/*?K时存盘一?/
    }
  }
  fclose(fp);
  /* l束通讯 */
  close(sockfd);
  exit(0);
}


zj@zj:~/C_pram/practice/http_client$ ls
httpclient  httpclient.c
zj@zj:~/C_pram/practice/http_client$ ./httpclient http://www.baidu.com/
parameter.1 is: http://www.baidu.com/
lowercase parameter.1 is: http://www.baidu.com/
webhost:www.baidu.com
hostfile:
portnumber:80

GET / HTTP/1.1
Accept: */*
Accept-Language: zh-cn
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Host: www.baidu.com:80
Connection: Close

local filename to write:index.html

163 bytes send OK!

The following is the response header:
HTTP/1.1 200 OK
Date: Wed, 29 Oct 2008 10:41:40 GMT
Server: BWS/1.0
Content-Length: 4216
Content-Type: text/html
Cache-Control: private
Expires: Wed, 29 Oct 2008 10:41:40 GMT
Set-Cookie: BAIDUID=A93059C8DDF7F1BC47C10CAF9779030E:FG=1; expires=Wed, 29-Oct-38 10:41:40 GMT; path=/; domain=.baidu.com
P3P: CP=" OTI DSP COR IVA OUR IND COM "

zj@zj:~/C_pram/practice/http_client$ ls
httpclient  httpclient.c  index.html

不指定文件名字的?默认是下蝲|站默认的首了^_^.

from:
http://blog.chinaunix.net/u2/76292/showart_1353805.html



chatler 2010-06-27 23:16 发表评论
]]>
c语言抓取|页数据http://www.shnenglu.com/beautykingdom/archive/2010/06/27/118838.htmlchatlerchatlerSun, 27 Jun 2010 15:13:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/06/27/118838.htmlhttp://www.shnenglu.com/beautykingdom/comments/118838.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/06/27/118838.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/118838.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/118838.html#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>

#define HTTPPORT 80


char* head =
     "GET /u2/76292/ HTTP/1.1\r\n"
     "Accept: */*\r\n"
     "Accept-Language: zh-cn\r\n"
     "Accept-Encoding: gzip, deflate\r\n"
     "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CIBA; TheWorld)\r\n"
     "Host:blog.chinaunix.net\r\n"
     "Connection: Keep-Alive\r\n\r\n";

int connect_URL(char *domain,int port)
{
    int sock;
    struct hostent * host;
    struct sockaddr_in server;
    host = gethostbyname(domain);
    if (host == NULL)
     {
      printf("gethostbyname error\n");
      return -2;
     }
   // printf("HostName: %s\n",host->h_name);

   // printf("IP Address: %s\n",inet_ntoa(*((struct in_addr *)host->h_addr)));

    sock = socket(AF_INET,SOCK_STREAM,0);
    if (sock < 0)
    {
      printf("invalid socket\n");
      return -1;
    }
    memset(&server,0,sizeof(struct sockaddr_in));
    memcpy(&server.sin_addr,host->h_addr_list[0],host->h_length);
    server.sin_family = AF_INET;
    server.sin_port = htons(port);
    return (connect(sock,(struct sockaddr *)&server,sizeof(struct sockaddr)) <0) ? -1 : sock;
}


int main()
{
  int sock;
  char buf[100];
  char *domain = "blog.chinaunix.net";

  
  fp = fopen("test.txt","rb");
  if(NULL == fp){
    printf("can't open stockcode file!\n");
    return -1;
  }
  

    sock = connect_URL(domain,HTTPPORT);
    if (sock <0){
       printf("connetc err\n");
       return -1;
        }

    send(sock,head,strlen(head),0);

    while(1)
    {
      if((recv(sock,buf,100,0))<1)
        break;
      fprintf(fp,"%s",bufp); //save http data

      }
    
    fclose(fp);
    close(sock);
  
  printf("bye!\n");
  return 0;
}

 

我这里是保存数据到本地硬?可以在这个的基础上修?head头的定义可以自己使用wireshark抓包来看

from:
http://blog.chinaunix.net/u2/76292/showart.php?id=2123108



chatler 2010-06-27 23:13 发表评论
]]>
TCP的流量控?http://www.shnenglu.com/beautykingdom/archive/2010/01/08/105213.htmlchatlerchatlerFri, 08 Jan 2010 15:34:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2010/01/08/105213.htmlhttp://www.shnenglu.com/beautykingdom/comments/105213.htmlhttp://www.shnenglu.com/beautykingdom/archive/2010/01/08/105213.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/105213.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/105213.html1. 前言
 
TCP是具备流控和可靠q接能力的协议,为防止TCP发生拥塞或ؓ提高传输效率Q在|?br>l发展早期就提出了一些相关的TCP控和优化算法,而且也被RFC2581规定是每?br>TCP实现时要实现的?/div>
 
本文中,为求方便把将“TCP分组D?segment)”都直接称?#8220;?#8221;?/div>
 
2. 慢启?slow start)和拥塞避?Congestion Avoidance)
 
慢启动和拥塞避免是属于TCP发送方必须(MUST)要实现的Q防止TCP发送方向网l传入大量的H发数据造成|络d?/div>

先介l几个相兛_敎ͼ是在通信双方中需要考虑但不在TCP包中体现的一些参敎ͼ

拥塞H口(congestion windowQcwnd)Q是指发送方在接收到Ҏ的ACK认前向允许|络发送的数据量,数据发送后Q拥塞窗口羃;接收到对方的ACK后,拥塞H口相应增加Q拥塞窗口越大,可发送的数据量越大?/strong>拥塞H口初始值的RFC2581中被规定Z过发送方MSS的两倍,而且不能过两个TCP包,在RFC3390中更C初始H口大小的设|方法?/div>

通告H口(advertised windowQrwnd)Q是指接收方所能接收的没来得及发ACK认的数据量Q接收方数据接收后,通告H口~小Q发送ACK后,通告H口相应扩大?/strong>

慢启动阈?slow start threshold, ssthresh)Q用来判断是否要使用慢启动或拥塞避免法来控制流量的一个参敎ͼ也是随通信q程不断变化的?/div>

当cwnd < ssthreshӞ拥塞H口值已l比较小了,表示未经认的数据量增大Q需要启动慢启动法Q当cwnd > ssthreshӞ可发送数据量大,需要启动拥塞避免算法?/div>

拥塞H口cwnd是根据发送的数据量自动减的Q但扩大需要根据对方的接收情况q行扩大Q慢启动和拥塞避免算法都是描q如何扩大该值的?/strong>

在启动慢启动法ӞTCP发送方接收到对方的ACK后拥塞窗口最多每ơ增加一个发送方MSS字节的数|当拥塞窗口超qsshresh后或观察到拥塞才停止法?/div>

启动拥塞避免法Ӟ拥塞H口在一个连接往q时间RTT内增加一个最大TCP包长度的量,一般实现时用以下公式计:
      cwnd += max(SMSS*SMSS/cwnd, 1)            Q?.1)
SMSS为发送方MSS?/div>

TCP发送方到数据包丢失时Q需要调整ssthreshQ一般按下面公式计算Q?/div>
      ssthresh = max (FlightSize / 2, 2*SMSS)    (2.2)
其中FlightSize表示已经发送但q没有被认的数据量?/div>
 
3. 快速重?fast retransmit)和快速恢?fast recovery)

TCP接收Ҏ到错序的TCP包时要发送复制的ACK包回应,提示发送方可能出现|络丢包Q发送方
收到q箋3个重复的ACK包后启动快速重传算法,Ҏ认号快速重传那个可能丢q包而不必等
重传定时器超时后再重传,普通的重传是要{到重传定时器超时还没收到ACK才进行的。这个算
法是TCP发送方应该(SHOULD)实现的,不是必须。TCP发送方q行了快速重传后q入快速恢复阶D?br>Q直到没再接攉复的ACK包?/div>

快速重传和快速恢复具体过EؓQ?br>1. 当收到第3个重复的ACK包时Qssthreh值按公式2.2重新讄Q?/div>
2. 重传丢失的包后,拥塞窗口cwnd讄为sshresh+3*SMSSQh工扩大了拥塞H口Q?/div>
3. 对于每个接收到的重复的ACK包,cwnd相应增加SMSSQ扩大拥塞窗口;
4. 如果新的拥塞H口cwnd值和接收方的通告H口值允许的话,可以l箋发新包;
5. 当收C一个ACK认了新数据Ӟcwnd大小调整为sshreshQ减窗口;Ҏ收方
   来说Q接收到重发的TCP包后p发此ACK认当前接收的数据?/div>
 
4. l论
q些法重点在于保持|络的可靠性和可用性,防止|络d造成的网l崩溃,是相?br>比较保守的?/div>

5. 附录讨论

A? q些法都是针对通信双方的事, 但如果从开发防火墙{中间设备的角度来看,
     中间讑֤有必要考虑q些?
端木: q个...我好象也看不出必要性,因ؓ法的参数都是在双方内部而不在TCP数据?br>      中体?..但应该会让中间设备轻杄Q这个就象在马\开车,q些法是交规
      让你开得规矩点Q交警只兛_你开车的情况Q而不你开的是什么RQ开得好交警
      也轻松。好车可以让你很Ҏ开好,但差车也可以开好?/div>

A? q些法原型提出也很早了, 最早是88q的? 当时|络都处于初U阶D? 有个
     9600bps的猫很牛了, 计算机性能也很? 因此实施q些法q有点用; 但现
     在过了快20q了, 癑օ都快淘汰, 千兆, 万兆|络都快普及? 即PC机的内存
     也都上G?再规矩这U几KU别的数据量有意思么? 好象现在喷气式战斗机都?/div>
     W?代了, 再研I螺旋桨战斗有意思么?
端木: q个...q个p病毒库了, 里面不也有无数的DOS时代的病? 你以后这辈子估计
      都见不着的,但没有哪个防病毒厂商会把q些病毒从库中剔除,库是只增不减的?br>      有这么个东西也是一P正因为^时没用,谁也不注意,知道了就可以吹一吹,
      其拿去唬唬人是很有效的Q?/div>

A? 你真无聊!
端木: You got it! 不无聊干吗写博客?

端木: 搞技术有时候是很悲哀的一件事Q必ȝ扯七大姑八大姨的很多老东西,也就是向?br>      兼容Q到一定程度将成ؓq一步发展的最大障,讲一个从smth看到的不是笑?/div>
      的笑话:

    C铁\的铁轨间距是4英尺8?英寸Q铁轨间距采用了电R轮距的标准,而电车轮?br>的标准则沿袭了马车的轮距标准?
    马R的轮距ؓ何是4英尺8?英寸Q原来,英国的马路辙q的宽度?英尺8?英寸?br>如果马R改用其他寸的轮距,轮子很快׃在英国的老马路上撞坏?
    英国马\的辙q宽度又从何而来Q这可以上溯到古|马时期。整个欧z?包括英国)的老\都是|马Zؓ其军队铺讄Q?英尺8?英寸正是|马战R的宽度?
    |马战R的宽度又是怎么来的Q答案很单,它是牵引一辆战车的两匹马的屁股的d度?
    D子到这里还没有l束。美国航天飞机的火箭助推器也摆脱不了马屁股的U缠———火助推器造好之后要经q铁路运送,而铁路上必然有一些隧道,隧道的宽度又是根据铁轨的宽度而来。代表着端U技的火助推器的宽度,竟然被两匚w的屁股的d度决定了?br>转自Q?br>http://www.shnenglu.com/prayer/archive/2009/04/20/80527.html


chatler 2010-01-08 23:34 发表评论
]]>量控制和拥塞控?/title><link>http://www.shnenglu.com/beautykingdom/archive/2009/12/30/104460.html</link><dc:creator>chatler</dc:creator><author>chatler</author><pubDate>Wed, 30 Dec 2009 08:54:00 GMT</pubDate><guid>http://www.shnenglu.com/beautykingdom/archive/2009/12/30/104460.html</guid><wfw:comment>http://www.shnenglu.com/beautykingdom/comments/104460.html</wfw:comment><comments>http://www.shnenglu.com/beautykingdom/archive/2009/12/30/104460.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.shnenglu.com/beautykingdom/comments/commentRss/104460.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/beautykingdom/services/trackbacks/104460.html</trackback:ping><description><![CDATA[拥塞QCongestionQ指的是在包交换|络中由于传送的包数目太多,而存贮{发节点的资源有限而造成|络传输性能下降的情c拥塞的一U极端情冉|死锁QDeadlockQ,退出死锁往往需要网l复位操作?<br>量控制QFlow ControlQ指的是在一条通道上控制发送端发送数据的数量及速度使其不超q接收端所能承受的能力Q这个能力主要指接收端接收数据的速率及接收数据缓冲区的大。通常采用停等法或滑动H口法控制流量?<br>量控制是针对端pȝ中资源受限而设|的Q拥塞控制是针对中间节点资源受限而设|的?br><img src ="http://www.shnenglu.com/beautykingdom/aggbug/104460.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/beautykingdom/" target="_blank">chatler</a> 2009-12-30 16:54 <a href="http://www.shnenglu.com/beautykingdom/archive/2009/12/30/104460.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>用wget下蝲文g或目录或者是整个|站http://www.shnenglu.com/beautykingdom/archive/2009/12/22/103663.htmlchatlerchatlerMon, 21 Dec 2009 17:04:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2009/12/22/103663.htmlhttp://www.shnenglu.com/beautykingdom/comments/103663.htmlhttp://www.shnenglu.com/beautykingdom/archive/2009/12/22/103663.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/103663.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/103663.html具体参数的含义还没有manQ等manq之后再dq来?br>


chatler 2009-12-22 01:04 发表评论
]]>
httph的详l过E?--理解计算机网l?lt;?gt;http://www.shnenglu.com/beautykingdom/archive/2009/10/21/99142.htmlchatlerchatlerWed, 21 Oct 2009 15:05:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2009/10/21/99142.htmlhttp://www.shnenglu.com/beautykingdom/comments/99142.htmlhttp://www.shnenglu.com/beautykingdom/archive/2009/10/21/99142.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/99142.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/99142.html 

一个httph的详l过E?/font>

我们来看当我们在览器输?/font>http://www.mycompany.com:8080/mydir/index.html,q后所发生的一切?/font>

首先http是一个应用层的协议,在这个层的协议,只是一U通讯规范Q也是因ؓ双方要进行通讯Q大家要事先U定一个规范?/font>

1.q接 当我们输入这样一个请求时Q首先要建立一个socketq接Q因为socket是通过ip和端口徏立的Q所以之前还有一个DNS解析q程Q把www.mycompany.com变成ipQ如果url里不包含端口P则会使用该协议的默认端口受?/font>

DNS的过E是q样的:首先我们知道我们本地的机器上在配|网l时都会填写DNSQ这h机就会把q个url发给q个配置的DNS服务器,如果能够扑ֈ相应的url则返回其ipQ否则该DNSl将该解析请求发送给上DNSQ整个DNS可以看做是一个树状结构,该请求将一直发送到根直到得到结果。现在已l拥有了目标ip和端口号Q这h们就可以打开socketq接了?/font>

2.h q接成功建立后,开始向web服务器发送请求,q个h一般是GET或POST命oQPOST用于FORM参数的传递)。GET命o的格式ؓQ  GET 路径/文g?HTTP/1.0
文g名指出所讉K的文ӞHTTP/1.0指出Web览器用的HTTP版本。现在可以发送GET命oQ?/font>

GET /mydir/index.html HTTP/1.0Q?/font>

3.应答 web服务器收到这个请求,q行处理。从它的文档I间中搜索子目录mydir的文件index.html。如果找到该文gQWeb服务器把该文件内容传送给相应的Web览器?/font>

Z告知览器,QWeb服务器首先传送一些HTTP头信息,然后传送具体内容(即HTTP体信息)QHTTP头信息和HTTP体信息之间用一个空行分开?br>常用的HTTP头信息有Q?br>  ?HTTP 1.0 200 OK  q是Web服务器应{的W一行,列出服务器正在运行的HTTP版本号和应答代码。代?200 OK"表示h完成?br>  ?MIME_Version:1.0 它指CMIMEcd的版本?br>  ?content_type:cd q个头信息非帔R要,它指CHTTP体信息的MIMEcd。如Qcontent_type:text/html指示传送的数据是HTML文档?br>  ?content_length:长度倹{它指CHTTP体信息的长度Q字节)?/font>


4.关闭q接Q当应答l束后,Web览器与Web服务器必L开Q以保证其它Web览器能够与Web服务器徏立连接?/font>


下面我们具体分析其中的数据包在网l中漫游的经?/font>

在网l分层结构中Q各层之间是严格单向依赖的?#8220;服务”是描q各层之间关pȝ抽象概念Q即|络中各层向紧邻上层提供的一l操作。下层是服务提供者,上层是请求服务的用户。服务的表现形式是原语(primitiveQ,如系l调用或库函数。系l调用是操作pȝ内核向网l应用程序或高层协议提供的服务原语。网l中的n层总要向n+1层提供比n-1层更完备的服务,否则n层就没有存在的h倹{?

传输层实现的?#8220;端到?#8221;通信Q引q网间进E通信概念Q同时也要解军_错控Ӟ量控制Q数据排序(报文排序Q,q接理{问题,为此提供不同的服务方式。通常传输层的服务通过pȝ调用的方式提供,以socket的方式。对于客LQ要惛_立一个socketq接Q需要调用这样一些函数socket() bind() connect(),然后可以通过send()q行数据发送?/font>

现在看数据包在网l中的穿行过E:

应用?/font>

首先我们可以看到在应用层Q根据当前的需求和动作Q结合应用层的协议,有我们确定发送的数据内容Q我们把q些数据攑ֈ一个缓冲区内,然后形成了应用层的报?strong>data?/font>

传输?/font>

q些数据通过传输层发送,比如tcp协议。所以它们会被送到传输层处理,在这里报文打上了传输头的包头Q主要包含端口号Q以及tcp的各U制信息Q这些信息是直接得到的,因ؓ接口中需要指定端口。这样就l成了tcp的数据传送单?strong>segment。tcp是一U端到端的协议,利用q些信息Q比如tcp首部中的序号认序号Q根据这些数字,发送的一方不断的q行发送等待确认,发送一个数据段后,会开启一个计数器Q只有当收到认后才会发送下一个,如果过计数旉仍未收到认则进行重发,在接受端如果收到错误数据Q则其丢弃Q这导致发送端时重发。通过tcp协议Q控制了数据包的发送序列的产生Q不断的调整发送序列,实现控和数据完整?/font>

|络?/font>

然后待发送的数据D送到|络层,在网l层被打包,q样装上了|络层的包头Q包头内部含有源及目的的ip地址Q该层数据发送单位被UCؓpacket。网l层开始负责将q样的数据包在网l上传输Q如何穿q\由器Q最l到辄的地址。在q里Q根据目的ip地址Q就需要查找下一跌\q地址。首先在本机Q要查找本机的\pQ在windows上运行route print可以看到当前\p内容Q有如下几项Q?br>Active Routes Default Route Persistent Route.

整个查找q程是这L:
(1)Ҏ目的地址Q得到目的网l号Q如果处在同一个内|,则可以直接发送?br>(2)如果不是Q则查询路由表,扑ֈ一个\由?br>(3)如果找不到明的路由Q此时在路由表中q会有默认网养I也可UCؓ~省|关QIP用缺省的|关地址一个数据传送给下一个指定的路由器,所以网关也可能是\由器Q也可能只是内网向特定\由器传输数据的网兟?br>(4)路由器收到数据后Q它再次E主机或|络查询路由Q若q未扑ֈ路由Q该数据包将发送到该\由器的缺省网兛_址。而数据包中包含一个最大\p敎ͼ如果过q个xQ就会丢弃数据包Q这样可以防止无限传递。\由器收到数据包后Q只会查看网l层的包Ҏ据,目的ip。所以说它是工作在网l层Q传输层的数据对它来说则是透明的?/font>

如果上面q些步骤都没有成功,那么该数据报׃能被传送。如果不能传送的数据报来自本机,那么一般会向生成数据报的应用程序返回一?#8220;L不可?#8221;?“|络不可?#8221;的错误?/font>

 

以windows下主机的路由表ؓ例,看\q查找q程
======================================================================
Active Routes:
Network Destination            Netmask                      Gateway              Interface                  Metric
0.0.0.0                                 0.0.0.0                       192.168.1.2           192.168.1.101           10
127.0.0.0                             255.0.0.0                   127.0.0.1               127.0.0.1                   1
192.168.1.0                         255.255.255.0           192.168.1.101       192.168.1.101           10
192.168.1.101                     255.255.255.255       127.0.0.1               127.0.0.1                   10
192.168.1.255                     255.255.255.255       192.168.1.101       192.168.1.101           10
 224.0.0.0                            240.0.0.0                   192.168.1.101       192.168.1.101           10
255.255.255.255                 255.255.255.255       192.168.1.101       192.168.1.101           1
Default Gateway:                192.168.1.2

Network Destination 目的|段 
Netmask 子网掩码 
Gateway 下一跌\由器入口的ipQ\由器通过interface和gateway定义一调到下一个\由器的链路,通常情况下,interface和gateway是同一|段的?br>Interface 到达该目的地的本路由器的出口ipQ对于我们的个hpc来说Q通常由机机A的网卡,用该|卡的IP地址标识Q当然一个pc也可以有多个|卡Q?/font>

|关q个概念Q主要用于不同子|间的交互,当两个子|内LA,B要进行通讯Ӟ首先A要将数据发送到它的本地|关Q然后网兛_数据发送给B所在的|关Q然后网兛_发送给B?br>默认|关Q当一个数据包的目的网D不在你的\p录中Q那么,你的路由器该把那个数据包发送到哪里Q缺省\q|关是由你的q接上的default gateway军_的,也就是我们通常在网l连接里配置的那个倹{?/font>

通常interface和gateway处在一个子|内Q对于\由器来说Q因为可能具有不同的interface,当数据包到达ӞҎNetwork DestinationL匚w的条目,如果扑ֈQinterface则指明了应当从该路由器的那个接口出去Qgateway则代表了那个子网的网兛_址?/font>

W一?nbsp;     0.0.0.0   0.0.0.0   192.168.1.2    192.168.1.101   10
0.0.0.0代表了缺省\由。该路由记录的意思是Q当我接收到一个数据包的目的网D不在我的\p录中Q我会将该数据包通过192.168.1.101q个接口发送到192.168.1.2q个地址Q这个地址是下一个\由器的一个接口,q样q个数据包就可以交付l下一个\由器处理Q与我无兟뀂该路由记录的线路质?10。当有多个条目匹配时Q会选择h较小Metric值的那个?/font>

W三?nbsp;     192.168.1.0   255.255.255.0  192.168.1.101   192.168.1.101  10
直联|段的\p录:当\由器收到发往直联|段的数据包时该如何处理Q这U情况,路由记录的interface和gateway是同一个。当我接收到一个数据包的目的网D|192.168.1.0Ӟ我会该数据包通过192.168.1.101q个接口直接发送出去,因ؓq个端口直接q接着192.168.1.0q个|段Q该路由记录的线路质?10 Q因interface和gateway是同一个,表示数据包直接传送给目的地址Q不需要再转给路由器)?/font>

一般就分这两种情况Q目的地址与当前\由器接口是否在同一子网。如果是则直接发送,不需再{l\由器Q否则还需要{发给下一个\由器l箋q行处理?/font>

 

查找C一跳ip地址后,q需要知道它的mac地址Q这个地址要作为链路层数据装进链\层头部。这旉要arp协议Q具体过E是q样的,查找arp~冲Qwindows下运行arp -a可以查看当前arp~冲内容。如果里面含有对应ip的mac地址Q则直接q回。否则需要发生arphQ该h包含源的ip和mac地址Q还有目的地的ip地址Q在|内q行q播Q所有的L会检查自qip与该h中的目的ip是否一P如果刚好对应则返回自qmac地址Q同时将h者的ip mac保存。这样就得到了目标ip的mac地址?/font>

链\?/font>

mac地址及链路层控制信息加到数据包里QŞ?strong>FrameQFrame在链路层协议下,完成了相ȝ节点间的数据传输Q完成连接徏立,控制传输速度Q数据完整?/font>

物理?/font>

物理U\则只负责该数据以bit为单位从L传输C一个目的地?/font>

下一个目的地接受到数据后Q从物理层得到数据然后经q逐层的解??链\??|络层,然后开始上q的处理Q在l网l层 链\?物理层将数据装好l传往下一个地址?/font>

在上面的q程中,可以看到有一个\p查询q程Q而这个\p的徏立则依赖于\q法。也是说\q法实际上只是用来路由器之间更新维护\pQ真正的数据传输q程q不执行q个法Q只查看路由表。这个概念也很重要,需要理解常用的路由法。而整个tcp协议比较复杂Q跟链\层的协议有些怼Q其中有很重要的一些机制或者概念需要认真理解,比如~号与确认,量控制Q重发机Ӟ发送接受窗口?/font>

 

tcp/ip基本模型及概?/font>


物理?/font>

讑֤Q中l器QrepeaterQ?集线器(hubQ。对于这一层来_从一个端口收到数据,会{发到所有端口?/font>


链\?/font>

协议QSDLCQSynchronous Data Link ControlQHDLCQHigh-level Data Link ControlQ?ppp协议独立的链路设备中最常见的当属网卡,|桥也是链\产品。集U器MODEM的某些功能有为属于链路层Q对此还有些争议认ؓ属于物理层设备。除此之外,所有的交换机都需要工作在数据链\层,但仅工作在数据链路层的仅是二层交换机。其他像三层交换机、四层交换机和七层交换机虽然可对应工作在OSI的三层、四层和七层Q但二层功能仍是它们基本的功能?/font>

因ؓ有了MAC地址表,所以才充分避免了冲H,因ؓ交换机通过目的MAC地址知道应该把这个数据{发到哪个端口。而不会像HUB一P会{发到所有滴端口。所以,交换机是可以划分冲突域滴?/font>


|络?/font>

四个主要的协?  
|际协议IPQ负责在L和网l之间寻址和\由数据包?nbsp;   
地址解析协议ARPQ获得同一物理|络中的gL地址?nbsp;   
|际控制消息协议ICMPQ发送消息,q报告有x据包的传送错误?nbsp;   
互联l管理协议IGMPQ被IPL拿来向本地多路广播\由器报告Ll成员?/font>

该层讑֤有三层交换机Q\由器?/font>


传输?/font>

两个重要协议 TCP ?UDP ?/font>

端口概念QTCP/UDP 使用 IP 地址标识|上LQ用端口号来标识应用进E,?TCP/UDP 用主?IP 地址和ؓ应用q程分配的端口号来标识应用进E。端口号?16 位的无符h敎ͼ TCP 的端口号?UDP 的端口号是两个独立的序列。尽相互独立,如果 TCP ?UDP 同时提供某种知名服务Q两个协议通常选择相同的端口号。这Ua是ؓ了用方便,而不是协议本w的要求。利用端口号Q一CZ多个q程可以同时使用 TCP/UDP 提供的传输服务,q且q种通信是端到端的,它的数据?IP 传递,但与 IP 数据报的传递\径无兟뀂网l通信中用一个三元组可以在全局唯一标志一个应用进E:Q协议,本地地址Q本地端口号Q?/font>

也就是说tcp和udp可以使用相同的端口?/font>

可以看到通过(协议,源端口,源ipQ目的端口,目的ip)可以用来完全标识一l网l连接?/font>

应用?/font>

ZtcpQTelnet FTP SMTP DNS HTTP
ZudpQRIP NTPQ网落时间协议)和DNS QDNS也用TCPQSNMP TFTP

 

参考文献:

L本机路由?http://hi.baidu.com/thusness/blog/item/9c18e5bf33725f0818d81f52.html

Internet 传输层协?http://www.cic.tsinghua.edu.cn/jdx/book6/3.htm 计算机网l?谢希?/font>


转自Q?br>http://blog.chinaunix.net/u2/67780/showart_2065190.html

chatler 2009-10-21 23:05 发表评论
]]>TCP三次握手/四次挥手详解<?gt;http://www.shnenglu.com/beautykingdom/archive/2009/10/20/99062.htmlchatlerchatlerTue, 20 Oct 2009 13:15:00 GMThttp://www.shnenglu.com/beautykingdom/archive/2009/10/20/99062.htmlhttp://www.shnenglu.com/beautykingdom/comments/99062.htmlhttp://www.shnenglu.com/beautykingdom/archive/2009/10/20/99062.html#Feedback0http://www.shnenglu.com/beautykingdom/comments/commentRss/99062.htmlhttp://www.shnenglu.com/beautykingdom/services/trackbacks/99062.html1
、徏立连接协议(三次握手Q?/font>
Q?Q客L发送一个带SYN标志的TCP报文到服务器。这是三ơ握手过E中的报??br style="font: normal normal normal 12px/normal song, Verdana; ">Q?Q?服务器端回应客户端的Q这是三ơ握手中的第2个报文,q个报文同时带ACK标志和SYN标志。因此它表示对刚才客LSYN报文的回应;同时又标志SYNl客LQ询问客L是否准备好进行数据通讯?br style="font: normal normal normal 12px/normal song, Verdana; ">Q?Q?客户必须再次回应服务D一个ACK报文Q这是报文段3?br style="font: normal normal normal 12px/normal song, Verdana; ">2
、连接终止协议(四次挥手Q?/font>
   ׃TCPq接是全双工的,因此每个方向都必d独进行关闭。这原则是当一方完成它的数据发送Q务后p发送一个FIN来终止这个方向的q接。收C?FIN只意味着q一方向上没有数据流动,一个TCPq接在收C个FIN后仍能发送数据。首先进行关闭的一方将执行d关闭Q而另一Ҏ行被动关闭?br style="font: normal normal normal 12px/normal song, Verdana; "> Q?Q?TCP客户端发送一个FINQ用来关闭客户到服务器的数据传送(报文D?Q?br style="font: normal normal normal 12px/normal song, Verdana; "> Q?Q?服务器收到这个FINQ它发回一个ACKQ确认序号ؓ收到的序号加1Q报文段5Q。和SYN一P一个FIN占用一个序受?br style="font: normal normal normal 12px/normal song, Verdana; "> Q?Q?服务器关闭客L的连接,发送一个FINl客LQ报文段6Q?br style="font: normal normal normal 12px/normal song, Verdana; "> Q?Q?客户D发回ACK报文认Qƈ确认序可|ؓ收到序号?Q报文段7Q?br style="font: normal normal normal 12px/normal song, Verdana; ">CLOSED: q个没什么好说的了,表示初始状态?br style="font: normal normal normal 12px/normal song, Verdana; ">LISTEN: q个也是非常Ҏ理解的一个状态,表示服务器端的某个SOCKET处于监听状态,可以接受q接了?br style="font: normal normal normal 12px/normal song, Verdana; ">SYN_RCVD: q个状态表C接受到了SYN报文Q在正常情况下,q个状态是服务器端的SOCKET在徏立TCPq接时的三次握手会话q程中的一个中间状态,很短暂,基本 上用netstat你是很难看到q种状态的Q除非你Ҏ写了一个客L试E序Q故意将三次TCP握手q程中最后一个ACK报文不予发送。因此这U状?Ӟ当收到客L的ACK报文后,它会q入到ESTABLISHED状态?br style="font: normal normal normal 12px/normal song, Verdana; ">SYN_SENT: q个状态与SYN_RCVD遥想呼应Q当客户端SOCKET执行CONNECTq接Ӟ它首先发送SYN报文Q因此也随即它会q入CSYN_SENT?态,q等待服务端的发送三ơ握手中的第2个报文。SYN_SENT状态表C客L已发送SYN报文?br style="font: normal normal normal 12px/normal song, Verdana; ">ESTABLISHEDQ这个容易理解了Q表C接已l徏立了?br style="font: normal normal normal 12px/normal song, Verdana; ">FIN_WAIT_1: q个状态要好好解释一下,其实FIN_WAIT_1和FIN_WAIT_2状态的真正含义都是表示{待Ҏ的FIN报文。而这两种状态的区别 是:FIN_WAIT_1状态实际上是当SOCKET在ESTABLISHED状态时Q它想主动关闭连接,向对方发送了FIN报文Q此时该SOCKET?q入到FIN_WAIT_1状态。而当Ҏ回应ACK报文后,则进入到FIN_WAIT_2状态,当然在实际的正常情况下,无论Ҏ何种情况下,都应该马 上回应ACK报文Q所以FIN_WAIT_1状态一般是比较难见到的Q而FIN_WAIT_2状态还有时常常可以用netstat看到?br style="font: normal normal normal 12px/normal song, Verdana; ">FIN_WAIT_2Q上面已l详l解释了q种状态,实际上FIN_WAIT_2状态下的SOCKETQ表C半q接Q也x一方要求closeq接Q但另外q告诉对方,我暂时还有点数据需要传送给你,E后再关闭连接?br style="font: normal normal normal 12px/normal song, Verdana; ">TIME_WAIT: 表示收到了对方的FIN报文Qƈ发送出了ACK报文Q就{?MSL后即可回到CLOSED可用状态了。如果FIN_WAIT_1状态下Q收CҎ同时?FIN标志和ACK标志的报文时Q可以直接进入到TIME_WAIT状态,而无ȝqFIN_WAIT_2状态?br style="font: normal normal normal 12px/normal song, Verdana; ">CLOSING: q种状态比较特D,实际情况中应该是很少见,属于一U比较罕见的例外状态。正常情况下Q当你发送FIN报文后,按理来说是应该先收到Q或同时收到Q对方的 ACK报文Q再收到Ҏ的FIN报文。但是CLOSING状态表CZ发送FIN报文后,q没有收到对方的ACK报文Q反而却也收CҎ的FIN报文。什 么情况下会出现此U情况呢Q其实细想一下,也不隑־出结论:那就是如果双方几乎在同时close一个SOCKET的话Q那么就出现了双方同时发送FIN?文的情况Q也即会出现CLOSING状态,表示双方都正在关闭SOCKETq接?br style="font: normal normal normal 12px/normal song, Verdana; ">CLOSE_WAIT: q种状态的含义其实是表C在{待关闭。怎么理解呢?当对方close一个SOCKET后发送FIN报文l自己,你系l毫无疑问地会回应一个ACK报文l对 方,此时则进入到CLOSE_WAIT状态。接下来呢,实际上你真正需要考虑的事情是察看你是否还有数据发送给ҎQ如果没有的话,那么你也可?closeq个SOCKETQ发送FIN报文l对方,也即关闭q接。所以你在CLOSE_WAIT状态下Q需要完成的事情是等待你d闭连接?br style="font: normal normal normal 12px/normal song, Verdana; ">LAST_ACK: q个状态还是比较容易好理解的,它是被动关闭一方在发送FIN报文后,最后等待对方的ACK报文。当收到ACK报文后,也即可以q入到CLOSED可用状态了?br style="font: normal normal normal 12px/normal song, Verdana; ">最后有2个问题的回答Q我自己分析后的l论Q不一定保?00%正确Q?br style="font: normal normal normal 12px/normal song, Verdana; ">1?Z么徏立连接协议是三次握手Q而关闭连接却是四ơ握手呢Q?br style="font: normal normal normal 12px/normal song, Verdana; ">q?是因为服务端的LISTEN状态下的SOCKET当收到SYN报文的徏q请求后Q它可以把ACK和SYNQACK起应{作用,而SYN起同步作用)攑֜一 个报文里来发送。但关闭q接Ӟ当收到对方的FIN报文通知Ӟ它仅仅表C对Ҏ有数据发送给你了Q但未必你所有的数据都全部发送给Ҏ了,所以你可以?必会马上会关闭SOCKET,也即你可能还需要发送一些数据给Ҏ之后Q再发送FIN报文l对Ҏ表示你同意现在可以关闭连接了Q所以它q里的ACK报文 和FIN报文多数情况下都是分开发送的?br style="font: normal normal normal 12px/normal song, Verdana; ">2?Z么TIME_WAIT状态还需要等2MSL后才能返回到CLOSED状态?
q是因ؓQ?虽然双方都同意关闭连接了Q而且握手?个报文也都协调和发送完毕,按理可以直接回到CLOSED状态(好比从SYN_SEND状态到 ESTABLISH状态那PQ但是因为我们必要假想|络是不可靠的,你无法保证你最后发送的ACK报文会一定被Ҏ收到Q因此对方处?LAST_ACK状态下的SOCKET可能会因时未收到ACK报文Q而重发FIN报文Q所以这个TIME_WAIT状态的作用是用来重发可能丢失?ACK报文?/font>
转自Q?/span>


chatler 2009-10-20 21:15 发表评论
]]>
þ99Ʒþ99| þԭav| һ97ձ˾þۺӰԺ| ȫɫƴɫƬѾþþ| þƬѹۿ| Ʒľþþþþþ| ۺ޾þһƷ| Ʒþþþþ| һձ˾þۺӰ| 99þþþ| ݺɫۺϾþ| ƷþþĻ | þþƷAV鶹| ݹƷþ| þüۺɫۺϰҲȥ | þþþ99ƷƬ| Ʒþһ| һõþۺϺݺAV| ۺϾþþ| þþƷAVӰԺ| һAvëƬþþƷ| þþƷž޾Ʒ| þAV| þþþƷҰ| 99þùѸ| ˾þþƷ| Ʒþ߹ۿ| 99þþþƷ| þþþavר| 2019þþø456| ޹˾þۺ| ľƷþþþùַ | þþƷ鶹ҹҹ| þþþƷ | ȾþֻоƷ| þ޾ƷAV| þۺŷ| ݺ88ۺϾþþþۺ| þþƷAAƬһ| þ㽶߿ۿƷyw| ɫۺϾþĻ|