本文同步自游戲人生
在使用IOCP時,最重要的幾個API就是GetQueueCompeltionStatus、WSARecv、WSASend,數(shù)據(jù)的I/O及其完成狀態(tài)通過這幾個接口獲取并進(jìn)行后續(xù)處理。
GetQueueCompeltionStatus attempts to dequeue an I/O completion packet from the specified I/O completion port. If there is no completion packet queued, the function waits for a pending I/O operation associated with the completion port to complete.
BOOL WINAPI GetQueuedCompletionStatus(
__in HANDLE CompletionPort,
__out LPDWORD lpNumberOfBytes,
__out PULONG_PTR lpCompletionKey,
__out LPOVERLAPPED *lpOverlapped,
__in DWORD dwMilliseconds
);
If the function dequeues a completion packet for a successful I/O operation from the completion port, the return value is nonzero. The function stores information in the variables pointed to by the lpNumberOfBytes, lpCompletionKey, and lpOverlapped parameters.
除了關(guān)心這個API的in & out(這是MSDN開頭的幾行就可以告訴我們的)之外,我們更加關(guān)心不同的return & out意味著什么,因為由于各種已知或未知的原因,我們的程序并不總是有正確的return & out。
If *lpOverlapped is NULL and the function does not dequeue a completion packet from the completion port, the return value is zero. The function does not store information in the variables pointed to by the lpNumberOfBytes and lpCompletionKey parameters. To get extended error information, call GetLastError. If the function did not dequeue a completion packet because the wait timed out, GetLastError returns WAIT_TIMEOUT.
假設(shè)我們指定dwMilliseconds為INFINITE。
這里常見的幾個錯誤有:
WSA_OPERATION_ABORTED (995): Overlapped operation aborted.
由于線程退出或應(yīng)用程序請求,已放棄I/O 操作。
MSDN: An overlapped operation was canceled due to the closure of the socket, or the execution of the SIO_FLUSH command in WSAIoctl. Note that this error is returned by the operating system, so the error number may change in future releases of Windows.
成因分析:這個錯誤一般是由于peer socket被closesocket或者WSACleanup關(guān)閉后,針對這些socket的pending overlapped I/O operation被中止。
解決方案:針對socket,一般應(yīng)該先調(diào)用shutdown禁止I/O操作后再調(diào)用closesocket關(guān)閉。
嚴(yán)重程度:輕微易處理。
WSAENOTSOCK (10038): Socket operation on nonsocket.
MSDN: An operation was attempted on something that is not a socket. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd_set was not valid.
成因分析:在一個非套接字上嘗試了一個操作。
使用closesocket關(guān)閉socket之后,針對該invalid socket的任何操作都會獲得該錯誤。
解決方案:如果是多線程存在對同一socket的操作,要保證對socket的I/O操作邏輯上的順序,做好socket的graceful disconnect。
嚴(yán)重程度:輕微易處理。
WSAECONNRESET (10054): Connection reset by peer.
遠(yuǎn)程主機強迫關(guān)閉了一個現(xiàn)有的連接。
MSDN: An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
成因分析:在使用WSAAccpet、WSARecv、WSASend等接口時,如果peer application突然中止(原因如上所述),往其對應(yīng)的socket上投遞的operations將會失敗。
解決方案:如果是對方主機或程序意外中止,那就只有各安天命了。但如果這程序是你寫的,而你只是hard close,那就由不得別人了。至少,你要知道這樣的錯誤已經(jīng)出現(xiàn)了,就不要再費勁的繼續(xù)投遞或等待了。
嚴(yán)重程度:輕微易處理。
WSAECONNREFUSED (10061): Connection refused.
由于目標(biāo)機器積極拒絕,無法連接。
MSDN: No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.
成因分析:在使用connect或WSAConnect時,服務(wù)器沒有運行或者服務(wù)器的監(jiān)聽隊列已滿;在使用WSAAccept時,客戶端的連接請求被condition function拒絕。
解決方案:Call connect or WSAConnect again for the same socket. 等待服務(wù)器開啟、監(jiān)聽空閑或查看被拒絕的原因。是不是長的丑或者錢沒給夠,要不就是服務(wù)器拒絕接受天價薪酬自主創(chuàng)業(yè)去了?
嚴(yán)重程度:輕微易處理。
WSAENOBUFS (10055): No buffer space available.
由于系統(tǒng)緩沖區(qū)空間不足或列隊已滿,不能執(zhí)行套接字上的操作。
MSDN: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
成因分析:這個錯誤是我查看錯誤日志后,最在意的一個錯誤。因為服務(wù)器對于消息收發(fā)有明確限制,如果緩沖區(qū)不足應(yīng)該早就處理了,不可能待到send/recv失敗啊。而且這個錯誤在之前的版本中幾乎沒有出現(xiàn)過。這也是這篇文章的主要內(nèi)容。像connect和accept因為緩沖區(qū)空間不足都可以理解,而且危險不高,但如果send/recv造成擁堵并惡性循環(huán)下去,麻煩就大了,至少說明之前的驗證邏輯有疏漏。
WSASend失敗的原因是:The Windows Sockets provider reports a buffer deadlock. 這里提到的是buffer deadlock,顯然是由于多線程I/O投遞不當(dāng)引起的。
解決方案:在消息收發(fā)前,對最大掛起的消息總的數(shù)量和容量進(jìn)行檢驗和控制。
嚴(yán)重程度:嚴(yán)重。
本文主要參考MSDN。
************* 說明 *************
Fox只是對自己關(guān)心的幾個錯誤和API參照MSDN進(jìn)行分析,不提供額外幫助。