本文同步自游戲人生
在使用IOCP時(shí),最重要的幾個(gè)API就是GetQueueCompeltionStatus、WSARecv、WSASend,數(shù)據(jù)的I/O及其完成狀態(tài)通過(guò)這幾個(gè)接口獲取并進(jìn)行后續(xù)處理。
GetQueueCompeltionStatus attempts to dequeue an I/O completion packet from the specified I/O completion port. If there is no completion packet queued, the function waits for a pending I/O operation associated with the completion port to complete.
BOOL WINAPI GetQueuedCompletionStatus(
__in HANDLE CompletionPort,
__out LPDWORD lpNumberOfBytes,
__out PULONG_PTR lpCompletionKey,
__out LPOVERLAPPED *lpOverlapped,
__in DWORD dwMilliseconds
);
If the function dequeues a completion packet for a successful I/O operation from the completion port, the return value is nonzero. The function stores information in the variables pointed to by the lpNumberOfBytes, lpCompletionKey, and lpOverlapped parameters.
除了關(guān)心這個(gè)API的in & out(這是MSDN開(kāi)頭的幾行就可以告訴我們的)之外,我們更加關(guān)心不同的return & out意味著什么,因?yàn)橛捎诟鞣N已知或未知的原因,我們的程序并不總是有正確的return & out。
If *lpOverlapped is NULL and the function does not dequeue a completion packet from the completion port, the return value is zero. The function does not store information in the variables pointed to by the lpNumberOfBytes and lpCompletionKey parameters. To get extended error information, call GetLastError. If the function did not dequeue a completion packet because the wait timed out, GetLastError returns WAIT_TIMEOUT.
假設(shè)我們指定dwMilliseconds為INFINITE。
這里常見(jiàn)的幾個(gè)錯(cuò)誤有:
WSA_OPERATION_ABORTED (995): Overlapped operation aborted.
由于線程退出或應(yīng)用程序請(qǐng)求,已放棄I/O 操作。
MSDN: An overlapped operation was canceled due to the closure of the socket, or the execution of the SIO_FLUSH command in WSAIoctl. Note that this error is returned by the operating system, so the error number may change in future releases of Windows.
成因分析:這個(gè)錯(cuò)誤一般是由于peer socket被closesocket或者WSACleanup關(guān)閉后,針對(duì)這些socket的pending overlapped I/O operation被中止。
解決方案:針對(duì)socket,一般應(yīng)該先調(diào)用shutdown禁止I/O操作后再調(diào)用closesocket關(guān)閉。
嚴(yán)重程度:輕微易處理。
WSAENOTSOCK (10038): Socket operation on nonsocket.
MSDN: An operation was attempted on something that is not a socket. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd_set was not valid.
成因分析:在一個(gè)非套接字上嘗試了一個(gè)操作。
使用closesocket關(guān)閉socket之后,針對(duì)該invalid socket的任何操作都會(huì)獲得該錯(cuò)誤。
解決方案:如果是多線程存在對(duì)同一socket的操作,要保證對(duì)socket的I/O操作邏輯上的順序,做好socket的graceful disconnect。
嚴(yán)重程度:輕微易處理。
WSAECONNRESET (10054): Connection reset by peer.
遠(yuǎn)程主機(jī)強(qiáng)迫關(guān)閉了一個(gè)現(xiàn)有的連接。
MSDN: An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
成因分析:在使用WSAAccpet、WSARecv、WSASend等接口時(shí),如果peer application突然中止(原因如上所述),往其對(duì)應(yīng)的socket上投遞的operations將會(huì)失敗。
解決方案:如果是對(duì)方主機(jī)或程序意外中止,那就只有各安天命了。但如果這程序是你寫的,而你只是hard close,那就由不得別人了。至少,你要知道這樣的錯(cuò)誤已經(jīng)出現(xiàn)了,就不要再費(fèi)勁的繼續(xù)投遞或等待了。
嚴(yán)重程度:輕微易處理。
WSAECONNREFUSED (10061): Connection refused.
由于目標(biāo)機(jī)器積極拒絕,無(wú)法連接。
MSDN: No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.
成因分析:在使用connect或WSAConnect時(shí),服務(wù)器沒(méi)有運(yùn)行或者服務(wù)器的監(jiān)聽(tīng)隊(duì)列已滿;在使用WSAAccept時(shí),客戶端的連接請(qǐng)求被condition function拒絕。
解決方案:Call connect or WSAConnect again for the same socket. 等待服務(wù)器開(kāi)啟、監(jiān)聽(tīng)空閑或查看被拒絕的原因。是不是長(zhǎng)的丑或者錢沒(méi)給夠,要不就是服務(wù)器拒絕接受天價(jià)薪酬自主創(chuàng)業(yè)去了?
嚴(yán)重程度:輕微易處理。
WSAENOBUFS (10055): No buffer space available.
由于系統(tǒng)緩沖區(qū)空間不足或列隊(duì)已滿,不能執(zhí)行套接字上的操作。
MSDN: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
成因分析:這個(gè)錯(cuò)誤是我查看錯(cuò)誤日志后,最在意的一個(gè)錯(cuò)誤。因?yàn)榉?wù)器對(duì)于消息收發(fā)有明確限制,如果緩沖區(qū)不足應(yīng)該早就處理了,不可能待到send/recv失敗啊。而且這個(gè)錯(cuò)誤在之前的版本中幾乎沒(méi)有出現(xiàn)過(guò)。這也是這篇文章的主要內(nèi)容。像connect和accept因?yàn)榫彌_區(qū)空間不足都可以理解,而且危險(xiǎn)不高,但如果send/recv造成擁堵并惡性循環(huán)下去,麻煩就大了,至少說(shuō)明之前的驗(yàn)證邏輯有疏漏。
WSASend失敗的原因是:The Windows Sockets provider reports a buffer deadlock. 這里提到的是buffer deadlock,顯然是由于多線程I/O投遞不當(dāng)引起的。
解決方案:在消息收發(fā)前,對(duì)最大掛起的消息總的數(shù)量和容量進(jìn)行檢驗(yàn)和控制。
嚴(yán)重程度:嚴(yán)重。
本文主要參考MSDN。
************* 說(shuō)明 *************
Fox只是對(duì)自己關(guān)心的幾個(gè)錯(cuò)誤和API參照MSDN進(jìn)行分析,不提供額外幫助。