epoll使用
在linux的網(wǎng)絡(luò)編程中,很長的時間都在使用select來做事件觸發(fā)。在linux新的內(nèi)核中,有了一種替換它的機(jī)制,就是epoll。
相比于select,epoll最大的好處在于它不會隨著監(jiān)聽fd數(shù)目的增長而降低效率。因?yàn)樵趦?nèi)核中的select實(shí)現(xiàn)中,它是采用輪詢來處理的,輪詢的fd數(shù)目越多,自然耗時越多。并且,在linux/posix_types.h頭文件有這樣的聲明:
#define __FD_SETSIZE 1024
表示select最多同時監(jiān)聽1024個fd,當(dāng)然,可以通過修改頭文件再重編譯內(nèi)核來擴(kuò)大這個數(shù)目,但這似乎并不治本。
epoll的接口非常簡單,一共就三個函數(shù):
1. int epoll_create(int size);
創(chuàng)建一個epoll的句柄,size用來告訴內(nèi)核這個監(jiān)聽的數(shù)目一共有多大。這個參數(shù)不同于select()中的第一個參數(shù),給出最大監(jiān)聽的fd+1的值。需要注意的是,當(dāng)創(chuàng)建好epoll句柄后,它就是會占用一個fd值,在linux下如果查看/proc/進(jìn)程id/fd/,是能夠看到這個fd的,所以在使用完epoll后,必須調(diào)用close()關(guān)閉。
2. int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
epoll的事件注冊函數(shù),它不同與select()是在監(jiān)聽事件時告訴內(nèi)核要監(jiān)聽什么類型的事件,而是在這里先注冊要監(jiān)聽的事件類型。第一個參數(shù)是epoll_create()的返回值,第二個參數(shù)表示動作,用三個宏來表示:
EPOLL_CTL_ADD:注冊新的fd到epfd中;
EPOLL_CTL_MOD:修改已經(jīng)注冊的fd的監(jiān)聽事件;
EPOLL_CTL_DEL:從epfd中刪除一個fd;
第三個參數(shù)是需要監(jiān)聽的fd,第四個參數(shù)是告訴內(nèi)核需要監(jiān)聽什么事,struct epoll_event結(jié)構(gòu)如下:
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;
struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
events可以是以下幾個宏的集合:
EPOLLIN :表示對應(yīng)的文件描述符可以讀(包括對端SOCKET正常關(guān)閉);
EPOLLOUT:表示對應(yīng)的文件描述符可以寫;
EPOLLPRI:表示對應(yīng)的文件描述符有緊急的數(shù)據(jù)可讀(這里應(yīng)該表示有帶外數(shù)據(jù)到來);
EPOLLERR:表示對應(yīng)的文件描述符發(fā)生錯誤;
EPOLLHUP:表示對應(yīng)的文件描述符被掛斷;
EPOLLET: 將EPOLL設(shè)為邊緣觸發(fā)(Edge Triggered)模式,這是相對于水平觸發(fā)(Level Triggered)來說的。
EPOLLONESHOT:只監(jiān)聽一次事件,當(dāng)監(jiān)聽完這次事件之后,如果還需要繼續(xù)監(jiān)聽這個socket的話,需要再次把這個socket加入到EPOLL隊(duì)列里
3. int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
等待事件的產(chǎn)生,類似于select()調(diào)用。參數(shù)events用來從內(nèi)核得到事件的集合,maxevents告之內(nèi)核這個events有多大,這個 maxevents的值不能大于創(chuàng)建epoll_create()時的size,參數(shù)timeout是超時時間(毫秒,0會立即返回,-1阻塞)。該函數(shù)返回需要處理的事件數(shù)目,如返回0表示已超時。
4、關(guān)于ET、LT兩種工作模式:
可以得出這樣的結(jié)論:
ET模式僅當(dāng)狀態(tài)發(fā)生變化的時候才獲得通知,這里所謂的狀態(tài)的變化并不包括緩沖區(qū)中還有未處理的數(shù)據(jù),也就是說,如果要采用ET模式,需要一直read/write直到出錯為止,很多人反映為什么采用ET模式只接收了一部分?jǐn)?shù)據(jù)就再也得不到通知了,大多因?yàn)檫@樣;而LT模式是只要有數(shù)據(jù)沒有處理就會一直通知下去的.
那么究竟如何來使用epoll呢?其實(shí)非常簡單。
通過在包含一個頭文件#include <sys/epoll.h> 以及幾個簡單的API將可以大大的提高你的網(wǎng)絡(luò)服務(wù)器的支持人數(shù)。
首先通過create_epoll(int maxfds)來創(chuàng)建一個epoll的句柄,其中maxfds為你epoll所支持的最大句柄數(shù)。這個函數(shù)會返回一個新的epoll句柄,之后的所有操作將通過這個句柄來進(jìn)行操作。在用完之后,記得用close()來關(guān)閉這個創(chuàng)建出來的epoll句柄。
之后在你的網(wǎng)絡(luò)主循環(huán)里面,每一幀的調(diào)用epoll_wait(int epfd, epoll_event events, int max events, int timeout)來查詢所有的網(wǎng)絡(luò)接口,看哪一個可以讀,哪一個可以寫了。基本的語法為:
nfds = epoll_wait(kdpfd, events, maxevents, -1);
其中kdpfd為用epoll_create創(chuàng)建之后的句柄,events是一個epoll_event*的指針,當(dāng)epoll_wait這個函數(shù)操作成功之后,epoll_events里面將儲存所有的讀寫事件。max_events是當(dāng)前需要監(jiān)聽的所有socket句柄數(shù)。最后一個timeout是 epoll_wait的超時,為0的時候表示馬上返回,為-1的時候表示一直等下去,直到有事件范圍,為任意正整數(shù)的時候表示等這么長的時間,如果一直沒有事件,則范圍。一般如果網(wǎng)絡(luò)主循環(huán)是單獨(dú)的線程的話,可以用-1來等,這樣可以保證一些效率,如果是和主邏輯在同一個線程的話,則可以用0來保證主循環(huán)的效率。
epoll_wait范圍之后應(yīng)該是一個循環(huán),遍利所有的事件。
幾乎所有的epoll程序都使用下面的框架:
for( ; ; )
{
nfds = epoll_wait(epfd,events,20,500);
for(i=0;i<nfds;++i)
{
if(events[i].data.fd==listenfd) //有新的連接
{
connfd = accept(listenfd,(sockaddr *)&clientaddr, &clilen); //accept這個連接
ev.data.fd=connfd;
ev.events=EPOLLIN|EPOLLET;
epoll_ctl(epfd,EPOLL_CTL_ADD,connfd,&ev); //將新的fd添加到epoll的監(jiān)聽隊(duì)列中
}
else if( events[i].events&EPOLLIN ) //接收到數(shù)據(jù),讀socket
{
n = read(sockfd, line, MAXLINE)) < 0 //讀
ev.data.ptr = md; //md為自定義類型,添加數(shù)據(jù)
ev.events=EPOLLOUT|EPOLLET;
epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev);//修改標(biāo)識符,等待下一個循環(huán)時發(fā)送數(shù)據(jù),異步處理的精髓
}
else if(events[i].events&EPOLLOUT) //有數(shù)據(jù)待發(fā)送,寫socket
{
struct myepoll_data* md = (myepoll_data*)events[i].data.ptr; //取數(shù)據(jù)
sockfd = md->fd;
send( sockfd, md->ptr, strlen((char*)md->ptr), 0 ); //發(fā)送數(shù)據(jù)
ev.data.fd=sockfd;
ev.events=EPOLLIN|EPOLLET;
epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev); //修改標(biāo)識符,等待下一個循環(huán)時接收數(shù)據(jù)
}
else
{
//其他的處理
}
}
}
epoll - I/O event notification facility
/* Copyright (C) 2002-2006, 2007 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */
#ifndef _SYS_EPOLL_H
#define _SYS_EPOLL_H 1
#include <stdint.h>
#include <sys/types.h>
/* Get __sigset_t. */
#include <bits/sigset.h>
#ifndef __sigset_t_defined
# define __sigset_t_defined
typedef __sigset_t sigset_t;
#endif
enum EPOLL_EVENTS
{
EPOLLIN = 0x001,
#define EPOLLIN EPOLLIN
EPOLLPRI = 0x002,
#define EPOLLPRI EPOLLPRI
EPOLLOUT = 0x004,
#define EPOLLOUT EPOLLOUT
EPOLLRDNORM = 0x040,
#define EPOLLRDNORM EPOLLRDNORM
EPOLLRDBAND = 0x080,
#define EPOLLRDBAND EPOLLRDBAND
EPOLLWRNORM = 0x100,
#define EPOLLWRNORM EPOLLWRNORM
EPOLLWRBAND = 0x200,
#define EPOLLWRBAND EPOLLWRBAND
EPOLLMSG = 0x400,
#define EPOLLMSG EPOLLMSG
EPOLLERR = 0x008,
#define EPOLLERR EPOLLERR
EPOLLHUP = 0x010,
#define EPOLLHUP EPOLLHUP
EPOLLRDHUP = 0x2000,
#define EPOLLRDHUP EPOLLRDHUP
EPOLLONESHOT = (1 << 30),
#define EPOLLONESHOT EPOLLONESHOT
EPOLLET = (1 << 31)
#define EPOLLET EPOLLET
};
/* Valid opcodes ( "op" parameter ) to issue to epoll_ctl(). */
#define EPOLL_CTL_ADD 1 /* Add a file decriptor to the interface. */
#define EPOLL_CTL_DEL 2 /* Remove a file decriptor from the interface. */
#define EPOLL_CTL_MOD 3 /* Change file decriptor epoll_event structure. */
typedef union epoll_data
{
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;
struct epoll_event
{
uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
__BEGIN_DECLS
/* Creates an epoll instance. Returns an fd for the new instance.
The "size" parameter is a hint specifying the number of file
descriptors to be associated with the new instance. The fd
returned by epoll_create() should be closed with close(). */
extern int epoll_create (int __size) __THROW;
/* Manipulate an epoll instance "epfd". Returns 0 in case of success,
-1 in case of error ( the "errno" variable will contain the
specific error code ) The "op" parameter is one of the EPOLL_CTL_*
constants defined above. The "fd" parameter is the target of the
operation. The "event" parameter describes which events the caller
is interested in and any associated user data. */
extern int epoll_ctl (int __epfd, int __op, int __fd,
struct epoll_event *__event) __THROW;
/* Wait for events on an epoll instance "epfd". Returns the number of
triggered events returned in "events" buffer. Or -1 in case of
error with the "errno" variable set to the specific error code. The
"events" parameter is a buffer that will contain triggered
events. The "maxevents" is the maximum number of events to be
returned ( usually size of "events" ). The "timeout" parameter
specifies the maximum wait time in milliseconds (-1 == infinite).
This function is a cancellation point and therefore not marked with
__THROW. */
extern int epoll_wait (int __epfd, struct epoll_event *__events,
int __maxevents, int __timeout);
/* Same as epoll_wait, but the thread's signal mask is temporarily
and atomically replaced with the one provided as parameter.
This function is a cancellation point and therefore not marked with
__THROW. */
extern int epoll_pwait (int __epfd, struct epoll_event *__events,
int __maxevents, int __timeout,
__const __sigset_t *__ss);
__END_DECLS
#endif /* sys/epoll.h */
// epoll.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#define MAX_EVENTS 10
#define PORT 8080
//設(shè)置socket連接為非阻塞模式
void setnonblocking(int sockfd) {
int opts;
opts = fcntl(sockfd, F_GETFL);
if(opts < 0) {
perror("fcntl(F_GETFL)\n");
exit(1);
}
opts = (opts | O_NONBLOCK);
if(fcntl(sockfd, F_SETFL, opts) < 0) {
perror("fcntl(F_SETFL)\n");
exit(1);
}
}
int main()
{
struct epoll_event ev, events[MAX_EVENTS];
int addrlen, listenfd, conn_sock, nfds, epfd, fd, i, nread, n;
struct sockaddr_in local, remote;
char buf[BUFSIZ];
//創(chuàng)建listen socket
if( (listenfd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
perror("sockfd\n");
exit(1);
}
setnonblocking(listenfd);
bzero(&local, sizeof(local));
local.sin_family = AF_INET;
local.sin_addr.s_addr = htonl(INADDR_ANY);;
local.sin_port = htons(PORT);
if( bind(listenfd, (struct sockaddr *) &local, sizeof(local)) < 0) {
perror("bind\n");
exit(1);
}
listen(listenfd, 20);
epfd = epoll_create(MAX_EVENTS);
pr_debug("listenfd:%d,epfd:%d",listenfd,epfd);
if (epfd == -1) {
perror("epoll_create");
exit(EXIT_FAILURE);
}
ev.events = EPOLLIN;
ev.data.fd = listenfd;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &ev) == -1) {
perror("epoll_ctl: listen_sock");
exit(EXIT_FAILURE);
}
for (;;) {
//nfds = epoll_wait(epfd, events, MAX_EVENTS, -1);
nfds = epoll_wait(epfd, events, MAX_EVENTS, 1000);
pr_debug("nfds:%d",nfds);
if (nfds == -1) {
perror("epoll_pwait");
exit(EXIT_FAILURE);
}
for (i = 0; i < nfds; ++i)
{
fd = events[i].data.fd;
if (fd == listenfd)
{
while ((conn_sock = accept(listenfd,(struct sockaddr *) &remote, (size_t *)&addrlen)) > 0)
{
setnonblocking(conn_sock);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = conn_sock;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, conn_sock, &ev) == -1)
{
perror("epoll_ctl: add");
exit(EXIT_FAILURE);
}
}
if (conn_sock == -1) {
if (errno != EAGAIN && errno != ECONNABORTED
&& errno != EPROTO && errno != EINTR)
perror("accept");
}
continue;
}
if (events[i].events & EPOLLIN)
{
n = 0;
while ((nread = read(fd, buf + n, BUFSIZ-1)) > 0) {
n += nread;
}
if (nread == -1 && errno != EAGAIN) {
perror("read error");
}
ev.data.fd = fd;
ev.events = events[i].events | EPOLLOUT;
if (epoll_ctl(epfd, EPOLL_CTL_MOD, fd, &ev) == -1) {
perror("epoll_ctl: mod");
}
}
if (events[i].events & EPOLLOUT)
{
sprintf(buf, "HTTP/1.1 200 OK\r\nContent-Length: %d\r\n\r\nHello World", 11);
int nwrite, data_size = strlen(buf);
n = data_size;
while (n > 0) {
nwrite = write(fd, buf + data_size - n, n);
if (nwrite < n) {
if (nwrite == -1 && errno != EAGAIN) {
perror("write error");
}
break;
}
n -= nwrite;
}
close(fd);
}
}
}
return 0;
}