Linux网络编程

IO复用 select poll epoll

1 man select

NAME

select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO - synchronous I/O multiplexing

SYNOPSIS

/* According to POSIX.1-2001, POSIX.1-2008 */

#include <sys/select.h>

/* According to earlier standards */

#include <sys/time.h>

#include <sys/types.h>

#include <unistd.h>

int select(int nfds, fd_set *readfds, fd_set *writefds,

fd_set *exceptfds, struct timeval *timeout);

void FD_CLR(int fd, fd_set *set);

int FD_ISSET(int fd, fd_set *set);

void FD_SET(int fd, fd_set *set);

void FD_ZERO(fd_set *set);

#include <sys/select.h>

int pselect(int nfds, fd_set *readfds, fd_set *writefds,

fd_set *exceptfds, const struct timespec *timeout,

const sigset_t *sigmask);

2 pselect和select

pselect 和 select

http://www.cnblogs.com/diegodu/p/3988103.html

pselect和select 这两个函数基本上是一致，但是有三个区别：

第一点 select函数用的timeout参数，是一个timeval的结构体（包含秒和微秒），然而pselect用的是一个timespec结构体（包含秒和纳秒）

第二点 select函数可能会为了指示还剩多长时间而更新timeout参数，然而pselect不会改变timeout参数

第三点 select函数没有sigmask参数，当pselect的sigmask参数为null时，两者行为时一致的。有sigmask的时候，pselect相当于如下的select()函数，在进入select()函数之前手动将信号的掩码改变，并保存之前的掩码值；select()函数执行之后，再恢复为之前的信号掩码值。

3 select 源码

select 实现分析 –2 【整理】

http://www.cnblogs.com/apprentice89/archive/2013/05/09/3070051.html

我写的select函数源码简析

http://blog.csdn.net/bboxhe/article/details/77367896

[网络编程]select流程分析

http://blog.csdn.net/hsly_support/article/details/8829901

select、poll、epoll之间的区别总结[整理]

http://www.cnblogs.com/Anker/p/3265058.html

Poll机制分析（转韦东山）

http://blog.csdn.net/frankyzhangc/article/details/6692210

[网络编程]select流程分析

http://blog.csdn.net/hsly_support/article/details/8829901

SYSCALL_DEFINE5(select, int, n, fd_set __user *, inp, fd_set __user *, outp, fd_set __user *, exp, struct timeval __user *, tvp)

定义出来就是sys_select

对每一个fd调用fop->poll() => poll_wait() => __pollwait()

1. poll > sys_poll > do_sys_poll >poll_initwait，poll_initwait函数注册一下回调函数__pollwait，它就是我们的驱动程序执行poll_wait时，真正被调用的函数。

2. 接下来执行file->f_op->poll，即我们驱动程序里自己实现的poll函数

它会调用poll_wait把自己挂入某个队列，这个队列也是我们的驱动自己定义的；

它还判断一下设备是否就绪。

4 select流程

4.1 sys_select()函数

sys_select()函数里将时间参数从用户态拷贝到内核态，然后调用core_sys_select()函数。

4.2 core_sys_select()函数

core_sys_select()里创建stack_fds数组，将用户态的读、写、异常描述符列表拷贝到stack_fds里。每一个文件描述符用一个位表示，其中1表示这个文件是被监视的。并且结果也是存放在stack_fds数组里的。然后调用do_select()函数。

4.3 do_select()函数

do_select()函数里初始化一些辅助变量，注册回调函数__pollwait，然后进入for循环，遍历每个待检测描述符的状态。里面有三层for循环。

最外层用于扫描一遍就睡眠，等待环境，重新扫描。

中间一层用于一次扫描unsigned long类型长度的描述符，也就是32bit。一次扫描32位可以加快速度，如果有值为1的位，说明这里面有需要检测的描述符。

然后进入最里层循环一个个扫描，否则跳过。这里面从文件描述符获得file结构体，得到f_op，然后调用*f_op->poll询问驱动。如果可读、可写、或者异常，则直接返回结果，mask掩码，然后把返回值retval加一。当*f_op->poll调用没有事件就绪时，驱动就会调用 __pollwait将当前进程加入到设备的等待队列里面，一旦就绪，驱动就会唤醒队列里的进程。

最后完成里面两层循环，在最外层循环的最后有三种跳出情况，一检测返回值retval是否不为0，二、超时，三、收到信号。否则进入poll_schedule_timeout()函数，进入可中断的睡眠。当有描述符就绪时，进程就会被唤醒。并且select只会休眠一次，有个triggered标记会被记1，一旦被唤醒不会再次休眠。

5 select的缺点

（1）每次调用select都需要把fd从用户态拷贝到内核态，开销大

（2）每次获取就绪描述符，都需要在内核（用户态也需要）遍历一遍所有的fd，开销大

（3）由于select的设计缺陷，最大可检测的描述符很少，默认是1024

1 man poll

NAME

poll, ppoll - wait for some event on a file descriptor

SYNOPSIS

#include <poll.h>

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

#define _GNU_SOURCE /* See feature_test_macros(7) */

#include <signal.h>

#include <poll.h>

int ppoll(struct pollfd *fds, nfds_t nfds,

const struct timespec *tmo_p, const sigset_t *sigmask);

2 poll源码

Poll机制分析（转韦东山）

http://blog.csdn.net/frankyzhangc/article/details/6692210

3 poll流程

poll的实现和select非常相似，只是描述fd集合的方式不同，poll使用pollfd结构而不是select的fd_set结构

当应用程序调用poll函数的时候，会调用到系统调用sys_poll函数，该函数最终调用do_poll函数，do_poll函数中有一个死循环，在里面又会利用do_pollfd函数去调用驱动中的poll函数（fds中每个成员的字符驱动程序都会被扫描到），驱动程序中的Poll函数的工作有两个，一个就是调用poll_wait 函数，把进程挂到等待队列中去（这个是必须的，你要睡眠，必须要在一个等待队列上面，否则到哪里去唤醒你呢？？），另一个是确定相关的fd是否有内容可读，如果可读，就返回1，否则返回0，如果返回1 ，do_poll函数中的count++，然后 do_poll函数然后判断三个条件（if (count ||!timeout || signal_pending(current))）如果成立就直接跳出，如果不成立，就睡眠timeout个jiffes这么长的时间（调用schedule_timeout实现睡眠），如果在这段时间内没有其他进程去唤醒它，那么第二次执行判断的时候就会跳出死循环。如果在这段时间内有其他进程唤醒它，那么也可以跳出死循环返回（例如我们可以利用中断处理函数去唤醒它，这样的话一有数据可读，就可以让它立即返回）。

4 poll的缺点

与select相似，只是没有描述符数量限制（最大为系统限制）

（1）每次调用select都需要把fd从用户态拷贝到内核态，开销大

（2）每次获取就绪描述符，都需要在内核遍历一遍所有的fd，开销大

1 man epoll

man epoll

NAME

epoll - I/O event notification facility

SYNOPSIS

#include <sys/epoll.h>

DESCRIPTION

The epoll API performs a similar task to poll(2): monitoring multiple file descriptors to

see if I/O is possible on any of them. The epoll API can be used either as an edge-trig‐

gered or a level-triggered interface and scales well to large numbers of watched file

descriptors. The following system calls are provided to create and manage an epoll

instance:

* epoll_create(2) creates an epoll instance and returns a file descriptor referring to that

instance. (The more recent epoll_create1(2) extends the functionality of epoll_cre‐

ate(2).)

* Interest in particular file descriptors is then registered via epoll_ctl(2). The set of

file descriptors currently registered on an epoll instance is sometimes called an epoll

set.

* epoll_wait(2) waits for I/O events, blocking the calling thread if no events are cur‐

rently available.

man epoll_create

NAME

epoll_create, epoll_create1 - open an epoll file descriptor

SYNOPSIS

#include <sys/epoll.h>

int epoll_create(int size);

int epoll_create1(int flags);

man epoll_ctl

NAME

epoll_ctl - control interface for an epoll descriptor

SYNOPSIS

#include <sys/epoll.h>

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

DESCRIPTION

This system call performs control operations on the epoll(7) instance referred to by the

file descriptor epfd. It requests that the operation op be performed for the target file

descriptor, fd.

Valid values for the op argument are:

EPOLL_CTL_ADD

file descriptor epfd and associate the event event with the internal file linked to

fd.

EPOLL_CTL_MOD

Change the event event associated with the target file descriptor fd.

EPOLL_CTL_DEL

Remove (deregister) the target file descriptor fd from the epoll instance referred

to by epfd. The event is ignored and can be NULL (but see BUGS below).

The event argument describes the object linked to the file descriptor fd. The struct

epoll_event is defined as:

typedef union epoll_data {

void *ptr;

int fd;

uint32_t u32;

uint64_t u64;

} epoll_data_t;

struct epoll_event {

uint32_t events; /* Epoll events */

epoll_data_t data; /* User data variable */

};

The events member is a bit mask composed using the following available event types:

EPOLLIN

The associated file is available for read(2) operations.

EPOLLOUT

The associated file is available for write(2) operations.

EPOLLRDHUP (since Linux 2.6.17)

Stream socket peer closed connection, or shut down writing half of connection.

(This flag is especially useful for writing simple code to detect peer shutdown

when using Edge Triggered monitoring.)

EPOLLPRI

There is urgent data available for read(2) operations.

EPOLLERR

Error condition happened on the associated file descriptor. epoll_wait(2) will

always wait for this event; it is not necessary to set it in events.

EPOLLHUP

Hang up happened on the associated file descriptor. epoll_wait(2) will always wait

for this event; it is not necessary to set it in events. Note that when reading

from a channel such as a pipe or a stream socket, this event merely indicates that

the peer closed its end of the channel. Subsequent reads from the channel will

return 0 (end of file) only after all outstanding data in the channel has been con‐

sumed.

EPOLLET

Sets the Edge Triggered behavior for the associated file descriptor. The default

behavior for epoll is Level Triggered. See epoll(7) for more detailed information

about Edge and Level Triggered event distribution architectures.

EPOLLONESHOT (since Linux 2.6.2)

Sets the one-shot behavior for the associated file descriptor. This means that

after an event is pulled out with epoll_wait(2) the associated file descriptor is

internally disabled and no other events will be reported by the epoll interface.

The user must call epoll_ctl() with EPOLL_CTL_MOD to rearm the file descriptor with

a new event mask.

EPOLLWAKEUP (since Linux 3.5)

If EPOLLONESHOT and EPOLLET are clear and the process has the CAP_BLOCK_SUSPEND

capability, ensure that the system does not enter "suspend" or "hibernate" while

this event is pending or being processed. The event is considered as being "pro‐

cessed" from the time when it is returned by a call to epoll_wait(2) until the next

call to epoll_wait(2) on the same epoll(7) file descriptor, the closure of that

file descriptor, the removal of the event file descriptor with EPOLL_CTL_DEL, or

the clearing of EPOLLWAKEUP for the event file descriptor with EPOLL_CTL_MOD. See

also BUGS.

man epoll_wait

NAME

epoll_wait, epoll_pwait - wait for an I/O event on an epoll file descriptor

SYNOPSIS

#include <sys/epoll.h>

int epoll_wait(int epfd, struct epoll_event *events,

int maxevents, int timeout);

int epoll_pwait(int epfd, struct epoll_event *events,

int maxevents, int timeout,

const sigset_t *sigmask);

epoll工作模式

epoll对文件描述符的操作有两种模式：LT（level trigger）和ET（edge trigger）。LT模式是默认模式，LT模式与ET模式的区别如下：

　　LT模式：当epoll_wait检测到描述符事件发生并将此事件通知应用程序，应用程序可以不立即处理该事件。下次调用epoll_wait时，会再次响应应用程序并通知此事件。

　　ET模式：当epoll_wait检测到描述符事件发生并将此事件通知应用程序，应用程序必须立即处理该事件。如果不处理，下次调用epoll_wait时，不会再次响应应用程序并通知此事件。

　　ET模式在很大程度上减少了epoll事件被重复触发的次数，因此效率要比LT模式高。epoll工作在ET模式的时候，必须使用非阻塞套接口，以避免由于一个文件句柄的阻塞读/阻塞写操作把处理多个文件描述符的任务饿死。

2 epoll源码

select、poll、epoll之间的区别总结[整理]

http://www.cnblogs.com/Anker/p/3265058.html

epoll源码实现分析[整理]

http://www.cnblogs.com/apprentice89/p/3234677.html

epoll_ctl -> ep_insert() ep_remove() ep_modify()

sys_epoll_wait() -> ep_poll()

epoll实现中共享内存问题？

https://www.zhihu.com/question/39792257

关于epoll里面有没有使用mmap，从2.6.32源码里面看是没有的。

注意：epoll里面没有使用mmap技术，各大博客很多都是错的

3 epoll流程

epoll不同于poll和select，它需要使用三个函数，epoll_create() 、epoll_ctl() 、epoll_wait() 。

epoll_create()创建一个epoll句柄（描述符）

epoll_ctl()可添加、删除、修改待监听的描述符事件

epoll_wait()等待事件产生，类似select()、poll() 调用

首先使用epoll_create()创建一个epoll句柄，然后调用epoll_ctl()参数EPOLL_CTL_ADD将需要监听的描述符事件注册上去。epoll_ctl()里的ep_insert()会把ep事件挂入红黑树ep->rbr里。以及调用tfile->f_op->poll，将回调函数ep_poll_callback()挂载到驱动里。当设备就绪时，回调函数就会把就绪的描述符加入就绪链表&ep->rdllist。

最后调用epoll_wait()等待事件产生。函数进入for循环，检测就绪链表&ep->rdllist是否为空，空则没有事件就绪，就会调用 schedule_timeout()睡眠，直到被唤醒，再次检测。有三个跳出条件，就绪链表不为空，超时，有未处理信号。

4 epoll的优点

（1）待监听的描述符事件集合在内核中用红黑树保存，不需要每次拷贝全部事件集合。

（2）获取就绪的描述符，不需要遍历一遍事件集合，由回调函数直接将就绪描述符事件挂入就绪链表，拷贝回用户态。

（3）没有监听数量限制，只受系统最大打开文件数量限制

发表评论 取消回复

发表评论取消回复