posts - 297, comments - 15, trackbacks - 0

本文檔的Copyleft歸yfydz所有，使用GPL發(fā)布，可以自由拷貝，轉(zhuǎn)載，轉(zhuǎn)載時(shí)請(qǐng)保持文檔的完整性，嚴(yán)禁用于任何商業(yè)用途。
msn: yfydz_no1@hotmail.com
來(lái)源：http://yfydz.cublog.cn

1. 前言

本文介紹linux內(nèi)核中一些常用的數(shù)據(jù)結(jié)構(gòu)和操作。

2. 雙向鏈表(list)

linux內(nèi)核中的雙向鏈表通過(guò)結(jié)構(gòu) struct list_head來(lái)將各個(gè)節(jié)點(diǎn)連接起來(lái)，此結(jié)構(gòu)會(huì)作為鏈表元素結(jié)構(gòu)中的一個(gè)參數(shù)：

struct list_head {
struct list_head *next, *prev;
};

鏈表頭的初始化，注意，結(jié)構(gòu)中的指針為NULL并不是初始化，而是指向自身才是初始化，如果只是按普通情況下的置為NULL，而不是指向自身，系統(tǒng)會(huì)崩潰，這是一個(gè)容易犯的錯(cuò)誤：

#define LIST_HEAD_INIT(name) { &(name), &(name) }

#define LIST_HEAD(name) \
struct list_head name = LIST_HEAD_INIT(name)

#define INIT_LIST_HEAD(ptr) do { \
(ptr)->next = (ptr); (ptr)->prev = (ptr); \
} while (0)

最常用的鏈表操作：

插入到鏈表頭:
void list_add(struct list_head *new, struct list_head *head);

插入到鏈表尾:
void list_add_tail(struct list_head *new, struct list_head *head);

刪除鏈表節(jié)點(diǎn):
void list_del(struct list_head *entry);

將節(jié)點(diǎn)移動(dòng)到另一鏈表:
void list_move(struct list_head *list, struct list_head *head);

將節(jié)點(diǎn)移動(dòng)到鏈表尾:
void list_move_tail(struct list_head *list,struct list_head *head);

判斷鏈表是否為空，返回1為空，0非空
int list_empty(struct list_head *head);

把兩個(gè)鏈表拼接起來(lái)：
void list_splice(struct list_head *list, struct list_head *head)；

取得節(jié)點(diǎn)指針：
#define list_entry(ptr, type, member) \
((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

遍歷鏈表中每個(gè)節(jié)點(diǎn)：
#define list_for_each(pos, head) \
for (pos = (head)->next, prefetch(pos->next); pos != (head); \
pos = pos->next, prefetch(pos->next))

逆向循環(huán)鏈表中每個(gè)節(jié)點(diǎn)：
#define list_for_each_prev(pos, head) \
for (pos = (head)->prev, prefetch(pos->prev); pos != (head); \
pos = pos->prev, prefetch(pos->prev))

舉例：

LISH_HEAD(mylist);

struct my_list{
struct list_head list;
int data;
};

static int ini_list(void)
{
struct my_list *p;
int i;
for(i=0; i<100; i++){
p=kmalloc(sizeof(struct my_list), GFP_KERNEL);
list_add(&p->list, &mylist);
}
}

在內(nèi)存中形成如下結(jié)構(gòu)的一個(gè)雙向鏈表：

+---------------------------------------------------------------+
|                                                               |
| mylist         99            98                     0        |
| +----+    +---------+    +---------+           +---------+   |
+->|next|--->|list.next|--->|list.next|--->...--->|list.next|---+
     |----|    |---------|    |---------|           |---------|
+--|prev|<---|list.prev|<---|list.prev|<---...<---|list.prev|<--+
| +----+    |---------|    |---------|           |---------|   |
|            | data   |    | data   |           | data   |   |
|            +---------+    +---------+           +---------+   |
|                                                               |
+---------------------------------------------------------------+

知道了鏈表頭就能遍歷整個(gè)鏈表，如果是用list_add()插入新節(jié)點(diǎn)的話，從鏈表頭的next方向看是一個(gè)堆棧型。

從鏈表中刪除節(jié)點(diǎn)很容易：

static void del_item(struct my_list *p)
{
list_del(&p->list, &mylist);
kfree(p);
}

最重要的宏是list_entry，這個(gè)宏的思路是根據(jù)鏈表元素結(jié)構(gòu)中鏈表頭結(jié)構(gòu)list_head的地址推算出鏈表元素結(jié)構(gòu)的實(shí)際地址：

#define list_entry(ptr, type, member) \
((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

ptr是鏈表元素結(jié)構(gòu)(如struct my_list)中鏈表頭結(jié)構(gòu)list_head的地址
member是鏈表元素結(jié)構(gòu)(如struct my_list)中鏈表頭結(jié)構(gòu)list_head參數(shù)的名稱
type是鏈表元素結(jié)構(gòu)類型(如struct my_list)

計(jì)算原理是根據(jù)鏈表頭結(jié)構(gòu)list_head的地址減去其在鏈表元素結(jié)構(gòu)中的偏移位置而得到鏈表元素結(jié)構(gòu)的地址。

例如：

static void print_list(void)
{
struct list_head *cur;
struct my_list *p;

list_for_each(cur, &mylist){
p=list_entry(cur, struct my_list, list);
printk("data=%d\n", p->data);
}
}

優(yōu)點(diǎn)：

這樣就可以用相同的數(shù)據(jù)處理方式來(lái)描述所有雙向鏈表，不用再單獨(dú)為各個(gè)鏈表編寫(xiě)各種編輯函數(shù)。

缺點(diǎn)：
1) 鏈表頭中元素置為NULL不是初始化，與普通習(xí)慣不同；
2) 仍然需要單獨(dú)編寫(xiě)各自的刪除整個(gè)鏈表的函數(shù)，不能統(tǒng)一處理，因?yàn)椴荒鼙ＷC所有鏈表元素結(jié)構(gòu)中鏈表頭結(jié)構(gòu)list_head的偏移地址都是相同的，當(dāng)然如果把鏈表頭結(jié)構(gòu)list_head都作為鏈表元素結(jié)構(gòu)的第一個(gè)參數(shù)，就可以用統(tǒng)一的刪除整個(gè)鏈表的函數(shù)。

3. HASH表

HASH表適用于不需要對(duì)整個(gè)空間元素進(jìn)行排序，而是只需要能快速找到某個(gè)元素的場(chǎng)合，是一種以空間換時(shí)間的方法，本質(zhì)也是線性表，但由一個(gè)大的線性表拆分為了多個(gè)小線性表，由于只需要查找小表，因此搜索速度就會(huì)線性查整個(gè)大表提高很多，理想情況下，有多少個(gè)小線性表，搜索速度就提高了多少倍，通常把小線性表的表頭綜合為一個(gè)數(shù)組，大小就是HASH表的數(shù)量。

HASH表速度的關(guān)鍵是HASH函數(shù)的設(shè)計(jì)，HASH函數(shù)根據(jù)每個(gè)元素中固定的參數(shù)進(jìn)行計(jì)算，算出一個(gè)不大于HASH表數(shù)量的索引值，表示該元素需要放在該索引號(hào)對(duì)應(yīng)的那個(gè)表中，對(duì)于固定的參數(shù)，計(jì)算結(jié)果始終是固定的，但對(duì)于不同的參數(shù)值，希望計(jì)算出來(lái)的結(jié)果能盡可能地平均到每個(gè)索引值，HASH函數(shù)計(jì)算得越平均，表示每個(gè)小表中元素的數(shù)量都會(huì)差不多，這樣搜索性能將越好。HASH函數(shù)也要盡可能的簡(jiǎn)單，以減少計(jì)算時(shí)間，常用的算法是將參數(shù)累加求模，在include/linux/jhash.h中已經(jīng)定義了一些HASH計(jì)算函數(shù)，可直接使用。

HASH表在路由cache表，狀態(tài)連接表等處用得很多。

舉例，連接跟蹤中根據(jù)tuple值計(jì)算HASH：

// net/ipv4/netfilter/ip_conntrack_core.c

u_int32_t
hash_conntrack(const struct ip_conntrack_tuple *tuple)
{
#if 0
dump_tuple(tuple);
#endif
return (jhash_3words(tuple->src.ip,
                      (tuple->dst.ip ^ tuple->dst.protonum),
                      (tuple->src.u.all | (tuple->dst.u.all << 16)),
                      ip_conntrack_hash_rnd) % ip_conntrack_htable_size);
}

// include/linux/jhash.h
static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
{
a += JHASH_GOLDEN_RATIO;
b += JHASH_GOLDEN_RATIO;
c += initval;

__jhash_mix(a, b, c);

return c;
}

4. 定時(shí)器(timer)

linux內(nèi)核定時(shí)器由以下結(jié)構(gòu)描述：

/* include/linux/timer.h */
struct timer_list {
struct list_head list;
unsigned long expires;
unsigned long data;
void (*function)(unsigned long);
};

list：timer鏈表
expires：到期時(shí)間
function：到期函數(shù)，時(shí)間到期時(shí)調(diào)用的函數(shù)
data：傳給到期函數(shù)的數(shù)據(jù)，實(shí)際應(yīng)用中通常是一個(gè)指針轉(zhuǎn)化而來(lái)，該指針指向一個(gè)結(jié)構(gòu)

timer的操作：

增加timer，將timer掛接到系統(tǒng)的timer鏈表：
extern void add_timer(struct timer_list * timer);

刪除timer，將timer從系統(tǒng)timer鏈表中拆除：
extern int del_timer(struct timer_list * timer);
(del_timer()函數(shù)可能會(huì)失敗，這是因?yàn)樵搕imer本來(lái)已經(jīng)不在系統(tǒng)timer鏈表中了，也就是已經(jīng)刪除過(guò)了)

對(duì)于SMP系統(tǒng)，刪除timer最好使用下面的函數(shù)來(lái)防止沖突：
extern int del_timer_sync(struct timer_list * timer);

修改timer，修改timer的到期時(shí)間：
int mod_timer(struct timer_list *timer, unsigned long expires);

通常用法：
struct timer_list通常作為數(shù)據(jù)結(jié)構(gòu)中的一個(gè)參數(shù)，在初始化結(jié)構(gòu)的時(shí)候初始化timer，表示到期時(shí)要進(jìn)行的操作，實(shí)現(xiàn)定時(shí)動(dòng)作，通常更多的是作為超時(shí)處理的，timer函數(shù)作為超時(shí)時(shí)的資源釋放函數(shù)。注意：如果超時(shí)了運(yùn)行超時(shí)函數(shù)，此時(shí)系統(tǒng)是處在時(shí)鐘中斷的bottom half里的，不能進(jìn)行很復(fù)雜的操作，如果要完成一些復(fù)雜操作，如到期后的數(shù)據(jù)發(fā)送，不能直接在到期函數(shù)中處理，而是應(yīng)該在到期函數(shù)中發(fā)個(gè)信號(hào)給特定內(nèi)核線程轉(zhuǎn)到top half進(jìn)行處理。

為判斷時(shí)間的先后，內(nèi)核中定義了以下宏來(lái)判斷：

#define time_after(a,b) ((long)(b) - (long)(a) < 0)
#define time_before(a,b) time_after(b,a)

#define time_after_eq(a,b) ((long)(a) - (long)(b) >= 0)
#define time_before_eq(a,b) time_after_eq(b,a)

這里用到了一個(gè)技巧，由于linux中的時(shí)間是無(wú)符號(hào)數(shù)，這里先將其轉(zhuǎn)換為有符號(hào)數(shù)后再判斷，就能解決時(shí)間回繞問(wèn)題，當(dāng)然只是一次回繞，回繞兩次當(dāng)然是判斷不出來(lái)的，具體可自己實(shí)驗(yàn)體會(huì)。

5. 內(nèi)核線程(kernel_thread)

內(nèi)核中新線程的建立可以用kernel_thread函數(shù)實(shí)現(xiàn)，該函數(shù)在kernel/fork.c中定義：

long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)

fn：內(nèi)核線程主函數(shù)；
arg：線程主函數(shù)的參數(shù)；
flags：建立線程的標(biāo)志；

內(nèi)核線程函數(shù)通常都調(diào)用daemonize()進(jìn)行后臺(tái)化作為一個(gè)獨(dú)立的線程運(yùn)行，然后設(shè)置線程的一些參數(shù)，如名稱，信號(hào)處理等，這也不是必須的，然后就進(jìn)入一個(gè)死循環(huán)，這是線程的主體部分，這個(gè)循環(huán)不能一直在運(yùn)行，否則系統(tǒng)就死在這了，或者是某種事件驅(qū)動(dòng)的，在事件到來(lái)前是睡眠的，事件到來(lái)后喚醒進(jìn)行操作，操作完后繼續(xù)睡眠；或者是定時(shí)睡眠，醒后操作完再睡眠；或者加入等待隊(duì)列通過(guò)schedule()調(diào)度獲得執(zhí)行時(shí)間。總之是不能一直占著 CPU。

以下是內(nèi)核線程的一個(gè)實(shí)例，取自kernel/context.c:

int start_context_thread(void)
{
static struct completion startup __initdata = COMPLETION_INITIALIZER(startup);

kernel_thread(context_thread, &startup, CLONE_FS | CLONE_FILES);
wait_for_completion(&startup);
return 0;
}

static int context_thread(void *startup)
{
struct task_struct *curtask = current;
DECLARE_WAITQUEUE(wait, curtask);
struct k_sigaction sa;

daemonize();
strcpy(curtask->comm, "keventd");
keventd_running = 1;
keventd_task = curtask;

spin_lock_irq(&curtask->sigmask_lock);
siginitsetinv(&curtask->blocked, sigmask(SIGCHLD));
recalc_sigpending(curtask);
spin_unlock_irq(&curtask->sigmask_lock);

complete((struct completion *)startup);

/* Install a handler so SIGCLD is delivered */
sa.sa.sa_handler = SIG_IGN;
sa.sa.sa_flags = 0;
siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);

/*
* If one of the functions on a task queue re-adds itself
* to the task queue we call schedule() in state TASK_RUNNING
*/
for (;;) {
  set_task_state(curtask, TASK_INTERRUPTIBLE);
  add_wait_queue(&context_task_wq, &wait);
  if (TQ_ACTIVE(tq_context))
   set_task_state(curtask, TASK_RUNNING);
  schedule();
  remove_wait_queue(&context_task_wq, &wait);
  run_task_queue(&tq_context);
  wake_up(&context_task_done);
  if (signal_pending(curtask)) {
   while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG) > 0)
    ;
   spin_lock_irq(&curtask->sigmask_lock);
   flush_signals(curtask);
   recalc_sigpending(curtask);
   spin_unlock_irq(&curtask->sigmask_lock);
  }
}
}

6. 結(jié)構(gòu)地址

在C中，結(jié)構(gòu)地址和結(jié)構(gòu)中第一個(gè)元素的地址是相同的，因此在linux內(nèi)核中經(jīng)常出現(xiàn)使用結(jié)構(gòu)第一個(gè)元素的地址來(lái)表示結(jié)構(gòu)地址的情況，在讀代碼時(shí)要注意這一點(diǎn)，這和list_entry宏的意思一樣。

如：
struct my_struct{
int a;
int b;
}c;

if(&c == &c.a){ // always true
...
}

from:

http://blog.chinaunix.net/u/12313/showart_109612.html

posted on 2010-04-01 19:59 chatler 閱讀(339) 評(píng)論(0) 編輯收藏引用所屬分類: linux kernel

只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。
【推薦】100%開(kāi)源！大型工業(yè)跨平臺(tái)軟件C++源碼提供，建模，組態(tài)！

相關(guān)文章: ubuntu下編譯內(nèi)核 <轉(zhuǎn)>how to start a kernel thread The Linux Kernel Module Programming Guide A Beast of a Different Nature HOWTO compile kernel modules for the kernel 2.6 Linux內(nèi)核中的一些基本編程操作如何在Linux內(nèi)核中寫(xiě)文件 What is the difference between user level threads and kernel level threads? Linux 內(nèi)核筆記2 – 進(jìn)程調(diào)度 linux塊設(shè)備，字符設(shè)備

網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問(wèn) Chat2DB 管理

2010年4月

日

一

二

三

四

五

六

常用鏈接

留言簿(10)

隨筆分類(307)

隨筆檔案(297)

algorithm

andytan
algorithm, linux, os, network,etc
EXACT STRING MATCHING ALGORITHMS
httperf -- a web perf test tool
Java多線程
編程夜未眠
布薩空間
結(jié)構(gòu)之法
沈一峰 google技術(shù)博客
小兵的窩

Books_Free_Online

Book Fire Center

C++

Bjarne Stroustrup's C++ Style and Technique FAQ
boyplayee column
C Plus Plus
CPP Reference
LearnC++Website
Welcome to Bjarne Stroustrup's homepage!

database

mydear Database
mysql高性能筆記

Linux

獨(dú)孤閣

Linux shell

linux
飛翔

linux socket

linux socket programming
sock programming

misce

cloudward
感覺(jué)這個(gè)博客還是不錯(cuò)，雖然做的東西和我不大相關(guān)，覺(jué)得看看還是有好處的

network

nginx

OSS

Google Android
Android is a software stack for mobile devices that includes an operating system, middleware and key applications. This early look at the Android SDK provides the tools and APIs necessary to begin developing applications on the Android platform using the Java programming language.
os161 file list

overall

linux related
linux_overall
loop_in_nodes
tiaot
Ubuntu Zone
陳皓專欄
享受編程的樂(lè)趣