精产国品久久一二三产区区别,亚洲精品午夜国产va久久,久久久久久久97

chatler — Tue, 13 Sep 2011 16:02:00 GMT

先make menuconfig�Q�选定cpu型号�Q�要不会在install内核�q��启的时候出现cpu unsupported之类的错。具体的命��o为：
sudo make-kpkg --initrd --append-to-version=dell1400 kernel_image kernel-headers
sudo dpkg -i linux-image-*.deb

references:
1.http://forum.ubuntu.org.cn/viewtopic.php?t=134404

chatler 2011-09-14 00:02 发表评论

<�?gt;how to start a kernel thread

chatler — Tue, 22 Mar 2011 13:08:00 GMT

Linux Kernel Threads in Device Drivers
Purpose
This examples shows how to create and stop a kernel thread.
The driver is implemented as a loadable module. In the init_module() routine five kernel threads are created. This kernel threads sleep one second, wake up, print a message and fall asleep again. On unload of the module (cleanup_module), the kernel threads are killed.
The example has been tested with Linux kernel 2.4.2 on Intel (uni processor only) and Alpha platform (COMPAQ Personal Workstation 500au (uni processor), DS20 and ES40 (SMP).
A version for the 2.2 kernel can be found here. Note: depending on the context of the creator of the threads the new threads may inherit properties from the parent you do not want to have. The new version avoids this by having keventd create the threads. The 2.2. kernel do not have a keventd, so this approach is not implementable there.

Functions in example
start_kthread: creates a new kernel thread. Can be called from any process context but not from interrupt. The functions blocks until the thread started.
stop_kthread: stop the thread. Can be called from any process context but the thread to be terminated. Cannot be called from interrupt context. The function blocks until the thread terminated.
init_kthread: sets the environment of the new threads. Is to be called out of the created thread.
exit_kthread: needs to be called by the thread to be terminated on exit
Creation of new Thread
A new thread is created with kernel_thread(). The thread inherits properties from its parents. To make sure that we do not get any weired properties, we let keventd create the new thread.
The new thread is created with start_kthread(). It uses a semaphore to block until the new thread is running. A down() blocks the start_kthread() routine until the corresponding up() call in init_kthread() is executed.
The new thread must call init_kthread() in order to let the creator continue.
Stop of new Thread
stop_kthread() sets a flag that the thread uses to determine whether do die or not and sends a SIGKILL to the thread. This signal causes the thread to be woken up. On wakeup it will check for the flag and then terminate itself by calling exit_kthread and returning from the thread function. With a semaphore the stop_kthread() function blocks until the thread terminated.
Initialization of new Thread
Within the new created thread, init_kthread() needs to be called. This function sets a signal mask, initialises a wait queue, the termination flag and sets a new name for the thread. With a up() call it notifies the creator that the setup is done.
Exit of new Thread
When the thread receives the notification to terminate itself, is calls the exit_kthread() function. It notifies the stop_kthread() function that it terminated with an up() call.
The new Thread itself
The new thread is implemented in the example_thread() function. It runs an endless loop (for(;;)). In the loop it falls asleep with the interruptible_sleep_on_timeout() function. It comes out of this function either when the timeout expires or when a signal got caught.
The "work" in the thread is to print out a message with printk.
Kernel Versions
The example has been tested on 2.4.2.
Example Device Driver Code
The example consists of four files: kthread.h, kthread.c, thread_drv.c and a Makefile
kthread.h
#ifndef _KTHREAD_H
#define _KTHREAD_H
#include
#include

#include
#include
#include
#include

#include
#include

/* a structure to store all information we need
for our thread */
typedef struct kthread_struct
{
/* private data */

/* Linux task structure of thread */
struct task_struct *thread;
/* Task queue need to launch thread */
struct tq_struct tq;
/* function to be started as thread */
void (*function) (struct kthread_struct *kthread);
/* semaphore needed on start and creation of thread. */
struct semaphore startstop_sem;

/* public data */

/* queue thread is waiting on. Gets initialized by
init_kthread, can be used by thread itself.
*/
wait_queue_head_t queue;
/* flag to tell thread whether to die or not.
When the thread receives a signal, it must check
the value of terminate and call exit_kthread and terminate
if set.
*/
int terminate;
/* additional data to pass to kernel thread */
void *arg;
} kthread_t;

/* prototypes */

/* start new kthread (called by creator) */
void start_kthread(void (*func)(kthread_t *), kthread_t *kthread);

/* stop a running thread (called by "killer") */
void stop_kthread(kthread_t *kthread);

/* setup thread environment (called by new thread) */
void init_kthread(kthread_t *kthread, char *name);

/* cleanup thread environment (called by thread upon receiving termination signal) */
void exit_kthread(kthread_t *kthread);

#endif

kthread.c
#include
#include

#if defined(MODVERSIONS)
#include
#endif
#include
#include
#include
#include
#include

#include
#include

#include "kthread.h"

/* private functions */
static void kthread_launcher(void *data)
{
kthread_t *kthread = data;
kernel_thread((int (*)(void *))kthread->function, (void *)kthread, 0);

}

/* public functions */

/* create a new kernel thread. Called by the creator. */
void start_kthread(void (*func)(kthread_t *), kthread_t *kthread)
{
/* initialize the semaphore:
we start with the semaphore locked. The new kernel
thread will setup its stuff and unlock it. This
control flow (the one that creates the thread) blocks
in the down operation below until the thread has reached
the up() operation.
*/
init_MUTEX_LOCKED(&kthread->startstop_sem);

/* store the function to be executed in the data passed to
the launcher */
kthread->function=func;

/* create the new thread my running a task through keventd */

/* initialize the task queue structure */
kthread->tq.sync = 0;
INIT_LIST_HEAD(&kthread->tq.list);
kthread->tq.routine = kthread_launcher;
kthread->tq.data = kthread;

/* and schedule it for execution */
schedule_task(&kthread->tq);

/* wait till it has reached the setup_thread routine */
down(&kthread->startstop_sem);

}

/* stop a kernel thread. Called by the removing instance */
void stop_kthread(kthread_t *kthread)
{
if (kthread->thread == NULL)
{
printk("stop_kthread: killing non existing thread!\n");
return;
}

/* this function needs to be protected with the big
kernel lock (lock_kernel()). The lock must be
grabbed before changing the terminate
flag and released after the down() call. */
lock_kernel();

/* initialize the semaphore. We lock it here, the
leave_thread call of the thread to be terminated
will unlock it. As soon as we see the semaphore
unlocked, we know that the thread has exited.
*/
init_MUTEX_LOCKED(&kthread->startstop_sem);

/* We need to do a memory barrier here to be sure that
the flags are visible on all CPUs.
*/
mb();

/* set flag to request thread termination */
kthread->terminate = 1;

/* We need to do a memory barrier here to be sure that
the flags are visible on all CPUs.
*/
mb();
kill_proc(kthread->thread->pid, SIGKILL, 1);

/* block till thread terminated */
down(&kthread->startstop_sem);

/* release the big kernel lock */
unlock_kernel();

/* now we are sure the thread is in zombie state. We
notify keventd to clean the process up.
*/
kill_proc(2, SIGCHLD, 1);

}

/* initialize new created thread. Called by the new thread. */
void init_kthread(kthread_t *kthread, char *name)
{
/* lock the kernel. A new kernel thread starts without
the big kernel lock, regardless of the lock state
of the creator (the lock level is *not* inheritated)
*/
lock_kernel();

/* fill in thread structure */
kthread->thread = current;

/* set signal mask to what we want to respond */
siginitsetinv(¤t->blocked, sigmask(SIGKILL)|sigmask(SIGINT)|sigmask(SIGTERM));

/* initialise wait queue */
init_waitqueue_head(&kthread->queue);

/* initialise termination flag */
kthread->terminate = 0;

/* set name of this process (max 15 chars + 0 !) */
sprintf(current->comm, name);

/* let others run */
unlock_kernel();

/* tell the creator that we are ready and let him continue */
up(&kthread->startstop_sem);

}

/* cleanup of thread. Called by the exiting thread. */
void exit_kthread(kthread_t *kthread)
{
/* we are terminating */

/* lock the kernel, the exit will unlock it */
lock_kernel();
kthread->thread = NULL;
mb();

/* notify the stop_kthread() routine that we are terminating. */
up(&kthread->startstop_sem);
/* the kernel_thread that called clone() does a do_exit here. */

/* there is no race here between execution of the "killer" and real termination
of the thread (race window between up and do_exit), since both the
thread and the "killer" function are running with the kernel lock held.
The kernel lock will be freed after the thread exited, so the code
is really not executed anymore as soon as the unload functions gets
the kernel lock back.
The init process may not have made the cleanup of the process here,
but the cleanup can be done safely with the module unloaded.
*/

}

thread_drv.c
#include
#include

#include
#if defined(MODVERSIONS)
#include
#endif

#include
#include
#include
#include

#include "kthread.h"

#define NTHREADS 5

/* the variable that contains the thread data */
kthread_t example[NTHREADS];

/* prototype for the example thread */
static void example_thread(kthread_t *kthread);

/* load the module */
int init_module(void)
{
int i;

/* create new kernel threads */
for (i=0; i
start_kthread(example_thread, &example);

return(0);
}

/* remove the module */
void cleanup_module(void)
{
int i;

/* terminate the kernel threads */
for (i=0; i
stop_kthread(&example);

return;
}

/* this is the thread function that we are executing */
static void example_thread(kthread_t *kthread)
{
/* setup the thread environment */
init_kthread(kthread, "example thread");

printk("hi, here is the kernel thread\n");

/* an endless loop in which we are doing our work */
for(;;)
{
/* fall asleep for one second */
interruptible_sleep_on_timeout(&kthread->queue, HZ);

/* We need to do a memory barrier here to be sure that
the flags are visible on all CPUs.
*/
mb();

/* here we are back from sleep, either due to the timeout
(one second), or because we caught a signal.
*/
if (kthread->terminate)
{
/* we received a request to terminate ourself */
break;
}

/* this is normal work to do */
printk("example thread: thread woke up\n");
}
/* here we go only in case of termination of the thread */

/* cleanup the thread, leave */
exit_kthread(kthread);

/* returning from the thread here calls the exit functions */
}

Makefile
# set to your kernel tree
KERNEL = /usr/src/linux

# get the Linux architecture. Needed to find proper include file for CFLAGS
ARCH=$(shell uname -m | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ -e s/arm.*/arm/ -e s/sa110/arm/)
# set default flags to compile module
CFLAGS = -D__KERNEL__ -DMODULE -I$(KERNEL)/include
CFLAGS+= -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing

all: thread_mod.o

# get configuration of kernel
include $(KERNEL)/.config
# modify CFLAGS with architecture specific flags
include $(KERNEL)/arch/${ARCH}/Makefile

# enable the module versions, if configured in kernel source tree
ifdef CONFIG_MODVERSIONS
CFLAGS+= -DMODVERSIONS -include $(KERNEL)/include/linux/modversions.h
endif
# enable SMP, if configured in kernel source tree
ifdef CONFIG_SMP
CFLAGS+= -D__SMP__
endif

# note: we are compiling the driver object file and then linking
# we link it into the module. With just one object file as in
# this example this is not needed. We can just load the object
# file produced by gcc
# link the thread driver module
thread_mod.o: thread_drv.o kthread.o
ld -r -o thread_mod.o thread_drv.o kthread.o
# compile the kthread object file
kthread.o: kthread.c kthread.h
gcc $(CFLAGS) -c kthread.c
# compile the thread driver
thread_drv.o: thread_drv.c kthread.h
gcc $(CFLAGS) -c thread_drv.c

clean:
rm -f *.o

Bugs
The code assumes that keventd is running with PID 2.
Comments, Corrections
Please send comments, corrections etc. to the address below.

from:
http://www.linuxforum.net/forum/showflat.php?Cat=&Board=linuxK&Number=282973&page=15&view=collapsed&sb=5&o=all

chatler 2011-03-22 21:08 发表评论

The Linux Kernel Module Programming Guide

chatler — Mon, 29 Nov 2010 04:03:00 GMT

摘要: The Linux Kernel Module Programming Guide Peter Jay SalzmanMichael BurianOri Pomerantz Copyright © 2001 Peter Jay Salzman 2007-05-18 ver 2.6.4 The Linux Kernel Module Programming Guide is a... 阅读全文

chatler 2010-11-29 12:03 发表评论

A Beast of a Different Nature

chatler — Sat, 22 May 2010 13:09:00 GMT

linux kernel development-chapter 2 getting started with the kernel

A Beast of a Different Nature

The kernel has several differences compared to normal user-space applications that, although not making it necessarily harder to program than user-space, certainly provide unique challenges to kernel development.

These differences make the kernel a beast of a different nature. Some of the usual rules are bent; other rules are entirely new. Although some of the differences are obvious (we all know the kernel can do anything it wants), others are not so obvious. The most important of these differences are

The kernel does not have access to the C library.
The kernel is coded in GNU C.
The kernel lacks memory protection like user-space.
The kernel cannot easily use floating point.
The kernel has a small fixed-size stack.
Because the kernel has asynchronous interrupts, is preemptive, and supports SMP, synchronization and concurrency are major concerns within the kernel.
Portability is important.

Let's briefly look at each of these issues because all kernel development must keep them in mind.

No libc

Unlike a user-space application, the kernel is not linked against the standard C library (or any other library, for that matter). There are multiple reasons for this, including some chicken-and-the-egg situations, but the primary reason is speed and size. The full C libraryor even a decent subset of itis too large and too inefficient for the kernel.

Do not fret: Many of the usual libc functions have been implemented inside the kernel. For example, the common string manipulation functions are in lib/string.c. Just include and have at them.

Header Files

When I talk about header files hereor elsewhere in this bookI am referring to the kernel header files that are part of the kernel source tree. Kernel source files cannot include outside headers, just as they cannot use outside libraries.

Of the missing functions, the most familiar is printf(). The kernel does not have access to printf(), but it does have access to printk(). The printk() function copies the formatted string into the kernel log buffer, which is normally read by the syslog program. Usage is similar to printf():

printk("Hello world! A string: %s and an integer: %d\n", a_string, an_integer);

One notable difference between printf() and printk() is that printk() allows you to specify a priority flag. This flag is used by syslogd(8) to decide where to display kernel messages. Here is an example of these priorities:

printk(KERN_ERR "this is an error!\n");

We will use printk() tHRoughout this book. Later chapters have more information on printk().

GNU C

Like any self-respecting Unix kernel, the Linux kernel is programmed in C. Perhaps surprisingly, the kernel is not programmed in strict ANSI C. Instead, where applicable, the kernel developers make use of various language extensions available in gcc (the GNU Compiler Collection, which contains the C compiler used to compile the kernel and most everything else written in C on a Linux system).

The kernel developers use both ISO C99^[1] and GNU C extensions to the C language. These changes wed the Linux kernel to gcc, although recently other compilers, such as the Intel C compiler, have sufficiently supported enough gcc features that they too can compile the Linux kernel. The ISO C99 extensions that the kernel uses are nothing special and, because C99 is an official revision of the C language, are slowly cropping up in a lot of other code. The more interesting, and perhaps unfamiliar, deviations from standard ANSI C are those provided by GNU C. Let's look at some of the more interesting extensions that may show up in kernel code.

^[1] ISO C99 is the latest major revision to the ISO C standard. C99 adds numerous enhancements to the previous major revision, ISO C90, including named structure initializers and a complex type. The latter of which you cannot use safely from within the kernel.

Inline Functions

GNU C supports inline functions. An inline function is, as its name suggests, inserted inline into each function call site. This eliminates the overhead of function invocation and return (register saving and restore), and allows for potentially more optimization because the compiler can optimize the caller and the called function together. As a downside (nothing in life is free), code size increases because the contents of the function are copied to all the callers, which increases memory consumption and instruction cache footprint. Kernel developers use inline functions for small time-critical functions. Making large functions inline, especially those that are used more than once or are not time critical, is frowned upon by the kernel developers.

An inline function is declared when the keywords static and inline are used as part of the function definition. For example:

static inline void dog(unsigned long tail_size)

The function declaration must precede any usage, or else the compiler cannot make the function inline. Common practice is to place inline functions in header files. Because they are marked static, an exported function is not created. If an inline function is used by only one file, it can instead be placed toward the top of just that file.

In the kernel, using inline functions is preferred over complicated macros for reasons of type safety.

Inline Assembly

The gcc C compiler enables the embedding of assembly instructions in otherwise normal C functions. This feature, of course, is used in only those parts of the kernel that are unique to a given system architecture.

The asm() compiler directive is used to inline assembly code.

The Linux kernel is programmed in a mixture of C and assembly, with assembly relegated to low-level architecture and fast path code. The vast majority of kernel code is programmed in straight C.

Branch Annotation

The gcc C compiler has a built-in directive that optimizes conditional branches as either very likely taken or very unlikely taken. The compiler uses the directive to appropriately optimize the branch. The kernel wraps the directive in very easy-to-use macros, likely() and unlikely().

For example, consider an if statement such as the following:

if (foo) {
        /* ... */
}

To mark this branch as very unlikely taken (that is, likely not taken):

/* we predict foo is nearly always zero ... */
if (unlikely(foo)) {
        /* ... */
}

Conversely, to mark a branch as very likely taken:

/* we predict foo is nearly always nonzero ... */
if (likely(foo)) {
        /* ... */
}

You should only use these directives when the branch direction is overwhelmingly a known priori or when you want to optimize a specific case at the cost of the other case. This is an important point: These directives result in a performance boost when the branch is correctly predicted, but a performance loss when the branch is mispredicted. A very common usage for unlikely() and likely() is error conditions. As one might expect, unlikely() finds much more use in the kernel because if statements tend to indicate a special case.

No Memory Protection

When a user-space application attempts an illegal memory access, the kernel can trap the error, send SIGSEGV, and kill the process. If the kernel attempts an illegal memory access, however, the results are less controlled. (After all, who is going to look after the kernel?) Memory violations in the kernel result in an oops, which is a major kernel error. It should go without saying that you must not illegally access memory, such as dereferencing a NULL pointerbut within the kernel, the stakes are much higher!

Additionally, kernel memory is not pageable. Therefore, every byte of memory you consume is one less byte of available physical memory. Keep that in mind next time you have to add one more feature to the kernel!

No (Easy) Use of Floating Point

When a user-space process uses floating-point instructions, the kernel manages the transition from integer to floating point mode. What the kernel has to do when using floating-point instructions varies by architecture, but the kernel normally catches a trap and does something in response.

Unlike user-space, the kernel does not have the luxury of seamless support for floating point because it cannot trap itself. Using floating point inside the kernel requires manually saving and restoring the floating point registers, among possible other chores. The short answer is: Don't do it; no floating point in the kernel.

Small, Fixed-Size Stack

User-space can get away with statically allocating tons of variables on the stack, including huge structures and many-element arrays. This behavior is legal because user-space has a large stack that can grow in size dynamically (developers of older, less intelligent operating systemssay, DOSmight recall a time when even user-space had a fixed-sized stack).

The kernel stack is neither large nor dynamic; it is small and fixed in size. The exact size of the kernel's stack varies by architecture. On x86, the stack size is configurable at compile-time and can be either 4 or 8KB. Historically, the kernel stack is two pages, which generally implies that it is 8KB on 32-bit architectures and 16KB on 64-bit architecturesthis size is fixed and absolute. Each process receives its own stack.

The kernel stack is discussed in much greater detail in later chapters.

Synchronization and Concurrency

The kernel is susceptible to race conditions. Unlike a single-threaded user-space application, a number of properties of the kernel allow for concurrent access of shared resources and thus require synchronization to prevent races. Specifically,

Linux is a preemptive multi-tasking operating system. Processes are scheduled and rescheduled at the whim of the kernel's process scheduler. The kernel must synchronize between these tasks.
The Linux kernel supports multiprocessing. Therefore, without proper protection, kernel code executing on two or more processors can access the same resource.
Interrupts occur asynchronously with respect to the currently executing code. Therefore, without proper protection, an interrupt can occur in the midst of accessing a shared resource and the interrupt handler can then access the same resource.
The Linux kernel is preemptive. Therefore, without protection, kernel code can be preempted in favor of different code that then accesses the same resource.

Typical solutions to race conditions include spinlocks and semaphores.

Later chapters provide a thorough discussion of synchronization and concurrency.

Portability Is Important

Although user-space applications do not have to aim for portability, Linux is a portable operating system and should remain one. This means that architecture-independent C code must correctly compile and run on a wide range of systems, and that architecture-dependent code must be properly segregated in system-specific directories in the kernel source tree.

A handful of rulessuch as remain endian neutral, be 64-bit clean, do not assume the word or page size, and so ongo a long way. Portability is discussed in extreme depth in a later chapter.

chatler 2010-05-22 21:09 发表评论

HOWTO compile kernel modules for the kernel 2.6

chatler — Wed, 14 Apr 2010 15:00:00 GMT

If you want to compile the sum-module (source mirrored below), follow these steps: 
Create the Makefile in your directory with the sum-module.c
 obj-m    := sum-module.o
KDIR    := /lib/modules/$(shell uname -r)/build
PWD    := $(shell pwd)
default:
       $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
Now do a
 make
... and the sum-module.ko is built.
 If you get something like this 
# make
make: Nothing to be done for `default'.
you need to install the kernel source and compile the kernel first (run "make" at least to the point until
 all "HOSTCC scripts/" stuff is done - this will configure your kernel and allows external module compilation). 
Make sure /lib/modules/$(shell uname -r)/build points to your build directory (most likely /usr/src/linux...).
Another reason for the above error can be, that your browser converted the TAB before $(MAKE) to spaces. 

Make sure there is a TAB before $(MAKE). 

Install it with install.sh: 
#!/bin/sh
install -m 644 sum-module.ko /lib/modules/`uname -r`/kernel/drivers/sum-module.ko
/sbin/depmod -a
(adjust the /lib/modules path according to your needs) 
 Now make a 
# modprobe sum-module 

Or if you don't want to install the module, do this: 
# insmod ./sum-module.ko
..and if your system doesn't freeze you've done it right ;-) 
 
For kernel 2.4, the Makefile would look like this: 
TARGET       := modulename
INCLUDE    := -I/lib/modules/`uname -r`/build/include
CFLAGS      := -O2 -Wall -DMODULE -D__KERNEL__ -DLINUX
CC  := gcc
${TARGET}.o: ${TARGET}.c
       $(CC) $(CFLAGS) ${INCLUDE} -c ${TARGET}.c
 (not yet tested) 
sum-module source from: http://www.win.tue.nl/~aeb/linux/lk/lk-9.html
/*
 * sum-module.c 
# modprobe sum-module.o
# ls -l /proc/arith
total 0
dr-xr-xr-x    2 root     root            0 Sep 30 12:40 .
dr-xr-xr-x   89 root     root            0 Sep 30 12:39 ..
-r--r--r--    1 root     root            0 Sep 30 12:40 sum
# cat /proc/arith/sum
0
# echo 7 > /proc/arith/sum
# echo 5 > /proc/arith/sum
# echo 13 > /proc/arith/sum
# cat /proc/arith/sum
25
# rmmod sum-module
# ls -l /proc/arith
ls: /proc/arith: No such file or directory
# 
*/
#include 
#include 
#include 
#include 

static unsigned long long sum;
static int show_sum(char *buffer, char **start, off_t offset, int length) {
        int size;
        size = sprintf(buffer, "%lld\n", sum);
        *start = buffer + offset;
        size -= offset;
        return (size > length) ? length : (size > 0) ? size : 0;
}
/* Expect decimal number of at most 9 digits followed by '\n' */
static int add_to_sum(struct file *file, const char *buffer,
                      unsigned long count, void *data) 

{
        unsigned long val = 0;
        char buf[10];
        char *endp;



        if (count > sizeof(buf))
                return -EINVAL;
        if (copy_from_user(buf, buffer, count))
                return -EFAULT;
        val = simple_strtoul(buf, &endp, 10);
        if (*endp != '\n')
                return -EINVAL;


        sum += val;     /* mod 2^64 */
        return count;
}
 
static int __init sum_init(void) {
        struct proc_dir_entry *proc_arith;
        struct proc_dir_entry *proc_arith_sum;
        proc_arith = proc_mkdir("arith", 0);
        if (!proc_arith) {
                printk (KERN_ERR "cannot create /proc/arith\n");
                return -ENOMEM;
        }
        proc_arith_sum = create_proc_info_entry("arith/sum", 0, 0, show_sum);
        if (!proc_arith_sum) {
                printk (KERN_ERR "cannot create /proc/arith/sum\n");
                remove_proc_entry("arith", 0);
                return -ENOMEM;
        }
        proc_arith_sum->write_proc = add_to_sum;
        return 0;
}
 
static void __exit sum_exit(void) {
        remove_proc_entry("arith/sum", 0);
        remove_proc_entry("arith", 0);
}
module_init(sum_init);
module_exit(sum_exit);
MODULE_LICENSE("GPL");
 
 from�Q?/font>

http://www.captain.at/programming/kernel-2.6/

http://blog.ednchina.com/fafen/267973/message.aspx

chatler 2010-04-14 23:00 发表评论

chatler — Thu, 01 Apr 2010 11:59:00 GMT

本文��的Copyleft归yfydz所有，使用GPL发布�Q�可以自由拷贝，转蝲�Q��{载时请保持文��的完整性，严禁用于��M��商业用途�?br>msn: yfydz_no1@hotmail.com
来源�Q?a >http://yfydz.cublog.cn

1. 前言

本文介绍linux内核中一些常用的数据�l�构和操作�?/div>

2. 双向链表(list)

linux内核中的双向链表通过�l�构 struct list_head来将各个节点�q�接��h��Q�此�l�构会作为链表元素结构中的一个参敎ͼ�

struct list_head {
struct list_head *next, *prev;
};

链表头的初始化，注意�Q�结构中的指针�ؓNULL�q�不是初始化�Q�而是指向自��n才是初始化，如果只是按普通情况下的置为NULL�Q�而不是指向自�w�，�pȝ��会崩溃，�q�是一个容易犯的错误：

#define LIST_HEAD_INIT(name) { &(name), &(name) }

#define LIST_HEAD(name) \
struct list_head name = LIST_HEAD_INIT(name)

#define INIT_LIST_HEAD(ptr) do { \
(ptr)->next = (ptr); (ptr)->prev = (ptr); \
} while (0)

最常用的链表操作：

插入到链表头:
void list_add(struct list_head *new, struct list_head *head);

插入到链表尾:
void list_add_tail(struct list_head *new, struct list_head *head);

删除链表节点:
void list_del(struct list_head *entry);

��节点移动到另一链表:
void list_move(struct list_head *list, struct list_head *head);

��节点移动到链表��?
void list_move_tail(struct list_head *list,struct list_head *head);

判断链表是否为空�Q�返�?为空�Q?非空
int list_empty(struct list_head *head);

把两个链表拼接�v来：
void list_splice(struct list_head *list, struct list_head *head)�Q?/div>

取得节点指针�Q?br>#define list_entry(ptr, type, member) \
((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

遍历链表中每个节点：
#define list_for_each(pos, head) \
for (pos = (head)->next, prefetch(pos->next); pos != (head); \
pos = pos->next, prefetch(pos->next))

逆向循环链表中每个节点：
#define list_for_each_prev(pos, head) \
for (pos = (head)->prev, prefetch(pos->prev); pos != (head); \
pos = pos->prev, prefetch(pos->prev))

举例�Q?/div>

LISH_HEAD(mylist);

struct my_list{
struct list_head list;
int data;
};

static int ini_list(void)
{
struct my_list *p;
int i;
for(i=0; i<100; i++){
p=kmalloc(sizeof(struct my_list), GFP_KERNEL);
list_add(&p->list, &mylist);
}
}

在内存中形成如下�l�构的一个双向链表：

+---------------------------------------------------------------+
|                                                               |
| mylist         99            98                     0        |
| +----+    +---------+    +---------+           +---------+   |
+->|next|--->|list.next|--->|list.next|--->...--->|list.next|---+
     |----|    |---------|    |---------|           |---------|
+--|prev|<---|list.prev|<---|list.prev|<---...<---|list.prev|<--+
| +----+    |---------|    |---------|           |---------|   |
|            | data   |    | data   |           | data   |   |
|            +---------+    +---------+           +---------+   |
|                                                               |
+---------------------------------------------------------------+

知道了链表头��p��遍历整个链表�Q�如果是用list_add()插入新节点的话，从链表头的next方向看是一个堆栈型�?/div>

从链表中删除节点很容易：

static void del_item(struct my_list *p)
{
list_del(&p->list, &mylist);
kfree(p);
}

最重要的宏是list_entry�Q�这个宏的思�\是根据链表元素结构中链表头结构list_head的地址推算出链表元素结构的实际地址�Q?/div>

#define list_entry(ptr, type, member) \
((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

ptr是链表元素结�?如struct my_list)中链表头�l�构list_head的地址
member是链表元素结�?如struct my_list)中链表头�l�构list_head参数的名�U?br>type是链表元素结构类�?如struct my_list)

计算原理是根据链表头�l�构list_head的地址减去其在链表元素�l�构中的偏移位置而得到链表元素结构的地址�?/div>

例如�Q?/div>

static void print_list(void)
{
struct list_head *cur;
struct my_list *p;

list_for_each(cur, &mylist){
p=list_entry(cur, struct my_list, list);
printk("data=%d\n", p->data);
}
}

优点�Q?br>

�q�样��可以用相同的数据处理方式来描述所有双向链表，不用再单独�ؓ各个链表�~�写各种�~�辑函数�?/div>

�~�点�Q?br>1) 链表头中元素�|��ؓNULL不是初始化，与普通习惯不同；
2) 仍然需要单独编写各自的删除整个链表的函敎ͼ�不能�l�一处理�Q�因��Z��能保证所有链表元素结构中链表头结构list_head的偏�U�d��址都是相同的，当然如果把链表头�l�构list_head都作为链表元素结构的�W�一个参敎ͼ��可以用�l�一的删除整个链表的函数�?/div>

3. HASH�?/div>

HASH表适用于不需要对整个�I�间元素�q�行排序�Q�而是只需要能快速找到某个元素的场合�Q�是一�U�以�I�间换时间的�Ҏ��Q�本质也是线性表�Q�但�׃��个大的线性表拆分��Z��多个��线性表�Q�由于只需要查扑ְ�表，因此搜烦速度��׃��U�性查整个大表提高很多�Q�理��x��况下�Q�有多少个小�U�性表�Q�搜索速度��提高了多少倍，通常把小�U�性表的表头综合�ؓ一个数�l�，大小��是HASH表的数量�?/div>

HASH表速度的关键是HASH函数的设计，HASH函数�Ҏ��每个元素中固定的参数�q�行计算�Q�算��Z��个不大于HASH表数量的索引��|��表示该元素需要放在该索引号对应的那个表中�Q�对于固定的参数�Q�计��结果始�l�是固定的，但对于不同的参数��|��希望计算出来的结果能��可能地�q�_��到每个烦引��|��HASH函数计算得越�q�_��Q�表�C�每个小表中元素的数量都会差不多�Q�这��h��索性能��越好�?span style="color: red;">HASH函数也要��可能的��单，以减��计��时��_��常用的算法是��参数篏加求�?/span>�Q�在include/linux/jhash.h中已�l�定义了一些HASH计算函数�Q�可直接使用�?/div>

HASH表在路由cache表，状态连接表�{�处用得很多�?/div>

举例�Q�连接跟�t�中�Ҏ��tuple��D��HASH�Q?/div>

// net/ipv4/netfilter/ip_conntrack_core.c

u_int32_t
hash_conntrack(const struct ip_conntrack_tuple *tuple)
{
#if 0
dump_tuple(tuple);
#endif
return (jhash_3words(tuple->src.ip,
                      (tuple->dst.ip ^ tuple->dst.protonum),
                      (tuple->src.u.all | (tuple->dst.u.all << 16)),
                      ip_conntrack_hash_rnd) % ip_conntrack_htable_size);
}

// include/linux/jhash.h
static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
{
a += JHASH_GOLDEN_RATIO;
b += JHASH_GOLDEN_RATIO;
c += initval;

__jhash_mix(a, b, c);

return c;
}

4. 定时�?timer)

linux内核定时器由以下�l�构描述�Q?/div>

/* include/linux/timer.h */
struct timer_list {
struct list_head list;
unsigned long expires;
unsigned long data;
void (*function)(unsigned long);
};

list�Q�timer链表
expires�Q�到期时�?br>function�Q�到期函敎ͼ�旉��到期时调用的函数
data�Q�传�l�到期函数的数据�Q�实际应用中通常是一个指针�{化而来�Q�该指针指向一个结�?/div>

timer的操作：

增加timer�Q�将timer挂接到系�l�的timer链表�Q?br>extern void add_timer(struct timer_list * timer);

删除timer�Q�将timer从系�l�timer链表中拆除：
extern int del_timer(struct timer_list * timer);
(del_timer()函数可能会失败，�q�是因�ؓ该timer本来已经不在�pȝ��timer链表中了�Q�也��是已经删除�q�了)

对于SMP�pȝ��Q�删除timer最好��用下面的函数来防止冲�H�：
extern int del_timer_sync(struct timer_list * timer);

修改timer�Q�修改timer的到期时��_��
int mod_timer(struct timer_list *timer, unsigned long expires);

通常用法�Q?br> struct timer_list通常作�ؓ数据�l�构中的一个参敎ͼ�在初始化�l�构的时候初始化timer�Q�表�C�到期时要进行的操作�Q�实现定时动作，通常更多的是作�ؓ��时处理的，timer函数作�ؓ��时时的资源释放函数。注意：如果��时了运行超时函敎ͼ�此时�pȝ��是处在时钟中断的bottom half里的�Q�不能进行很复杂的操作，如果要完成一些复杂操作，如到期后的数据发送，不能直接在到期函��C��处理�Q�而是应该在到期函��C��发个信号�l�特定内核线�E��{到top half�q�行处理�?/span>

为判断时间的先后�Q�内�怸�定义了以下宏来判断：

#define time_after(a,b) ((long)(b) - (long)(a) < 0)
#define time_before(a,b) time_after(b,a)

#define time_after_eq(a,b) ((long)(a) - (long)(b) >= 0)
#define time_before_eq(a,b) time_after_eq(b,a)

�q�里用到了一个技巧，�׃��linux中的旉��是无�W�号敎ͼ��q�里先将其�{换�ؓ有符��h��后再判断�Q�就能解��x��间回�l�问题，当然只是一�ơ回�l�，回绕两次当然是判断不出来的，具体可自己实验体会�?/span>

5. 内核�U�程(kernel_thread)

内核中新�U�程的徏立可以用kernel_thread函数实现�Q�该函数在kernel/fork.c中定义：

long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)

fn�Q�内核线�E�主函数�Q?br>arg�Q�线�E�主函数的参敎ͼ�
flags�Q�徏立线�E�的标志�Q?/div>

内核�U�程函数通常都调用daemonize()�q�行后台化作��Z��个独立的�U�程�q�行�Q�然后设�|�线�E�的一些参敎ͼ�如名�U�ͼ�信号处理�{�，�q�也不是必须的，然后��p��入一个死循环�Q�这是线�E�的��M��部分�Q�这个��@环不能一直在�q�行�Q�否则系�l�就��d��q�了�Q�或者是某种事�g驱动的，在事件到来前是睡眠的�Q�事件到来后唤醒�q�行操作�Q�操作完后��l�睡眠；或者是定时睡眠�Q�醒后操作完再睡眠；或者加入等待队列通过schedule()调度获得执行旉��。��M��是不能一直占着 CPU�?/div>

以下是内核线�E�的一个实例，取自kernel/context.c:

int start_context_thread(void)
{
static struct completion startup __initdata = COMPLETION_INITIALIZER(startup);

kernel_thread(context_thread, &startup, CLONE_FS | CLONE_FILES);
wait_for_completion(&startup);
return 0;
}

static int context_thread(void *startup)
{
struct task_struct *curtask = current;
DECLARE_WAITQUEUE(wait, curtask);
struct k_sigaction sa;

daemonize();
strcpy(curtask->comm, "keventd");
keventd_running = 1;
keventd_task = curtask;

spin_lock_irq(&curtask->sigmask_lock);
siginitsetinv(&curtask->blocked, sigmask(SIGCHLD));
recalc_sigpending(curtask);
spin_unlock_irq(&curtask->sigmask_lock);

complete((struct completion *)startup);

/* Install a handler so SIGCLD is delivered */
sa.sa.sa_handler = SIG_IGN;
sa.sa.sa_flags = 0;
siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);

/*
* If one of the functions on a task queue re-adds itself
* to the task queue we call schedule() in state TASK_RUNNING
*/
for (;;) {
  set_task_state(curtask, TASK_INTERRUPTIBLE);
  add_wait_queue(&context_task_wq, &wait);
  if (TQ_ACTIVE(tq_context))
   set_task_state(curtask, TASK_RUNNING);
  schedule();
  remove_wait_queue(&context_task_wq, &wait);
  run_task_queue(&tq_context);
  wake_up(&context_task_done);
  if (signal_pending(curtask)) {
   while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG) > 0)
    ;
   spin_lock_irq(&curtask->sigmask_lock);
   flush_signals(curtask);
   recalc_sigpending(curtask);
   spin_unlock_irq(&curtask->sigmask_lock);
  }
}
}

6. �l�构地址

在C中，�l�构地址和结构中�W�一个元素的地址是相同的�Q�因此在linux内核中经常出��C��用结构第一个元素的地址来表�C�结构地址的情况，在读代码时要注意�q�一点，�q�和list_entry宏的意思一栗��?/div>

如：
struct my_struct{
int a;
int b;
}c;

if(&c == &c.a){ // always true
...
}

from:

http://blog.chinaunix.net/u/12313/showart_109612.html

chatler 2010-04-01 19:59 发表评论

如何在Linux内核中写文�g

chatler — Sat, 27 Feb 2010 03:02:00 GMT

#include
#include
#include
#include
#include
#include
#include
#include
#include

#define MY_FILE "/root/LogFile"

char buf[128];
struct file *file = NULL;

static int __init init(void)
{
mm_segment_t old_fs;
printk("Hello, I'm the module that intends to write messages to file.\n");

        if(file == NULL)
                file = filp_open(MY_FILE, O_RDWR | O_APPEND | O_CREAT, 0644);
        if (IS_ERR(file)) {
                printk("error occured while opening file %s, exiting...\n", MY_FILE);
                return 0;
        }

sprintf(buf,"%s", "The Messages.");

        old_fs = get_fs();
        set_fs(KERNEL_DS);
        file->f_op->write(file, (char *)buf, sizeof(buf), &file->f_pos);
        set_fs(old_fs);

return 0;
}

static void __exit fini(void)
{
if(file != NULL)
filp_close(file, NULL);
}

module_init(init);
module_exit(fini);
MODULE_LICENSE("GPL");

from�Q?br>http://blog.csdn.net/coofive/archive/2006/05/07/712028.aspx

chatler 2010-02-27 11:02 发表评论

What is the difference between user level threads and kernel level threads?

chatler — Sat, 27 Feb 2010 02:00:00 GMT

A kernel thread, sometimes called a LWP (Lightweight Process) is created and scheduled by the kernel. Kernel threads are often more expensive to create than user threads and the system calls to directly create kernel threads are very platform specific.

A user thread is normally created by a threading library and scheduling is managed by the threading library itself (Which runs in user mode). All user threads belong to process that created them. The advantage of user threads is that they are portable.

The major difference can be seen when using multiprocessor systems, user threads completely managed by the threading library can't be ran in parallel on the different CPUs, although this means they will run fine on uniprocessor systems. Since kernel threads use the kernel scheduler, different kernel threads can run on different CPUs.

Many systems implement threading differently,

A many-to-one threading model maps many user processes directly to one kernel thread, the kernel thread can be thought of as the main process.

A one-to-one threading model maps each user thread directly to one kernel thread, this model allows parallel processing on the multiprocessor systems. Each kernel thread can be thought of as a VP (Virtual Process) which is managed by the scheduler.

from:
http://blog.csdn.net/jicheng687/archive/2009/09/08/4527676.aspx

chatler 2010-02-27 10:00 发表评论

Linux 内核�W�记2 �?�q�程调度

chatler — Mon, 15 Feb 2010 07:30:00 GMT

Linux 内核�W�记 – �q�程调度
1 前言
2 调度��法
2.1.1 常用概念
2.1.2 �q�程数据�l�构中相兛_��?
2.1.3 调度��法说明
2.1.4 相关函数
3 调度�E�序的执�?
3.1.1 直接调用
3.1.2 延迟调用
3.1.3 相关函数
4 �q�程调度�C�意�?
5 SMP�pȝ��的调�?
6 问题与答�?
7 参考文献：

1 前言
本文的许可协议遵循GNU Free Document License。协议的具体内容请参见http://www.gnu.org/copyleft/fdl.html。在遵��@GNU Free Document License的基��上，可以自由��C��播或发行本文�Q�但请保留本文的完整性�?
�Ƣ迎大家对这��文章提出意见和指正�Q�我的email是：shisheng_liu@yahoo.com.cn�?

2 调度��法
2.1.1 常用概念
2.1.1.1 定时中断
通过��g的可�~�程中断控制�?254来实玎ͼ�定时中断发生的频率由HZ定义�Q�发生的旉��间隔被称为tick�?
2.1.1.2 CPU节拍�Q�tick�Q?
计算机内部时间的一个计数单位，表示发生一�ơ时钟中断的旉��间隔�?
2.1.1.3 HZ
旉��中断发生的频率。在i386机器上，HZ被定义�ؓ100�Q�因此时钟中断每10ms一�ơ�?
2.1.1.4 CPU时期
调度是以CPU时期为周期的�Q�在一个CPU时期/调度周期内，�pȝ��中所有程序都被执行直到用完当前的旉��片，然后所有进�E�的counter��D��重新计算�Q��ƈ开始另一个CPU时期�?
2.1.1.5 �q�程的分�c?
在调度程序看来，�pȝ��中的�q�程分�ؓ两大�c�，分别是实时进�E�和普通进�E�。在��M��时候实时进�E�的执行都高于普通进�E�。进�E�数据结构中的policy�? 员变量表�C�Z��q�程是哪一�c�，而sched_set(/get)scheduler提供了控制进�E�调度policy的用��L�~�程接口�?
2.1.2 �q�程数据�l�构中相兛_��?
2.1.2.1 1�Q�nice
�q�程的优先��Q�媄响进�E�获得CPU事�g的多��，20为最低，-19为最高�?
2.1.2.2 2�Q�counter
�q�程旉��片所剩余的CPU节拍数。初始值根据进�E�的nice值决定，在每�ơ时钟中断发生时�Q�也��是一个CPU节拍�Q�tick�Q�的时候，当前�q�程的counter值减1�Q�如果counter值变�?则表�C�当前进�E�的旉��片已�l�用完，�pȝ��会重新调度，军_��下一个执行的�q�程�?
2.1.2.3 3�Q�need_resched
标志位。该位在从中断和�pȝ��调用中返回的时候被��查，need_resched�?的时候表�C��求启动调度程序，�q�通常发生在进�E�的旉��片已�l�用完，或者因为IO事�g发生而强行抢占当前进�E�的时候�?
2.1.2.4 4�Q�policy
�q�程的调度策略。如果调度策略�ؓSCHED_RR或者SCHED_FIFO则表�C�当前进�E��ؓ实时�q�程�Q�否�?SCHED_OTHER)为普通进�E��?
2.1.3 调度��法说明
Linux采用相当��单但实际证明效果不错的调度算法。在调度的时候，所有正在运行进�E�的执行权�?goodness)都被计算�Q�最�l�权值最高的 �q�程获得执行的机会。假讑־�到的最大权��gؓ0�Q�就认�ؓ本次CPU时期已经执行完毕�Q�会重新计算所有进�E�的counter��|��开始新的CPU时期。调度算�? 的核心就是goodness的计��，计算的基本思�\如下�Q?
如果�{�待调度的进�E�是实时�q�程�Q�它的goodness�?000 + 本��n的优先��,而普通进�E�的goodness�q�小�?000�Q�这��׃��证了实时�q�程��L��优先于普通进�E�执行�?
如果�q�程剩余的counter�?,��p��为它已经用光了自己在该时期的CPU旉��片，goodness�q�回0�?
对于其他的情况，用下面的公式来计��goodness:
goodness = counter + 20 – nice�Q?
2.1.4 相关函数
1�Q�schedule() in kernel/sched.c
主调度函敎ͼ�选择要运行的�q�程
2�Q�goodness() in kernel/sched.c
由schedule()调用�Q�计��进�E�的执行权�?
3 调度�E�序的执�?
可以通过两种方式来激�z�调度程序，分别是直接调用和延迟调用�?
3.1.1 直接调用
当current�q�程准备��d��攑ּ�CPU的时候，它会直接调用调度�E�序schedule()�Q�将CPU让给另一个进�E��?
促��current�q�程��d��攑ּ�CPU的原因有两种�Q�一�U�情冉|��current需要睡眠（��d��Q�来�{�待所需的资源准备好�Q�此时current的状态被讄��为TASK_INTERRUPTABLE或TASK_UNINTERRUPTABLE�Q�在调用schedule()后进�E�进入睡眠状态；另一�U�情况下�q�程讄��SCHED_YIELD的调度策略，然后调用schedule()�Q�此时进�E�只是短暂的攑ּ�CPU�Q�在下一�ơschedule()被调用的�? 候进�E�会�l�箋参与CPU的竞争�?
3.1.2 延迟调用
通过讄��当前�q�程的need_resched标志来在其后的某个时��L��z�调度程序。前面说�q�，在从中断/异常/�pȝ��调用中返�? �Ӟ��need_resched标志被检查，在标志不�?的时候会�Ȁ�z�调度程序。例如：当时钟中断发生时�Q�中断处理程序检查到当前�q�程的时间片已经执行�? 毕，它就会设�|�当前进�E�的need_resched标志�Q�另一个例子是当某个IO中断发生�Ӟ��中断处理�E�序发现有进�E�在�{�待该IO事�g�Q�它会将正在�{�待�? �q�程的状态变为执行态，�q�设�|�当前进�E�的need_resched标志。当中断处理�E�序一�l�束�Q�系�l�会重新调度�Q�在�q�种情况下，新�{入执行态的�q�程很可�? 会获得执行机会，从而�ɾpȝ��保持对IO事�g的快速响应�?
3.1.3 相关函数
1�Q�wake_up_common() in kernel/sched.c
�Ȁ�z�IO�{�待队列中的�q�程�Q�它会顺序调用try_to_wake_up()�Q�reschedule_idle()�{�函数来要求对进�E�进�E�重新调度�?
2�Q�do_timer() in kernel/timer.c
定时旉��中断�E�序�Q�减��当前进�E�的counter��|��如果counter已经用完�Q�则讄��q�程的need_resched域要求重新调度�?
3�Q�ret_from_intr/sys_call/exception in arch/i386/entry.S
汇编语言中的�E�序点，在从中断/异常/�pȝ��调用中返回时都会执行�q�一�D늨�序，��查当前进�E�的need_resched域，如果不�ؓ0��׃��Ȁ�z�schedule()重新调度�?

4 �q�程调度�C�意�?
linux的进�E�调度如�?所�C��?br>

6 问题与答�?
Q�Q�在当前�pȝ��下，调度旉��片的长度是多��？
A. �?.2.x版的内核相比�Q�kernel2.4.x的时间片长度�~�短了，对于最高优先��的进�E�来��_��旉��片的长度�?00ms�Q�默认优先��q�程的时间片长度�?0ms�Q�而最低优先��q�程的时间片长度�?0ms�?

Q. Linux如何保证对I/O事�g相对比较快的响应速度�Q�这个响应速度是否与调度时间片的长短有养I��
A�Q�当I/O事�g发生的是时候，对应的中断处理程序被�Ȁ�z�，当它发现有进�E�在�{�待�q�个I/O事�g的时候，它会�Ȁ�zȝ��待进�E�，�q�且讄��当前正在执行 �q�程的need_resched标志�Q�这样在中断处理�E�序�q�回的时候，调度�E�序被激�z�，原来在等待I/O事�g的进�E�（很可能）获得执行权，从而保证了对I /O事�g的相对快速响应（毫秒�U�）�?
从上面的说明可以看出�Q�在I/O事�g发生的时候，I/O事�g的处理进�E�会抢占当前�q�程�Q�响应速度与调度时间片的长度无兟�?

Q�Q�高优先�U?nice)�q�程和低优先�U�进�E�在执行上有何区别？例如一个优先��?19�Q�最高优先��Q�的�q�程和优先��?0�Q�最低）的进�E�有何区�?
A. �q�程获得的CPU旉��的绝�Ҏ��目取决于它的初始counter��|��初始的counter的计��公�?sched.c in kernel 2.4.14)如下�Q?
p->counter = (p->counter >> 1) + ((20 - p->nice) >> 2) +1)
由公式可以计��出�Q�对于标准进�E�（p->nice �?�Q�，得到的初始counter�?�Q�即�q�程获得的时间片�?0ms�?
最高优先��q�程�Q�nice�?19�Q�的初始counter��gؓ10�Q�进�E�的旉��片�ؓ100ms�?
最低优先��q�程�Q�nice�?0�Q�的初始counter��gؓ1,�q�程旉��片�ؓ10ms�?
�l�论是最高优先��q�程会获得最低优先��q�程10倍的执行旉��Q�普通优先��q�程接近两倍的执行旉��。当�Ӟ��q�是在进�E�不�q�行��M��IO操作的时候的数据�Q�在有IO操作的时候，�q�程会经常被�q�睡眠来�{�待IO操作的完�?真正所占用的CPU旉��是很难比较的�?
我们可以看到每次重新计算counter的时候，新的counter值都要加上它本��n剩余值的一半，�q�种奖励只适用于通过SCHED_YIELD ��d��攑ּ�CPU的进�E�，只有它在重新计算的时候counter值没有用完，所以在计算后counter��g��增大�Q�但永远不可能超�q?0�?

SMP�pȝ��中的调度

7 参考文献：
1�Q?linux内核源代码版�?.4.14
在�Q何时候真实的代码��L��提供�l�我们最准确和详�l�的资料。感谢Linus Torvalds�Q�Alan Cox和其它linux开发者的辛勤力_��?
2�Q�DANIEL P.BOVET & MARCO CESATI
<> ISBN: 0-596-00002-2 O’REILLY 2001
中译�?《深入理解Linux内核�?陈莉君等�?ISBN: 7-5083-0719-4 中国电力出版�C?2001�?
本书是专门介�l�linux内核�l�构的书中最详尽的一本，对代码分析讲解的也比较深入，��Z��2.2的内核版�?
3�Q�W.Richard Stevens
《UNIX环境高��~�程�?��晋元译 ISBN: 7-111-07579-X 机械工业出版�C?2000
UNIX�~�程圣经�Q�程序员手头必备的书�c�之一,�Ҏ��有UNIX开发�h员，无论水��^高低�Q�都有参考�h倹{��翻译的水准也难得一见的高明�?

from:
http://www.linuxforum.net/forum/gshowflat.php?Cat=&Board=linuxK&Number=294463&page=16&view=collapsed&sb=5&o=all&fpart=

chatler 2010-02-15 15:30 发表评论

linux块设备，字符讑֤�

chatler — Thu, 28 Jan 2010 07:00:00 GMT

字符讑֤��q�是块设备的定义属于操作�pȝ��的设备访问层�Q�与实际物理讑֤�的特性无必然联系�?/p>

讑֤�讉K��层下面是驱动�E�序�Q�所以只要驱动程序提供的方式�Q�都可以。也��是说驱动程序支持stream方式�Q�那么就可以用这�U�方式访问，驱动�E�序如果�q�支持block方式�Q�那么你想用哪种方式讉K��都可以，典型的比如硬盘式的裸讑֤��Q�两�U�都支持块设备（block device�Q�：是一�U�具有一定结构的随机存取讑֤��Q�对�q�种讑֤�的读写是按块�q�行的，他��用缓冲区来存放暂时的数据�Q�待条�g成熟后，从缓存一�ơ性写入设备或从设备中一�ơ性读出放入到�~�冲区，如磁盘和文�g�pȝ��{?/p>

字符讑֤��Q�Character device�Q�：�q�是一个顺序的数据��设备，对这�U�设备的��d��是按字符�q�行的，而且�q�些字符是连�l�地形成一个数据流。他不具备缓冲区�Q�所以对�q�种讑֤�的读写是实时的，如终端、磁带机�{��?br>�pȝ��中能够随机（不需要按��序�Q�访问固定大��数据片�Q�chunks�Q�的讑֤�被称作块讑֤��Q�这些数据片��q��作块。最常见的块讑֤�是硬盘，除此以外�Q�还有��Y盘驱动器、CD-ROM驱动器和闪存�{�等许多其他块设备。注意，它们都是以安装文件系�l�的方式使用的——这也是块设备一般的讉K��方式�?/p>

另一�U�基本的讑֤��c�d��是字�W�设备。字�W�设备按照字�W�流的方式被有序讉K��Q�像串口和键盘就都属于字�W�设备。如果一个硬件设备是以字�W�流的方式被讉K��的话�Q�那��应该将它归于字�W�设备；反过来，如果一个设备是随机�Q�无序的�Q�访问的�Q�那么它��属于块讑֤��?/p>

�q�两�U�类型的讑֤�的根本区别在于它们是否可以被随机讉K��——换句话说就是，能否在访问设备时随意��C��一个位�|�蟩转到另一个位�|�。�D个例子，键盘�q�种讑֤�提供的就是一个数据流�Q�当你敲�?#8220;fox”�q�个字符串时�Q�键盘驱动程序会按照和输入完全相同的��序�q�回�q�个�׃��个字�W�组成的数据��。如果让键盘驱动�E�序打�ؕ��序来读字符�Ԍ��或读取其他字�W�，都是没有意义的。所以键盘就是一�U�典型的字符讑֤��Q�它提供的就是用户从键盘输入的字�W�流。对键盘�q�行��L��作会得到一个字�W�流�Q�首先是“f”�Q�然后是“o”�Q�最后是“x”�Q�最�l�是文�g的结�?EOF)。当没�h敲键盘时�Q�字�W�流��是�I�的。硬盘设备的情况��׃��大一样了。硬盘设备的驱动可能要求��d��盘上�Q意块的内容，然后又�{去读取别的块的内容，而被��d��的块在磁盘上位置不一定要�q�箋�Q�所以说��盘可以被随��问，而不是以��的方式被访问，昄��它是一个块讑֤��?/p>

内核��理块设备要比管理字�W�设备细致得多，需要考虑的问题和完成的工作相比字�W�设备来说要复杂许多。这是因为字�W�设备仅仅需要控制一个位�|�—当前位�|�—而块讑֤�讉K��的位�|�必��能够在介质的不同区间前后移动。所以事实上内核不必提供一个专门的子系�l�来��理字符讑֤��Q�但是对块设备的��理却必��要有一个专门的提供服务的子�pȝ��。不仅仅是因为块讑֤�的复杂性远�q�高于字�W�设备，更重要的原因是块讑֤��Ҏ��行性能的要求很高；对硬盘每多一分利用都会对整个�pȝ��的性能带来提升�Q�其效果要远�q�比键盘吞吐速度成倍的提高大得多。另外，我们��会看到�Q�块讑֤�的复杂性会��U�优化留下很大的施展�I�间.

from:

http://os.51cto.com/art/200909/151133.htm

chatler 2010-01-28 15:00 发表评论

linux内核模块��理命��o

chatler — Thu, 10 Dec 2009 14:48:00 GMT

1. lsmod 列出已经加蝲的内核模�? lsmod 是列出目前系�l�中已加载的模块的名�U�及大小�{�；另外我们�q�可以查�?/proc/modules �Q�我们一样可以知道系�l�已�l�加载的模块�? 2.modinfo 查看模块信息 modinfo 可以查看模块的信息，通过查看模块信息来判定这个模块的用途�? 3.modprobe 挂蝲新模块以及新模块�怾�赖的模块 modprobe 我们常用的功能就是挂载模块，在挂载某个内核模块的同时�Q�这个模块所依赖的模块也被同时挂载；当然modprobe 也有列出内核所有模块，�q�有�U�除模块的功能；下在我们举个例子说一说咱们常用的功能和参敎ͼ� modprobe [-v] [-V] [-C config-file] [-n] [-i] [-q] [-o ] [parameters...] modprobe -r [-n] [-i] [-v] ... modprobe -l -t [ -a ...] 我们可以看到�?etc/modprobe.conf文�g中存在的内容形式如下�Q? alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptspi 最后一列是模块名字�Q�中间的是模块的别名。那么如果我们知道了一个模块的名字�Q�怎么知道它的别名呢？用下面的命��o��可以： #modprobe -c 可以查看所有模块的别名 #modprobe -c 模块�?|grep 模块�? modprobe -l 是列出内�怸�所有的模块�Q�包括已挂蝲和未挂蝲的；通过modprobe -l �Q�我们能查看到我们所需要的模块�Q�然后根据我们的需要来挂蝲�Q�其实modprobe -l ��d��的模块列表就位于 /lib/modules/'uname -r' 目录中；其中uname -r 是内核的版本�Q? 注意�Q?模块名是不能带有后缀的，我们通过modprobe -l 所看到的模块，都是带有.ko �?o后缀�Q? 4.rmmod �U�除已挂载模�? 5.depmod 创徏模块依赖关系的列�? �q�个模块��理工具是创建模块依赖关�pȝ��列表�Q�有几个参数我们注意一下就行了�Q�目前的的Linux 发行版所用的内核�?.6x版本�Q�是自动解决依赖关系�Q�所以这个命令知道就行了�Q�模块之前也有依赖关�p�，比如我们想驱动USB �U�d��盘�Q�目前有两种驱动�Q�一�U�是udev �Q�在内核中有�Q�但目前不太�E�_��Q�另一�U�办法是用usb-storage驱动�Q�而usb-storage 依赖的模块是scsi 模块�Q�所以我们要用usb-storage 的模块，也得把scsi �~�译安装�Q? 再�D个例子：sata的硬盘，在Linux中的讑֤�表示的是/dev/sd* �Q�比�?/dev/sda�Q?dev/sdb �{?.. �pȝ��要驱�?sata��盘�Q�则需要把sata在内�怸�选中�Q�或�~�译成模块，或内�|�于内核之中�Q�在此同�Ӟ��q�需要在内核中选中ide �Q�scsi 的支持等�Q? depmod 工具的洋文原意：depmod �?program to generate modules.dep and map files.�Q�我译的�Q��ؓmodules.dep 文�g或映��文件创��Z��赖关�p�） [root@localhost beinan]# depmod -a 注：为所有列�?etc/modprobe.conf �?etc/modules.conf 中的所有模块创��Z��赖关�p�，�q�且写入到modules.dep文�g�Q? [root@localhost beinan]# depmod -e 注：列出已挂载但不可用的模块�Q? [root@localhost beinan]# depmod -n 注：列出所有模块的依赖关系�Q�但仅仅是输出出�?�Q�Write the dependency file on stdout only�Q? 注：modules.dep 位于 /lib/modules/内核版本目录 6.insmod 挂蝲模块 insmod �q�个工具�Q�和modprobe 有点�c�M��Q�但功能上没有modprobe 强，modprobe 在挂载模块是不用指定模块文�g的�\径，也不用带文�g的后�~�.o �?ko �Q�而insmod 需要的是模块的所在目录的�l�对路径�Q��ƈ且一定要带有模块文�g名后�~��?modulefile.o 或modulesfile.ko �Q? 7.与内核模块加载相关的配置文�g 模块的配�|�文�?modules.conf �?modprobe.conf 内核模块的开��动挂载模块一般是位于一个配�|�文�Ӟ��一般的Linux发行版本都有 /etc/modules.conf �?/etc/modprobe.conf 。比如Fedora Core 4.0 内核模块开��动加载文件是 /etc/modprobe.conf �Q�在�q�个文�g中，一般是写入模块的加载命令或模块的别名的定义�{�；比如我们在modules.conf 中可能会发行�c�M��的一�?�Q? alias eth0 8139too from http://blog.chinaunix.net/u2/76292/showart.php?id=2090623

chatler 2009-12-10 22:48 发表评论

chatler — Sat, 28 Nov 2009 12:08:00 GMT

Sendfile函数说明

#include

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

sendfile()是作用于数据拯��在两个文件描�q�符之间的操作函�?�q�个拯��操作是内�怸�操作�?所以称�?零拷�?.sendfile函数比�vread和write函数高效得多,因�ؓread和write是要把数据拷贝到用户应用层操�?

参数说明:

out_fd 是已�l�打开�?用于写操�?write)的文件描�q�符;

in_fd 是已�l�打开�?用于��L��?read)的文件描�q�符;

offset 偏移�?表示sendfile函数从in_fd中的哪一偏移量开始读取数�?如果是零表示从文件的开始读,否则从相应的便宜量读�?如果是��@环读取的时�?下一�ơoffset值应为sendfile函数�q�回值加上本�ơ的offset的�?

count是在两个描述�W�之间拷贝的字节�?bytes)

�q�回�?

如果成功的拷�?�q�回写操作到out_fd的字节数,错误�q�回-1,�q�相应的讄��error信息.

EAGAIN 无阻塞I/O讄��O_NONBLOCK�?写操�?write)��d��?

EBADF 输出或者输入的文�g描述�W�没有打开.

EFAULT 错误的地址.

EINVAL 描述�W�不可用或者锁定了,或者用mmap()函数操作的in_fd不可�?

EIO 当读�?read)in_fd时发生未知错�?

ENOMEM �?read)in_fd时内存不��?

------------------------------------------------------------------------------

�׃��惛_��提升原有�pȝ��中文件传输模块的速度,�q�减��系�l�资源占�?�q�行了一�ơsendfile()的性能��试,但失败了.不过�q�是��它用在了模块中.记录一下这�ơ失改的微调��试.

�q�行�q�_��: 客户��Z��服务器均为P4计算�?IDE��盘; Fedora5发行�? 癑օ�局域网;

接收端程序如�?

FILE *fp = fopen(FILENAME,"wb");

  while((len = recv(sockfd, buff, sizeof(buff), 0)) > 0)
  {
      fwrite(buffer, 1, len, fp);
  }
  fclose(fp);

A. 发送端传统方式代码�D�如�?

fd = open(FILENAME, O_RDONLY);
  while((len =read(fd, buff, sizeof(buff))) >0)
  {
       send(sockfd, buff, len ,0);
  }
  close(fd);

�׃��我磁盘分区时指定的块大小�?096,��Z��最优读取磁盘数�?buff大小设�ؓ4096字节.但在��试中发现设�?024�?192不会对传输速度带来影响.

文�g大小:9M; 耗时:0.71 - 0.76�U?
文�g大小:32M; 耗时:2.64 - 2.68�U?
文�g大小:64M; 耗时:5.36 - 5.43�U?

B. 使用sendfile()传输代码�D?

off_t offset = 0;
stat(FILENAME, &filestat);

  fd = open(FILENAME, O_RDONLY);
  sendfile(sockfd, fd, &offset, filestat.st_size) );
  close(fd);

文�g大小:9M; 耗时:0.71 - 1.08�U?
文�g大小:32M; 耗时:2.66 - 2.74�U?
文�g大小:64M; 耗时:5.43 - 6.64�U?

��g��q�略有下�?�Ҏ��sendfile的man手册,我在使用该函数前调用�?br>

int no = 1;
printf("%d\n", setsockopt(sockfd, IPPROTO_TCP, TCP_CORK, (char*)&no, sizeof(int)) );

文�g大小:9M; 耗时:0.72 - 0.75�U?
文�g大小:32M; 耗时:2.66 - 2.68�U?
文�g大小:64M; 耗时:5.38 - 5.60�U?

�q�样��g��辑ֈ�了传�l�方式的速度?!不管哪种环境�?我用ethereal抓包昄��每一个tcp包的playload部分最大也通常�?448字节.

看来我的��试没有体现�?应用层数据的两次拯��带来很大的消�?�q�一说法.如果按照存在��是有理的说法的�?那我想sendfile()在两�U�情况下才体��C��?但我却没有环境测�?
1. 大�ƈ发量的文件服务器或HTTP服务�?
2. 内存资源紧张的嵌入式�pȝ��;

另外,�|�络上大量的关于tcp选项中的TCP_CORK描述已经�q�时.在man手册中早已提到该参数可以与TCP_NODELAY�l�合使用�?只是,只要讄��了TCP_NODELAY选项�?不管是否讄��TCP_CORK,包都会立卛_��?

----------------------------------------------------------------------

补充:

TCP_NODELAY和TCP_CORK基本上控制了包的“Nagle�?#8221;�Q�Nagle化在�q�里的含义是采用Nagle��法把较��的包组装�ؓ更大的��?John Nagle是Nagle��法的发明�h�Q�后者就是用他的名字来命名的�Q�他�?984�q�首�ơ用�q�种�Ҏ��来尝试解决福�Ҏ��车公司的�|�络拥塞问题�Q�欲了解详情请参看IETF RFC 896�Q�。他解决的问题就是所谓的silly window syndrome �Q�中文称“愚蠢�H�口症候群”�Q�具体含义是�Q�因为普遍终端应用程序每产生一�ơ击键操作就会发送一个包�Q�而典型情况下一个包会拥有一个字节的数据载荷以及40个字节长的包��_��于是产生4000%的过载，很轻易地��p��令网�l�发生拥�?�?Nagle化后来成了一�U�标准�ƈ且立卛_��因特�|�上得以实现。它现在已经成�ؓ�~�省配置了，但在我们看来�Q�有些场合下把这一选项��x��也是合乎需要的�?

现在让我们假设某个应用程序发��Z��一个请求，希望发送小块数据。我们可以选择立即发送数据或者等待��生更多的数据然后再一�ơ发送两�U�策略。如果我们马上发送数据，那么交互性的以及客户/服务器型的应用程序将极大地受益。例如，当我们正在发送一个较短的��h��q�且�{�候较大的响应�Ӟ��相关�q�蝲与传输的数据总量相比��׃��比较低，而且�Q�如果请求立卛_��出那么响应时间也会快一些。以上操作可以通过讄��套接字的TCP_NODELAY选项来完成，�q�样��q��用了 Nagle��法�?

另外一�U�情况则需要我们等到数据量辑ֈ�最大时才通过�|�络一�ơ发送全部数据，�q�种数据传输方式有益于大量数据的通信性能�Q�典型的应用��是文�g服务器。应用Nagle��法在这�U�情况下��׃��产生问题。但是，如果你正在发送大量数据，你可以设�|�TCP_CORK选项��用Nagle化，其方式正好同 TCP_NODELAY相反�Q�TCP_CORK �?TCP_NODELAY 是互相排斥的�Q�。下面就让我们仔�l�分析下其工作原理�?

假设应用�E�序使用sendfile()函数来�{�U�d��量数据。应用协议通常要求发送某些信息来预先解释数据�Q�这些信息其实就是报头内宏V��典型情况下报头很小�Q�而且套接字上讄��了TCP_NODELAY。有报头的包��被立即传输�Q�在某些情况下（取决于内部的包计数器�Q�，因�ؓ�q�个包成功地被对�Ҏ��到后需要请求对方确认。这��P��大量数据的传输就会被推迟而且产生了不必要的网�l�流量交换�?

但是�Q�如果我们在套接字上讄��了TCP_CORK�Q�可以比��Mؓ在管道上插入“塞子”�Q�选项�Q�具有报头的包就会填补大量的数据�Q�所有的数据都根据大��自动地通过包传输出厅R��当数据传输完成�Ӟ��最好取消TCP_CORK 选项讄��l�连�?#8220;拔去塞子”以便��M��部分的��都能发送出厅R��这�?#8220;塞住”�|�络�q�接同等重要�?

总而言之，如果你肯定能一起发送多个数据集合（例如HTTP响应的头和正文）�Q�那么我们徏议你讄��TCP_CORK选项�Q�这样在�q�些数据之间不存在�g�q�。能极大地有益于WWW、FTP以及文�g服务器的性能�Q�同时也��化了你的工作�?
转自�Q?br>http://blog.chinaunix.net/u2/76292/showart.php?id=2105375

chatler 2009-11-28 20:08 发表评论

精产国品久久一二三产区区别,亚洲精品午夜国产va久久,久久久久久久97

<�?gt;how to start a kernel thread

The Linux Kernel Module Programming Guide

A Beast of a Different Nature

linux kernel development-chapter 2 getting started with the kernel

A Beast of a Different Nature

No libc

Header Files

GNU C

Inline Functions

Inline Assembly

Branch Annotation

No Memory Protection

No (Easy) Use of Floating Point

Small, Fixed-Size Stack

Synchronization and Concurrency

Portability Is Important

HOWTO compile kernel modules for the kernel 2.6

如何在Linux内核中写文�g

What is the difference between user level threads and kernel level threads?

Linux 内核�W�记2 �?�q�程调度

linux块设备，字符讑֤�

linux内核模块���理命��o

linux内核模块��理命��o