亚洲夜晚福利在线观看,亚洲一区中文,欧美高清视频在线观看

chatler — Sat, 16 Jun 2012 01:17:00 GMT

摘要: memcached完全剖析�p�d��教程–1. memcached的基��memcached是什么？memcached 是以LiveJournal 旗下Danga Interactive 公司的Brad Fitzpatric 为首开发的一�ƾ��Y件。现在已成�ؓ 豆瓣、Facebook�?nbsp;Vox �{�众... 阅读全文

chatler 2012-06-16 09:17 发表评论

死锁和活�?deadlock and livelock

chatler — Fri, 08 Jun 2012 09:15:00 GMT

一、活�?
如果事务T1���锁了数据R�Q�事务T2又请求封锁R�Q�于是T2�{�待。T3也请求封锁R�Q?/pre>当T1释放了R上的���锁之后�pȝ��首先批准了T3的请求，T2仍然�{�待。然后T4�?/pre>��h�����锁R�Q�当T3释放了R上的���锁之后�pȝ��又批准了T4的请求，...�Q�T2有可
能永�q�等待，�q�就是活锁的情�Ş,避免�z�锁的简单方法是采用先来先服务的�{�略�?/pre>二、死�?
如果事务T1���锁了数据R1�Q�T2���锁了数据R2�Q�然后T1又请求封锁R2�Q�因T2�?/pre>���锁了R2�Q�于是T1�{�待T2释放R2上的锁。接着T2又申请封锁R1�Q�因T1已封锁了R1�Q?/pre>T2也只能等待T1释放R1上的锁。这样就出现了T1在等待T2�Q�而T2又在�{�待T1的局面，
T1和T2两个事务永远不能�l�束�Q��Ş成死锁�?
1. 死锁的预�?
在数据库中，产生死锁的原因是两个或多个事务都已封锁了一些数据对象，然后又都
��h��对已为其他事务封锁的数据对象加锁�Q�从而出现死�{�待。防止死锁的发生其实��?/pre>是要破坏产生死锁的条件。预防死锁通常有两�U�方法： 
① 一�ơ封锁法  
一�ơ封锁法要求每个事务必须一�ơ将所有要使用的数据全部加锁，否则��׃��能���l�执行�?

一�ơ封锁法虽然可以有效地防止死锁的发生�Q�但也存在问题，一�ơ就���以后要用到的全
部数据加锁，势必扩大了封锁的范围�Q�从而降低了�pȝ��的�ƈ发度�?
② ��序���锁�?
��序���锁法是预先�Ҏ��据对象规定一个封锁顺序，所有事务都按这个顺序实行封锁�?

��序���锁法可以有效地防止死锁�Q�但也同样存在问题。事务的���锁��h��可以随着事务�?/pre>执行而动态地军_���Q�很难事先确定每一个事务要���锁哪些对象�Q�因此也���很难按规定�?/pre>��序��L��加封锁�?
 
可见�Q�在操作�pȝ��中广为采用的预防死锁的策略�ƈ不很适合数据库的特点�Q�因此DBMS�?/pre>解决死锁的问题上普遍采用的是诊断�q�解除死锁的�Ҏ���?

 2. 死锁的诊断与解除
 
① ���时�?

 如果一个事务的�{�待旉������过了规定的旉����Q�就认�ؓ发生了死锁。超时法实现���单，�?/pre>其不���也很明显。一是有可能误判死锁�Q�事务因为其他原因�ɽ{�待旉������过旉����Q�系�l�会
误认为发生了死锁。二是时限若讄���得太长，死锁发生后不能及时发现�?
 
② �{�待图法
 
事务�{�待图是一个有向图G=(T,U)�?T为结点的集合�Q�每个结点表�C�正�q�行的事务；U�?/pre>边的集合�Q�每条边表示事务�{�待的情��c��若T1�{�待T2,则T1、T2之间划一条有向边�Q�从T1
指向T2。事务等待图动态地反映了所有事务的�{�待情况。�ƈ发控制子�pȝ��周期性地�Q�比�?/pre>每隔1分钟�Q�检���事务等待图�Q�如果发现图中存在回路，则表�C�系�l�中出现了死锁�?
 
DBMS的�ƈ发控制子�pȝ��一旦检���到�pȝ��中存在死锁，���p��设法解除。通常采用的方法是选择
一个处理死锁代��h�����的事务�Q�将其撤消，释放此事务持有的所有的锁，使其它事务得以���l?/pre>�q�行下去。当�Ӟ���Ҏ��消的事务所执行的数据修�Ҏ��作必���d��以恢复�?/pre>

chatler 2012-06-08 17:15 发表评论

�?C/C++ ��目构徏您自��q��内存��理�?lt;forward>

chatler — Sat, 26 May 2012 14:41:00 GMT

摘要: Arpan Sen, 技术负责�h, Synapti Computer Aided Design Pvt LtdRahul Kumar Kardam (rahul@syncad.com), 高��软�g工程�? Synapti Computer Aided Design Pvt Ltd��介：代码的性能优化是一��w��帔R��要的工作。经常可以看刎ͼ�采用 C �?C++ �~�写的、功能正��的... 阅读全文

chatler 2012-05-26 22:41 发表评论

TCMalloc : Thread-Caching Malloc

chatler — Wed, 04 Apr 2012 13:15:00 GMT

from:

http://goog-perftools.sourceforge.net/doc/tcmalloc.html

TCMalloc : Thread-Caching Malloc

Sanjay Ghemawat, Paul Menage

Motivation

TCMalloc is faster than the glibc 2.3 malloc (available as a separate library called ptmalloc2) and other mallocs that I have tested. ptmalloc2 takes approximately 300 nanoseconds to execute a malloc/free pair on a 2.8 GHz P4 (for small objects). The TCMalloc implementation takes approximately 50 nanoseconds for the same operation pair. Speed is important for a malloc implementation because if malloc is not fast enough, application writers are inclined to write their own custom free lists on top of malloc. This can lead to extra complexity, and more memory usage unless the application writer is very careful to appropriately size the free lists and scavenge idle objects out of the free list

TCMalloc also reduces lock contention for multi-threaded programs. For small objects, there is virtually zero contention. For large objects, TCMalloc tries to use fine grained and efficient spinlocks. ptmalloc2 also reduces lock contention by using per-thread arenas but there is a big problem with ptmalloc2's use of per-thread arenas. In ptmalloc2 memory can never move from one arena to another. This can lead to huge amounts of wasted space. For example, in one Google application, the first phase would allocate approximately 300MB of memory for its data structures. When the first phase finished, a second phase would be started in the same address space. If this second phase was assigned a different arena than the one used by the first phase, this phase would not reuse any of the memory left after the first phase and would add another 300MB to the address space. Similar memory blowup problems were also noticed in other applications.

Another benefit of TCMalloc is space-efficient representation of small objects. For example, N 8-byte objects can be allocated while using space approximately 8N * 1.01 bytes. I.e., a one-percent space overhead. ptmalloc2 uses a four-byte header for each object and (I think) rounds up the size to a multiple of 8 bytes and ends up using 16N bytes.

Usage

To use TCmalloc, just link tcmalloc into your application via the "-ltcmalloc" linker flag.

You can use tcmalloc in applications you didn't compile yourself, by using LD_PRELOAD:

   $ LD_PRELOAD="/usr/lib/libtcmalloc.so"

LD_PRELOAD is tricky, and we don't necessarily recommend this mode of usage.

TCMalloc includes a heap checker and heap profiler as well.

If you'd rather link in a version of TCMalloc that does not include the heap profiler and checker (perhaps to reduce binary size for a static binary), you can link in libtcmalloc_minimal instead.

Overview

TCMalloc assigns each thread a thread-local cache. Small allocations are satisfied from the thread-local cache. Objects are moved from central data structures into a thread-local cache as needed, and periodic garbage collections are used to migrate memory back from a thread-local cache into the central data structures.

TCMalloc treates objects with size <= 32K ("small" objects) differently from larger objects. Large objects are allocated directly from the central heap using a page-level allocator (a page is a 4K aligned region of memory). I.e., a large object is always page-aligned and occupies an integral number of pages.

A run of pages can be carved up into a sequence of small objects, each equally sized. For example a run of one page (4K) can be carved up into 32 objects of size 128 bytes each.

Small Object Allocation

Each small object size maps to one of approximately 170 allocatable size-classes. For example, all allocations in the range 961 to 1024 bytes are rounded up to 1024. The size-classes are spaced so that small sizes are separated by 8 bytes, larger sizes by 16 bytes, even larger sizes by 32 bytes, and so forth. The maximal spacing (for sizes >= ~2K) is 256 bytes.

A thread cache contains a singly linked list of free objects per size-class.

When allocating a small object: (1) We map its size to the corresponding size-class. (2) Look in the corresponding free list in the thread cache for the current thread. (3) If the free list is not empty, we remove the first object from the list and return it. When following this fast path, TCMalloc acquires no locks at all. This helps speed-up allocation significantly because a lock/unlock pair takes approximately 100 nanoseconds on a 2.8 GHz Xeon.

If the free list is empty: (1) We fetch a bunch of objects from a central free list for this size-class (the central free list is shared by all threads). (2) Place them in the thread-local free list. (3) Return one of the newly fetched objects to the applications.

If the central free list is also empty: (1) We allocate a run of pages from the central page allocator. (2) Split the run into a set of objects of this size-class. (3) Place the new objects on the central free list. (4) As before, move some of these objects to the thread-local free list.

Large Object Allocation

A large object size (> 32K) is rounded up to a page size (4K) and is handled by a central page heap. The central page heap is again an array of free lists. For i < 256, the kth entry is a free list of runs that consist of k pages. The 256th entry is a free list of runs that have length >= 256 pages:

An allocation for k pages is satisfied by looking in the kth free list. If that free list is empty, we look in the next free list, and so forth. Eventually, we look in the last free list if necessary. If that fails, we fetch memory from the system (using sbrk, mmap, or by mapping in portions of /dev/mem).

If an allocation for k pages is satisfied by a run of pages of length > k, the remainder of the run is re-inserted back into the appropriate free list in the page heap.

Spans

The heap managed by TCMalloc consists of a set of pages. A run of contiguous pages is represented by a Span object. A span can either be allocated, or free. If free, the span is one of the entries in a page heap linked-list. If allocated, it is either a large object that has been handed off to the application, or a run of pages that have been split up into a sequence of small objects. If split into small objects, the size-class of the objects is recorded in the span.

A central array indexed by page number can be used to find the span to which a page belongs. For example, span a below occupies 2 pages, span b occupies 1 page, span c occupies 5 pages and span d occupies 3 pages.

A 32-bit address space can fit 2^20 4K pages, so this central array takes 4MB of space, which seems acceptable. On 64-bit machines, we use a 3-level radix tree instead of an array to map from a page number to the corresponding span pointer.

Deallocation

When an object is deallocated, we compute its page number and look it up in the central array to find the corresponding span object. The span tells us whether or not the object is small, and its size-class if it is small. If the object is small, we insert it into the appropriate free list in the current thread's thread cache. If the thread cache now exceeds a predetermined size (2MB by default), we run a garbage collector that moves unused objects from the thread cache into central free lists.

If the object is large, the span tells us the range of pages covered by the object. Suppose this range is [p,q]. We also lookup the spans for pages p-1 andq+1. If either of these neighboring spans are free, we coalesce them with the [p,q] span. The resulting span is inserted into the appropriate free list in the page heap.

Central Free Lists for Small Objects

As mentioned before, we keep a central free list for each size-class. Each central free list is organized as a two-level data structure: a set of spans, and a linked list of free objects per span.

An object is allocated from a central free list by removing the first entry from the linked list of some span. (If all spans have empty linked lists, a suitably sized span is first allocated from the central page heap.)

An object is returned to a central free list by adding it to the linked list of its containing span. If the linked list length now equals the total number of small objects in the span, this span is now completely free and is returned to the page heap.

Garbage Collection of Thread Caches

A thread cache is garbage collected when the combined size of all objects in the cache exceeds 2MB. The garbage collection threshold is automatically decreased as the number of threads increases so that we don't waste an inordinate amount of memory in a program with lots of threads.

We walk over all free lists in the cache and move some number of objects from the free list to the corresponding central list.

The number of objects to be moved from a free list is determined using a per-list low-water-mark L. L records the minimum length of the list since the last garbage collection. Note that we could have shortened the list by L objects at the last garbage collection without requiring any extra accesses to the central list. We use this past history as a predictor of future accesses and move L/2 objects from the thread cache free list to the corresponding central free list. This algorithm has the nice property that if a thread stops using a particular size, all objects of that size will quickly move from the thread cache to the central free list where they can be used by other threads.

Performance Notes

PTMalloc2 unittest

The PTMalloc2 package (now part of glibc) contains a unittest program t-test1.c. This forks a number of threads and performs a series of allocations and deallocations in each thread; the threads do not communicate other than by synchronization in the memory allocator.

t-test1 (included in google-perftools/tests/tcmalloc, and compiled as ptmalloc_unittest1) was run with a varying numbers of threads (1-20) and maximum allocation sizes (64 bytes - 32Kbytes). These tests were run on a 2.4GHz dual Xeon system with hyper-threading enabled, using Linux glibc-2.3.2 from RedHat 9, with one million operations per thread in each test. In each case, the test was run once normally, and once with LD_PRELOAD=libtcmalloc.so.

The graphs below show the performance of TCMalloc vs PTMalloc2 for several different metrics. Firstly, total operations (millions) per elapsed second vs max allocation size, for varying numbers of threads. The raw data used to generate these graphs (the output of the "time" utility) is available in t-test1.times.txt.

TCMalloc is much more consistently scalable than PTMalloc2 - for all thread counts >1 it achieves ~7-9 million ops/sec for small allocations, falling to ~2 million ops/sec for larger allocations. The single-thread case is an obvious outlier, since it is only able to keep a single processor busy and hence can achieve fewer ops/sec. PTMalloc2 has a much higher variance on operations/sec - peaking somewhere around 4 million ops/sec for small allocations and falling to <1 million ops/sec for larger allocations.
TCMalloc is faster than PTMalloc2 in the vast majority of cases, and particularly for small allocations. Contention between threads is less of a problem in TCMalloc.
TCMalloc's performance drops off as the allocation size increases. This is because the per-thread cache is garbage-collected when it hits a threshold (defaulting to 2MB). With larger allocation sizes, fewer objects can be stored in the cache before it is garbage-collected.
There is a noticeably drop in the TCMalloc performance at ~32K maximum allocation size; at larger sizes performance drops less quickly. This is due to the 32K maximum size of objects in the per-thread caches; for objects larger than this tcmalloc allocates from the central page heap.

Next, operations (millions) per second of CPU time vs number of threads, for max allocation size 64 bytes - 128 Kbytes.

Here we see again that TCMalloc is both more consistent and more efficient than PTMalloc2. For max allocation sizes <32K, TCMalloc typically achieves ~2-2.5 million ops per second of CPU time with a large number of threads, whereas PTMalloc achieves generally 0.5-1 million ops per second of CPU time, with a lot of cases achieving much less than this figure. Above 32K max allocation size, TCMalloc drops to 1-1.5 million ops per second of CPU time, and PTMalloc drops almost to zero for large numbers of threads (i.e. with PTMalloc, lots of CPU time is being burned spinning waiting for locks in the heavily multi-threaded case).

Caveats

For some systems, TCMalloc may not work correctly on with applications that aren't linked against libpthread.so (or the equivalent on your OS). It should work on Linux using glibc 2.3, but other OS/libc combinations have not been tested.

TCMalloc may be somewhat more memory hungry than other mallocs, though it tends not to have the huge blowups that can happen with other mallocs. In particular, at startup TCMalloc allocates approximately 6 MB of memory. It would be easy to roll a specialized version that trades a little bit of speed for more space efficiency.

TCMalloc currently does not return any memory to the system.

Don't try to load TCMalloc into a running binary (e.g., using JNI in Java programs). The binary will have allocated some objects using the system malloc, and may try to pass them to TCMalloc for deallocation. TCMalloc will not be able to handle such objects.

chatler 2012-04-04 21:15 发表评论

How does the DMA work

chatler — Sun, 14 Nov 2010 11:23:00 GMT

The DMA is another two chips on your motherboard (usually is an Intel 8237A-5 chips) that allow you (the programmer) to offload data transfers between I/O boards. DMA actually stands for 'Direct Memory Access'.

DMA can work: memory->I/O, I/O->memory. The memory->memory transfer doesn't work. It doesn't matter because ISA DMA is slow as hell and thus is unusable. Futhermore, using DMA for zeroing out memory would massacre the contents of memory caches.

What about caches and DMA? L1 and L2 caches work absolutely transparently. When DMA writes to memory, caches autmatically load or least invalidate the data that go into the memory. When DMA reads memory, caches supply the unwritten bytes so not old but new values are tranferred to the peripheral.

There are signals DACK, DRQ, and TC. When a peripheral wants to move a byte or 2 bytes into memory (is dependent on whether 8 bit or 16 bit DMA channel is in use -- 0,1,2,3 are 8-bit, 5,6,7 are 16-bit), it issues DRQ. DMA controller chats with CPU and after some time DMA controller issues DACK. Seeing DACK, the peripheral puts it's byte on data bus, DMA controller takes it and puts it in memory. If it was the last byte/word to move, DMA controller sets up also TC during the DACK. When peripheral sees TC, it is possible it will not want any more movements,

In the other direction, everything is the same, but first the byte/word is fetched from the memory and then DACK is generated and the peripheral takes the data.

DMA controller has only 8-bit address counter inside. There is external ALS573 counter for each chip so it makes programmer see it as DMA controller had 16 bits of address counter per channel inside. There are more 8 bits of address per channel of so called page register in LS612 that unfortunately do not increment as those in ALS573. All these 24 bits can address 16777216 of distict addresses.

Recapitulation: for each channel, independently, you see 16 bits of auto-incrementing counter, and 8 bits of page register which doesn't increment.

The difference between 16-bit DMA channels and 8-bit DMA channels is that the address bits for 16-bit channels are wired one bit left to the address bus so every address is 2 times bigger. The lowest bit is 0. The highest bit of page register would fit into bit 24 which is not on ISA so that it is left unconnected. The bus control logic is wired for 16-bit channels in a manner every single DMA transfer, a 16-bit cycle is generated, so ISA device puts 16 bits onto the bus at the time. I don't know what happens if you use 16-bit DMA channel with XT peripheral. I guess it could work but only be slower.

8-bit DMA: increments by 1, cycles inside 65536 bytes, addresses 16MB, moves 8 bits a time.

16-bit DMA: increments by 2, goes only over even addresses, cycles inside 131072 bytes, addresses 16MB, moves 16 bits a time. Uses 16-bit ISA I/O cycle so it takes less ticks to make one move that the 8-bit DMA.

An example of DMA usage would be the Sound Blaster's ability to play samples in the background. The CPU sets up the sound card and the DMA. When the DMA is told to 'go', it simply shovels the data from RAM to the card. Since this is done off-CPU, the CPU can do other things while the data is being transferred.

Enough basics. Here's how you program the DMA chip.

When you want to start a DMA transfer, you need to know several things:

Number of DMA channel you want to use
What page to use
The offset in the page
The length
How to tell you peripheral to ask for DMA

You cannot transfer more than 64K or 128K of data in one shot, and
You cannot cross a page boundary. If you cross it, the lower 16 or 17 bits of address will simply wrap and you only suddenly jump 65536 or 131072 bytes lower that where you expected. It will be absolutely OK and no screw up will be performed. If you will take it in account in your program you can use it.

Restriction #1 is rather easy to get around. Simply transfer the first block, and when the transfer is done, send the next block.

For those of you not familiar with pages, I'll try to explain.

Picture the first 16MB region of memory in your system. It is divided into 256 pages of 64K or 128 pages of 128K. Every page starts at a multiple of 65536 or 131072. They are numbered from 0 to 255 or from 0 to 127.

In plain English, the page is the highest 8 bits or 7 bits of the absolute 24 bit address of our memory location. The offset is the lower 16 or 17 bits of the absolute 24 bit address.

Now that we know where our data is, we need to find the length.

The DMA has a little quirk on length. The true length sent to the DMA is actually length + 1. So if you send a zero length to the DMA, it actually transfers one byte or word, whereas if you send 0xFFFF, it transfers 64K or 128K. I guess they made it this way because it would be pretty senseless to program the DMA to do nothing (a length of zero), and in doing it this way, it allowed a full 64K or 128K span of data to be transferred.

Now that you know what to send to the DMA, how do you actually start it? This enters us into the different DMA channels.

The following chart will describe each channel and it's corresponding port number:

DMA Channel	Page	Address	Count
0	87h	0h	1h
1	83h	2h	3h
2	81h	4h	5h
3	82h	6h	7h
4	8Fh	C0h	C2h
5	8Bh	C4h	C6h
6	89h	C8h	CAh
7	8Ah	CCh	CEh

DMA 4. Doesn't exist. DMA 4 is used to cascade the two 8237A chips. When first 8237A wants to DMA, it issues "HRQ" to second chip's DRQ 4. The second chip thinks DMA 4 is wanna be made so issues DRQ 4 to the first chip's HLDA. First chip makes it's own DMA 0-3, then sends to the second "OK second chip, my DMA 4 is complete" and second chip knows it's free on the bus. If this mechanism would not work, the two chips could peck each other on the BUS and the PC would screw up. :+)

from:

http://www.osdever.net/papers/view/how-does-the-dma-work

chatler 2010-11-14 19:23 发表评论

有关异步��d��、通信

chatler — Mon, 06 Sep 2010 09:33:00 GMT

��?/div>

一般来��_��单的异步�Q�Asynchronous�Q?调用是这样一�U�调用方式：发�v者请求一个异步调用，通知执行者，然后处理其他工作�Q�在某一个同步点�{�待执行者的完成�Q�执行者执行调用的实际操作�Q�完成后�?知发赯��。可以看出，在异步调用中有两�U�角�Ԍ��发�v者和执行者，它们都是能主动运行的对象�Q�我们称��Z��动对象，同时�q�有一个同步点�Q�主动对象在同步点协�?同步。在本文中，我们讨论主要是通用计算机、多�q�程多线�E�的分时操作�pȝ��上的异步调用。在操作�pȝ��的角度上来看�Q�主动对象包括了�q�程、线�E�和��g上的IC�{�，至于中断�Q�可以看作��L��在某个进�E�或者线�E�的上下文借用一下CPU。而同步操作可以通过操作�pȝ��得各�U�同步机�Ӟ��互斥锁，信号灯等�{�来完成�?/div>

我们可以先看看异步调用在Windows(本文中一般不加指出的话，都是�Ҏ��NT/2000)��d��文�g中的应用。Windows中的ReadFile和WriteFile都提供了异步的接口。以ReadFile��Z��Q?/div>

BOOL ReadFile(HANDLE hFile, LPVOID lpBuffer, DWORD nNumberOfBytesToRead, LPDWORD lpNumberOfBytesRead, LPOVERLAPPED lpOverlapped);

如果最后一个参数lpOverlapped不�ؓNULL�Q��ƈ且文件以FILE_FLAG_OVERLAPPED 标志打开�Q�那么这个调用就是异步的�Q�ReadFile会立刻返回，如果操作没有立刻完成�Q�返回FALSE�q�且GetLastError()�q�回 ERROR_IO_PENDING�Q�，那么调用者可以在某个时刻通过WaitForSingleObject�{�函数来�{�待中的hEvent来等待操作完�?�Q�可能已�l�完成）�q�行同步�Q�当操作完成以后�Q�可以调用GetOverlappedResult者获得操作的�l�果�Q�比如是否成功，��d��了多��字节等�{�。这�?的发赯��就是应用程序，而执行者就是操作系�l�本�w�，至于执行者是怎么执行的，我们会在后面的篇�q�讨论。而两者的同步��是通过一个Windows Event来完成�?/div>

把这个异步调用的�q�程再抽象和扩展一些，我们可以把异步调用需要解决的问题归结��Z��个：一个是执行的动力，另一个是��d��对象的调度。简单来��_��前者是各个��d��对象�Q�线�E�、进�E�或者一些代码）是如何获得CPU�Q�后者是各个��d��对象如何协同工作�Q?保证操作的流�E�是协调正确的。一般来��_��q�程和线�E�都可以由操作系�l�直接调度而获得CPU�Q�而更�l�粒度的�Q�比如一些代码的调度�Q�往往��需要一个更复杂的模型（比如在操作系�l�内部的实现�Q�这时候线�E�的�_�度太粗了）。而主动对象的调度�Q�当参与者较��的时候，可以通过基本的同步机制来完成�Q�在更复杂的情况下，�?能通过一个schedule机制来做会更实际一些�?/div>

动力和调�?/div>

如前所�q�ͼ�异步调用主要需要解决两个问题：执行的动力和执行的调度。最普遍的情况就是，一个主导流�E�的调用者进�E�（�U�程�Q�，一个或多个工作者进�E�（�U�程�Q�，通过操作�pȝ��提供的同步机制来�?成异步调用。这个同步机制在扩展化的情�Ş下，是一个或多个栅栏Barrier�Q�对应于每个同步的执行点。所有需要在�q�个执行点同步的��d��对象会等待相应的 Barrier�Q�直到所有对象都完成。在一些简化的情�Ş�Q�比如说工作者�ƈ不关心调用者的同步�Q�那么这个Barrier可以��化成信号灯，在只有一个工作�?的情况下�Q�可以简化成一个Windows事�gEvent或者条件变�?Condition Variable�?/div>

现在来考虑复杂的情形。假设我们用一些线�E�来协作完成一��工作，各个�U�程的执行之间有先后��序上的限制�Q�而操作系�l�就是这��工作的调度者，负责在适当的时候调度适当的线�E�来获得CPU。显�Ӟ��q�发执行中的一个线�E�对于另外一个线�E�来��_��本质上就是异步的�Q�假如它们之间有调用关系�Q�那也就是一个异步调用。而操作系�l�可以通过�?本的同步机制使得合适的�U�程才被调度�Q�其他未完成的线�E�则处于�{�待状态。�D例说�Q�我们有4个线�E�A,B,C,D来完成一��工作，其中的顺序限制是 A>B;C>D�Q?#8220;>”表示左边的线�E�完成必��d��于右边的�U�程执行�Q��?#8220;;”表示两个�U�程可以同时�q�行。同时假设B的一个操作需要调�?C来完成，显而易见，�q�时候这个操作就是一个异步调用。我们可以在每个“>”的位�|�设定一个同步点�Q�然后通过一个信��L��来完成同步。线�E�B�Q�C�{�待 �W�一个信��L��Q�而D会等待第二个信号灯。这个例子的动力和调度都是通过操作�pȝ��的基本机�Ӟ��U�程调度和同步机�Ӟ��来完成�?/div>

把这个过�E�抽象一下，可以描述为：若干个主动对象（包括代码�Q�协调来完成一��工作，通过一个调度器来调度，实际上，�q�个调度器可能只是一些调度规则。显 �Ӟ��q�程或者线�E�只要被调度��p��获得CPU�Q�所以我们主要考虑代码�Q�比如一个函敎ͼ�怎么��h��能获得执行。用工作者线�E�来调用�q�个函数昄��是直观和通用的一个方案。事实上�Q�在用户�I�间(user space)或者用��h�?user mode)�Q�这个方法是很常用的。而在内核�?kernel mode)�Q�则可以通过中断来获得CPU�Q�这个通过注册IDT�?口和触发软中断就可以完成。硬件设备上的IC是另一个动力之源。而主动对象的调度�Q�最基本的也是前面说的各�U�同步机制。另一个常用的机制��是回调函数�Q�需要注意的是，回调函数一般会发生在跟调用者不一��L��上下文，比如说同一个进�E�的不同�U�程�Q�这个差别会带来一些限制。如果需要回调发生在调用者的�q�程�Q�线 �E�）上下文，则需要一些类似Unix下的signal或者Windows下的APC机制�Q�这一�Ҏ��们在后面会有所阐述。那么在回调函数里面一般作些什么事情呢�Q�最常用的，跟同步机制结合在一��P��当然��是释放一个互斥锁�Q�信��L��或者Windows Event�Q�Unix的条件变量）�{�等�Q�从而��得等待同步的其他对象可以得到调度而重新执行，实际上，也可以看作是通知调度器（操作�pȝ��Q�某些主动对�?�Q�等待同步的�Q�可以重新被调度了，从而调度器重新调度。但是对于另外一些调度器�Q�在�q�个�q�程中可能不需要同步对象的参与。在一些极端一些的例子里，调度�?至不要求严格有序的�?/div>

在实际应用中�Q�根据环境的限制�Q�异步调用的动力和调度的实现方式可以有很大差别。我们会在后面的例子里加以说明�?操作�pȝ��中的异步�Q�Windows的异步I/O�?/div>

Windows NT/2000是一个抢占式的分时操作系�l�。Windows的调度单位是�U�程�Q�它�?I/O架构是完全异步的�Q�也��是说同步的I/O实际上都��Z��异步I/O来完成。一个用��h��的�U�程��h��一个I/O的时候会��D��一个运行状态从user mode到kernel mode的�{变（操作�pȝ��把内核映��到每个�q�程�?G-4G的地址上，对于每个�q�程都是一��L��Q�。这个过�E�是通过中断调用内核输出的一些System Service来完成，比如说ReadFile实际上会执行NtReadFile�Q�ZwReadFile�Q�，需要注意的是，�q�行上下文仍然是当前�U�程�?NtReadFile的实现则��Z��Windows内核的异步I/O框架�Q�在I/O Manager的协助下完成。需要指出的是，I/O Manager只是��p��q�API构成的一个抽象概念，�q�没有一个真正的I/O Manager�U�程在运行�?/div>

Windows的I/O驱动�E�序是层�ơ堆�U�的。每个驱动程序会提供一致的接口以供初始化、清理和功能调用。驱动程序的调用��Z��I/O��h��包（I/O Request Packet, IRP�Q�，而不是像普通的函数调用那样使用栈来传递参数。操作系�l�和PnP��理器根据注册表�?适当的时机初始化和清理相应的驱动�E�序。在一般的功能调用的时候，IRP里面会指定功能调用号码以及相应的上下文或者参敎ͼ�I/O stack location�Q�。一个驱动程序可能调用别的驱动程序，�q�个�q�程可能是同步的�Q�线�E�上下文不改�?�Q�也可能是异步的。NtReadFile的实玎ͼ�大致是向最上层的驱动程序发��Z��个或多个IRP�Q�然后等待相应事件的完成�Q�同步的情况�Q�，或者直接返回（带Overlapped的情况）�Q�这些都在发赯��求的 �U�程执行�?/div>

当驱动程序处理IRP的时候，它可能立��d��成，也可能在中断里才能完成，比如��_��往��g讑֤�发出一个请求（通常可以是写 I/O port�Q�，当设备完成操作的时候会触发一个中断，然后在中断处理函数里得到操作�l�果。Windows有两�c�M��断，��g讑֤�的中断和软中断，分成若干个不同的优先�U�（IRQL�Q�。��Y中断主要有两�U�：DPC(Delayed Procedure Call)和APC(Asynchronous Procedure Call)�Q�都处于较低的优先��。驱动程序可以�ؓ��g中断注册ISR(Interrupt Service Routine)�Q�一般就是修改IDT某个条目的入口。同��P��操作�pȝ��也会为DPC和APC注册适当的中断处理例�E�（也是在IDT中）�?/div>

值得指出的是�Q�DPC是跟处理器相关的�Q�每个处理器会有一个DPC队列�Q�而APC是跟�U�程相关的，每个�U�程会有它的APC队列�Q�实际上包括一�?Kernel APC队列和User APC队列�Q�它们的调度�{�略有所区别�Q�，可以惌��Q�APC�q�不��严格意义上的中断，因�ؓ中断可能发生在�Q何一个线�E�的上下文中�Q�它被称��Z��断，主要是因�?IRQL的提升（从PASSIVE到APC�Q�，APC的调度一般在�U�程切换�{�等情�Ş下进行。当中断发生的时候，操作�pȝ��会调用中断处理例�E�，对于��g讑֤� 的ISR�Q�一般处理是兌��备中断，发出一个DPC��h��Q�然后返回。不在设备的中断处理中��用太多的CPU旉��Q�主要考虑是否则可能丢失别的中断。由于硬件设备中断的IRQL比DPC中断的高�Q�所以在ISR里面DPC会阻塞，直到ISR�q�回IRQL回到较低的水�q�I��才会触发DPC中断�Q�在 DPC中断里执行从��g讑֤��d��数据以及重新��h��、开中断�{�操作。ISR或者DPC可能在�Q何被中断的线�E�上下文�Q�arbitrary thread context�Q�执行，事实上线�E�的上下文是不可见的�Q�可以认为是�pȝ��借用一下时间片而已�?/div>

�ȝ��来说�Q�Windows的异步I/O�?构中�Q�主要有两种动力�Q�一是发赯��求的�U�程�Q�一部分内核代码会在�q�个�U�程上下文执行，二是ISR和DPC�Q�这部分内核代码会在中断里完成，可能使用��M��一个线�E�的上下文。而调度常见��用回调和事�g�Q�KEVENT�Q�，比如说在往下一层的驱动�E�序发出��h��的时候，可以指定一个完成例�E�Completion Routine�Q�当下层的驱动完成这个请求的时候会调用�q�个例程�Q�而往往在这个例�E�里�Q�就是简单的触发一下一个事件�?另外可以��Z��提一下Linux。Linux 2.6也有�c�M��的中断机�Ӟ��它有更多的��Y中断优先�U�，即不同优先��的softirq�Q�而类��g��DPC�Q�Linux也提供了专门的��Y中断�Q�对应DPC的就�?tasklet。Linux没有一个像windows�q�么一致的层次驱动�E�序架构�Q�所以它的异步I/O�E�微�_�糙一些，主要是通过以前的一些阻塞点�Q�现在直接返�?EIOCBRETRY�Q�而让调用者在合适的时机�l�箋重试。在�q�个�Ҏ��中，可以认�ؓ整个操作�׃��个函数完成，每次操作有进展时�Q�都把这个函��C��头执行一遍，当然已经完成的部分就不会再有实际的I/O。这��L��最大好处是原有的文件系�l�和驱动�E�序不用完全重写。而对于同步调用，只要��d��可以了�Q�这样对 �pȝ��的修改较��。这时候，要提供POSIX aio的语义，��可能需要提供一些用��L��E�来完成重试的过�E�了�Q�回想Windows可以通过中断和DPC完成的）。而对于Solaris�Q�也是类似的处理�Q�如果设备支持异步I/O�Q�那��通过中断可以完成�Q�否则就使用内部的LWP来模拟�?/div>

应用�E�序�Q�一个异步的HTTP服务器的设计

假设我们要设计一个HTTP服务器，它的设计目标包括�Q�高�q�发性、精�� Q�部分支持HTTP/1.1�Q�、支持plug-in�l�构。在不少场合可能都有�q�个需求。��M��上来��_��HTTP服务器可以类比成一个基于多�U�程的操作系 �l�：OS调度每个工作�U�程在适当的时候获得执行，而工作线�E�提供服务（也就是处理HTTP��h��Q�。在�q�个基础上，主要的考虑��是调度�_�度的大��，�_�度太大的时候�ƈ发性会降低�Q�而粒度太��又可能因�ؓ��d��切换�Q�考虑OS的Context Switching�Q�而导致效率降低，所以这又是一个折��L��l�果。类��g��Apache�Q�以及其他的HTTP服务器）�Q�我们可以把一个HTTP处理�q�程分�ؓ 若干个状态，��Z��q�些状态可以构造出一个HTTP处理的状态机。这�U�情况下�Q�我们就可以把每个状态的处理作�ؓ调度的粒度。一个调度过�E�就是：一个工作线�E?从全局的�Q务队列里取出一个HTTP_Context�l�构�Q�根据当前的状态完成相应处理；然后�Ҏ��状态机讄��下一个状态；再放回到全局的�Q务队列里。这�?子，若干个HTTP状态就可以通过�q�个调度�{�略构成一个完整HTTP处理�q�程。显而易见，一个状态对于下一个状态处理的调用都可以认为是异步的。一�?HTTP状态机的设计如下图所�C��?/div>

�?. HTTP状态机

工作�U�程的函数其实就是两个操作：从状态队列里取出一个HTTP_Context�Q�调用HTTP_Context的service()函数�Q�周而复此�?在这个架构上�Q�就很容易引入异步I/O和Plug-in的机制了。事实上我们也可以��用基于事�Ӟ��例如select/poll�Q�的I/O�{�略来模拟异步I /O�Q�实��C��使用一个用��L��E�就可以了�?/div>

对于异步I/O和Plug-in的调用，我们也是采用�c�M��于Linux 2.6里面aio的重试方案，而异步完成的时候采用回调函数。在某个状态上�Q�如果系�l�需要I/O操作�Q�recv或者send�Q�，则会��h��一个异步I /O�Q�操作系�l�提供的异步I/O或者由用户�U�程模拟的异步I/O�Q�，�q�时候相应的HTTP_Context不会重新回到状态队列里�Q�而在I/O完成的回�?函数里面才会重新攑֛�到状态队列，得到重新调度的机会。HTTP_Context得到重新调度的时候会��查I/O状态（�q�个可以通过一些标志位来完成）�Q?如果已经完成�Q�则处理然后讄��下一状态，重新调度�Q�否则可以重新请求一个新的I/O��h��。Plug-in也可以��用类似的�Ҏ��Q�比如说一个Plug-in 要跟外部的一个服务器通信�Q�这时候就可以在通信完成的时候才把HTTP_Context重新攑֛�到状态队列。显�Ӟ��Plug-in跟HTTP状态是多对�?的关�p�，一个Plug-in可以在若�q�个兛_��的状态注册自�w�，同时�q�可以设�|�一些short-path来提高处理的效率�?/div>

�l�论

�ȝ��来说�Q�异步调用的设计和应用归根结底就是对多个��d��对象的管理问题：如何提供执行的动力以及如何保证执行的��序逻辑。主要考虑的问题是��d��对象的粒度以及执行方式，同步或者回调来完成��序的调度，或者��用近似的调度而加一些鲁��的错误处理机制来保证语义的正确。后者可以考虑在��用基于事件的 socket的时候，readable事�g的通知可以是冗余的�Q�或者说可以比实际中发生的readable事�g更多�Q�这个时候��用非��d��的socket, 有些read()�Q�或者recv()�Q�会直接�q�回EWOULDBLOCK�Q�系�l�只要考虑处理�q�种情况�Q��用non blocking socket而不是blocking socket�Q�，当例外的情况不多的时候是可以接受的。这时候可以说事�g的报告就只是�q�似的�?/div>

from:

http://hi.baidu.com/hytjfxk/blog/item/d9262cdfcb298c14632798b3.html

chatler 2010-09-06 17:33 发表评论

semaphore and spinlock

chatler — Thu, 01 Apr 2010 03:50:00 GMT

内核同步措施

    ��Z��避免�q�发�Q�防止竞争。内核提供了一�l�同步方法来提供对共享数据的保护�? 我们的重点不是介�l�这些方法的详细用法�Q�而是��Z��么��用这些方法和它们之间的差别�?br>     Linux 使用的同步机制可以说�?.0�?.6以来不断发展完善。从最初的原子操作�Q�到后来的信号量�Q�从大内栔R��C��天的自旋锁。这些同步机制的发展伴随 Linux从单处理器到对称多处理器的过度；伴随着从非抢占内核到抢占内核的�q�度。锁机制��来��有效，也越来越复杂�?br>     目前来说内核中原子操作多用来做计��C��用，其它情况最常用的是两种锁以及它们的变种:一个是自旋锁，另一个是信号量。我们下面就来着重介�l�一下这两种锁机制�?br>

自旋�?br> ------------------------------------------------------
    自旋锁是专�ؓ防止多处理器�q�发而引入的一�U�锁�Q�它在内�怸�大量应用于中断处理等部分(对于单处理器来说�Q�防止中断处理中的�ƈ发可��单采用关闭中断的方式�Q? 不需要自旋锁)�?br>     自旋锁最多只能被一个内�怓Q务持有，如果一个内�怓Q务试图请求一个已被争�?已经被持�?的自旋锁�Q�那么这个�Q务就会一直进行忙循环——旋转——等待锁�? 新可用�?/strong>要是锁未被争用，��h��它的内核��d��便能立刻得到它�ƈ且��l�进行。自旋锁可以在�Q何时刻防止多于一个的内核��d��? 时进入��界区�Q�因此这�U�锁可有效地避免多处理器上�ƈ发运行的内核��d��竞争�׃�n资源�?br>     事实上，自旋锁的初衷��是�Q�在短期间内�q�行轻量�U�的锁定。一个被争用的自旋锁使得��h��它的�U�程在等待锁重新可用的期间进行自�?特别��费处理器时�?�Q�所以自旋锁不应该被持有旉��q�长。如果需要长旉��锁定的话, 最好��用信号量�?br> 自旋锁的基本形式如下�Q?br>     spin_lock(&mr_lock);
    //临界�?br>     spin_unlock(&mr_lock);

    因�ؓ自旋锁在同一时刻只能被最多一个内�怓Q务持有，所以一个时��d��有一个线�E�允许存在于临界��Z��。这点很好地满��了对�U�多处理机器需要的锁定服务。在单处理器上，自旋锁仅仅当作一个设�|�内核抢占的开兟뀂如果内核抢占也不存在，那么自旋锁会在编译时被完全剔除出内核�?br>     ��单的��_��自旋锁在内核中主要用来防止多处理器中�q�发讉K��临界区，防止内核抢占造成的竞争�?/strong>另外自旋锁不允许��d��睡眠(持有自旋锁的��d��睡眠会造成自死锁——因为睡眠有可能造成持有锁的内核 ��d��被重新调度，而再�ơ申误��己已持有的锁)�Q?font color="#800000">它能够在中断上下文中使用�?br>     死锁�Q�假设有一个或多个内核��d��和一个或多个资源�Q�每个内栔R��在等待其中的一个资源，但所有的资源都已�l�被占用了。这便会发生所有内�怓Q务都在相互等待，但它们永�q�不会释攑ַ��l�占有的资源�Q�于是�Q何内�怓Q务都无法获得所需要的资源�Q�无法��l�运行，�q�便意味着死锁发生了。自�ȝ��是说自己占有了某个资源，然后自己又申误��己已占有的资源，昄��不可能再获得该资源，因此��p��~�手脚了�?br>

信号�?br> ------------------------------------------------------
    Linux中的信号量是一�U�睡眠锁。如果有一个�Q务试图获得一个已被持有的信号量时�Q�信号量会将其推入等待队列，然后让其睡眠。这时处理器获得自由��L��? 其它代码。当持有信号量的�q�程��信号量释放后，在等待队列中的一个�Q务将被唤醒，从而便可以获得�q�个信号量�?br>     信号量的睡眠�Ҏ��，使得信号量适用于锁会被长时间持有的情况�Q�只能在�q�程上下文中使用�Q�因��Z��断上下文中是不能被调度的�Q�另外当代码持有信号量时�Q�不可以再持有自旋锁�?br>
信号量基本��用�Ş式�ؓ�Q?br> static DECLARE_MUTEX(mr_sem);//声明互斥信号�?br> if(down_interruptible(&mr_sem))
    //可被中断的睡眠，当信��h��刎ͼ�睡眠的�Q务被唤醒
    //临界�?br> up(&mr_sem);

信号量和自旋锁区�?br> ------------------------------------------------------
    虽然听�v来两者之间的使用条�g复杂�Q�其实在实际使用中信号量和自旋锁�q�不易�؜淆。注意以下原�?
    如果代码需要睡眠——这往往是发生在和用��L��间同步时——��用信号量是唯一的选择。由于不受睡眠的限制�Q��用信号量通常来说更加��单一些。如果需要在自旋锁和信号量中作选择�Q�应该取决于锁被持有的时间长短。理��x��冉|��所有的锁都应该��可能短的被持有�Q�但是如果锁的持有时间较长的话，使用信号量是更好的�? 择。另外，信号量不同于自旋锁，它不会关闭内核抢占，所以持有信号量的代码可以被抢占。这意味者信号量不会对媄响调度反应时间带来负面媄响�?br>

自旋锁对信号�?br> ------------------------------------------------------
需�?                    ��的加锁方�?br>
低开销加锁               优先使用自旋�?br> 短期锁定                 优先使用自旋�?br> 长期加锁                 优先使用信号�?br> 中断上下文中加锁          使用自旋�?br> 持有锁是需要睡眠、调�?nbsp;    使用信号�?br>
from:
http://blog.chinaunix.net/u1/38576/showart_367985.html

chatler 2010-04-01 11:50 发表评论

An In-Depth Look into the Win32 Portable Executable File Format

chatler — Thu, 25 Mar 2010 07:17:00 GMT
     摘要: from: http://msdn.microsoft.com/zh-cn/magazine/cc301808%28en-us%29.aspx Download the code for this article: PE.exe (98KB) part1: SUMMARY A good understanding of the Portable Executable (PE) fi...  阅读全文

chatler 2010-03-25 15:17 发表评论

IA-32 保护模式内存��理

chatler — Thu, 25 Mar 2010 07:03:00 GMT

CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT

3.1. MEMORY MANAGEMENT OVERVIEW

The memory management facilities of the IA-32 architecture are divided into two parts: segmentation and paging. Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mechanism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is no mode bit to disable segmentation. The use of paging, however, is optional.

IA-32架构的内存管理机构（facilities�Q�可划分��Z��个部分：分段�Q�segmentation�Q�和分页�Q�paging�Q�。分�D�功能提供了分隔代码、数据和堆栈的机�Ӟ��从而��多个�q�程�q�行在同一个CPU物理地址�I�间内而互不媄响；分页可用来实��C��U?#8220;��h��式�Q�demand-paged�Q?#8221;的虚拟内存机�Ӟ��从而页化程序执行环境，在程序运行时可将所需要的��|��到物理内存。分��|��制也可用作隔��d��q�程��d��。分�D�功能是CPU保护模式必须的，没有讄��位可以屏蔽内存分�D�；不过内存分页则是可选的�?/p>
These two mechanisms (segmentation and paging) can be configured to support simple single-program (or single-task) systems, multitasking systems, or multiple-processor systems that used shared memory.

As shown in Figure 3-1, segmentation provides a mechanism for dividing the processor’s addressable memory space (called the linear address space) into smaller protected address spaces called segments. Segments can be used to hold the code, data, and stack for a program or to hold system data structures (such as a TSS or LDT). If more than one program (or task) is running on a processor, each program can be assigned its own set of segments. The processor then enforces the boundaries between these segments and insures that one program does not interfere with the execution of another program by writing into the other program’s segments.

分段和分��|��制被配置成支持单��d��pȝ��、多��d��pȝ��或多处理器系�l��?/span>

如图3-1�Q�内存分�D�将CPU的可��d��I�间�Q�称为线性地址�I�间�Q�划分更��的受保护的内存�D�，�q�些�D�存攄��序的数据�Q�代码、数据和堆栈�Q�和�pȝ��的数据结�? �Q�像TSS �?LDT�Q�。如果处理器�q�行着多个��d��Q�那么每个�Q务都有一集自��q��立的内存�D�c�?/p>
The segmentation mechanism also allows typing of segments so that the operations that may be performed on a particular type of segment can be restricted.

All the segments in a system are contained in the processor’s linear address space. To locate a byte in a particular segment, a logical address (also called a far pointer) must be provided. A logical address consists of a segment selector and an offset. The segment selector is a unique identifier for a segment. Among other things it provides an offset into a descriptor table (such as the global descriptor table, GDT) to a data structure called a segment descriptor. Each segment has a segment descriptor, which specifies the size of the segment, the access rights and privilege level for the segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment). The offset part of the logical address is added to the base address for the segment to locate a byte within the segment. The base address plus the offset thus forms a linear address in the processor’s linear address space.

�q�程的各个段都必��M��于CPU的线性空间之内，�q�程要访问某�D늚�一个字节，必须�l�出该字�? 的逻辑地址�Q�也叫远指针�Q�。逻辑地址由段选择子（segment selector �Q�和偏移值组成。段选择子是�D늚�唯一标识�Q�指向一个叫�D�|��q�符的数据结构；�D�|��q�符位于一个叫描述表之内（如全局描述表GDT�Q? 每个�D�必��都有相应的�D�|��q�符�Q�用以指定段大小、访问权限和�D늚��Ҏ��U�别�Q�privilege level�Q�、段�c�d��和段的首地址在线性地址�I�间的位�|�（叫段的基地址�Q�。逻辑地址通过基地址加上�D�内偏移得到�?/span>

If paging is not used, the linear address space of the processor is mapped directly into the physical address space of processor. The physical address space is defined as the range of addresses that the processor can generate on its address bus.

Because multitasking computing systems commonly define a linear address space much larger than it is economically feasible to contain all at once in physical memory, some method of “virtualizing” the linear address space is needed. This virtualization of the linear address space is handled through the processor’s paging mechanism.

如果不用分页功能�Q�处理器的[�U�性地址�I�间]��׃��直接映射到[物理地址�I�间]。[物理地址�I�间]的大��就是处理器能通过地址�ȝ��产生的地址范围。�ؓ了直�? 使用�U�性地址�I�间从而简化编�E�和实现多进�E�而提高内存的利用率，需要实现某�U�对�U�性地址�I�间�q�行“虚拟化（virtualizing�Q?#8221;�Q�CPU的分��|�� 制实��C��q�种虚拟化�?/p>
Paging supports a “virtual memory” environment where a large linear address space is simulated with a small amount of physical memory (RAM and ROM) and some disk storage. When using paging, each segment is divided into pages (typically 4 KBytes each in size), which are stored either in physical memory or on the disk. The operating system or executive maintains a page directory and a set of page tables to keep track of the pages. When a program (or task) attempts to access an address location in the linear address space, the processor uses the page directory and page tables to translate the linear address into a physical address and then performs the requested operation (read or write) on the memory location. If the page being accessed is not currently in physical memory, the processor interrupts execution of the program (by generating a page-fault exception). The operating system or executive then reads the page into physical memory from the disk and continues executing the program.

“虚拟内存”��是利用物理内存和磁盘来对CPU的线性地址�q�行模拟�Q�kemin:高��语言源码指定的是�W�号地址�Q�是虚的�Q�有了虚拟内存即便是用汇�~�指定一固定地址也是虚的。问题是�q�些虚存是怎么��理的）。当使用分页�Ӟ��q�程的每个段都会被分成大��固定的��，�q�些��可能在内存中，也可能在��盘。操作系�l�用�? 一张页目录�Q�page directory�Q�和多张��表来管理这些页。当�q�程试图讉K��U�性地址�I�间的某个位�|�，处理器会通过��늛�录和��表先将�U�性地址转换成物理地址�Q�然后再讉K�� Q�读或写�Q�（kemin�Q��{换细节没有讲�Q�。如果被讉K��的页当前不在内存�Q�处理就会中断进�E�的�q�行�Q�通过产生�~�页异常中断�Q�（kemin:怎么判断某页�? 在内存？�Q�。操作系�l�负责从��盘��d��该页�q��l�执行该�q�程�Q�kemin:��读入的前前后后没有�Ԍ��?/p>
When paging is implemented properly in the operating-system or executive, the swapping of pages between physical memory and the disk is transparent to the correct execution of a program. Even programs written for 16-bit IA-32 processors can be paged (transparently) when they are run in virtual-8086 mode.

from:

http://blog.csdn.net/keminlau/archive/2008/10/19/3090337.aspx

chatler 2010-03-25 15:03 发表评论

Context Switch Definition

chatler — Thu, 25 Feb 2010 15:09:00 GMT

A context switch (also sometimes referred to as a process switch or a task switch ) is the switching of the CPU (central processing unit) from one process or thread to another.

A process (also sometimes referred to as a task ) is an executing (i.e., running) instance of a program. In Linux, threads are lightweight processes that can run in parallel and share an address space (i.e., a range of memory locations) and other resources with their parent processes (i.e., the processes that created them).

A context is the contents of a CPU's registers and program counter at any point in time. A register is a small amount of very fast memory inside of a CPU (as opposed to the slower RAM main memory outside of the CPU) that is used to speed the execution of computer programs by providing quick access to commonly used values, generally those in the midst of a calculation. A program counter is a specialized register that indicates the position of the CPU in its instruction sequence and which holds either the address of the instruction being executed or the address of the next instruction to be executed, depending on the specific system.

Context switching can be described in slightly more detail as the kernel operating system) performing the following activities with regard to processes (including threads) on the CPU: (1) suspending the progression of one process and storing the CPU's state (i.e., the context) for that process somewhere in memory, (2) retrieving the context of the next process from memory and restoring it in the CPU's registers and (3) returning to the location indicated by the program counter (i.e., returning to the line of code at which the process was interrupted) in order to resume the process. (i.e., the core of the

A context switch is sometimes described as the kernel suspending execution of one process on the CPU and resuming execution of some other process that had previously been suspended. Although this wording can help clarify the concept, it can be confusing in itself because a process is , by definition, an executing instance of a program. Thus the wording suspending progression of a process might be preferable.

Context Switches and Mode Switches

Context switches can occur only in kernel mode . Kernel mode is a privileged mode of the CPU in which only the kernel runs and which provides access to all memory locations and all other system resources. Other programs, including applications, initially operate in user mode , but they can run portions of the kernel code via system calls . A system call is a request in a Unix-like operating system by an active process (i.e., a process currently progressing in the CPU) for a service performed by the kernel, such as input/output (I/O) or process creation (i.e., creation of a new process). I/O can be defined as any movement of information to or from the combination of the CPU and main memory (i.e. RAM), that is, communication between this combination and the computer's users (e.g., via the keyboard or mouse), its storage devices (e.g., disk or tape drives), or other computers.

The existence of these two modes in Unix-like operating systems means that a similar, but simpler, operation is necessary when a system call causes the CPU to shift to kernel mode. This is referred to as a mode switch rather than a context switch, because it does not change the current process.

Context switching is an essential feature of multitasking operating systems. A multitasking operating system is one in which multiple processes execute on a single CPU seemingly simultaneously and without interfering with each other. This illusion of concurrency is achieved by means of context switches that are occurring in rapid succession (tens or hundreds of times per second). These context switches occur as a result of processes voluntarily relinquishing their time in the CPU or as a result of the scheduler making the switch when a process has used up its CPU time slice .

A context switch can also occur as a result of a hardware interrupt , which is a signal from a hardware device (such as a keyboard, mouse, modem or system clock) to the kernel that an event (e.g., a key press, mouse movement or arrival of data from a network connection) has occurred.

Intel 80386 and higher CPUs contain hardware support for context switches. However, most modern operating systems perform software context switching , which can be used on any CPU, rather than hardware context switching in an attempt to obtain improved performance. Software context switching was first implemented in Linux for Intel-compatible processors with the 2.4 kernel.

One major advantage claimed for software context switching is that, whereas the hardware mechanism saves almost all of the CPU state, software can be more selective and save only that portion that actually needs to be saved and reloaded. However, there is some question as to how important this really is in increasing the efficiency of context switching. Its advocates also claim that software context switching allows for the possibility of improving the switching code, thereby further enhancing efficiency, and that it permits better control over the validity of the data that is being loaded.

The Cost of Context Switching

Context switching is generally computationally intensive. That is, it requires considerable processor time, which can be on the order of nanoseconds for each of the tens or hundreds of switches per second. Thus, context switching represents a substantial cost to the system in terms of CPU time and can, in fact, be the most costly operation on an operating system.

Consequently, a major focus in the design of operating systems has been to avoid unnecessary context switching to the extent possible. However, this has not been easy to accomplish in practice. In fact, although the cost of context switching has been declining when measured in terms of the absolute amount of CPU time consumed, this appears to be due mainly to increases in CPU clock speeds rather than to improvements in the efficiency of context switching itself.

One of the many advantages claimed for Linux as compared with other operating systems, including some other Unix-like systems, is its extremely low cost of context switching and mode switching.

from:
http://blog.csdn.net/wave_1102/archive/2007/09/04/1771745.aspx

chatler 2010-02-25 23:09 发表评论

堆和栈的区别

chatler — Fri, 11 Dec 2009 15:53:00 GMT

堆：　是大家共有的�I�间�Q�分全局堆和局部堆。全局堆就是所有没有分配的�I�间�Q�局部堆��是用户分配的空间。堆在操作系�l�对�q�程初始化的时候分配，�q�行�q�程中也可以向系�l�要额外的堆�Q�但是记得用完了要还�l�操作系�l�，要不然就是内存泄漏�?/p>
栈：是个�U�程独有的，保存其运行状态和局部自动变量的。栈在线�E�开始的时候初始化�Q�每个线�E�的栈互相独立，因此�Q�栈是　thread safe的。每个��E　�Q�＋对象的数据成员也存在在栈中，每个函数都有自己的栈�Q�栈被用来在函数之间传递参数。操作系�l�在切换�U�程的时候会自动的切换栈�Q�就是切换　�Q�I��Q��I�Q�I��寄存器。栈�I�间不需要在高��语言里面昑ּ�的分配和释放�?/p>
堆和栈的区别

一、预备知识—程序的内存分配
一个由c/C++�~�译的程序占用的内存分�ؓ以下几个部分�Q?br>1、栈区（stack�Q��?��q��译器自动分配释放�Q�存攑և�数的参数��|��局部变量的值等。其操作方式�c�M��于数据结构中的栈�?/p>
2、堆区（heap�Q?�?一般由�E�序员分配释放，若程序员不释放，�E�序�l�束时可能由OS回收。注意它与数据结构中的堆是两回事�Q�分配方式倒是�c�M��于链表�?/p>
3、全局区（静态区�Q�（static�Q�—，全局变量和静态变量的存储是放在一块的�Q�初始化的全局变量和静态变量在一块区域，未初始化的全局变量和未初始化的静态变量在盔R��的另一块区域�?- �E�序�l�束后由�pȝ��释放�?/p>
4、文字常量区   —常量字�W�串��是攑֜��q�里的。程序结束后��q��l�释放�?/p>
5、程序代码区—存攑և��C��的二�q�制代码�?/p>
二、例子程�?br>//main.cpp
int a = 0; 全局初始化区
char *p1; 全局未初始化�?br>main()
{
int b; �?br>char s[] = "abc"; �?br>char *p2; �?br>char *p3 = "123456"; 123456\0在常量区�Q�p3在栈上�?br>static int c =0�Q�全局�Q�静态）初始化区
p1 = (char *)malloc(10);
p2 = (char *)malloc(20);
分配得来�?0�?0字节的区域就在堆区�?br>strcpy(p1, "123456"); 123456\0攑֜�帔R��区，�~�译器可能会��它与p3所指向�?123456"优化成一个地斏V�?br>}
二、堆和栈的理论知�?br>2.1甌��方式
stack:
��q��l�自动分配。例如，声明在函��C��一个局部变�?int b; �pȝ��自动在栈中�ؓb开辟空�?br>heap:
需要程序员自己甌��Q��ƈ指明大小�Q�在c中malloc函数
如p1 = (char *)malloc(10);
在C++中用new�q�算�W?br>如p2 = (char *)malloc(10);
但是注意p1、p2本��n是在栈中的�?/p>

2.2
甌��后系�l�的响应
栈：只要栈的剩余�I�间大于所甌��I�间�Q�系�l�将为程序提供内存，否则��报异常提示栈溢出�?br>堆：首先应该知道操作�pȝ��有一个记录空闲内存地址的链表，当系�l�收到程序的甌��Ӟ��会遍历该链表�Q�寻扄��一个空间大于所甌��I�间的堆�l�点�Q�然后将该结点从�I�闲 �l�点链表中删除，�q�将该结点的�I�间分配�l�程序，另外�Q�对于大多数�pȝ��Q�会在这块内存空间中的首地址处记录本�ơ分配的大小�Q�这��P��代码中的delete语句才能正确的释放本内存�I�间。另外，�׃��扑ֈ�的堆�l�点的大��不一定正好等于申��L��大小�Q�系�l�会自动的将多余的那部分重新攑օ��I�闲链表中�?/p>
2.3甌��大小的限�?br>栈：在Windows�?栈是向低地址扩展的数据结构，是一块连�l�的内存的区域。这句话的意思是栈顶的地址和栈的最大容量是�pȝ��预先规定好的�Q�在WINDOWS下，栈的大小�?M�Q�也可能�?M�Q�它是一个编译时��q��定的常数�Q�，如果甌��的空间超�q�栈的剩余空间时�Q�将提示overflow。因此，能从栈获得的�I�间较小
�?br>堆：堆是向高地址扩展的数据结构，是不�q�箋的内存区域。这是由于系�l�是用链表来存储的空闲内存地址的，自然是不�q�箋的，而链表的遍历方向是由低地址向高地址。堆的大��受限于计算机系�l�中有效的虚拟内存。由此可见，堆获得的�I�间比较灉|��Q�也比较大�?/p>

2.4甌��效率的比较：
栈由�pȝ��自动分配�Q�速度较快。但�E�序员是无法控制的�?br>堆是由new分配的内存，一般速度比较慢，而且�Ҏ��产生内存��片,不过用�v来最方便.
另外�Q�在WINDOWS下，最好的方式是用VirtualAlloc分配内存�Q�他不是在堆�Q�也不是在栈是直接在�q�程的地址�I�间中保留一快内存，虽然用�v来最不方�ѝ��但是速度快，也最灉|��?/p>
2.5堆和栈中的存储内�?br>栈：在函数调用时�Q�第一个进栈的是主函数中后的下一条指令（函数调用语句的下一条可执行语句�Q�的地址�Q�然后是函数的各个参敎ͼ�在大多数的C�~�译器中�Q�参数是由右往左入栈的�Q�然后是函数中的局部变量。注意静态变量是不入栈的�?br>当本�ơ函数调用结束后�Q�局部变量先出栈�Q�然后是参数�Q�最后栈��指针指向最开始存的地址�Q�也��是��d��C��的下一条指令，�E�序��p��点��l�运行�?br>堆：一般是在堆的头部用一个字节存攑֠�的大��。堆中的具体内容有程序员安排�?/p>
2.6存取效率的比�?/p>
char s1[] = "aaaaaaaaaaaaaaa";
char *s2 = "bbbbbbbbbbbbbbbbb";
aaaaaaaaaaa是在�q�行时刻赋值的�Q?br>而bbbbbbbbbbb是在�~�译时就��定的；
但是�Q�在以后的存取中�Q�在栈上的数�l�比指针所指向的字�W�串(例如�?快�?br>比如�Q?br>void main()
{
char a = 1;
char c[] = "1234567890";
char *p ="1234567890";
a = c[1];
a = p[1];
return;
}
对应的汇�~�代�?br>10: a = c[1];
00401067 8A 4D F1 mov cl,byte ptr [ebp-0Fh]
0040106A 88 4D FC mov byte ptr [ebp-4],cl
11: a = p[1];
0040106D 8B 55 EC mov edx,dword ptr [ebp-14h]
00401070 8A 42 01 mov al,byte ptr [edx+1]
00401073 88 45 FC mov byte ptr [ebp-4],al
�W�一�U�在��d��时直接就把字�W�串中的元素��d��寄存器cl中，而第二种则要先把指针��D��到edx中，在根�?br>edx��d��字符�Q�显然慢了�?/p>

2.7��结�Q?br>堆和栈的区别可以用如下的比喻来看出：
使用栈就象我们去饭馆里吃饭，只管点菜�Q�发出申��P��、付钱、和吃（使用�Q�，吃饱了就赎ͼ�不必理会切菜、洗菜等准备工作和洗��、刷锅等扫尾工作�Q�他的好处是快捷�Q�但是自由度��?br>使用堆就象是自己动手做喜�Ƣ吃的菜��_��比较�ȝ��Q�但是比较符合自��q��口味�Q�而且自由度大�?br>

下面是另一��，�ȝ��的比上面好：

堆和栈的联系与区别dd

      在bbs上，堆与栈的区分问题�Q�似乎是一个永恒的话题�Q�由此可见，初学者对此往往是�؜淆不清的�Q�所以我军_��拿他�W�一个开刀�?/p>
       首先�Q�我们�D一个例子：

       void f() { int* p=new int[5]; }

       �q�条短短的一句话��包含了堆与栈，看到new�Q�我们首先就应该惛_��Q�我们分配了一块堆内存�Q�那么指针p呢？他分配的是一块栈内存�Q�所以这句话的意思就是：在栈内存中存放了一个指向一块堆内存的指针p。在�E�序会先��定在堆中分配内存的大小�Q�然后调用operator new分配内存�Q�然后返回这块内存的首地址�Q�放入栈中，他在VC6下的汇编代码如下�Q?/p>
       00401028      push           14h

       0040102A      call           operator new (00401060)

       0040102F      add            esp,4

       00401032      mov            dword ptr [ebp-8],eax

       00401035      mov            eax,dword ptr [ebp-8]

       00401038      mov            dword ptr [ebp-4],eax

       �q�里�Q�我们�ؓ了简单�ƈ没有释放内存�Q�那么该怎么去释攑֑��Q�是delete p么？澻I��错了�Q�应该是delete []p�Q�这是�ؓ了告诉编译器�Q�我删除的是一个数�l�，VC6��׃��Ҏ��相应的Cookie信息去进行释攑ֆ�存的工作�?/p>
       好了�Q�我们回到我们的主题�Q�堆和栈�I�竟有什么区别？

       主要的区别由以下几点�Q?/p>
       1、管理方式不同；

       2、空间大��不同；

       3、能否��生碎片不同；

       4、生长方向不同；

       5、分配方式不同；

       6、分配效率不同；

       ��理方式�Q�对于栈来讲�Q�是��q��译器自动��理�Q�无需我们手工控制�Q�对于堆来说�Q�释攑ַ�作由�E�序员控�Ӟ��Ҏ��产生memory leak�?/p>
       �I�间大小�Q�一般来讲在32位系�l�下�Q�堆内存可以辑ֈ�4G的空��_��从这个角度来看堆内存几乎是没有什么限制的。但是对于栈来讲�Q�一般都是有一定的�I�间大小的，例如�Q�在VC6下面�Q�默认的栈空间大��是1M�Q�好像是�Q�记不清楚了�Q�。当�Ӟ��我们可以修改�Q?nbsp;

       打开工程�Q�依�ơ操作菜单如下：Project->Setting->Link�Q�在Category 中选中Output�Q�然后在Reserve中设定堆栈的最大值和commit�?/p>
注意�Q�reserve最��gؓ4Byte�Q�commit是保留在虚拟内存的页文�g里面�Q�它讄��的较大会使栈开辟较大的��|�� 可能增加内存的开销和启动时间�?/p>
       ��片问题�Q�对于堆来讲�Q�频�J�的new/delete势必会造成内存�I�间的不�q�箋�Q�从而造成大量的碎片，使程序效率降低。对于栈来讲�Q�则不会存在�q�个问题�Q? 因�ؓ栈是先进后出的队列，他们是如此的一一对应�Q�以至于永远都不可能有一个内存块从栈中间弹出�Q�在他弹��Z��前，在他上面的后�q�的栈内容已�l�被弹出�Q�详�l�的可以参考数据结构，�q�里我们��׃��再一一讨论了�?/p>
       生长方向�Q�对于堆来讲�Q�生长方向是向上的，也就是向着内存地址增加的方向；对于栈来�Ԍ��它的生长方向是向下的�Q�是向着内存地址减小的方向增�ѝ�?/p>
       分配方式�Q�堆都是动态分配的�Q�没有静态分配的堆。栈�?�U�分配方式：静态分配和动态分配。静态分配是�~�译器完成的�Q�比如局部变量的分配。动态分配由 alloca函数�q�行分配�Q�但是栈的动态分配和堆是不同的，他的动态分配是��q��译器�q�行释放�Q�无需我们手工实现�?/p>
       分配效率�Q�栈是机器系�l�提供的数据�l�构�Q�计��机会在底层�Ҏ��提供支持�Q�分配专门的寄存器存放栈的地址�Q�压栈出栈都有专门的指��o执行�Q�这��决定了栈的效率�? 较高。堆则是C/C++函数库提供的�Q�它的机制是很复杂的�Q�例如�ؓ了分配一块内存，库函��C��按照一定的��法�Q�具体的��法可以参考数据结�?操作�pȝ��Q�在�? 内存中搜索可用的��_��大小的空��_��如果没有��_��大小的空��_��可能是由于内存碎片太多）�Q�就有可能调用系�l�功能去增加�E�序数据�D늚�内存�I�间�Q�这样就有机会分到��够大��的内存�Q�然后进行返回。显�Ӟ��堆的效率比栈要低得多�?/p>
       从这里我们可以看刎ͼ�堆和栈相比，�׃��大量new/delete的��用，�Ҏ��造成大量的内存碎片；�׃��没有专门的系�l�支持，效率很低�Q�由于可能引发用��h�? 和核心态的切换�Q�内存的甌��Q�代价变得更加昂��c��所以栈在程序中是应用最�q�泛的，��q��是函数的调用也利用栈��d��成，函数调用�q�程中的参数�Q�返回地址�Q�EBP和局部变量都采用栈的方式存放。所以，我们推荐大家��量用栈�Q�而不是用堆�?/p>
       虽然栈有如此众多的好处，但是�׃��和堆相比不是那么灉|��Q�有时候分配大量的内存�I�间�Q�还是用堆好一些�?/p>
       无论是堆�q�是栈，都要防止��界现象的发生（除非你是故意使其��界�Q�，因�ؓ��界的结果要么是�E�序崩溃�Q�要么是摧毁�E�序的堆、栈�l�构�Q��生以想不到的�l�果,��? ��是在你的程序运行过�E�中�Q�没有发生上面的问题�Q�你�q�是要小心，说不定什么时候就崩掉�Q�那时候debug可是相当困难�?:)    对了�Q�还有一件事�Q�如果有人把堆栈合�v来说�Q�那它的意思是栈，可不是堆�Q�呵�? 清楚了？

from�Q?br>http://blog.chinaunix.net/u2/76292/showart_1327414.html

http://hi.baidu.com/54wangjun/blog/item/d1b4a74424d5934f510ffedd.html

chatler 2009-12-11 23:53 发表评论

同步、异步、阻塞和非阻塞的概念

chatler — Sat, 05 Dec 2009 15:10:00 GMT
在进行网�l�编�E�时�Q�我们常常见到同步、异步、阻塞和非阻塞四�U�调用方式。这些方式彼此概念�ƈ不好理解。下面是我对�q�些术语的理解�? 同步所谓同步，��是在发��Z��个功能调用时�Q�在没有得到�l�果之前�Q�该调用��׃��q�回。按照这个定义，其实�l�大多数函数都是同步调用�Q�例如sin, isdigit�{�）。但是一般而言�Q�我们在说同步、异步的时候，�Ҏ��那些需要其他部件协作或者需要一定时间完成的��d��。最常见的例子就�?SendMessage。该函数发送一个消息给某个�H�口�Q�在�Ҏ��处理完消息之前，�q�个函数不返回。当�Ҏ��处理完毕以后�Q�该函数才把消息处理函数所�q�回�?LRESULT��D��回给调用者�? 异步异步的概念和同步相对。当一个异步过�E�调用发出后�Q�调用者不能立��d��到结果。实际处理这个调用的部�g在完成后�Q�通过状态、通知和回调来通知调用者。以CAsycSocket�c�Mؓ例（注意�Q�CSocket从CAsyncSocket�z��Q�但是�v功能已经由异步�{化�ؓ同步�Q�，当一个客��L��通过调用 Connect函数发出一个连接请求后�Q�调用者线�E�立��d��以朝下运行。当�q�接真正建立��h��以后�Q�socket底层会发送一个消息通知该对象�? �q�里提到执行部�g和调用者通过三种途径�q�回�l�果�Q�状态、通知和回调。可以��用哪一�U�依赖于执行部�g的实玎ͼ�除非执行部�g提供多种选择�Q�否则不受调用者控制。如果执行部件用状态来通知�Q�那么调用者就需要每隔一定时间检查一�ơ，效率��很低（有些初学多线�E�编�E�的人，��d��Ƣ用一个��@环去��查某个变量的��|��q�其实是一�U�很严重的错误）。如果是使用通知的方式，效率则很高，因�ؓ执行部�g几乎不需要做额外的操作。至于回调函敎ͼ�其实和通知没太多区别�? ��d�� d��调用是指调用�l�果�q�回之前�Q�当前线�E�会被挂赗��函数只有在得到�l�果之后才会�q�回�? 有�h也许会把��d��调用和同步调用等同�v来，实际上他是不同的。对于同步调用来��_��很多时候当前线�E�还是激�zȝ��Q�只是从逻辑上当前函数没有返回而已。例如，我们在CSocket中调用Receive函数�Q�如果缓冲区中没有数据，�q�个函数��׃��一直等待，直到有数据才�q�回。而此�Ӟ��当前�U�程�q�会�l�箋处理各种各样的消息。如果主�H�口和调用函数在同一个线�E�中�Q�除非你在特�D�的界面操作函数中调用，其实�ȝ��面还是应该可以刷新�? socket接收数据的另外一个函数recv则是一个阻塞调用的例子。当socket工作在阻塞模式的时候，如果没有数据的情况下调用该函敎ͼ�则当前线�E�就会被挂�v�Q�直到有数据为止�? 非阻�? 非阻塞和��d��的概�늛�对应�Q�指在不能立��d��到结果之前，该函��C��会阻塞当前线�E�，而会立刻�q�回�? 对象的阻塞模式和��d��函数调用对象是否处于��d��模式和函数是不是��d��调用有很强的相关性，但是�q�不是一一对应的。阻塞对象上可以有非��d��的调用方式，我们可以通过一定的API去轮询状态，在适当的时候调用阻塞函敎ͼ��可以避免阻塞。而对于非��d��对象�Q�调用特�D�的函数也可以进入阻塞调用。函数select��是�q�样的一个例子�?img src ="http://www.shnenglu.com/beautykingdom/aggbug/102631.html" width = "1" height = "1" />

chatler 2009-12-05 23:10 发表评论

中断、DMA、通道

chatler — Tue, 10 Nov 2009 15:34:00 GMT

一、轮询方�?/strong>
对I/O讑֤�的程序轮询的方式�Q�是早期的计��机�pȝ��对I/O讑֤�的一�U�管理方式。它定时对各�U�设备轮��询问一遍有无处理要求。轮��询问之后，有要求的�Q�则加以处理。在处理I/O讑֤�的要求之后，处理��回��l�工作�?br>��管轮询需要时��_��但轮询不比I/O讑֤�的速度要快得多�Q�所以一般不会发生不能及时处理的问题�?
当然�Q�再快的处理机，能处理的输入输出讑֤�的数量也是有一定限度的。而且�Q�程序轮询毕竟占据了CPU相当一部分处理旉��Q�因此程序轮询是一�U�效率较低的方式�Q�在��C��计算机系�l�中已很��应用�?/font>

二、中断方�?
处理器的高速和输入输出讑֤�的低速是一对矛盾，是设备管理要解决的一个重要问题。�ؓ了提高整体效率，减少在程序直接控制方式中CPU之间的数据传送，是很必要的�?
在I/O讑֤�中断方式下，中央处理器与I/O讑֤�之间数据的传输步骤如下：
⑴在某个�q�程需要数据时�Q�发出指令启动输入输��备准备数�?
⑵在�q�程发出指��o启动讑֤�之后�Q�该�q�程攑ּ�处理器，�{�待相关I/O操作完成。此�Ӟ��q�程调度�E�序会调度其他就�l�进�E��用处理器�?
⑶当I/O操作完成�Ӟ��输入输出讑֤�控制器通过中断��h��U�向处理器发��Z��断信��P��处理器收��C��断信号之后，转向预先设计好的中断处理�E�序�Q�对数据传送工作进行相应的处理�?
⑷得��C��数据的进�E�，转入��q�A状态。在随后的某个时刻，�q�程调度�E�序会选中该进�E��l�工作�?
中断方式的优�~�点
I/O讑֤�中断方式使处理器的利用率提高�Q�且能支持多道程序和I/O讑֤�的�ƈ行操作�?
不过�Q�中断方式仍然存在一些问题。首先，��C��计算机系�l�通常配置有各�U�各��L��输入输出讑֤�。如果这些I/O讑֤�都同�q�中断处理方式进行�ƈ行操作，那么中断�ơ数的急剧增加会造成CPU无法响应中断和出现数据丢��q��象�?
其次�Q�如果I/O控制器的数据�~�冲区比较小�Q�在�~�冲��满数据之后将会发生中断。那么，在数据传送过�E�中�Q�发生中断的��Z��较多�Q�这��耗去大量的CPU处理旉��?/font>

三、直接内存存取（DMA�Q�方�?
直接内存存取技术是指，数据在内存与I/O讑֤�间直接进行成块传输�?
DMA技术特�?
DMA有两个技术特征，首先是直接传送，其次是块传送�?
所谓直接传送，卛_��内存与IO讑֤�间传送一个数据块的过�E�中�Q�不需要CPU的�Q何中间干涉，只需要CPU在过�E�开始时向设备发�?#8220;传送块数据”的命令，然后通过中断来得知过�E�是否结束和下次操作是否准备��q�A�?
DMA工作�q�程
⑴当�q�程要求讑֤�输入数据�Ӟ��CPU把准备存放输入数据的内存起始地址以及要传送的字节数分别送入DMA控制器中的内存地址寄存器和传送字节计数器�?br>⑵发出数据传输要求的�q�行�q�入�{�待状态。此时正在执行的CPU指��o被暂时挂赗��进�E�调度程序调度其他进�E�占据CPU�?br>⑶输入设备不断地�H�取CPU工作周期�Q�将数据�~�冲寄存器中的数据源源不断地写入内存�Q�直到所要求的字节全部传送完毕�?br>⑷DMA控制器在传送完所有字节时�Q�通过中断��h��U�发��Z��断信受��CPU在接收到中断信号后，转入中断处理�E�序�q�行后箋处理�?br>�怸�断处理结束后�Q�CPU�q�回到被中断的进�E�中�Q�或切换到新的进�E�上下文环境中，�l�箋执行�?strong>
　　DMA与中断的区别
⑴中断方式是在数据缓冲寄存器满之后发��Z��断，要求CPU�q�行中断处理�Q�而DMA方式则是在所要求传送的数据块全部传送结束时要求CPU �q�行中断处理。这��大大减��了CPU�q�行中断处理的次数�?br>⑵中断方式的数据传送是在中断处理时由CPU控制完成的，而DMA方式则是在DMA控制器的控制下，不经�q�CPU控制完成的。这��排除了CPU因�ƈ行设备过多而来不及处理以及因速度不匹配而造成数据丢失�{�现象�?strong>
　　DMA方式的优�~�点
在DMA方式中，�׃��I/O讑֤�直接同内存发生成块的数据交换�Q�因此I/O效率比较高。由于DMA技术可以提高I/O效率�Q�因此在��C��计算机系�l�中�Q?得到了广泛的应用。许多输入输��备的控制器，特别是块讑֤�的控制器�Q�都支持DMA方式�?br>通过上述分析可以看出�Q�DMA控制器功能的强弱�Q�是军_��DMA效率的关键因素。DMA控制器需要�ؓ每次数据传送做大量的工作，数据传送单位的增大意味着传送次数的减少。另外，DMA方式�H�取了始�l�周期，CPU处理效率降低了，要想��量��地�H�取始终周期�Q�就要设法提高DMA控制器的性能�Q�这样可以较��地影响CPU出理效率�?/font>
四、通道方式
输入/输出通道是一个独立于CPU的，专门��理I/O的处理机�Q�它控制讑֤�与内存直接进行数据交换。它有自��q��通道指��o�Q�这些通道指��o由CPU启动�Q��ƈ在操作结束时向CPU发出中断信号�Q�见�?-3�?br>输入/输出通道控制是一�U�以内存��Z��心，实现讑֤�和内参内直接交换数据的控制方式。在通道方式中，数据的传送方向、存放数据的内存起始地址以及传送的数据块长度等都由通道来进行控制�?br>另外�Q�通道控制方式可以做到一个通道控制多台讑֤�与内存进行数据交换。因而，通道方式�q�一步减��M��CPU的工作负担，增加了计��机�pȝ��的�ƈ行工作程度�?strong>
　　输入/输出通道分类
按照信息交换方式和所�q�接的设备种�c�M��同，通道可以分�ؓ以下三种�c�d��Q?br>⑴字节多路通道
它适用于连接打印机、终端等低速或中速的I/O讑֤�。这�U�通道以字节�ؓ单位交叉工作�Q�当��Z��台设备传送一个字节后�Q�立卌��{��Mؓ另一它设备传送一个字节�?br>⑵选择通道
它适用于连接磁盘、磁带等高速设备。这�U�通道�?#8220;�l�方�?#8221;工作�Q�每�ơ传送一�Ҏ��据，传送速率很高�Q�但在一�D�|��间只能�ؓ一台设备服务。每当一个I/O��h��处理完之后，��选择另一台设备�ƈ为其服务�?br>⑶成�l�多路通道
�q�种通道�l�合了字节多路通道分时工作和选择通道传输速率高的特点�Q�其实质是：寚w��道�E�序采用多道�E�序设计技术，使得与通道�q�接的设备可以�ƈ行工作�?strong>
　　通道工作原理
在通道控制方式中，I/O讑֤�控制器（常简�U�CؓI/O控制器）中没有传送字节计数器和内存地址寄存器，但多了通道讑֤�控制器和指��o执行部�g。CPU只需发出启动指��o�Q�指出通道相应的操作和I/O讑֤��Q�该指��o��可启动通道�q��该通道从内存中调出相应的通道指��o执行�?br>一旦CPU发出启动通道的指令，通道��开始工作。I/O通道控制I/O控制器工作，I/O控制器又控制I/O讑֤�。这��P��一个通道可以�q�接多个I/O控制器，而一个I/O控制器又可以�q�接若干台同�c�d��的外部设备�?strong>
　　通道的连�?/strong>
�׃��通道和控制器的数量一般比讑֤�数量要少�Q�因此，如果�q�接不当�Q�往往会导致出�?#8220;瓉��”。故一般设备的�q�接采用交叉�q�接�Q�这样做的好处是�Q?br>�?nbsp; 提高�pȝ��的可靠性：当某条通�\因控制器或通道故障而断开�Ӟ��可��用其他通�\�?br>�?nbsp; 提高讑֤�的�ƈ行性：对于同一个设备，当与它相�q�的某一条通�\中的控制器或通道被占用时�Q�可以选择另一条空闲通�\�Q�减��了讑֤�因等待通�\所需要花费的旉��?strong>
　　通道处理�?/strong>
通道相当于一个功能单�U�的处理机，它具有自��q��指��o�pȝ��Q�包括读、写、控制、�{�U�R��结束以及空操作�{�指令，�q�可以执行由�q�些指��o�~�写的通道�E�序�?br>通道的运��控刉��件包括：
�?通道地址字（CAW�Q�：记录下一条通道指��o存放的地址�Q�其功能�c�M��于中央处理机的指令寄存器�?br>�?通道命��o字（CCW�Q�：记录正在执行的通道指��o�Q�其作用相当于中央处理机的指令寄存器�?br>�?nbsp; 通道状态字�Q�CSW�Q�：记录通道、控制器、设备的状态，包括I/O传输完成信息、出错信息、重复执行次数等�?strong>
　　通道对主机的讉K��
通道一般需要与��L��׃�n同一个内存，以保存通道�E�序和交换数据。通道讉K��内存采用“周期�H�用”方式�?br>采用通道方式后，输入/输出的执行过�E�如下：
CPU在执行用��L��序时遇到I/O��h��Q�根据用��L��I/O��h��生成通道�E�序�Q�也可以是事先编好的�Q�。放到内存中�Q��ƈ把该通道�E�序首地址攑օ�CAW中�?br>然后�Q�CPU执行“启动I/O”指��o�Q�启动通道工作。通道接收“启动I/O”指��o信号�Q�从CAW中取出通道�E�序首地址�Q��ƈ�Ҏ��此地址取出通道�E�序的第一条指令，攑օ�CCW中；同时向CU发回�{�信��P��通知“启动I/O”指��o完成完毕�Q�CPU可��l�执行�?br>通道开始执行通道�E�序�Q�进行物理I/O操作。当执行完一条指令后�Q�如果还有下一条指令则�l�箋执行�Q�否则表�C�Z��输完成，同时自行停止�Q�通知CPU转去处理通道�l�束事�g�Q��ƈ从CCW中得到有关通道状态�?br>��M��Q�在通道中，I/O�q�用专用的辅助处理器处理I/O操作�Q�从而剪径了��d��理器处理I/O的负担。主处理器只要发��Z��个I/O操作命��o�Q�剩下的工作完全由通道负责。I/O操作�l�束后，I/O通道会发��Z��个中断请求，表示相应操作已完成�?strong>
　　通道的发�?/strong>
通道的思想是从早期的大型计��机�pȝ��中发展�v来的。在早期的大型计��机�pȝ��中，一般配有大量的I/O讑֤�。�ؓ了把对I/O讑֤�的管理从计算��Z��Z��分离出来�Q��Ş成了I/O通道的概念，�q�专门设计出I/O通道处理机�?br>I/O通道在计��机�pȝ��中是一个非帔R��要的部�g�Q�它对系�l�整体性能的提高�v了相当重要的作用。不�q�，随着技术不断的发展�Q�处理机和I/O讑֤�性能的不断提高，专用的、独立I/O通道处理机已不容易见到。但是通道的思想又融入了许多新的技术，所以仍在广泛地应用着。由于光�U�通道技术具有数据传输速率高、数据传输距��远以及可简化大型存储系�l�设计的优点�Q�新的通用光纤通道技术正在快速发展。这�U�通用光纤通道可以在一个通道上容�U�_��?27个的大容量硬盘驱动器。显�Ӟ��在大定w��高速存储应用领域，通用光纤通道有着�q�泛的应用前景�?br>
转自�Q?br>http://blog.chinaunix.net/u2/67780/showart_2063742.html

chatler 2009-11-10 23:34 发表评论

callback function from wikipedia

chatler — Tue, 12 May 2009 15:28:00 GMT

Callback (computer science)

From Wikipedia, the free encyclopedia

Jump to: navigation, search

For a discussion of callback with computer modems, see callback (telecommunications).

In computer programming, a callback is executable code that is passed as an argument to other code. It allows a lower-level software layer to call a subroutine (or function) defined in a higher-level layer.

A callback is often back on the level of the original caller.

However, while technically accurate, this might not be the most illustrative explanation. Think of it as an "In case of fire, break glass" subroutine. Many computer programs tend to be written such that they expect a certain set of possibilities at any given moment. If "Thing That Was Expected", then "Do something", otherwise, "Do something else." is a common theme. However, there are many situations in which events (such as fire) could happen at any time. Rather than checking for them at each possible step ("Thing that was expected OR Things are on fire"), it is easier to have a system which detects a number of events, and will call the appropriate function upon said event (this also keeps us from having to write programs like "Thing that was expected OR Things are on fire OR Nuclear meltdown OR alien invasion OR the dead rising from the grave OR...etc., etc.) Instead, a callback routine is a sort of insurance policy. If zombies attack, call this function. If the user moves their mouse over an icon, call HighlightIcon, and so forth.

Usually, there is a framework in which a series of events (some condition is met) in which the running framework (be it a generic library or unique to the program) will call a registered chunk of code based on some pre-registered function (typically, a handle or a function pointer) The events may be anything from user input (such as mouse or keyboard input), network activity (callbacks are frequently used as message handlers for new network sessions) or an internal operating system event (such as a POSIX-style signal) The concept is to develop a piece of code that can be registered within some framework (be it a GUI toolkit, network library, etc.) that will serve as the handler upon the condition stated at registration. How the flow of control is passed between the underlying framework and the registered callback function is specific to the framework itself.

In another common scenario, the callback is first registered and later called asynchronously.

Contents
[hide]

1 Motivation

2 Example

3 Implementation

4 Special cases

5 See also

6 External links

7 References

[edit] Motivation

To understand the motivation for using callbacks, consider the problem of a network server. At any given point in time, it may have an internal state machine that is currently at a point in which it is dealing with one very specific communication session, not necessarily expecting new participants. As a host, it could be dealing with all the name exchange and handshakes and pleasantries, but no real way of dealing with the next dinner party guest that walks through the door. One way to deal with this is for this server to live by a state machine in which it rejects new connections until the current one is dealt with...not very robust (What if the other end goes away unexpectedly?) and not very scalable (Would you really want to make other clients wait (or more likely, keep retrying to connect) until it's their turn?) Instead, it's easier to have some sort of management process that spins off a new thread (or process) to deal with the new connection. Rather than writing programs that keep dealing with all of the possible resource contention problems that could come of this, or all of the details involved in socket code (your desired platform may be more straight-forward than others, but one of your design goals may be cross-platform compatibility), many have opted to use more generic frameworks that will handle such details in exchange for providing a reference such that the underlying framework can call it if the registered event occurs.

[edit] Example

The following code in C demonstrates the use of callbacks for the specific case of dealing with a POSIX-style signal (in this case SIGUSR1).

#include #include void * sig(int signum) { printf("Received signal number %d!\n", signum); } int main(int argc, char *argv[]) { signal(SIGUSR1,&sig); while(1){}; return 0; }

The while loop will keep this example from doing anything interesting, but it will give you plenty of time to send a signal to this process. (If you're on a unix-like system, try a "kill -USR1 " to the process ID associated with this sample program. No matter how or when you send it, the callback should respond.)

[edit] Implementation

The form of a callback varies among programming languages.

C and C++ allow function pointers as arguments to other functions.

Several programming languages (though especially functional programming languages such as Scheme or ML) allow closures, a generalization of function pointers, as arguments to other functions.

Several programming languages, especially interpreted languages, allow one to pass the name of a function A as a parameter to a function B and have B call A by means of eval.

In object-oriented programming languages, a call can accept an object that implements some abstract interface, without specifying in detail how the object should do so. The programmer who implements that object may use the interface's methods exclusively for application-specific code. Such objects are effectively a bundle of callbacks, plus the data they need to manipulate. They are useful in implementing various design patterns like Visitor, Observer, and Strategy.

C++ allows objects to provide their own implementation of the function call operation. The Standard Template Library accepts these objects (called functors), as well as function pointers, as parameters to various polymorphic algorithms

C# .NET Framework provides a type-safe encapsulating reference, a 'delegate', to manage function pointers. These can be used for callback operations.

Perl supports subroutine references.^[1]^[2]

Some systems have built-in programming languages to support extension and adaptation. These languages provide callbacks without the need for separate software development tools.

[edit] Special cases

Callback functions are also frequently used as a means to handle exceptions arising within the low level function, as a way to enable side-effects in response to some condition, or as a way to gather operational statistics in the course of a larger computation. Interrupt handlers in an operating system respond to hardware conditions, signal handlers of a process are triggered by the operating system, and event handlers process the asynchronous input a program receives.

A pure callback function is one which is purely functional (always returns the same value given the same inputs) and free of observable side-effects. Some uses of callbacks require pure callback functions to operate correctly.

A special case of a callback is called a predicate callback, or just predicate for short. This is a pure callback function which accepts a single input value and returns a Boolean value. These types of callbacks are useful for filtering collections of values by some condition.

[edit] See also

Signals and slots

libsigc++, a callback library for C++

Implicit invocation

User exit

Inversion of control

[edit] External links

Style Case Study #2: Generic Callbacks

C++ Callback Solution

Basic Instincts: Implementing Callback Notifications Using Delegates

Implement Script Callback Framework in ASP.NET

Implement callback routines in Java

[edit] References

^ "Perl Cookbook - 11.4. Taking References to Functions". http://www.unix.org.ua/orelly/perl/cookbook/ch11_05.htm. Retrieved on 2008-03-03.

^ "Advanced Perl Programming - 4.2 Using Subroutine References". http://www.unix.org.ua/orelly/perl/advprog/ch04_02.htm. Retrieved on 2008-03-03.

Retrieved from "http://en.wikipedia.org/wiki/Callback_(computer_science)"

Categories: Articles with example C code | Control flow

chatler 2009-05-12 23:28 发表评论

Critical Section

chatler — Mon, 11 May 2009 03:48:00 GMT

Critical section

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In concurrent programming a critical section is a piece of code that accesses a shared resource (data structure or device) that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will only have to wait a fixed time to enter it (i.e. bounded waiting). Some synchronization mechanism is required at the entry and exit of the critical section to ensure exclusive use, for example a semaphore.

By carefully controlling which variables are modified inside and outside the critical section (usually, by accessing important state only from within), concurrent access to that state is prevented. A critical section is typically used when a multithreaded program must update multiple related variables without a separate thread making conflicting changes to that data. In a related situation, a critical section may be used to ensure a shared resource, for example a printer, can only be accessed by one process at a time.

How critical sections are implemented varies among operating systems.

The simplest method is to prevent any change of processor control inside the critical section. On uni-processor systems, this can be done by disabling interrupts on entry into the critical section, avoiding system calls that can cause a context switch while inside the section and restoring interrupts to their previous state on exit. Any thread of execution entering any critical section anywhere in the system will, with this implementation, prevent any other thread, including an interrupt, from getting the CPU and therefore from entering any other critical section or, indeed, any code whatsoever, until the original thread leaves its critical section.

This brute-force approach can be improved upon by using semaphores. To enter a critical section, a thread must obtain a semaphore, which it releases on leaving the section. Other threads are prevented from entering the critical section at the same time as the original thread, but are free to gain control of the CPU and execute other code, including other critical sections that are protected by different semaphores.

Some confusion exists in the literature about the relationship between different critical sections in the same program.^{[citation needed]} In general, a resource that must be protected from concurrent access may be accessed by several pieces of code. Each piece must be guarded by a common semaphore. Is each piece now a critical section or are all the pieces guarded by the same semaphore in aggregate a single critical section? This confusion is evident in definitions of a critical section such as "... a piece of code that can only be executed by one process or thread at a time". This only works if all access to a protected resource is contained in one "piece of code", which requires either the definition of a piece of code or the code itself to be somewhat contrived.

Contents
[hide]

1 Application Level Critical Sections

2 Kernel Level Critical Sections

3 See also

4 External links

[edit] Application Level Critical Sections

Application-level critical sections reside in the memory range of the process and are usually modifiable by the process itself. This is called a user-space object because the program run by the user (as opposed to the kernel) can modify and interact with the object. However the functions called may jump to kernel-space code to register the user-space object with the kernel.

Example Code For Critical Sections with POSIX pthread library

/* Sample C/C++, Unix/Linux */
#include

/* This is the critical section object (statically allocated). */
static pthread_mutex_t cs_mutex = PTHREAD_MUTEX_INITIALIZER;

void f()
{
/* Enter the critical section -- other threads are locked out */
pthread_mutex_lock( &cs_mutex );

/* Do some thread-safe processing! */

/*Leave the critical section -- other threads can now pthread_mutex_lock() */
pthread_mutex_unlock( &cs_mutex );
}

Example Code For Critical Sections with Win32 API

/* Sample C/C++, Windows, link to kernel32.dll */
#include

static CRITICAL_SECTION cs; /* This is the critical section object -- once initialized,
it cannot be moved in memory */
/* If you program in OOP, declare this in your class */

/* Initialize the critical section before entering multi-threaded context. */
InitializeCriticalSection(&cs);

void f()
{
/* Enter the critical section -- other threads are locked out */
EnterCriticalSection(&cs);

/* Do some thread-safe processing! */

/* Leave the critical section -- other threads can now EnterCriticalSection() */
LeaveCriticalSection(&cs);
}

/* Release system object when all finished -- usually at the end of the cleanup code */
DeleteCriticalSection(&cs);

Note that on Windows NT (not 9x/ME), the function TryEnterCriticalSection() can be used to attempt to enter the critical section. This function returns immediately so that the thread can do other things if it fails to enter the critical section (usually due to another thread having locked it). With the pthreads library, the equivalent function is pthread_mutex_trylock(). Note that the use of a CriticalSection is not the same as a Win32 Mutex, which is an object used for inter-process synchronization. A Win32 CriticalSection is for intra-process synchronization (and is much faster as far as lock times), however it cannot be shared across processes.

[edit] Kernel Level Critical Sections

This section does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (July 2007)

Typically, critical sections prevent process and thread migration between processors and the preemption of processes and threads by interrupts and other processes and threads.

Critical sections often allow nesting. Nesting allows multiple critical sections to be entered and exited at little cost.

If the scheduler interrupts the current process or thread in a critical section, the scheduler will either allow the process or thread to run to completion of the critical section, or it will schedule the process or thread for another complete quantum. The scheduler will not migrate the process or thread to another processor, and it will not schedule another process or thread to run while the current process or thread is in a critical section.

Similarly, if an interrupt occurs in a critical section, the interrupt's information is recorded for future processing, and execution is returned to the process or thread in the critical section. Once the critical section is exited, and in some cases the scheduled quantum completes, the pending interrupt will be executed.

Since critical sections may execute only on the processor on which they are entered, synchronization is only required within the executing processor. This allows critical sections to be entered and exited at almost zero cost. No interprocessor synchronization is required, only instruction stream synchronization. Most processors provide the required amount of synchronization by the simple act of interrupting the current execution state. This allows critical sections in most cases to be nothing more than a per processor count of critical sections entered.

Performance enhancements include executing pending interrupts at the exit of all critical sections and allowing the scheduler to run at the exit of all critical sections. Furthermore, pending interrupts may be transferred to other processors for execution.

Critical sections should not be used as a long-lived locking primitive. They should be short enough that the critical section will be entered, executed, and exited without any interrupts occurring, from neither hardware much less the scheduler.

Kernel Level Critical Sections are the base of the software lockout issue.

[edit] See also

Lock (computer science)

[edit] External links

Critical Section documentation on the MSDN Library homepage: http://msdn2.microsoft.com/en-us/library/ms682530.aspx

chatler 2009-05-11 11:48 发表评论

�q�程和线�E�编�E?lt;�?gt;

chatler — Thu, 26 Mar 2009 07:56:00 GMT
     摘要: http://man.lupaworld.com/content/develop/joyfire/system/11.html#I255   �q�程和线�E�编�E? �?�? �q�程和线�E�编�E? 原始��道 ...  阅读全文

chatler 2009-03-26 15:56 发表评论

OS FAQ

chatler — Sun, 15 Mar 2009 05:20:00 GMT

V6:::::::::

1.3 What is the main advantage of multiprogramming?
Answer: Multiprogramming makes efficient use of the CPU by overlapping the demands for the CPU and its I/O devices from various users. It attempts to increase CPU utilization by always having something for the CPU to execute.

1.5 In a multiprogramming and time-sharing environment, several users share the system simultaneously. This situation can result in various security problems.
a. What are two such problems?
b. Can we ensure the same degree of security in a time-shared machine as we have in a
dedicated machine? Explain your answer.
Answer:
a. Stealing or copying one’s programs or data; using system resources (CPU, memory, disk space, peripherals) without proper accounting.
b. Probably not, since any protection scheme devised by humans can inevitably be broken by a human, and the more complex the scheme, the more difficult it is to feel
confident of its correct implementation.

1.9 Describe the differences between symmetric and asymmetric multiprocessing. What are three advantages and one disadvantage of multiprocessor systems?
Answer: Symmetric multiprocessing treats all processors as equals, and I/O can be processed on any CPU. Asymmetric multiprocessing has one master CPU and the remainder CPUs are slaves. The master distributes tasks among the slaves, and I/O is usually done by themaster only. Multiprocessors can savemoney by not duplicating power supplies, housings, and peripherals. They can execute programs more quickly and can have increased reliability. They are also more complex in both hardware and software than uniprocessor systems.

1.10 What is the main difficulty that a programmer must overcome in writing an operating system for a real-time environment?
Answer: The main difficulty is keeping the operating system within the fixed time constraints of a real-time system. If the system does not complete a task in a certain time
frame, it may cause a breakdown of the entire system it is running. Therefore when writing an operating system for a real-time system, the writer must be sure that his scheduling schemes don’t allow response time to exceed the time constraint.

2.1 Prefetching is a method of overlapping the I/O of a job with that job’s own computation.
The idea is simple. After a read operation completes and the job is about to start operating on the data, the input device is instructed to begin the next read immediately. The CPU and input device are then both busy. With luck, by the time the job is ready for the next data item, the input device will have finished reading that data item. The CPU can then begin processing the newly read data, while the input device starts to read the following data.
A similar idea can be used for output. In this case, the job creates data that are put into a buffer until an output device can accept them. Compare the prefetching scheme with the spooling scheme, where the CPU overlaps the input of one job with the computation and output of other jobs.
Answer: Prefetching is a user-based activity, while spooling is a system-based activity.
Spooling is a much more effective way of overlapping I/O and CPU operations.

2.3 What are the differences between a trap and an interrupt? What is the use of each function?
   An interrupt is a hardware-generated change-of-flow within the system. An interrupt handler is summoned to deal with the cause of the interrupt; control is then re-turned to the interrupted context and instruction.
   A trap is a software-generated interrupt.
   An interrupt can be used to signal the completion of an I/O to obviate the need for device polling.
   A trap can be used to call operating system routines or to catch arithmetic errors.

V7::::::::
19.3 The Linux 2.6 kernel can be built with no virtual memory system. Explain how this feature may appeal to designers of real-time systems.
Answer: By disabling the virtual memory system, processes are guaranteed to have portions of its address space resident in physical memory.
This results in a system that does not suffer from page faults and therefore does not have to deal with unanticipated costs corresponding to paging the address space.
The resulting system is appealing to designers of real-time systems who prefer to avoid variability in performance.

chatler 2009-03-15 13:20 发表评论

chatler — Fri, 26 Sep 2008 01:07:00 GMT

应用场景�Q?br>        做了一个client�Q�去和Message Middleware通信�Q�实时获取消息中间�g以topic方式(不是Queue�Q�对Message Middleware来说�Q�Queue是发送一个destination�Q�topic可以发多�?�?/p>
        从实时获取的角度来说�Q�需要启一个线�E�，接收Message Middleware消息�Q�然后做场景需要的处理。创建线�E�的函数如下所�C�：
// for compilers which have it, we should use C RTL function for thread
// creation instead of Win32 API one because otherwise we will have memory
// leaks if the thread uses C RTL (and most threads do)
#if defined(__VISUALC__) || \
    (defined(__BORLANDC__) && (__BORLANDC__ >= 0x500)) || \
    (defined(__GNUG__) && defined(__MSVCRT__))
    typedef unsigned (__stdcall *RtlThreadStart)(void *);

    m_hThread = (HANDLE)_beginthreadex(NULL, 0,
                                       (RtlThreadStart)
                                       wxThreadInternal::WinThreadStart,
                                       thread, CREATE_SUSPENDED,
                                       (unsigned int *)&m_tid);
#else // compiler doesn't have _beginthreadex
    m_hThread = ::CreateThread
                  (
                    NULL,                               // default security
                    0,                                  // default stack size
                    (LPTHREAD_START_ROUTINE)            // thread entry point
                    wxThreadInternal::WinThreadStart,   // the function that runs under thread
                    (LPVOID)thread,                     // parameter
                    CREATE_SUSPENDED,                   // flags
                    &m_tid                              // [out] thread id
                  );
#endif // _beginthreadex/CreateThread
note: there should be a function definition before these lines.eg:
DWORD wxThreadInternal::WinThreadStart(wxThread *thread)

chatler 2008-09-26 09:07 发表评论

亚洲夜晚福利在线观看,亚洲一区中文,欧美高清视频在线观看

死锁和活�?deadlock and livelock

�?C/C++ ��目构徏您自��q��内存���理�?lt;forward>

TCMalloc : Thread-Caching Malloc

from:

http://goog-perftools.sourceforge.net/doc/tcmalloc.html

TCMalloc : Thread-Caching Malloc

Motivation

Usage

Overview

Small Object Allocation

Large Object Allocation

Spans

Deallocation

Central Free Lists for Small Objects

Garbage Collection of Thread Caches

Performance Notes

PTMalloc2 unittest

Caveats

How does the DMA work

有关异步��d��、通信

semaphore and spinlock

An In-Depth Look into the Win32 Portable Executable File Format

IA-32 保护模式内存���理

CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT

3.1. MEMORY MANAGEMENT OVERVIEW

Context Switch Definition

堆和栈的区别

同步、异步、阻塞和非阻塞的概念

中断、DMA、通道

callback function from wikipedia

Callback (computer science)

From Wikipedia, the free encyclopedia

Contents

[edit] Motivation

[edit] Example

[edit] Implementation

[edit] Special cases

[edit] See also

[edit] External links

[edit] References

Critical Section

Critical section

From Wikipedia, the free encyclopedia

Contents

[edit] Application Level Critical Sections

[edit] Kernel Level Critical Sections

[edit] See also

[edit] External links

�q�程和线�E�编�E?lt;�?gt;

OS FAQ

�?C/C++ ��目构徏您自��q��内存��理�?lt;forward>

IA-32 保护模式内存��理