锘??xml version="1.0" encoding="utf-8" standalone="yes"?> TCMalloc also reduces lock contention for multi-threaded programs. For small objects, there is virtually zero contention. For large objects, TCMalloc tries to use fine grained and efficient spinlocks. ptmalloc2 also reduces lock contention by using per-thread arenas but there is a big problem with ptmalloc2's use of per-thread arenas. In ptmalloc2 memory can never move from one arena to another. This can lead to huge amounts of wasted space. For example, in one Google application, the first phase would allocate approximately 300MB of memory for its data structures. When the first phase finished, a second phase would be started in the same address space. If this second phase was assigned a different arena than the one used by the first phase, this phase would not reuse any of the memory left after the first phase and would add another 300MB to the address space. Similar memory blowup problems were also noticed in other applications. Another benefit of TCMalloc is space-efficient representation of small objects. For example, N 8-byte objects can be allocated while using space approximately To use TCmalloc, just link tcmalloc into your application via the "-ltcmalloc" linker flag. You can use tcmalloc in applications you didn't compile yourself, by using LD_PRELOAD: LD_PRELOAD is tricky, and we don't necessarily recommend this mode of usage. TCMalloc includes a heap checker and heap profiler as well. If you'd rather link in a version of TCMalloc that does not include the heap profiler and checker (perhaps to reduce binary size for a static binary), you can link in TCMalloc treates objects with size <= 32K ("small" objects) differently from larger objects. Large objects are allocated directly from the central heap using a page-level allocator (a page is a 4K aligned region of memory). I.e., a large object is always page-aligned and occupies an integral number of pages. A run of pages can be carved up into a sequence of small objects, each equally sized. For example a run of one page (4K) can be carved up into 32 objects of size 128 bytes each. A thread cache contains a singly linked list of free objects per size-class. If the free list is empty: (1) We fetch a bunch of objects from a central free list for this size-class (the central free list is shared by all threads). (2) Place them in the thread-local free list. (3) Return one of the newly fetched objects to the applications. If the central free list is also empty: (1) We allocate a run of pages from the central page allocator. (2) Split the run into a set of objects of this size-class. (3) Place the new objects on the central free list. (4) As before, move some of these objects to the thread-local free list. An allocation for If an allocation for A central array indexed by page number can be used to find the span to which a page belongs. For example, span a below occupies 2 pages, span b occupies 1 page, span c occupies 5 pages and span d occupies 3 pages. If the object is large, the span tells us the range of pages covered by the object. Suppose this range is An object is allocated from a central free list by removing the first entry from the linked list of some span. (If all spans have empty linked lists, a suitably sized span is first allocated from the central page heap.) An object is returned to a central free list by adding it to the linked list of its containing span. If the linked list length now equals the total number of small objects in the span, this span is now completely free and is returned to the page heap. We walk over all free lists in the cache and move some number of objects from the free list to the corresponding central list. The number of objects to be moved from a free list is determined using a per-list low-water-mark t-test1 (included in google-perftools/tests/tcmalloc, and compiled as ptmalloc_unittest1) was run with a varying numbers of threads (1-20) and maximum allocation sizes (64 bytes - 32Kbytes). These tests were run on a 2.4GHz dual Xeon system with hyper-threading enabled, using Linux glibc-2.3.2 from RedHat 9, with one million operations per thread in each test. In each case, the test was run once normally, and once with LD_PRELOAD=libtcmalloc.so. The graphs below show the performance of TCMalloc vs PTMalloc2 for several different metrics. Firstly, total operations (millions) per elapsed second vs max allocation size, for varying numbers of threads. The raw data used to generate these graphs (the output of the "time" utility) is available in t-test1.times.txt.
]]>涓銆佹椿閿?
濡傛灉浜嬪姟T1灝侀攣浜?jiǎn)鏁版嵁R錛屼簨鍔2鍙堣姹傚皝閿丷錛屼簬鏄疶2絳夊緟銆俆3涔熻姹傚皝閿丷錛?/pre>
]]>http://goog-perftools.sourceforge.net/doc/tcmalloc.html
TCMalloc : Thread-Caching Malloc
Sanjay Ghemawat, Paul Menage <opensource@google.com>
Motivation
TCMalloc is faster than the glibc 2.3 malloc (available as a separate library called ptmalloc2) and other mallocs that I have tested. ptmalloc2 takes approximately 300 nanoseconds to execute a malloc/free pair on a 2.8 GHz P4 (for small objects). The TCMalloc implementation takes approximately 50 nanoseconds for the same operation pair. Speed is important for a malloc implementation because if malloc is not fast enough, application writers are inclined to write their own custom free lists on top of malloc. This can lead to extra complexity, and more memory usage unless the application writer is very careful to appropriately size the free lists and scavenge idle objects out of the free list
8N * 1.01
bytes. I.e., a one-percent space overhead. ptmalloc2 uses a four-byte header for each object and (I think) rounds up the size to a multiple of 8 bytes and ends up using 16N
bytes.Usage
$ LD_PRELOAD="/usr/lib/libtcmalloc.so"
libtcmalloc_minimal
instead.Overview
TCMalloc assigns each thread a thread-local cache. Small allocations are satisfied from the thread-local cache. Objects are moved from central data structures into a thread-local cache as needed, and periodic garbage collections are used to migrate memory back from a thread-local cache into the central data structures.
Small Object Allocation
Each small object size maps to one of approximately 170 allocatable size-classes. For example, all allocations in the range 961 to 1024 bytes are rounded up to 1024. The size-classes are spaced so that small sizes are separated by 8 bytes, larger sizes by 16 bytes, even larger sizes by 32 bytes, and so forth. The maximal spacing (for sizes >= ~2K) is 256 bytes.
Large Object Allocation
A large object size (> 32K) is rounded up to a page size (4K) and is handled by a central page heap. The central page heap is again an array of free lists. For i < 256
, the k
th entry is a free list of runs that consist of k
pages. The 256
th entry is a free list of runs that have length >= 256
pages:
k
pages is satisfied by looking in the k
th free list. If that free list is empty, we look in the next free list, and so forth. Eventually, we look in the last free list if necessary. If that fails, we fetch memory from the system (using sbrk, mmap, or by mapping in portions of /dev/mem).k
pages is satisfied by a run of pages of length > k
, the remainder of the run is re-inserted back into the appropriate free list in the page heap.Spans
The heap managed by TCMalloc consists of a set of pages. A run of contiguous pages is represented by a Span
object. A span can either be allocated, or free. If free, the span is one of the entries in a page heap linked-list. If allocated, it is either a large object that has been handed off to the application, or a run of pages that have been split up into a sequence of small objects. If split into small objects, the size-class of the objects is recorded in the span.
Deallocation
When an object is deallocated, we compute its page number and look it up in the central array to find the corresponding span object. The span tells us whether or not the object is small, and its size-class if it is small. If the object is small, we insert it into the appropriate free list in the current thread's thread cache. If the thread cache now exceeds a predetermined size (2MB by default), we run a garbage collector that moves unused objects from the thread cache into central free lists.
[p,q]
. We also lookup the spans for pages p-1
andq+1
. If either of these neighboring spans are free, we coalesce them with the [p,q]
span. The resulting span is inserted into the appropriate free list in the page heap.Central Free Lists for Small Objects
As mentioned before, we keep a central free list for each size-class. Each central free list is organized as a two-level data structure: a set of spans, and a linked list of free objects per span.
Garbage Collection of Thread Caches
A thread cache is garbage collected when the combined size of all objects in the cache exceeds 2MB. The garbage collection threshold is automatically decreased as the number of threads increases so that we don't waste an inordinate amount of memory in a program with lots of threads.
L
. L
records the minimum length of the list since the last garbage collection. Note that we could have shortened the list by L
objects at the last garbage collection without requiring any extra accesses to the central list. We use this past history as a predictor of future accesses and move L/2
objects from the thread cache free list to the corresponding central free list. This algorithm has the nice property that if a thread stops using a particular size, all objects of that size will quickly move from the thread cache to the central free list where they can be used by other threads.Performance Notes
PTMalloc2 unittest
The PTMalloc2 package (now part of glibc) contains a unittest program t-test1.c. This forks a number of threads and performs a series of allocations and deallocations in each thread; the threads do not communicate other than by synchronization in the memory allocator.
Next, operations (millions) per second of CPU time vs number of threads, for max allocation size 64 bytes - 128 Kbytes.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Here we see again that TCMalloc is both more consistent and more efficient than PTMalloc2. For max allocation sizes <32K, TCMalloc typically achieves ~2-2.5 million ops per second of CPU time with a large number of threads, whereas PTMalloc achieves generally 0.5-1 million ops per second of CPU time, with a lot of cases achieving much less than this figure. Above 32K max allocation size, TCMalloc drops to 1-1.5 million ops per second of CPU time, and PTMalloc drops almost to zero for large numbers of threads (i.e. with PTMalloc, lots of CPU time is being burned spinning waiting for locks in the heavily multi-threaded case).
For some systems, TCMalloc may not work correctly on with applications that aren't linked against libpthread.so (or the equivalent on your OS). It should work on Linux using glibc 2.3, but other OS/libc combinations have not been tested.
TCMalloc may be somewhat more memory hungry than other mallocs, though it tends not to have the huge blowups that can happen with other mallocs. In particular, at startup TCMalloc allocates approximately 6 MB of memory. It would be easy to roll a specialized version that trades a little bit of speed for more space efficiency.
TCMalloc currently does not return any memory to the system.
Don't try to load TCMalloc into a running binary (e.g., using JNI in Java programs). The binary will have allocated some objects using the system malloc, and may try to pass them to TCMalloc for deallocation. TCMalloc will not be able to handle such objects.
The DMA is another two chips on your motherboard (usually is an Intel 8237A-5 chips) that allow you (the programmer) to offload data transfers between I/O boards. DMA actually stands for 'Direct Memory Access'.
DMA can work: memory->I/O, I/O->memory. The memory->memory transfer doesn't work. It doesn't matter because ISA DMA is slow as hell and thus is unusable. Futhermore, using DMA for zeroing out memory would massacre the contents of memory caches.
What about caches and DMA? L1 and L2 caches work absolutely transparently. When DMA writes to memory, caches autmatically load or least invalidate the data that go into the memory. When DMA reads memory, caches supply the unwritten bytes so not old but new values are tranferred to the peripheral.
There are signals DACK
In the other direction, everything is the same, but first the byte/word is fetched from the memory and then DACK is generated and the peripheral takes the data.
DMA controller has only 8-bit address counter inside. There is external ALS573 counter for each chip so it makes programmer see it as DMA controller had 16 bits of address counter per channel inside. There are more 8 bits of address per channel of so called page register in LS612 that unfortunately do not increment as those in ALS573. All these 24 bits can address 16777216 of distict addresses.
Recapitulation: for each channel, independently, you see 16 bits of auto-incrementing counter, and 8 bits of page register which doesn't increment.
The difference between 16-bit DMA channels and 8-bit DMA channels is that the address bits for 16-bit channels are wired one bit left to the address bus so every address is 2 times bigger. The lowest bit is 0. The highest bit of page register would fit into bit 24 which is not on ISA so that it is left unconnected. The bus control logic is wired for 16-bit channels in a manner every single DMA transfer, a 16-bit cycle is generated, so ISA device puts 16 bits onto the bus at the time. I don't know what happens if you use 16-bit DMA channel with XT peripheral. I guess it could work but only be slower.
8-bit DMA: increments by 1, cycles inside 65536 bytes, addresses 16MB, moves 8 bits a time.
16-bit DMA: increments by 2, goes only over even addresses, cycles inside 131072 bytes, addresses 16MB, moves 16 bits a time. Uses 16-bit ISA I/O cycle so it takes less ticks to make one move that the 8-bit DMA.
An example of DMA usage would be the Sound Blaster's ability to play samples in the background. The CPU sets up the sound card and the DMA. When the DMA is told to 'go', it simply shovels the data from RAM to the card. Since this is done off-CPU, the CPU can do other things while the data is being transferred.
Enough basics. Here's how you program the DMA chip.
When you want to start a DMA transfer, you need to know several things:
Restriction #1 is rather easy to get around. Simply transfer the first block, and when the transfer is done, send the next block.
For those of you not familiar with pages, I'll try to explain.
Picture the first 16MB region of memory in your system. It is divided into 256 pages of 64K or 128 pages of 128K. Every page starts at a multiple of 65536 or 131072. They are numbered from 0 to 255 or from 0 to 127.
In plain English, the page is the highest 8 bits or 7 bits of the absolute 24 bit address of our memory location. The offset is the lower 16 or 17 bits of the absolute 24 bit address.
Now that we know where our data is, we need to find the length.
The DMA has a little quirk on length. The true length sent to the DMA is actually length + 1. So if you send a zero length to the DMA, it actually transfers one byte or word, whereas if you send 0xFFFF, it transfers 64K or 128K. I guess they made it this way because it would be pretty senseless to program the DMA to do nothing (a length of zero), and in doing it this way, it allowed a full 64K or 128K span of data to be transferred.
Now that you know what to send to the DMA, how do you actually start it? This enters us into the different DMA channels.
The following chart will describe each channel and it's corresponding port number:
DMA Channel | Page | Address | Count |
0 | 87h | 0h | 1h |
1 | 83h | 2h | 3h |
2 | 81h | 4h | 5h |
3 | 82h | 6h | 7h |
4 | 8Fh | C0h | C2h |
5 | 8Bh | C4h | C6h |
6 | 89h | C8h | CAh |
7 | 8Ah | CCh | CEh |
DMA 4. Doesn't exist. DMA 4 is used to cascade the two 8237A chips. When first 8237A wants to DMA, it issues "HRQ" to second chip's DRQ 4. The second chip thinks DMA 4 is wanna be made so issues DRQ 4 to the first chip's HLDA. First chip makes it's own DMA 0-3, then sends to the second "OK second chip, my DMA 4 is complete" and second chip knows it's free on the bus. If this mechanism would not work, the two chips could peck each other on the BUS and the PC would screw up. :+)
The memory management facilities of the IA-32 architecture are divided into two parts: segmentation and paging. Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mechanism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is no mode bit to disable segmentation. The use of paging, however, is optional.
IA-32鏋舵瀯鐨勫唴瀛樼鐞嗘満鏋勶紙facilities錛夊彲鍒掑垎涓轟袱涓儴鍒嗭細(xì)鍒嗘錛坰egmentation錛夊拰鍒嗛〉錛坧aging錛夈傚垎孌靛姛鑳芥彁渚涗簡(jiǎn)鍒嗛殧 浠g爜銆佹暟鎹拰鍫嗘爤鐨勬満鍒訛紝浠庤屼嬌澶氫釜榪涚▼榪愯鍦ㄥ悓涓涓狢PU鐗╃悊鍦板潃絀洪棿鍐呰屼簰涓嶅獎(jiǎng)鍝嶏紱鍒嗛〉鍙敤鏉ュ疄鐜頒竴縐?#8220;璇鋒眰欏靛紡錛坉emand-paged錛?#8221;鐨勮櫄 鎷熷唴瀛樻満鍒訛紝浠庤岄〉鍖栫▼搴忔墽琛岀幆澧冿紝鍦ㄧ▼搴忚繍琛屾椂鍙皢鎵闇瑕佺殑欏墊槧灝勫埌鐗╃悊鍐呭瓨銆傚垎欏墊満鍒朵篃鍙敤浣滈殧紱誨榪涚▼浠誨姟銆傚垎孌靛姛鑳芥槸CPU淇濇姢妯″紡蹇呴』鐨勶紝娌℃湁 璁劇疆浣嶅彲浠ュ睆钄藉唴瀛樺垎孌碉紱涓嶈繃鍐呭瓨鍒嗛〉鍒欐槸鍙夌殑銆?/p>
These two mechanisms (segmentation and paging) can be configured to support simple single-program (or single-task) systems, multitasking systems, or multiple-processor systems that used shared memory.
As shown in Figure 3-1, segmentation provides a mechanism for dividing the processor’s addressable memory space (called the linear address space) into smaller protected address spaces called segments. Segments can be used to hold the code, data, and stack for a program or to hold system data structures (such as a TSS or LDT). If more than one program (or task) is running on a processor, each program can be assigned its own set of segments. The processor then enforces the boundaries between these segments and insures that one program does not interfere with the execution of another program by writing into the other program’s segments.
鍒嗘鍜屽垎欏墊満鍒惰閰嶇疆鎴愭敮鎸佸崟浠誨姟緋葷粺銆佸浠誨姟緋葷粺鎴栧澶勭悊鍣ㄧ郴緇熴?/span>
濡傚浘3-1錛屽唴瀛樺垎孌靛皢CPU鐨勫彲瀵誨潃絀洪棿錛堢О涓虹嚎鎬у湴鍧絀洪棿錛夊垝鍒嗘洿灝忕殑鍙椾繚鎶ょ殑鍐呭瓨孌碉紝榪欎簺孌靛瓨鏀劇▼搴忕殑鏁版嵁錛堜唬鐮併佹暟鎹拰鍫嗘爤錛夊拰緋葷粺鐨勬暟鎹粨鏋? 錛堝儚TSS 鎴?LDT錛夈傚鏋滃鐞嗗櫒榪愯鐫澶氫釜浠誨姟錛岄偅涔堟瘡涓換鍔¢兘鏈変竴闆嗚嚜宸辯嫭绔嬬殑鍐呭瓨孌點(diǎn)?/p>
The segmentation mechanism also allows typing of segments so that the operations that may be performed on a particular type of segment can be restricted.
All the segments in a system are contained in the processor’s linear address space. To locate a byte in a particular segment, a logical address (also called a far pointer) must be provided. A logical address consists of a segment selector and an offset. The segment selector is a unique identifier for a segment. Among other things it provides an offset into a descriptor table (such as the global descriptor table, GDT) to a data structure called a segment descriptor. Each segment has a segment descriptor, which specifies the size of the segment, the access rights and privilege level for the segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment). The offset part of the logical address is added to the base address for the segment to locate a byte within the segment. The base address plus the offset thus forms a linear address in the processor’s linear address space.
榪涚▼鐨勫悇涓閮藉繀欏諱綅浜嶤PU鐨勭嚎鎬х┖闂翠箣鍐咃紝榪涚▼瑕佽闂煇孌電殑涓涓瓧鑺傦紝蹇呴』緇欏嚭璇ュ瓧鑺? 鐨勯昏緫鍦板潃錛堜篃鍙繙鎸囬拡錛夈傞昏緫鍦板潃鐢辨閫夋嫨瀛愶紙segment selector 錛夊拰鍋忕Щ鍊肩粍鎴愩傛閫夋嫨瀛愭槸孌電殑鍞竴鏍囪瘑錛屾寚鍚戜竴涓彨孌墊弿榪扮鐨勬暟鎹粨鏋勶紱孌墊弿榪扮浣嶄簬涓涓彨鎻忚堪琛ㄤ箣鍐咃紙濡傚叏灞鎻忚堪琛℅DT錛? 姣忎釜孌靛繀欏婚兘鏈夌浉搴旂殑孌墊弿榪扮錛岀敤浠ユ寚瀹氭澶у皬銆佽闂潈闄愬拰孌電殑鐗規(guī)潈綰у埆錛坧rivilege level錛夈佹綾誨瀷鍜屾鐨勯鍦板潃鍦ㄧ嚎鎬у湴鍧絀洪棿鐨勪綅緗紙鍙鐨勫熀鍦板潃錛夈傞昏緫鍦板潃閫氳繃鍩哄湴鍧鍔犱笂孌靛唴鍋忕Щ寰楀埌銆?/span>
If paging is not used, the linear address space of the processor is mapped directly into the physical address space of processor. The physical address space is defined as the range of addresses that the processor can generate on its address bus.
Because multitasking computing systems commonly define a linear address space much larger than it is economically feasible to contain all at once in physical memory, some method of “virtualizing” the linear address space is needed. This virtualization of the linear address space is handled through the processor’s paging mechanism.
濡傛灉涓嶇敤鍒嗛〉鍔熻兘錛屽鐞嗗櫒鐨刐綰挎у湴鍧絀洪棿]灝變細(xì)鐩存帴鏄犲皠鍒癧鐗╃悊鍦板潃絀洪棿]銆俒鐗╃悊鍦板潃絀洪棿]鐨勫ぇ灝忓氨鏄鐞嗗櫒鑳介氳繃鍦板潃鎬葷嚎浜х敓鐨勫湴鍧鑼冨洿銆備負(fù)浜?jiǎn)鐩存? 浣跨敤綰挎у湴鍧絀洪棿浠庤岀畝鍖栫紪紼嬪拰瀹炵幇澶氳繘紼嬭屾彁楂樺唴瀛樼殑鍒╃敤鐜囷紝闇瑕佸疄鐜版煇縐嶅綰挎у湴鍧絀洪棿榪涜“铏氭嫙鍖栵紙virtualizing錛?#8221;錛孋PU鐨勫垎欏墊満 鍒跺疄鐜頒簡(jiǎn)榪欑铏氭嫙鍖栥?/p>
Paging supports a “virtual memory” environment where a large linear address space is simulated with a small amount of physical memory (RAM and ROM) and some disk storage. When using paging, each segment is divided into pages (typically 4 KBytes each in size), which are stored either in physical memory or on the disk. The operating system or executive maintains a page directory and a set of page tables to keep track of the pages. When a program (or task) attempts to access an address location in the linear address space, the processor uses the page directory and page tables to translate the linear address into a physical address and then performs the requested operation (read or write) on the memory location. If the page being accessed is not currently in physical memory, the processor interrupts execution of the program (by generating a page-fault exception). The operating system or executive then reads the page into physical memory from the disk and continues executing the program.
“铏氭嫙鍐呭瓨”灝辨槸鍒╃敤鐗╃悊鍐呭瓨鍜岀鐩樻潵瀵笴PU鐨勭嚎鎬у湴鍧榪涜妯℃嫙錛坘emin:楂樼駭璇█婧愮爜鎸囧畾鐨勬槸絎﹀彿鍦板潃錛屾槸铏氱殑錛屾湁浜?jiǎn)铏氭嫙鍐呭瓨鍗充究鏄敤姹嚲~栨寚瀹氫竴 鍥哄畾鍦板潃涔熸槸铏氱殑銆傞棶棰樻槸榪欎簺铏氬瓨鏄庝箞綆$悊鐨勶級(jí)銆傚綋浣跨敤鍒嗛〉鏃訛紝榪涚▼鐨勬瘡涓閮戒細(xì)琚垎鎴愬ぇ灝忓浐瀹氱殑欏碉紝榪欎簺欏靛彲鑳藉湪鍐呭瓨涓紝涔熷彲鑳藉湪紓佺洏銆傛搷浣滅郴緇熺敤浜? 涓寮犻〉鐩綍錛坧age directory錛夊拰澶氬紶欏佃〃鏉ョ鐞嗚繖浜涢〉銆傚綋榪涚▼璇曞浘璁塊棶綰挎у湴鍧絀洪棿鐨勬煇涓綅緗紝澶勭悊鍣ㄤ細(xì)閫氳繃欏電洰褰曞拰欏佃〃鍏堝皢綰挎у湴鍧杞崲鎴愮墿鐞嗗湴鍧錛岀劧鍚庡啀璁塊棶 錛堣鎴栧啓錛夛紙kemin錛氳漿鎹㈢粏鑺傛病鏈夎錛夈傚鏋滆璁塊棶鐨勯〉褰撳墠涓嶅湪鍐呭瓨錛屽鐞嗗氨浼?xì)涓柇杩浗E嬬殑榪愯錛堥氳繃浜х敓緙洪〉寮傚父涓柇錛夛紙kemin:鎬庝箞鍒ゆ柇鏌愰〉涓? 鍦ㄥ唴瀛橈紵錛夈傛搷浣滅郴緇熻礋璐d粠紓佺洏璇誨叆璇ラ〉騫剁戶緇墽琛岃榪涚▼錛坘emin:欏佃鍏ョ殑鍓嶅墠鍚庡悗娌℃湁璁詫級(jí)銆?/p>
When paging is implemented properly in the operating-system or executive, the swapping of pages between physical memory and the disk is transparent to the correct execution of a program. Even programs written for 16-bit IA-32 processors can be paged (transparently) when they are run in virtual-8086 mode.
from:
http://blog.csdn.net/keminlau/archive/2008/10/19/3090337.aspx
A process (also sometimes referred to as a task ) is an executing (i.e., running) instance of a program. In Linux, threads are lightweight processes that can run in parallel and share an address space (i.e., a range of memory locations) and other resources with their parent processes (i.e., the processes that created them).
A context is the contents of a CPU's registers and program counter at any point in time. A register is a small amount of very fast memory inside of a CPU (as opposed to the slower RAM main memory outside of the CPU) that is used to speed the execution of computer programs by providing quick access to commonly used values, generally those in the midst of a calculation. A program counter is a specialized register that indicates the position of the CPU in its instruction sequence and which holds either the address of the instruction being executed or the address of the next instruction to be executed, depending on the specific system.
Context switching can be described in slightly more detail as the kernel operating system) performing the following activities with regard to processes (including threads) on the CPU: (1) suspending the progression of one process and storing the CPU's state (i.e., the context) for that process somewhere in memory, (2) retrieving the context of the next process from memory and restoring it in the CPU's registers and (3) returning to the location indicated by the program counter (i.e., returning to the line of code at which the process was interrupted) in order to resume the process. (i.e., the core of the
A context switch is sometimes described as the kernel suspending execution of one process on the CPU and resuming execution of some other process that had previously been suspended. Although this wording can help clarify the concept, it can be confusing in itself because a process is , by definition, an executing instance of a program. Thus the wording suspending progression of a process might be preferable.
Context Switches and Mode Switches
Context switches can occur only in kernel mode . Kernel mode is a privileged mode of the CPU in which only the kernel runs and which provides access to all memory locations and all other system resources. Other programs, including applications, initially operate in user mode , but they can run portions of the kernel code via system calls . A system call is a request in a Unix-like operating system by an active process (i.e., a process currently progressing in the CPU) for a service performed by the kernel, such as input/output (I/O) or process creation (i.e., creation of a new process). I/O can be defined as any movement of information to or from the combination of the CPU and main memory (i.e. RAM), that is, communication between this combination and the computer's users (e.g., via the keyboard or mouse), its storage devices (e.g., disk or tape drives), or other computers.
The existence of these two modes in Unix-like operating systems means that a similar, but simpler, operation is necessary when a system call causes the CPU to shift to kernel mode. This is referred to as a mode switch rather than a context switch, because it does not change the current process.
Context switching is an essential feature of multitasking operating systems. A multitasking operating system is one in which multiple processes execute on a single CPU seemingly simultaneously and without interfering with each other. This illusion of concurrency is achieved by means of context switches that are occurring in rapid succession (tens or hundreds of times per second). These context switches occur as a result of processes voluntarily relinquishing their time in the CPU or as a result of the scheduler making the switch when a process has used up its CPU time slice .
A context switch can also occur as a result of a hardware interrupt , which is a signal from a hardware device (such as a keyboard, mouse, modem or system clock) to the kernel that an event (e.g., a key press, mouse movement or arrival of data from a network connection) has occurred.
Intel 80386 and higher CPUs contain hardware support for context switches. However, most modern operating systems perform software context switching , which can be used on any CPU, rather than hardware context switching in an attempt to obtain improved performance. Software context switching was first implemented in Linux for Intel-compatible processors with the 2.4 kernel.
One major advantage claimed for software context switching is that, whereas the hardware mechanism saves almost all of the CPU state, software can be more selective and save only that portion that actually needs to be saved and reloaded. However, there is some question as to how important this really is in increasing the efficiency of context switching. Its advocates also claim that software context switching allows for the possibility of improving the switching code, thereby further enhancing efficiency, and that it permits better control over the validity of the data that is being loaded.
The Cost of Context Switching
Context switching is generally computationally intensive. That is, it requires considerable processor time, which can be on the order of nanoseconds for each of the tens or hundreds of switches per second. Thus, context switching represents a substantial cost to the system in terms of CPU time and can, in fact, be the most costly operation on an operating system.
Consequently, a major focus in the design of operating systems has been to avoid unnecessary context switching to the extent possible. However, this has not been easy to accomplish in practice. In fact, although the cost of context switching has been declining when measured in terms of the absolute amount of CPU time consumed, this appears to be due mainly to increases in CPU clock speeds rather than to improvements in the efficiency of context switching itself.
One of the many advantages claimed for Linux as compared with other operating systems, including some other Unix-like systems, is its extremely low cost of context switching and mode switching.
鏍堬細(xì)鏄釜綰跨▼鐙湁鐨勶紝淇濆瓨鍏惰繍琛岀姸鎬佸拰灞閮ㄨ嚜鍔ㄥ彉閲忕殑銆傛爤鍦ㄧ嚎紼嬪紑濮嬬殑鏃跺欏垵濮嬪寲錛屾瘡涓嚎紼嬬殑鏍堜簰鐩哥嫭绔嬶紝鍥犳錛屾爤鏄thread safe鐨勩傛瘡涓跡銆錛嬶紜瀵硅薄鐨勬暟鎹垚鍛樹篃瀛樺湪鍦ㄦ爤涓紝姣忎釜鍑芥暟閮芥湁鑷繁鐨勬爤錛屾爤琚敤鏉ュ湪鍑芥暟涔嬮棿浼犻掑弬鏁般傛搷浣滅郴緇熷湪鍒囨崲綰跨▼鐨勬椂鍊欎細(xì)鑷姩鐨勫垏鎹㈡爤錛屽氨鏄垏鎹€錛籌汲錛忥譏錛籌及瀵勫瓨鍣ㄣ傛爤絀洪棿涓嶉渶瑕佸湪楂樼駭璇█閲岄潰鏄懼紡鐨勫垎閰嶅拰閲婃斁銆?/p>
鍫嗗拰鏍堢殑鍖哄埆
涓銆侀澶囩煡璇嗏旂▼搴忕殑鍐呭瓨鍒嗛厤
涓涓敱c/C++緙栬瘧鐨勭▼搴忓崰鐢ㄧ殑鍐呭瓨鍒嗕負(fù)浠ヤ笅鍑犱釜閮ㄥ垎錛?br>1銆佹爤鍖猴紙stack錛夆?鐢辯紪璇戝櫒鑷姩鍒嗛厤閲婃斁錛屽瓨鏀懼嚱鏁扮殑鍙傛暟鍊鹼紝灞閮ㄥ彉閲忕殑鍊肩瓑銆傚叾鎿嶄綔鏂瑰紡綾諱技浜庢暟鎹粨鏋勪腑鐨勬爤銆?/p>
2銆佸爢鍖猴紙heap錛?鈥?涓鑸敱紼嬪簭鍛樺垎閰嶉噴鏀撅紝鑻ョ▼搴忓憳涓嶉噴鏀撅紝紼嬪簭緇撴潫鏃跺彲鑳界敱O(jiān)S鍥炴敹銆傛敞鎰忓畠涓庢暟鎹粨鏋勪腑鐨勫爢鏄袱鍥炰簨錛屽垎閰嶆柟寮忓掓槸綾諱技浜庨摼琛ㄣ?/p>
3銆佸叏灞鍖猴紙闈?rùn)鎬佸尯錛夛紙static錛夆旓紝鍏ㄥ眬鍙橀噺鍜岄潤(rùn)鎬佸彉閲忕殑瀛樺偍鏄斁鍦ㄤ竴鍧楃殑錛屽垵濮嬪寲鐨勫叏灞鍙橀噺鍜岄潤(rùn)鎬佸彉閲忓湪涓鍧楀尯鍩燂紝鏈垵濮嬪寲鐨勫叏灞鍙橀噺鍜屾湭鍒濆鍖栫殑闈?rùn)鎬佸彉閲忓湪鐩擱偦鐨勫彟涓鍧楀尯鍩熴?- 紼嬪簭緇撴潫鍚庣敱緋葷粺閲婃斁銆?/p>
4銆佹枃瀛楀父閲忓尯 鈥斿父閲忓瓧絎︿覆灝辨槸鏀懼湪榪欓噷鐨勩傜▼搴忕粨鏉熷悗鐢辯郴緇熼噴鏀俱?/p>
5銆佺▼搴忎唬鐮佸尯鈥斿瓨鏀懼嚱鏁頒綋鐨勪簩榪涘埗浠g爜銆?/p>
浜屻佷緥瀛愮▼搴?br>//main.cpp
int a = 0; 鍏ㄥ眬鍒濆鍖栧尯
char *p1; 鍏ㄥ眬鏈垵濮嬪寲鍖?br>main()
{
int b; 鏍?br>char s[] = "abc"; 鏍?br>char *p2; 鏍?br>char *p3 = "123456"; 123456\0鍦ㄥ父閲忓尯錛宲3鍦ㄦ爤涓娿?br>static int c =0錛涘叏灞錛堥潤(rùn)鎬侊級(jí)鍒濆鍖栧尯
p1 = (char *)malloc(10);
p2 = (char *)malloc(20);
鍒嗛厤寰楁潵寰?0鍜?0瀛楄妭鐨勫尯鍩熷氨鍦ㄥ爢鍖恒?br>strcpy(p1, "123456"); 123456\0鏀懼湪甯擱噺鍖猴紝緙栬瘧鍣ㄥ彲鑳戒細(xì)灝嗗畠涓巔3鎵鎸囧悜鐨?123456"浼樺寲鎴愪竴涓湴鏂廣?br>}
浜屻佸爢鍜屾爤鐨勭悊璁虹煡璇?br>2.1鐢寵鏂瑰紡
stack:
鐢辯郴緇熻嚜鍔ㄥ垎閰嶃備緥濡傦紝澹版槑鍦ㄥ嚱鏁頒腑涓涓眬閮ㄥ彉閲?int b; 緋葷粺鑷姩鍦ㄦ爤涓負(fù)b寮杈熺┖闂?br>heap:
闇瑕佺▼搴忓憳鑷繁鐢寵錛屽茍鎸囨槑澶у皬錛屽湪c涓璵alloc鍑芥暟
濡俻1 = (char *)malloc(10);
鍦–++涓敤new榪愮畻絎?br>濡俻2 = (char *)malloc(10);
浣嗘槸娉ㄦ剰p1銆乸2鏈韓鏄湪鏍堜腑鐨勩?/p>
2.2
鐢寵鍚庣郴緇熺殑鍝嶅簲
鏍堬細(xì)鍙鏍堢殑鍓╀綑絀洪棿澶т簬鎵鐢寵絀洪棿錛岀郴緇熷皢涓虹▼搴忔彁渚涘唴瀛橈紝鍚﹀垯灝嗘姤寮傚父鎻愮ず鏍堟孩鍑恒?br>鍫嗭細(xì) 棣栧厛搴旇鐭ラ亾鎿嶄綔緋葷粺鏈変竴涓褰曠┖闂插唴瀛樺湴鍧鐨勯摼琛紝褰撶郴緇熸敹鍒扮▼搴忕殑鐢寵鏃訛紝浼?xì)閬嶅巻璇ラ摼琛ㄥQ屽鎵劇涓涓┖闂村ぇ浜庢墍鐢寵絀洪棿鐨勫爢緇撶偣錛岀劧鍚庡皢璇ョ粨鐐逛粠絀洪棽 緇撶偣閾捐〃涓垹闄わ紝騫跺皢璇ョ粨鐐圭殑絀洪棿鍒嗛厤緇欑▼搴忥紝鍙﹀錛屽浜庡ぇ澶氭暟緋葷粺錛屼細(xì)鍦ㄨ繖鍧楀唴瀛樼┖闂翠腑鐨勯鍦板潃澶勮褰曟湰嬈″垎閰嶇殑澶у皬錛岃繖鏍鳳紝浠g爜涓殑delete璇彞鎵嶈兘姝g‘鐨勯噴鏀炬湰鍐呭瓨絀洪棿銆傚彟澶栵紝鐢變簬鎵懼埌鐨勫爢緇撶偣鐨勫ぇ灝忎笉涓瀹氭濂界瓑浜庣敵璇風(fēng)殑澶у皬錛岀郴緇熶細(xì)鑷姩鐨勫皢澶氫綑鐨勯偅閮ㄥ垎閲嶆柊鏀懼叆絀洪棽閾捐〃涓?/p>
2.3鐢寵澶у皬鐨勯檺鍒?br>鏍堬細(xì)鍦╓indows涓?鏍堟槸鍚戜綆鍦板潃鎵╁睍鐨勬暟鎹粨鏋勶紝鏄竴鍧楄繛緇殑鍐呭瓨鐨勫尯鍩熴傝繖鍙ヨ瘽鐨勬剰鎬濇槸鏍堥《鐨勫湴鍧鍜屾爤鐨勬渶澶у閲忔槸緋葷粺棰勫厛瑙勫畾濂界殑錛屽湪WINDOWS涓嬶紝鏍堢殑澶у皬鏄?M錛堜篃鍙兘鏄?M錛屽畠鏄竴涓紪璇戞椂灝辯‘瀹氱殑甯告暟錛夛紝濡傛灉鐢寵鐨勭┖闂磋秴榪囨爤鐨勫墿浣欑┖闂存椂錛屽皢鎻愮ずoverflow銆傚洜姝わ紝鑳戒粠鏍堣幏寰楃殑絀洪棿杈冨皬
銆?br>鍫嗭細(xì)鍫嗘槸鍚戦珮鍦板潃鎵╁睍鐨勬暟鎹粨鏋勶紝鏄笉榪炵畫鐨勫唴瀛樺尯鍩熴傝繖鏄敱浜庣郴緇熸槸鐢ㄩ摼琛ㄦ潵瀛樺偍鐨勭┖闂插唴瀛樺湴鍧鐨勶紝鑷劧鏄笉榪炵畫鐨勶紝鑰岄摼琛ㄧ殑閬嶅巻鏂瑰悜鏄敱浣庡湴鍧鍚戦珮鍦板潃銆傚爢鐨勫ぇ灝忓彈闄愪簬璁$畻鏈虹郴緇熶腑鏈夋晥鐨勮櫄鎷熷唴瀛樸傜敱姝ゅ彲瑙侊紝鍫嗚幏寰楃殑絀洪棿姣旇緝鐏墊椿錛屼篃姣旇緝澶с?/p>
2.4鐢寵鏁堢巼鐨勬瘮杈冿細(xì)
鏍堢敱緋葷粺鑷姩鍒嗛厤錛岄熷害杈冨揩銆備絾紼嬪簭鍛樻槸鏃犳硶鎺у埗鐨勩?br>鍫嗘槸鐢眓ew鍒嗛厤鐨勫唴瀛橈紝涓鑸熷害姣旇緝鎱紝鑰屼笖瀹規(guī)槗浜х敓鍐呭瓨紕庣墖,涓嶈繃鐢ㄨ搗鏉ユ渶鏂逛究.
鍙﹀錛屽湪WINDOWS涓嬶紝鏈濂界殑鏂瑰紡鏄敤VirtualAlloc鍒嗛厤鍐呭瓨錛屼粬涓嶆槸鍦ㄥ爢錛屼篃涓嶆槸鍦ㄦ爤鏄洿鎺ュ湪榪涚▼鐨勫湴鍧絀洪棿涓繚鐣欎竴蹇唴瀛橈紝铏界劧鐢ㄨ搗鏉ユ渶涓嶆柟渚褲備絾鏄熷害蹇紝涔熸渶鐏墊椿銆?/p>
2.5鍫嗗拰鏍堜腑鐨勫瓨鍌ㄥ唴瀹?br>鏍堬細(xì)鍦ㄥ嚱鏁拌皟鐢ㄦ椂錛岀涓涓繘鏍堢殑鏄富鍑芥暟涓悗鐨勪笅涓鏉℃寚浠わ紙鍑芥暟璋冪敤璇彞鐨勪笅涓鏉″彲鎵ц璇彞錛夌殑鍦板潃錛岀劧鍚庢槸鍑芥暟鐨勫悇涓弬鏁幫紝鍦ㄥぇ澶氭暟鐨凜緙栬瘧鍣ㄤ腑錛屽弬鏁版槸鐢卞彸寰宸﹀叆鏍堢殑錛岀劧鍚庢槸鍑芥暟涓殑灞閮ㄥ彉閲忋傛敞鎰忛潤(rùn)鎬佸彉閲忔槸涓嶅叆鏍堢殑銆?br>褰撴湰嬈″嚱鏁拌皟鐢ㄧ粨鏉熷悗錛屽眬閮ㄥ彉閲忓厛鍑烘爤錛岀劧鍚庢槸鍙傛暟錛屾渶鍚庢爤欏舵寚閽堟寚鍚戞渶寮濮嬪瓨鐨勫湴鍧錛屼篃灝辨槸涓誨嚱鏁頒腑鐨勪笅涓鏉℃寚浠わ紝紼嬪簭鐢辮鐐圭戶緇繍琛屻?br>鍫嗭細(xì)涓鑸槸鍦ㄥ爢鐨勫ご閮ㄧ敤涓涓瓧鑺傚瓨鏀懼爢鐨勫ぇ灝忋傚爢涓殑鍏蜂綋鍐呭鏈夌▼搴忓憳瀹夋帓銆?/p>
2.6瀛樺彇鏁堢巼鐨勬瘮杈?/p>
char s1[] = "aaaaaaaaaaaaaaa";
char *s2 = "bbbbbbbbbbbbbbbbb";
aaaaaaaaaaa鏄湪榪愯鏃跺埢璧嬪肩殑錛?br>鑰宐bbbbbbbbbb鏄湪緙栬瘧鏃跺氨紜畾鐨勶紱
浣嗘槸錛屽湪浠ュ悗鐨勫瓨鍙栦腑錛屽湪鏍堜笂鐨勬暟緇勬瘮鎸囬拡鎵鎸囧悜鐨勫瓧絎︿覆(渚嬪鍫?蹇?br>姣斿錛?br>void main()
{
char a = 1;
char c[] = "1234567890";
char *p ="1234567890";
a = c[1];
a = p[1];
return;
}
瀵瑰簲鐨勬眹緙栦唬鐮?br>10: a = c[1];
00401067 8A 4D F1 mov cl,byte ptr [ebp-0Fh]
0040106A 88 4D FC mov byte ptr [ebp-4],cl
11: a = p[1];
0040106D 8B 55 EC mov edx,dword ptr [ebp-14h]
00401070 8A 42 01 mov al,byte ptr [edx+1]
00401073 88 45 FC mov byte ptr [ebp-4],al
絎竴縐嶅湪璇誨彇鏃剁洿鎺ュ氨鎶婂瓧絎︿覆涓殑鍏冪礌璇誨埌瀵勫瓨鍣╟l涓紝鑰岀浜岀鍒欒鍏堟妸鎸囬拡鍊艱鍒癳dx涓紝鍦ㄦ牴鎹?br>edx璇誨彇瀛楃錛屾樉鐒舵參浜?jiǎn)銆?/p>
2.7灝忕粨錛?br>鍫嗗拰鏍堢殑鍖哄埆鍙互鐢ㄥ涓嬬殑姣斿柣鏉ョ湅鍑猴細(xì)
浣跨敤鏍堝氨璞℃垜浠幓楗閲屽悆楗紝鍙鐐硅彍錛堝彂鍑虹敵璇鳳級(jí)銆佷粯閽便佸拰鍚冿紙浣跨敤錛夛紝鍚冮ケ浜?jiǎn)灏辫祹图屼笉蹇呯悊浼?xì)鍒囪彍銆佹礂鑿滅瓑鍑嗗宸ヤ綔鍜屾礂紕椼佸埛閿呯瓑鎵熬宸ヤ綔錛屼粬鐨勫ソ澶勬槸蹇嵎錛屼絾鏄嚜鐢卞害灝忋?br>浣跨敤鍫嗗氨璞℃槸鑷繁鍔ㄦ墜鍋氬枩嬈㈠悆鐨勮彍鑲達(dá)紝姣旇緝楹葷儲(chǔ)錛屼絾鏄瘮杈冪鍚堣嚜宸辯殑鍙e懗錛岃屼笖鑷敱搴﹀ぇ銆?br>
涓嬮潰鏄彟涓綃囷紝鎬葷粨鐨勬瘮涓婇潰濂斤細(xì)
鍫嗗拰鏍堢殑鑱旂郴涓庡尯鍒玠d
鍦╞bs涓婏紝鍫嗕笌鏍堢殑鍖哄垎闂錛屼技涔庢槸涓涓案鎭掔殑璇濋錛岀敱姝ゅ彲瑙侊紝鍒濆鑰呭姝ゅ線寰鏄販娣嗕笉娓呯殑錛屾墍浠ユ垜鍐沖畾鎷夸粬絎竴涓紑鍒銆?/p>
棣栧厛錛屾垜浠婦涓涓緥瀛愶細(xì)
void f() { int* p=new int[5]; }
榪欐潯鐭煭鐨勪竴鍙ヨ瘽灝卞寘鍚簡(jiǎn)鍫嗕笌鏍堬紝鐪嬪埌new錛屾垜浠鍏堝氨搴旇鎯沖埌錛屾垜浠垎閰嶄簡(jiǎn)涓鍧楀爢鍐呭瓨錛岄偅涔堟寚閽坧鍛紵浠栧垎閰嶇殑鏄竴鍧楁爤鍐呭瓨錛屾墍浠ヨ繖鍙ヨ瘽鐨勬剰鎬濆氨鏄細(xì) 鍦ㄦ爤鍐呭瓨涓瓨鏀句簡(jiǎn)涓涓寚鍚戜竴鍧楀爢鍐呭瓨鐨勬寚閽坧銆傚湪紼嬪簭浼?xì)鍏埣嫯瀹氬湪鍫嗕腑鍒嗛厤鍐呭瓨鐨勫ぇ灏忓Q岀劧鍚庤皟鐢╫perator new鍒嗛厤鍐呭瓨錛岀劧鍚庤繑鍥炶繖鍧楀唴瀛樼殑棣栧湴鍧錛屾斁鍏ユ爤涓紝浠栧湪VC6涓嬬殑姹囩紪浠g爜濡備笅錛?/p>
00401028 push 14h
0040102A call operator new (00401060)
0040102F add esp,4
00401032 mov dword ptr [ebp-8],eax
00401035 mov eax,dword ptr [ebp-8]
00401038 mov dword ptr [ebp-4],eax
榪欓噷錛屾垜浠負(fù)浜?jiǎn)绠鍗曞茍娌℃湁閲婃斁鍐呭瓨錛岄偅涔堣鎬庝箞鍘婚噴鏀懼憿錛熸槸delete p涔堬紵婢籌紝閿欎簡(jiǎn)錛屽簲璇ユ槸delete []p錛岃繖鏄負(fù)浜?jiǎn)鍛婅瘔缂栬瘧鍣ㄥQ氭垜鍒犻櫎鐨勬槸涓涓暟緇勶紝VC6灝變細(xì)鏍規(guī)嵁鐩稿簲鐨凜ookie淇℃伅鍘昏繘琛岄噴鏀懼唴瀛樼殑宸ヤ綔銆?/p>
濂戒簡(jiǎn)錛屾垜浠洖鍒版垜浠殑涓婚錛氬爢鍜屾爤絀剁珶鏈変粈涔堝尯鍒紵
涓昏鐨勫尯鍒敱浠ヤ笅鍑犵偣錛?/p>
1銆佺鐞嗘柟寮忎笉鍚岋紱
2銆佺┖闂村ぇ灝忎笉鍚岋紱
3銆佽兘鍚︿駭鐢熺鐗囦笉鍚岋紱
4銆佺敓闀挎柟鍚戜笉鍚岋紱
5銆佸垎閰嶆柟寮忎笉鍚岋紱
6銆佸垎閰嶆晥鐜囦笉鍚岋紱
綆$悊鏂瑰紡錛氬浜庢爤鏉ヨ錛屾槸鐢辯紪璇戝櫒鑷姩綆$悊錛屾棤闇鎴戜滑鎵嬪伐鎺у埗錛涘浜庡爢鏉ヨ錛岄噴鏀懼伐浣滅敱紼嬪簭鍛樻帶鍒訛紝瀹規(guī)槗浜х敓memory leak銆?/p>
絀洪棿澶у皬錛氫竴鑸潵璁插湪32浣嶇郴緇熶笅錛屽爢鍐呭瓨鍙互杈懼埌4G鐨勭┖闂達(dá)紝浠庤繖涓搴︽潵鐪嬪爢鍐呭瓨鍑犱箮鏄病鏈変粈涔堥檺鍒剁殑銆備絾鏄浜庢爤鏉ヨ錛屼竴鑸兘鏄湁涓瀹氱殑絀洪棿澶у皬 鐨勶紝渚嬪錛屽湪VC6涓嬮潰錛岄粯璁ょ殑鏍堢┖闂村ぇ灝忔槸1M錛堝ソ鍍忔槸錛岃涓嶆竻妤氫簡(jiǎn)錛夈傚綋鐒?dòng)灱屾垜浠彲浠ヤ慨鏀瑰Q?nbsp;
鎵撳紑宸ョ▼錛屼緷嬈℃搷浣滆彍鍗曞涓嬶細(xì)Project->Setting->Link錛屽湪Category 涓変腑Output錛岀劧鍚庡湪Reserve涓瀹氬爢鏍堢殑鏈澶у煎拰commit銆?/p>
娉ㄦ剰錛歳eserve鏈灝忓間負(fù)4Byte錛沜ommit鏄繚鐣欏湪铏氭嫙鍐呭瓨鐨勯〉鏂囦歡閲岄潰錛屽畠璁劇疆鐨勮緝澶т細(xì)浣挎爤寮杈熻緝澶х殑鍊鹼紝 鍙兘澧炲姞鍐呭瓨鐨勫紑閿鍜屽惎鍔ㄦ椂闂淬?/p>
紕庣墖闂錛氬浜庡爢鏉ヨ錛岄綣佺殑new/delete鍔垮繀浼?xì)閫犳垚鍐呭瓨絀洪棿鐨勪笉榪炵畫錛屼粠鑰岄犳垚澶ч噺鐨勭鐗囷紝浣跨▼搴忔晥鐜囬檷浣庛傚浜庢爤鏉ヨ錛屽垯涓嶄細(xì)瀛樺湪榪欎釜闂錛? 鍥犱負(fù)鏍堟槸鍏堣繘鍚庡嚭鐨勯槦鍒楋紝浠栦滑鏄姝ょ殑涓涓瀵瑰簲錛屼互鑷充簬姘歌繙閮戒笉鍙兘鏈変竴涓唴瀛樺潡浠庢爤涓棿寮瑰嚭錛屽湪浠栧脊鍑轟箣鍓嶏紝鍦ㄤ粬涓婇潰鐨勫悗榪涚殑鏍堝唴瀹瑰凡緇忚寮瑰嚭錛岃緇嗙殑 鍙互鍙傝冩暟鎹粨鏋勶紝榪欓噷鎴戜滑灝變笉鍐嶄竴涓璁ㄨ浜?jiǎn)銆?/p>
鐢熼暱鏂瑰悜錛氬浜庡爢鏉ヨ錛岀敓闀挎柟鍚戞槸鍚戜笂鐨勶紝涔熷氨鏄悜鐫鍐呭瓨鍦板潃澧炲姞鐨勬柟鍚戯紱瀵逛簬鏍堟潵璁詫紝瀹冪殑鐢熼暱鏂瑰悜鏄悜涓嬬殑錛屾槸鍚戠潃鍐呭瓨鍦板潃鍑忓皬鐨勬柟鍚戝闀褲?/p>
鍒嗛厤鏂瑰紡錛氬爢閮芥槸鍔ㄦ佸垎閰嶇殑錛屾病鏈夐潤(rùn)鎬佸垎閰嶇殑鍫嗐傛爤鏈?縐嶅垎閰嶆柟寮忥細(xì)闈?rùn)鎬佸垎閰嶅拰鍔ㄦ佸垎閰嶃傞潤(rùn)鎬佸垎閰嶆槸緙栬瘧鍣ㄥ畬鎴愮殑錛屾瘮濡傚眬閮ㄥ彉閲忕殑鍒嗛厤銆傚姩鎬佸垎閰嶇敱 alloca鍑芥暟榪涜鍒嗛厤錛屼絾鏄爤鐨勫姩鎬佸垎閰嶅拰鍫嗘槸涓嶅悓鐨勶紝浠栫殑鍔ㄦ佸垎閰嶆槸鐢辯紪璇戝櫒榪涜閲婃斁錛屾棤闇鎴戜滑鎵嬪伐瀹炵幇銆?/p>
鍒嗛厤鏁堢巼錛氭爤鏄満鍣ㄧ郴緇熸彁渚涚殑鏁版嵁緇撴瀯錛岃綆楁満浼?xì)鍦ㄥ簳灞傚?guī)爤鎻愪緵鏀寔錛氬垎閰嶄笓闂ㄧ殑瀵勫瓨鍣ㄥ瓨鏀炬爤鐨勫湴鍧錛屽帇鏍堝嚭鏍堥兘鏈変笓闂ㄧ殑鎸囦護(hù)鎵ц錛岃繖灝卞喅瀹氫簡(jiǎn)鏍堢殑鏁堢巼姣? 杈冮珮銆傚爢鍒欐槸C/C++鍑芥暟搴撴彁渚涚殑錛屽畠鐨勬満鍒舵槸寰堝鏉傜殑錛屼緥濡備負(fù)浜?jiǎn)鍒嗛厤涓鍧楀唴瀛橈紝搴撳嚱鏁頒細(xì)鎸夌収涓瀹氱殑綆楁硶錛堝叿浣撶殑綆楁硶鍙互鍙傝冩暟鎹粨鏋?鎿嶄綔緋葷粺錛夊湪鍫? 鍐呭瓨涓悳绱㈠彲鐢ㄧ殑瓚沖澶у皬鐨勭┖闂達(dá)紝濡傛灉娌℃湁瓚沖澶у皬鐨勭┖闂達(dá)紙鍙兘鏄敱浜庡唴瀛樼鐗囧お澶氾級(jí)錛屽氨鏈夊彲鑳借皟鐢ㄧ郴緇熷姛鑳藉幓澧炲姞紼嬪簭鏁版嵁孌電殑鍐呭瓨絀洪棿錛岃繖鏍峰氨鏈夋満浼?xì)鍒? 鍒拌凍澶熷ぇ灝忕殑鍐呭瓨錛岀劧鍚庤繘琛岃繑鍥炪傛樉鐒?dòng)灱屽爢鐨勬晥鐜囨瘮鏍堣浣庡緱澶氥?/p>
浠庤繖閲屾垜浠彲浠ョ湅鍒幫紝鍫嗗拰鏍堢浉姣旓紝鐢變簬澶ч噺new/delete鐨勪嬌鐢紝瀹規(guī)槗閫犳垚澶ч噺鐨勫唴瀛樼鐗囷紱鐢變簬娌℃湁涓撻棬鐨勭郴緇熸敮鎸侊紝鏁堢巼寰堜綆錛涚敱浜庡彲鑳藉紩鍙戠敤鎴鋒? 鍜屾牳蹇?jī)鎬佺殑鍒囨崲錛屽唴瀛樼殑鐢寵錛屼唬浠峰彉寰楁洿鍔犳槀璐點(diǎn)傛墍浠ユ爤鍦ㄧ▼搴忎腑鏄簲鐢ㄦ渶騫挎硾鐨勶紝灝辯畻鏄嚱鏁扮殑璋冪敤涔熷埄鐢ㄦ爤鍘誨畬鎴愶紝鍑芥暟璋冪敤榪囩▼涓殑鍙傛暟錛岃繑鍥炲湴 鍧錛孍BP鍜屽眬閮ㄥ彉閲忛兘閲囩敤鏍堢殑鏂瑰紡瀛樻斁銆傛墍浠ワ紝鎴戜滑鎺ㄨ崘澶у灝介噺鐢ㄦ爤錛岃屼笉鏄敤鍫嗐?/p>
铏界劧鏍堟湁濡傛浼楀鐨勫ソ澶勶紝浣嗘槸鐢變簬鍜屽爢鐩告瘮涓嶆槸閭d箞鐏墊椿錛屾湁鏃跺欏垎閰嶅ぇ閲忕殑鍐呭瓨絀洪棿錛岃繕鏄敤鍫嗗ソ涓浜涖?/p>
鏃犺鏄爢榪樻槸鏍堬紝閮借闃叉瓚婄晫鐜拌薄鐨勫彂鐢燂紙闄ら潪浣犳槸鏁呮剰浣垮叾瓚婄晫錛夛紝鍥犱負(fù)瓚婄晫鐨勭粨鏋滆涔堟槸紼嬪簭宕╂簝錛岃涔堟槸鎽ф瘉紼嬪簭鐨勫爢銆佹爤緇撴瀯錛屼駭鐢熶互鎯充笉鍒扮殑緇撴灉,灝? 綆楁槸鍦ㄤ綘鐨勭▼搴忚繍琛岃繃紼嬩腑錛屾病鏈夊彂鐢熶笂闈㈢殑闂錛屼綘榪樻槸瑕佸皬蹇?jī)锛岃涓嶅畾浠涔堟椂鍊欏氨宕╂帀錛岄偅鏃跺檇ebug鍙槸鐩稿綋鍥伴毦鐨?:) 瀵逛簡(jiǎn)錛岃繕鏈変竴浠朵簨錛屽鏋滄湁浜烘妸鍫嗘爤鍚堣搗鏉ヨ錛岄偅瀹冪殑鎰忔濇槸鏍堬紝鍙笉鏄爢錛屽懙鍛? 娓呮浜?jiǎn)锛?/p>
from錛?br>http://blog.chinaunix.net/u2/76292/showart_1327414.html
http://hi.baidu.com/54wangjun/blog/item/d1b4a74424d5934f510ffedd.html
浜屻佷腑鏂柟寮?
澶勭悊鍣ㄧ殑楂橀熷拰杈撳叆杈撳嚭璁懼鐨勪綆閫熸槸涓瀵圭煕鐩撅紝鏄澶囩鐞嗚瑙e喅鐨勪竴涓噸瑕侀棶棰樸備負(fù)浜?jiǎn)鎻愰珮鏁翠綋鏁堢巼锛屽噺灏戝湪绋嬪簭鐩存帴鎺у埗鏂瑰紡涓瑿PU涔嬮棿鐨勬暟鎹紶閫侊紝鏄緢蹇呰鐨勩?
鍦↖/O璁懼涓柇鏂瑰紡涓嬶紝涓ぎ澶勭悊鍣ㄤ笌I/O璁懼涔嬮棿鏁版嵁鐨勪紶杈撴楠ゅ涓嬶細(xì)
鈶村湪鏌愪釜榪涚▼闇瑕佹暟鎹椂錛屽彂鍑烘寚浠ゅ惎鍔ㄨ緭鍏ヨ緭鍑?guó)櫘惧鍑嗗鏁版?
鈶靛湪榪涚▼鍙戝嚭鎸囦護(hù)鍚姩璁懼涔嬪悗錛岃榪涚▼鏀懼純澶勭悊鍣紝絳夊緟鐩稿叧I/O鎿嶄綔瀹屾垚銆傛鏃訛紝榪涚▼璋冨害紼嬪簭浼?xì)璋冨害鍏朵粬灏本l繘紼嬩嬌鐢ㄥ鐞嗗櫒銆?
鈶跺綋I/O鎿嶄綔瀹屾垚鏃訛紝杈撳叆杈撳嚭璁懼鎺у埗鍣ㄩ氳繃涓柇璇鋒眰綰垮悜澶勭悊鍣ㄥ彂鍑轟腑鏂俊鍙鳳紝澶勭悊鍣ㄦ敹鍒頒腑鏂俊鍙蜂箣鍚庯紝杞悜棰勫厛璁捐濂界殑涓柇澶勭悊紼嬪簭錛屽鏁版嵁浼犻佸伐浣滆繘琛岀浉搴旂殑澶勭悊銆?
鈶峰緱鍒頒簡(jiǎn)鏁版嵁鐨勮繘紼嬶紝杞叆灝辯華鐘舵併傚湪闅忓悗鐨勬煇涓椂鍒伙紝榪涚▼璋冨害紼嬪簭浼?xì)閫変腑璇ヨ繘紼嬬戶緇伐浣溿?
涓柇鏂瑰紡鐨勪紭緙虹偣
I/O璁懼涓柇鏂瑰紡浣垮鐞嗗櫒鐨勫埄鐢ㄧ巼鎻愰珮錛屼笖鑳芥敮鎸佸閬撶▼搴忓拰I/O璁懼鐨勫茍琛屾搷浣溿?
涓嶈繃錛屼腑鏂柟寮忎粛鐒跺瓨鍦ㄤ竴浜涢棶棰樸傞鍏堬紝鐜頒唬璁$畻鏈虹郴緇熼氬父閰嶇疆鏈夊悇縐嶅悇鏍風(fēng)殑杈撳叆杈撳嚭璁懼銆傚鏋滆繖浜汭/O璁懼閮藉悓榪囦腑鏂鐞嗘柟寮忚繘琛屽茍琛屾搷浣滐紝閭d箞涓柇嬈℃暟鐨勬ュ墽澧炲姞浼?xì)閫犳垚CPU鏃犳硶鍝嶅簲涓柇鍜屽嚭鐜版暟鎹涪澶辯幇璞°?
鍏舵錛屽鏋淚/O鎺у埗鍣ㄧ殑鏁版嵁緙撳啿鍖烘瘮杈冨皬錛屽湪緙撳啿鍖鴻婊℃暟鎹箣鍚庡皢浼?xì)鍙戠敓涓柇銆傞偅涔堬紝鍦ㄦ暟鎹紶閫佽繃紼嬩腑錛屽彂鐢熶腑鏂殑鏈轟細(xì)杈冨錛岃繖灝嗚楀幓澶ч噺鐨凜PU澶勭悊鏃墮棿銆?/font>
涓夈佺洿鎺ュ唴瀛樺瓨鍙栵紙DMA錛夋柟寮?
鐩存帴鍐呭瓨瀛樺彇鎶鏈槸鎸囷紝鏁版嵁鍦ㄥ唴瀛樹笌I/O璁懼闂寸洿鎺ヨ繘琛屾垚鍧椾紶杈撱?
DMA鎶鏈壒寰?
DMA鏈変袱涓妧鏈壒寰侊紝棣栧厛鏄洿鎺ヤ紶閫侊紝鍏舵鏄潡浼犻併?
鎵璋撶洿鎺ヤ紶閫侊紝鍗沖湪鍐呭瓨涓嶪O璁懼闂翠紶閫佷竴涓暟鎹潡鐨勮繃紼嬩腑錛屼笉闇瑕丆PU鐨勪換浣曚腑闂村共娑夛紝鍙渶瑕丆PU鍦ㄨ繃紼嬪紑濮嬫椂鍚戣澶囧彂鍑?#8220;浼犻佸潡鏁版嵁”鐨勫懡浠わ紝鐒跺悗閫氳繃涓柇鏉ュ緱鐭ヨ繃紼嬫槸鍚︾粨鏉熷拰涓嬫鎿嶄綔鏄惁鍑嗗灝辯華銆?
DMA宸ヤ綔榪囩▼
鈶村綋榪涚▼瑕佹眰璁懼杈撳叆鏁版嵁鏃訛紝CPU鎶婂噯澶囧瓨鏀捐緭鍏ユ暟鎹殑鍐呭瓨璧峰鍦板潃浠ュ強(qiáng)瑕佷紶閫佺殑瀛楄妭鏁板垎鍒佸叆DMA鎺у埗鍣ㄤ腑鐨勫唴瀛樺湴鍧瀵勫瓨鍣ㄥ拰浼犻佸瓧鑺傝鏁板櫒銆?br>鈶靛彂鍑烘暟鎹紶杈撹姹傜殑榪涜榪涘叆絳夊緟鐘舵併傛鏃舵鍦ㄦ墽琛岀殑CPU鎸囦護(hù)琚殏鏃舵寕璧楓傝繘紼嬭皟搴︾▼搴忚皟搴﹀叾浠栬繘紼嬪崰鎹瓹PU銆?br>鈶惰緭鍏ヨ澶囦笉鏂湴紿冨彇CPU宸ヤ綔鍛ㄦ湡錛屽皢鏁版嵁緙撳啿瀵勫瓨鍣ㄤ腑鐨勬暟鎹簮婧愪笉鏂湴鍐欏叆鍐呭瓨錛岀洿鍒版墍瑕佹眰鐨勫瓧鑺傚叏閮ㄤ紶閫佸畬姣曘?br>鈶稤MA鎺у埗鍣ㄥ湪浼犻佸畬鎵鏈夊瓧鑺傛椂錛岄氳繃涓柇璇鋒眰綰垮彂鍑轟腑鏂俊鍙楓侰PU鍦ㄦ帴鏀跺埌涓柇淇″彿鍚庯紝杞叆涓柇澶勭悊紼嬪簭榪涜鍚庣畫澶勭悊銆?br>鈶鎬腑鏂鐞嗙粨鏉熷悗錛孋PU榪斿洖鍒拌涓柇鐨勮繘紼嬩腑錛屾垨鍒囨崲鍒版柊鐨勮繘紼嬩笂涓嬫枃鐜涓紝緇х畫鎵ц銆?strong>
In computer programming, a callback is executable code that is passed as an argument to other code. It allows a lower-level software layer to call a subroutine (or function) defined in a higher-level layer.
However, while technically accurate, this might not be the most illustrative explanation. Think of it as an "In case of fire, break glass" subroutine. Many computer programs tend to be written such that they expect a certain set of possibilities at any given moment. If "Thing That Was Expected", then "Do something", otherwise, "Do something else." is a common theme. However, there are many situations in which events (such as fire) could happen at any time. Rather than checking for them at each possible step ("Thing that was expected OR Things are on fire"), it is easier to have a system which detects a number of events, and will call the appropriate function upon said event (this also keeps us from having to write programs like "Thing that was expected OR Things are on fire OR Nuclear meltdown OR alien invasion OR the dead rising from the grave OR...etc., etc.) Instead, a callback routine is a sort of insurance policy. If zombies attack, call this function. If the user moves their mouse over an icon, call HighlightIcon, and so forth.
Usually, there is a framework in which a series of events (some condition is met) in which the running framework (be it a generic library or unique to the program) will call a registered chunk of code based on some pre-registered function (typically, a handle or a function pointer) The events may be anything from user input (such as mouse or keyboard input), network activity (callbacks are frequently used as message handlers for new network sessions) or an internal operating system event (such as a POSIX-style signal) The concept is to develop a piece of code that can be registered within some framework (be it a GUI toolkit, network library, etc.) that will serve as the handler upon the condition stated at registration. How the flow of control is passed between the underlying framework and the registered callback function is specific to the framework itself.
Contents[hide] |
To understand the motivation for using callbacks, consider the problem of a network server. At any given point in time, it may have an internal state machine that is currently at a point in which it is dealing with one very specific communication session, not necessarily expecting new participants. As a host, it could be dealing with all the name exchange and handshakes and pleasantries, but no real way of dealing with the next dinner party guest that walks through the door. One way to deal with this is for this server to live by a state machine in which it rejects new connections until the current one is dealt with...not very robust (What if the other end goes away unexpectedly?) and not very scalable (Would you really want to make other clients wait (or more likely, keep retrying to connect) until it's their turn?) Instead, it's easier to have some sort of management process that spins off a new thread (or process) to deal with the new connection. Rather than writing programs that keep dealing with all of the possible resource contention problems that could come of this, or all of the details involved in socket code (your desired platform may be more straight-forward than others, but one of your design goals may be cross-platform compatibility), many have opted to use more generic frameworks that will handle such details in exchange for providing a reference such that the underlying framework can call it if the registered event occurs.
The following code in C demonstrates the use of callbacks for the specific case of dealing with a POSIX-style signal (in this case SIGUSR1).
#include <stdio.h> #include <signal.h> void * sig(int signum) { printf("Received signal number %d!\n", signum); } int main(int argc, char *argv[]) { signal(SIGUSR1,&sig); while(1){}; return 0; }
The while loop will keep this example from doing anything interesting, but it will give you plenty of time to send a signal to this process. (If you're on a unix-like system, try a "kill -USR1 <pid>" to the process ID associated with this sample program. No matter how or when you send it, the callback should respond.)
The form of a callback varies among programming languages.
Callback functions are also frequently used as a means to handle exceptions arising within the low level function, as a way to enable side-effects in response to some condition, or as a way to gather operational statistics in the course of a larger computation. Interrupt handlers in an operating system respond to hardware conditions, signal handlers of a process are triggered by the operating system, and event handlers process the asynchronous input a program receives.
A pure callback function is one which is purely functional (always returns the same value given the same inputs) and free of observable side-effects. Some uses of callbacks require pure callback functions to operate correctly.
A special case of a callback is called a predicate callback, or just predicate for short. This is a pure callback function which accepts a single input value and returns a Boolean value. These types of callbacks are useful for filtering collections of values by some condition.
In concurrent programming a critical section is a piece of code that accesses a shared resource (data structure or device) that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will only have to wait a fixed time to enter it (i.e. bounded waiting). Some synchronization mechanism is required at the entry and exit of the critical section to ensure exclusive use, for example a semaphore.
By carefully controlling which variables are modified inside and outside the critical section (usually, by accessing important state only from within), concurrent access to that state is prevented. A critical section is typically used when a multithreaded program must update multiple related variables without a separate thread making conflicting changes to that data. In a related situation, a critical section may be used to ensure a shared resource, for example a printer, can only be accessed by one process at a time.
How critical sections are implemented varies among operating systems.
The simplest method is to prevent any change of processor control inside the critical section. On uni-processor systems, this can be done by disabling interrupts on entry into the critical section, avoiding system calls that can cause a context switch while inside the section and restoring interrupts to their previous state on exit. Any thread of execution entering any critical section anywhere in the system will, with this implementation, prevent any other thread, including an interrupt, from getting the CPU and therefore from entering any other critical section or, indeed, any code whatsoever, until the original thread leaves its critical section.
This brute-force approach can be improved upon by using semaphores. To enter a critical section, a thread must obtain a semaphore, which it releases on leaving the section. Other threads are prevented from entering the critical section at the same time as the original thread, but are free to gain control of the CPU and execute other code, including other critical sections that are protected by different semaphores.
Some confusion exists in the literature about the relationship between different critical sections in the same program.[citation needed] In general, a resource that must be protected from concurrent access may be accessed by several pieces of code. Each piece must be guarded by a common semaphore. Is each piece now a critical section or are all the pieces guarded by the same semaphore in aggregate a single critical section? This confusion is evident in definitions of a critical section such as "... a piece of code that can only be executed by one process or thread at a time". This only works if all access to a protected resource is contained in one "piece of code", which requires either the definition of a piece of code or the code itself to be somewhat contrived.
Contents[hide] |
Application-level critical sections reside in the memory range of the process and are usually modifiable by the process itself. This is called a user-space object because the program run by the user (as opposed to the kernel) can modify and interact with the object. However the functions called may jump to kernel-space code to register the user-space object with the kernel.
Example Code For Critical Sections with POSIX pthread library
/* Sample C/C++, Unix/Linux */
#include <pthread.h>
/* This is the critical section object (statically allocated). */
static pthread_mutex_t cs_mutex = PTHREAD_MUTEX_INITIALIZER;
void f()
{
/* Enter the critical section -- other threads are locked out */
pthread_mutex_lock( &cs_mutex );
/* Do some thread-safe processing! */
/*Leave the critical section -- other threads can now pthread_mutex_lock() */
pthread_mutex_unlock( &cs_mutex );
}
Example Code For Critical Sections with Win32 API
/* Sample C/C++, Windows, link to kernel32.dll */
#include <windows.h>
static CRITICAL_SECTION cs; /* This is the critical section object -- once initialized,
it cannot be moved in memory */
/* If you program in OOP, declare this in your class */
/* Initialize the critical section before entering multi-threaded context. */
InitializeCriticalSection(&cs);
void f()
{
/* Enter the critical section -- other threads are locked out */
EnterCriticalSection(&cs);
/* Do some thread-safe processing! */
/* Leave the critical section -- other threads can now EnterCriticalSection() */
LeaveCriticalSection(&cs);
}
/* Release system object when all finished -- usually at the end of the cleanup code */
DeleteCriticalSection(&cs);
Note that on Windows NT (not 9x/ME), the function TryEnterCriticalSection() can be used to attempt to enter the critical section. This function returns immediately so that the thread can do other things if it fails to enter the critical section (usually due to another thread having locked it). With the pthreads library, the equivalent function is pthread_mutex_trylock(). Note that the use of a CriticalSection is not the same as a Win32 Mutex, which is an object used for inter-process synchronization. A Win32 CriticalSection is for intra-process synchronization (and is much faster as far as lock times), however it cannot be shared across processes.
This section does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (July 2007) |
Typically, critical sections prevent process and thread migration between processors and the preemption of processes and threads by interrupts and other processes and threads.
Critical sections often allow nesting. Nesting allows multiple critical sections to be entered and exited at little cost.
If the scheduler interrupts the current process or thread in a critical section, the scheduler will either allow the process or thread to run to completion of the critical section, or it will schedule the process or thread for another complete quantum. The scheduler will not migrate the process or thread to another processor, and it will not schedule another process or thread to run while the current process or thread is in a critical section.
Similarly, if an interrupt occurs in a critical section, the interrupt's information is recorded for future processing, and execution is returned to the process or thread in the critical section. Once the critical section is exited, and in some cases the scheduled quantum completes, the pending interrupt will be executed.
Since critical sections may execute only on the processor on which they are entered, synchronization is only required within the executing processor. This allows critical sections to be entered and exited at almost zero cost. No interprocessor synchronization is required, only instruction stream synchronization. Most processors provide the required amount of synchronization by the simple act of interrupting the current execution state. This allows critical sections in most cases to be nothing more than a per processor count of critical sections entered.
Performance enhancements include executing pending interrupts at the exit of all critical sections and allowing the scheduler to run at the exit of all critical sections. Furthermore, pending interrupts may be transferred to other processors for execution.
Critical sections should not be used as a long-lived locking primitive. They should be short enough that the critical section will be entered, executed, and exited without any interrupts occurring, from neither hardware much less the scheduler.
Kernel Level Critical Sections are the base of the software lockout issue.
Critical Section documentation on the MSDN Library homepage: http://msdn2.microsoft.com/en-us/library/ms682530.aspx
1.3 What is the main advantage of multiprogramming?
Answer: Multiprogramming makes efficient use of the CPU by overlapping the demands for the CPU and its I/O devices from various users. It attempts to increase CPU utilization by always having something for the CPU to execute.
1.5 In a multiprogramming and time-sharing environment, several users share the system simultaneously. This situation can result in various security problems.
a. What are two such problems?
b. Can we ensure the same degree of security in a time-shared machine as we have in a
dedicated machine? Explain your answer.
Answer:
a. Stealing or copying one’s programs or data; using system resources (CPU, memory, disk space, peripherals) without proper accounting.
b. Probably not, since any protection scheme devised by humans can inevitably be broken by a human, and the more complex the scheme, the more difficult it is to feel
confident of its correct implementation.
1.9 Describe the differences between symmetric and asymmetric multiprocessing. What are three advantages and one disadvantage of multiprocessor systems?
Answer: Symmetric multiprocessing treats all processors as equals, and I/O can be processed on any CPU. Asymmetric multiprocessing has one master CPU and the remainder CPUs are slaves. The master distributes tasks among the slaves, and I/O is usually done by themaster only. Multiprocessors can savemoney by not duplicating power supplies, housings, and peripherals. They can execute programs more quickly and can have increased reliability. They are also more complex in both hardware and software than uniprocessor systems.
1.10 What is the main difficulty that a programmer must overcome in writing an operating system for a real-time environment?
Answer: The main difficulty is keeping the operating system within the fixed time constraints of a real-time system. If the system does not complete a task in a certain time
frame, it may cause a breakdown of the entire system it is running. Therefore when writing an operating system for a real-time system, the writer must be sure that his scheduling schemes don’t allow response time to exceed the time constraint.
2.1 Prefetching is a method of overlapping the I/O of a job with that job’s own computation.
The idea is simple. After a read operation completes and the job is about to start operating on the data, the input device is instructed to begin the next read immediately. The CPU and input device are then both busy. With luck, by the time the job is ready for the next data item, the input device will have finished reading that data item. The CPU can then begin processing the newly read data, while the input device starts to read the following data.
A similar idea can be used for output. In this case, the job creates data that are put into a buffer until an output device can accept them. Compare the prefetching scheme with the spooling scheme, where the CPU overlaps the input of one job with the computation and output of other jobs.
Answer: Prefetching is a user-based activity, while spooling is a system-based activity.
Spooling is a much more effective way of overlapping I/O and CPU operations.
2.3 What are the differences between a trap and an interrupt? What is the use of each function?
An interrupt is a hardware-generated change-of-flow within the system. An interrupt handler is summoned to deal with the cause of the interrupt; control is then re-turned to the interrupted context and instruction.
A trap is a software-generated interrupt.
An interrupt can be used to signal the completion of an I/O to obviate the need for device polling.
A trap can be used to call operating system routines or to catch arithmetic errors.
V7::::::::
浠庡疄鏃惰幏鍙栫殑瑙掑害鏉ヨ錛岄渶瑕佸惎涓涓嚎紼嬶紝鎺ユ敹Message Middleware娑堟伅錛岀劧鍚庡仛鍦烘櫙闇瑕佺殑澶勭悊銆傚垱寤虹嚎紼嬬殑鍑芥暟濡備笅鎵紺猴細(xì)
// for compilers which have it, we should use C RTL function for thread
// creation instead of Win32 API one because otherwise we will have memory
// leaks if the thread uses C RTL (and most threads do)
#if defined(__VISUALC__) || \
(defined(__BORLANDC__) && (__BORLANDC__ >= 0x500)) || \
(defined(__GNUG__) && defined(__MSVCRT__))
typedef unsigned (__stdcall *RtlThreadStart)(void *);
m_hThread = (HANDLE)_beginthreadex(NULL, 0,
(RtlThreadStart)
wxThreadInternal::WinThreadStart,
thread, CREATE_SUSPENDED,
(unsigned int *)&m_tid);
#else // compiler doesn't have _beginthreadex
m_hThread = ::CreateThread
(
NULL, // default security
0, // default stack size
(LPTHREAD_START_ROUTINE) // thread entry point
wxThreadInternal::WinThreadStart, // the function that runs under thread
(LPVOID)thread, // parameter
CREATE_SUSPENDED, // flags
&m_tid // [out] thread id
);
#endif // _beginthreadex/CreateThread
note: there should be a function definition before these lines.eg:
DWORD wxThreadInternal::WinThreadStart(wxThread *thread)