青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

posts - 319, comments - 22, trackbacks - 0, articles - 11
  C++博客 :: 首頁 :: 新隨筆 :: 聯系 :: 聚合  :: 管理

Parallel Programming Essentials via the Intel TBB

Posted on 2011-09-15 07:05 RTY 閱讀(657) 評論(0)  編輯 收藏 引用 所屬分類: TBB需要注意的
http://www.codeproject.com/KB/Parallel_Programming/Threading_Blocks.aspx

Introduction

The Intel TBB was released by Intel during their movement to enhance system performance by cores, rather than by Gigahertz. It was meant for their use, but they released a trial download and an open source edition. It will scale to function as a powerful tool for those going into parallel programming and multi-core programming. Note that some powerful sample code can be downloaded from http://www.threadingbuildingblocks.org/.

Preface

Sometimes, the terms concurrency and parallelism are used interchangeably. But a multithreaded application can have threads execute simultaneously while residing on one physical chip. Parallelism, however, is achieved only when those concurrent threads are executing on separate cores of a multicore chip. When we write an algorithm to use concurrency, our most important design choice should be to figure out how to partition the original problem into individual sub-parts. Because most programs spend a lot of their execution time running loops over an iteration range, we should standardize the use of the Intel TBB. For example, loops which use C++ STL iterators loop over the contents of a collection of data. This iteration space must be determined before we can parallelize those loops. IntelTBB is a C++ template-based library for loop-level parallelism that concentrates on defining tasks rather than explicit threads. The components of TBB include generic parallel algorithms, concurrent containers, low-level synchronization primitives, and a task scheduler. At the time of this writing, TBB 3.0 is the most recent version. Programmers usingTBB can parallelize the execution of loop iterations by treating chunks of iterations as tasks and allowing the TBBtask scheduler to determine the task sizes, number of threads to use assignment of tasks to those threads, and how those threads are scheduled for execution. The task scheduler will give precedence to tasks that have been most recently in a core with the idea of making best use of the cache that likely contains the task’s data. When thinking tasks as opposed to threads, we should be concerned about performance, load balance between different sub-problems, data sharing, control, and data dependencies among the sub-problems.

The task scheduler utilizes a task-stealing mechanism to load balance the execution. Furthermore, breaking a program into separate functional blocks and assigning a separate thread to each block is a solution that usually does not scale well because, typically, the number of functional blocks is fixed. In contrast, Threading Building Blocks emphasizes data-parallel programming, enabling multiple threads to work most efficiently together. Data-parallel programming scales well to larger numbers of processors by dividing a data set into smaller pieces. Breaking apart a function into smaller bits of work will limit scalability and possibly render a performance hit, as the number of methods in a program is rarely dynamic. Taking a data parallelism approach resolves that issue by breaking the iteration space into independent units of work. Any thread that uses an algorithm template from the library or the task scheduler must have an initialized tbb::task_scheduler_init object. A thread may have more than one of these objects initialized at a time. The task scheduler shuts down when all task_scheduler_init objects terminate. By default, the constructor for task_scheduler_init does the initialization and the destructor does the termination. Thus, declaring a task_scheduler_init in main( ), as in the example shown below, both starts and shuts down the scheduler.

#include "tbb/task_scheduler_init.h"
using namespace tbb;
int main( ) {
task_scheduler_init init;
...
return 0;
}

The parallel_for template parallelizes tasks that are contained within a for loop. The template requires two parameters: a range type over which to iterate and a body type that executes iterations over the range or a subrange. The range class must define a copy constructor and a destructor, the methods is_empty() (which returns TRUE if the range is empty) and is_divisible() (which returns TRUE if the range can be split), and a splitting constructor (to divide the range in half). A partitioner class object can be used to heuristically find the smallest number of iterations that should be assigned. The TBB library contains two predefined range types:blocked_range and blocked_range2D. These ranges are used for single- and two dimensional ranges, respectively. The body class must define a copy constructor and a destructor as well as the operator(). Theoperator() will contain a copy of the original serial loop that has been modified to run over a subrange of values that come from the range type. The parallel_reduce template will iterate over a range and combine partial results computed by each task into a final (reduction) value. The range type for parallel_reduce has the same requirements as parallel_for. The body type needs a splitting constructor and a join method. The splitting constructor in the body copies read-only data required to run the loop body and to assign the identity element of the reduction operation that initializes the reduction variable. The join method combines partial results of tasks based on the reduction operation being used. So having that, let’s look at an example.

A Simple Example of Parallel Sum

The parallel sum operation can work with any associative and commutative combining operation. For example, the operations of multiplication, maximum, minimum, and some logical operations would all be appropriate. In a serial program, the commutative property of the operation is not important, since the order with which the serial execution combines the elements is the same each and every time. Nevertheless, to have any chance of performing the computation concurrently, we rely on this property of the operation, since concurrent execution does not impose a fixed schedule on the order of independent computations.

#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <ctype.h>
#include "parallel_reduce.h"
#include "blocked_range.h"
#include "task_scheduler_init.h"
#include "tick_count.h"

using namespace tbb;

// Uncomment the line below to enable the auto_partitioner
#define AUTO_GRAIN

struct Sum {
    float value;
    Sum() : value(0) {}
    Sum( Sum& s, split ) {value = 0;}
    void operator()( const blocked_range<float*>& range ) {
        float temp = value;
        for( float* a=range.begin(); a!=range.end(); ++a ) {
            temp += *a;
        }
        value = temp;
    }
    void join( Sum& rhs ) {value += rhs.value;}
};

float ParallelSum( float array[], size_t n ) {
    Sum total;
#ifndef AUTO_GRAIN
    parallel_reduce( blocked_range<float*>( array, array+n, 1000 ), 
                     total );
#else /* AUTO_GRAIN */
    parallel_reduce( blocked_range<float*>( array, array+n ), 
                     total, auto_partitioner() );
#endif /* AUTO_GRAIN */
    return total.value;
}

//! Problem size
const int N = 1000000;

//! Number of threads to use.
static int MaxThreads = 4;

//! If true, print out bits of the array
static bool VerboseFlag = false;

//! Parse the command line.
static void ParseCommandLine( int argc, char* argv[] ) {
    int i = 1;
    if( i<argc )="=0" verboseflag="true;"><argc>
	<argc maxthreads="strtol(argv[i++],0,0);" i="0;">< N; ++i ) {
      input[i] = (float)(rand() % 10);
    }

    if( VerboseFlag ) {
      printf(" Input: ");
      for ( size_t i = 0; i < 7; ++i ) {
        printf("%7.2f ", input[i]);
      }
      printf("...\n");
    }

    // Try different numbers of threads
    for( int p=1; p<=MaxThreads; ++p ) {

        task_scheduler_init init(p);

        tick_count t0 = tick_count::now();
        float output = ParallelSum(input, N);
        tick_count t1 = tick_count::now();
        
        if( VerboseFlag ) {
          printf("Output: %7.2f\n", output);
        }
        printf("%2d threads: %.3f msec\n", p, (t1-t0).seconds()*1000);
    }
    return 0;
}

This Intel code was compiled via the Intel C++ Professional Compiler, a compiler that when installed, integrates into Microsoft Visual Studio 2008 or 2010. Once the installation is completed, then it is time to install the performance library Intel TBB 3.0. The included documentation will step you through configuring the environment in order for the compiler to reference the include header files and the Lib files. Notice the results:

 1 threads: 6.513 msec
 2 threads: 3.333 msec
 3 threads: 3.496 msec
 4 threads: 3.345 msec 

It turns out that executing 4 threads in parallel via the Intel TBB takes the time that the execution of one thread does. Another performance gain is how the Intel TBB library manages to keep things thread-safe. After ascertaining that the body is safe to run in parallel, then the simplest approach to parallelizing the loop would be to divide the size of the loop (the upper range – lowest range – assuming that the iterations are in ascending order) by the number of processors. This approach however, is best left to the task scheduler. A simple division process could easily lead to an inefficient use those processors.

Now consider a composite integer, composite meaning that the integer is either odd or even. We know that a prime number is divisible by itself and one. But any composite integer can be broken down into a factor of primes. Therefore, calculating and then locating the number of primes in a range of integers would involve a continual computational procedure. Examine this code below:

#include <cctype>
#include <cstdlib>
#include <cstdio>
#include <cstring>
#include <math.h>
#include <cassert>
#include "parallel_reduce.h"
#include "task_scheduler_init.h"
#include "tick_count.h"

using namespace std;
using namespace tbb;

typedef unsigned long Number;

static bool PrintPrimes = false;

static Number GrainSize = 1000;

class Multiples {
    inline Number strike( Number start, 
                          Number limit, 
                          Number stride ) {
        
bool* is_composite = my_is_composite;

assert( stride>=2 );

for(start =0;start < limit; start+="stride" i+="2" +="m&1;" i="3;"
 n_factor="0;"
 my_factor="new" my_striker="new" 
my_is_composite="new"
 m="Number(sqrt(double(n)));" 
is_composite[start]="true;" k="0;"
is_composite="my_is_composite;" my_factor[n_factor++]="i;" 
my_striker[n_factor]="strike(" 

As we would expect, the higher degree of parallelism we go, the quicker the computation outputs its results:

#primes from [2..100000000] = 5761455 (1.10 sec with serial code)
#primes from [2..100000000] = 5761455 (1.10 sec with 1-way parallelism)
#primes from [2..100000000] = 5761455 (0.59 sec with 2-way parallelism)
#primes from [2..100000000] = 5761455 (0.57 sec with 3-way parallelism)
#primes from [2..100000000] = 5761455 (0.60 sec with 4-way parallelism)

Where Are We Now?

Up to this point, we have been noting that sometimes part of a computation involves computing the summation of a large set of values. If multiple processors are available, instead of adding the values together sequentially, the set can be partitioned and the summations of the subsets computed simultaneously, each on a different processor. The partial sums are then combined to get the final answer. Thus, using multiple processors to compute in parallel may allow us to obtain a solution sooner. Also, if each processor has its own memory, partitioning the data between the processors may allow larger problems to be handled than could be handled on a single processor. So if we wanted to form a set of integers that start at 1, and goes from 1/1 +n.. we could add them but the sequence would not end at the number 2. The sequence would be an infinite set of fractional values that approach to 2 as a limit, never actually equaling 2. But the more values that we add, the smaller they become.

Here is a simple seismic wave simulation (wave propagation) based on parallel_for and blocked_range. The main body of the program steps through the simulation of a seismic wave in a core loop that sets the impulse from the source of the disturbance, does the two tough computations of stress updates and velocities, and finally cleans the edges of the simulation. Seismic waves are waves of sound that travel through the core of the earth, perhaps because of the result of an earthquake or explosion. Because sound waves form a chain of waves (similar to how a train gets louder upon arrival; upon departure, the sound minimizes until it cannot be heard. That is the end of the chain of sound waves). In earlier times, scientists would blow some dynamite. With a device (similar in principle to a seismograph) that would track the directions of the sound waves. One direction meant diffraction off of a denser medium, and they learned by following those waves until diffraction would cause a change in direction. After a while, they would that one type of change in direction on the graph, mine that area, and perhaps would find salt. Sometimes they would even find oil, but their intentions were to find gold. Similarly, when a particle has been displaced, some force displaced it. It has moved in a vector path, meaning the displacement is measured by both the direction and magnitude. Tracing the path, and then plotting each point of that path on a grid would reveal where the change in direction took place.

I introduce this program because it espouses an underlying concept used in DNA sequencing, etc. in computer animation, rendering is the step where information from the animation files, such as lighting, textures, and shading, is applied to 3D models to generate the 2D image that makes up a frame of the film. Parallel computing is essential to generate the needed number of frames (24 per second) for a feature-length film. Now notice the rippling effect in the image below:

wave.jpg

Seismic Wave Propagation

//#define _CRT_SECURE_NO_DEPRECATE
#define VIDEO_WINMAIN_ARGS
#include "../../common/gui/video.h"
#include <cstring>
#include <cstdlib>
#include <math.h
#include <cctype>
#include <cstdio>
#include <cassert>
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/tick_count.h"

using namespace std;

#ifdef _MSC_VER
// warning C4068: unknown pragma
#pragma warning(disable: 4068)
#endif

#define DEFAULT_NUMBER_OF_FRAMES 100
int number_of_frames = -1;
const size_t MAX_WIDTH = 1024;
const size_t MAX_HEIGHT = 512;

int UniverseHeight=MAX_HEIGHT;
int UniverseWidth=MAX_WIDTH;

typedef float value;

//! Velocity at each grid point
static value V[MAX_HEIGHT][MAX_WIDTH];

//! Horizontal stress
static value S[MAX_HEIGHT][MAX_WIDTH];

//! Vertical stress
static value T[MAX_HEIGHT][MAX_WIDTH];

//! Coefficient related to modulus
static value M[MAX_HEIGHT][MAX_WIDTH];

//! Coefficient related to lightness
static value L[MAX_HEIGHT][MAX_WIDTH];

//! Damping coefficients
static value D[MAX_HEIGHT][MAX_WIDTH];

static tbb::affinity_partitioner Affinity;

enum MaterialType {
    WATER=0,
    SANDSTONE=1,
    SHALE=2
};

//! Values are MaterialType, cast to an unsigned char to save space.
static unsigned char Material[MAX_HEIGHT][MAX_WIDTH];

static const colorcomp_t MaterialColor[4][3] = { // BGR
    {96,0,0},     // WATER
    {0,48,48},    // SANDSTONE
    {32,32,23}    // SHALE
};

static const int DamperSize = 32;

static const int ColorMapSize = 1024;
static color_t ColorMap[4][ColorMapSize];

static int PulseTime = 100;
static int PulseCounter;
static int PulseX = UniverseWidth/3;
static int PulseY = UniverseHeight/4;

static bool InitIsParallel = true;
const char *titles[2] = {"Seismic Simulation: Serial", "Seismic Simulation: Parallel"};
int threads_low = 0, threads_high = tbb::task_scheduler_init::automatic;

static void UpdatePulse() {
    if( PulseCounter>0 ) {
        value t = (PulseCounter-PulseTime/2)*0.05f;
        V[PulseY][PulseX] += 64*sqrt(M[PulseY][PulseX])*exp(-t*t);
        --PulseCounter;
    }
}

static void SerialUpdateStress() {
    drawing_area drawing(0, 0, UniverseWidth, UniverseHeight);
    for( int i=1; i<universeheight-1; j="1;">
	<universewidth-1; +="M[i][j]*(V[i][j+1]-V[i][j]);" 
	index="(int)(V[i][j]*(ColorMapSize/2))"><0 ) index = 0;
            if( index>=ColorMapSize ) index = ColorMapSize-1;
            color_t* c = ColorMap[Material[i][j]];
            drawing.put_pixel(c[index]);
        }
    }
}

struct UpdateStressBody {
    void operator()( const tbb::blocked_range<int>& range ) const {
        drawing_area drawing(0, range.begin(), 
UniverseWidth, range.end()-range.begin());
        int i_end = range.end();
        for( int y = 0, i=range.begin(); i!=i_end; ++i,y++ ) {
            drawing.set_pos(1, y);
#pragma ivdep
            for( int j=1; j<universewidth-1; 
		+="M[i][j]*(V[i][j+1]-V[i][j]);" 
		index="(int)(V[i][j]*(ColorMapSize/2))"><0 ) index = 0;
                if( index>=ColorMapSize ) index = ColorMapSize-1;
                color_t* c = ColorMap[Material[i][j]];
                drawing.put_pixel(c[index]);
            }
        }
    }
};

static void ParallelUpdateStress() {
    tbb::parallel_for( tbb::blocked_range<int>( 1, UniverseHeight-1 ), 
// Index space for loop
                       UpdateStressBody(),                            
 // Body of loop
                       Affinity );                                     
// Affinity hint
}

static void SerialUpdateVelocity() {
    for( int i=1; i<universeheight-1; j="1;">
		<universewidth-1; v[i][j]="D[i][j]*(V[i][j]"><int>& range ) const {
        int i_end = range.end();
        for( int i=range.begin(); i!=i_end; ++i ) 
#pragma ivdep
            for( int j=1; j<universewidth-1; 
		v[i][j]="D[i][j]*(V[i][j]"><int>( 1, 
		UniverseHeight-1 ), // Index space for loop
                       UpdateVelocityBody(),                           

                       Affinity );                                    
 
}

void SerialUpdateUniverse() {
    UpdatePulse();
    SerialUpdateStress();
    SerialUpdateVelocity();
}

void ParallelUpdateUniverse() {
    UpdatePulse();
    ParallelUpdateStress();
    ParallelUpdateVelocity();
}

class seismic_video : public video
{
    void on_mouse(int x, int y, int key) {
        if(key == 1 && PulseCounter == 0) {
            PulseCounter = PulseTime;
            PulseX = x; PulseY = y;
        }
    }
    void on_key(int key) {
        key &= 0xff;
        if(char(key) == ' ') InitIsParallel = !InitIsParallel;
        else if(char(key) == 'p') InitIsParallel = true;
        else if(char(key) == 's') InitIsParallel = false;
        else if(char(key) == 'e') updating = true;
        else if(char(key) == 'd') updating = false;
        else if(key == 27) running = false;
        title = InitIsParallel?titles[1]:titles[0];
    }
    void on_process() {
        tbb::task_scheduler_init Init(threads_high);
        do {
            if( InitIsParallel )
                ParallelUpdateUniverse();
            else
                SerialUpdateUniverse();
            if( number_of_frames > 0 ) --number_of_frames;
        } while(next_frame() && number_of_frames);
    }
} video;

void InitializeUniverse() {
    PulseCounter = PulseTime;
    // Initialize V, S, and T to slightly non-zero values,
 in order to avoid denormal waves.
    for( int i=0; i<universeheight; j="0;"><universewidth; 
		t[i][j]="S[i][j]" i="1;"><universeheight-1; j="1;">
		<universewidth-1; x="float(j-UniverseWidth/2)/
		(UniverseWidth/2);" t="(value)i/UniverseHeight;" 
		d[i][j]="1.0;"><0.3f ) {
                m = WATER;
                M[i][j] = 0.125;
                L[i][j] = 0.125;
            } else if( fabs(t-0.7+0.2*exp(-8*x*x)+0.025*x)<=0.1 ) {
                m = SHALE;
                M[i][j] = 0.5;
                L[i][j] = 0.6;
            } else {
                m = SANDSTONE;
                M[i][j] = 0.3;
                L[i][j] = 0.4;
            } 
            Material[i][j] = m;
        }
    }
    value scale = 2.0f/ColorMapSize;
    for( int k=0; k<4; ++k ) {
        for( int i=0; i<colormapsize; t="(i-ColorMapSize/2)*scale;" r="t">0 ? t : 0;
            value b = t<0 ? -t : 0;
            value g = 0.5f*fabs(t);
            memcpy(c, MaterialColor[k], sizeof(c));
            c[2] = colorcomp_t(r*(255-c[2])+c[2]);
            c[1] = colorcomp_t(g*(255-c[1])+c[1]);
            c[0] = colorcomp_t(b*(255-c[0])+c[0]);
            ColorMap[k][i] = video.get_color(c[2], c[1], c[0]);
        }
    }
    // Set damping coefficients around border to reduce reflections from boundaries.
    value d = 1.0;
    for( int k=DamperSize-1; k>0; --k ) {
        d *= 1-1.0f/(DamperSize*DamperSize);
        for( int j=1; j<universewidth-1; *="d;" i="1;">
		<universeheight-1; *="d;"> 1 && isdigit(argv[1][0])) {
        char* end; threads_high = threads_low = (int)strtol(argv[1],&end,0);
        switch( *end ) {
            case ':': threads_high = (int)strtol(end+1,0,0); break;
            case '\0': break;
            default: printf("unexpected character = %c\n",*end);
        }
    }
    if (argc > 2 && isdigit(argv[2][0])){
        number_of_frames = (int)strtol(argv[2],0,0);
    }
    // video layer init
    video.title = InitIsParallel?titles[1]:titles[0];
#ifdef _WINDOWS
    #define MAX_LOADSTRING 100
    TCHAR szWindowClass[MAX_LOADSTRING];    // the main window class name
    LoadStringA(video::win_hInstance, 
    IDC_SEISMICSIMULATION, szWindowClass, MAX_LOADSTRING);
    LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam);
    WNDCLASSEX wcex; memset(&wcex, 0, sizeof(wcex));
    wcex.lpfnWndProc    = (WNDPROC)WndProc;
    wcex.hIcon          = 
     LoadIcon(video::win_hInstance, MAKEINTRESOURCE(IDI_SEISMICSIMULATION));
    wcex.hCursor        = LoadCursor(NULL, IDC_ARROW);
    wcex.hbrBackground  = (HBRUSH)(COLOR_WINDOW+1);
    wcex.lpszMenuName   = LPCTSTR(IDC_SEISMICSIMULATION);
    wcex.lpszClassName  = szWindowClass;
    wcex.hIconSm        = LoadIcon(video::win_hInstance, MAKEINTRESOURCE(IDI_SMALL));
    video.win_set_class(wcex); // ascii convention here
    video.win_load_accelerators(IDC_SEISMICSIMULATION);
#endif
    if(video.init_window(UniverseWidth, UniverseHeight)) {
        video.calc_fps = true;
        video.threaded = threads_low > 0;
        // video is ok, init universe
        InitializeUniverse();
        // main loop
        video.main_loop();
    }
    else if(video.init_console()) {
        // do console mode
        if(number_of_frames <= 0) number_of_frames = DEFAULT_NUMBER_OF_FRAMES;
        if(threads_high == tbb::task_scheduler_init::automatic) threads_high = 4;
        if(threads_high < threads_low) threads_high = threads_low;
        for( int p = threads_low; p <= threads_high; ++p ) {
            InitializeUniverse();
            tbb::task_scheduler_init init(tbb::task_scheduler_init::deferred);
            if( p > 0 )
                init.initialize( p );
            tbb::tick_count t0 = tbb::tick_count::now();
            if( p > 0 )
                for( int i=0; i<number_of_frames; i="0;">
		<number_of_frames; t1="tbb::tick_count::now();"> 0 ) 
                printf(" with %d way parallelism\n",p);
            else
                printf(" with serial version\n"); 
        }
    }
    video.terminate();
    return 0;
}

#ifdef _WINDOWS
LRESULT CALLBACK About(HWND hDlg, UINT message, WPARAM wParam, LPARAM lParam)
{
    switch (message)
    {
    case WM_INITDIALOG: return TRUE;
    case WM_COMMAND:
        if (LOWORD(wParam) == IDOK || LOWORD(wParam) == IDCANCEL) {
            EndDialog(hDlg, LOWORD(wParam));
            return TRUE;
        }
        break;
    }
    return FALSE;
}

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
    int wmId, wmEvent;
    switch (message) {
    case WM_COMMAND:
        wmId    = LOWORD(wParam); 
        wmEvent = HIWORD(wParam); 
        // Parse the menu selections:
        switch (wmId)
        {
        case IDM_ABOUT:
            DialogBox(video::win_hInstance, MAKEINTRESOURCE(IDD_ABOUTBOX), hWnd, 
            (DLGPROC)About);
            break;
        case IDM_EXIT:
            PostQuitMessage(0);
            break;
        case ID_FILE_PARALLEL:
            if( !InitIsParallel ) {
                InitIsParallel = true;
                video.title = titles[1];
            }
            break;
        case ID_FILE_SERIAL:
            if( InitIsParallel ) {
                InitIsParallel = false;
                video.title = titles[0];
            }
            break;
        case ID_FILE_ENABLEGUI:
            video.updating = true;
            break;
        case ID_FILE_DISABLEGUI:
            video.updating = false;
            break;
        default:
            return DefWindowProc(hWnd, message, wParam, lParam);
        }
        break;
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    return 0;
}

#endif

Admittedly, the concept of wave propagation is difficult enough with adding both a serial and a parallel approach. But the number of professional areas that use parallel programming is astounding. This is why choosing a powerful threading library is essential. Examine the output. The image takes a few minutes to complete:

a.jpg

A disturbance occurs in the earth or some elastic medium and triggers a chain, or sequence of waves, waves that traveling with velocity and direction. Acceleration is not really an accurate measure. Now examine the image after around 30 seconds later. The frames second pretty much remain in the same range:

b.jpg

The Ray Tracer Example

The MSDN Parallel Computing Center offers a download of parallel programming examples, both in native code and managed code. The Ray Tracer examples has become famous because Ray tracing is fundamentally a “delightfully parallel problem”. Each individual pixel in the image is generated by firing an imaginary ray of light, examining the color of that ray as it bounces off of and through objects in the scene, and storing the resulting color. Every pixel is thus independent of every other pixel, allowing them all to be processed in parallel. The example can be downloaded fromhttp://code.msdn.microsoft.com/ParExtSamples. Now if we consider some underlying principles that involve medical imaging, then we could take a look at an X-Ray. An x-ray has the shortest wave length of any of the light waves that there are. An x-ray’s frequency is higher than ultra-violet waves. So when shot from an x-ray tube, the sudden burst captures solid body images. Nowadays they use gamma rays. But in any event, the resolution is always poor.

To solve this problem, models are designed to show how radiation propagates through the body. Randomly selected points within the body are assumed to emit radiation (usually a gamma ray), and the trajectory of each ray is followed. As a particle (ray) passes through the body, it is attenuated by the different organs it traverses, continuing until the particle leaves the body and hits a camera model, thereby defining a full trajectory. To create a statistically significant simulation, thousands, if not millions, of trajectories are followed. This problem can be parallelized in two ways. Because each trajectory is independent, it is possible to parallelize the application by associating each trajectory with a task or data parallelization.

So, at this point, we will conclude with a simple example of a 2-D Ray Tracer, a tea cup. The code executes by having a data file that contains the image modeling of an image. Then we execute the program and note quicker the frames per second calculations are performed, as well as the improved resolution. The code is downloadable at the top section of this article. The reader should download a trial evaluation of the Intel C++ Compiler. Once installed, you’ll see how it integrates with Visual Studio 2008 as a built-in. Next, install the Intel TBB beta 3.0 version as your threading library. After that, extract the files into a newly-made folder within your projects directory folder . But we are going to use the “nmake” utility to build the files and their executables. This will be made more clear shortly.

tea.jpg

The command that provided the above image is:

C>\..\tachyon.tbb.exe teacup.dat

To build this executable, and other comparison executable that runs with the files held in the dat directory, then build a directory to unzip the downloadable dat.zip files. Then set the environmental variables because we are going to use “nmake” from VS Studio 2008. We first start by configuring the Intel compilers path, and then run a batch script to set the environment. Afterwards, we do the same for the Microsoft VC++ compiler and run a similar batch script to set the environment:

set PATH=%PATH%;.;c:\Program Files\Intel\compiler\11.1\065\bin\ia32
iclvars_ia32.bat
set PATH=%PATH%;.;C:\Program Files\Microsoft Visual Studio 9.0\VC\bin
vcvars32.bat
(now run nmake)
nmake.exe 

The build will show balls appearing after serial and parallel executions. Thus, the files are balls.datteapot.dat, etc. Try and copy and then paste those .dat files into the prior directory where nmake built the executables.

References

  • Intel Threading Building Blocks, by James Reiner
  • Patterns of Parallel Programming by Stephen Toub

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

logicchild

Other
Pref. Trust
United States United States

Member
I started electronics training at age 33. I began studying microprocessor technology in an RF communications oriented program. I am 43 years old now. I have studied C code, opcode (mainly x86 and AT+T) for around 3 years in order to learn how to recognize viral code and the use of procedural languages. I am currently learning C# and the other virtual runtime system languages. I guess I started with the egg rather than the chicken. My past work would indicate that my primary strength is in applied mathematics.


青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            免费中文日韩| 欧美怡红院视频| 亚洲国产精品一区二区第一页| 欧美亚洲免费| 在线观看国产成人av片| 欧美电影在线免费观看网站| 欧美1区2区| 一区二区精品| 亚洲欧美日韩精品在线| 国产一区二区中文字幕免费看| 久久综合国产精品| 欧美1区2区视频| 亚洲女同同性videoxma| 欧美一区观看| 日韩午夜电影av| 欧美区亚洲区| 亚洲免费不卡| 亚洲欧洲99久久| 亚洲欧洲在线视频| 亚洲欧美日韩国产精品| 亚洲国产裸拍裸体视频在线观看乱了中文 | 亚洲一区二区高清视频| 国内成人精品2018免费看| 欧美激情一区二区久久久| 欧美视频导航| 免费欧美高清视频| 欧美日韩一级大片网址| 久久久久久亚洲精品杨幂换脸 | 可以看av的网站久久看| 亚洲视频在线一区| 久久日韩精品| 亚洲欧美区自拍先锋| 久久五月天婷婷| 亚洲欧美变态国产另类| 老司机成人网| 久久国产精品99国产精| 欧美日韩国产综合视频在线观看| 欧美一区二区播放| 欧美视频一区二区三区四区| 欧美成人午夜激情视频| 国产欧美一区二区三区视频| 亚洲精品欧美日韩专区| 亚洲大胆女人| 欧美在线观看网址综合| 午夜精品99久久免费| 欧美噜噜久久久xxx| 欧美成人在线免费观看| 国内一区二区在线视频观看| 中文精品一区二区三区| 亚洲美女毛片| 欧美jizz19性欧美| 美女免费视频一区| 影音先锋久久| 久久久亚洲成人| 久久理论片午夜琪琪电影网| 国产精自产拍久久久久久| 在线亚洲免费| 亚洲一区二区在线免费观看视频| 欧美激情欧美激情在线五月| 欧美成年人视频网站| 在线日韩中文字幕| 久久综合久久美利坚合众国| 毛片基地黄久久久久久天堂| 国产永久精品大片wwwapp| 午夜精品亚洲| 欧美在线视频观看| 国产日韩一级二级三级| 午夜视频一区二区| 久久精品国产免费观看| 国产综合久久久久影院| 欧美一级日韩一级| 玖玖精品视频| 日韩午夜中文字幕| 欧美日韩在线大尺度| 国产精品99久久久久久有的能看| 亚洲在线成人精品| 国产欧美日韩视频在线观看| 亚洲一区二区欧美日韩| 蜜桃av一区二区在线观看| 亚洲电影第三页| 亚洲美女尤物影院| 欧美视频在线播放| 亚洲欧美三级伦理| 欧美成人免费一级人片100| 亚洲精品中文字幕有码专区| 欧美精品性视频| 亚洲女同同性videoxma| 久久永久免费| 亚洲精品一区二区在线观看| 欧美调教视频| 久久电影一区| 亚洲三级影片| 久久久久五月天| 亚洲免费成人av| 国产亚洲精品久久久久婷婷瑜伽| 久久影院午夜片一区| 99国产精品久久| 免费看精品久久片| 亚洲一级二级在线| 黑人一区二区三区四区五区| 牛牛国产精品| 欧美伊人影院| 日韩午夜三级在线| 农夫在线精品视频免费观看| 亚洲四色影视在线观看| 激情国产一区| 国产精品视频久久久| 欧美成人精品一区| 亚洲欧美日韩精品久久久久| 亚洲国产美女久久久久| 欧美在线一二三| 亚洲视频一起| 亚洲人体大胆视频| 国产在线观看91精品一区| 欧美另类综合| 欧美成人精品一区| 久久久久久网站| 午夜一区在线| 亚洲一区二区免费视频| 亚洲三级免费观看| 欧美二区在线播放| 久久久久国产精品一区二区| 亚洲免费综合| 亚洲网站在线播放| 99精品热视频| 亚洲三级电影全部在线观看高清| 国产主播一区二区三区四区| 国产精品久久国产精品99gif| 欧美精品一区二区三区视频| 免费在线观看日韩欧美| 久久久99国产精品免费| 欧美一区二区三区免费在线看| 一本久道久久久| 亚洲三级免费电影| 亚洲乱码一区二区| 亚洲精品日韩在线| 亚洲精品在线电影| 亚洲人成小说网站色在线| 欧美成人免费一级人片100| 麻豆成人在线| 欧美国产精品久久| 亚洲大胆人体视频| 亚洲国产精品久久久久秋霞不卡| 免费视频一区| 亚洲高清不卡在线| 亚洲人成高清| 一区二区三区视频在线观看| 一卡二卡3卡四卡高清精品视频| 亚洲另类春色国产| 在线视频亚洲欧美| 亚洲欧美国产三级| 欧美一区午夜视频在线观看| 久久精品国产亚洲一区二区三区 | 欧美r片在线| 午夜精品免费在线| 亚洲国产精品va在线看黑人| 亚洲二区在线观看| 亚洲开发第一视频在线播放| 一区二区三区四区五区精品| 亚洲欧美日本精品| 久久久国产精彩视频美女艺术照福利 | 欧美人体xx| 国产精品美女午夜av| 国产日韩欧美在线播放不卡| 韩国av一区二区三区四区| 亚洲精品1234| 亚洲一区二区三区高清| 欧美自拍偷拍| 欧美激情在线狂野欧美精品| 一区二区欧美精品| 久久精品国产一区二区电影| 欧美激情一区二区在线| 国产精品久久久久久久久久久久久久| 国产精品久久久久aaaa| 狠狠久久亚洲欧美| 中文欧美在线视频| 久久久久久久综合日本| 亚洲国产专区校园欧美| 亚洲一区二区网站| 老司机久久99久久精品播放免费 | 免费欧美在线视频| 国产精品都在这里| 亚洲国产精品欧美一二99| 欧美一区二区| 亚洲精品免费看| 久久久免费精品视频| 欧美日韩高清区| 激情综合电影网| 亚洲一区二区免费看| 欧美sm视频| 香蕉成人伊视频在线观看| 欧美日韩hd| 亚洲国产精选| 久久免费视频网站| 制服丝袜激情欧洲亚洲| 欧美成人一区二区三区片免费| 国产欧美一区二区精品忘忧草| 一本到高清视频免费精品| 免费久久精品视频| 欧美一区二区成人6969|