青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

OpenGL Performance Optimization(轉(zhuǎn))

SIGGRAPH '97

Course 24: OpenGL and Window System Integration

OpenGL Performance Optimization



Contents



1. Hardware vs. Software

OpenGL may be implemented by any combination of hardware and software. At the high-end, hardware may implement virtually all of OpenGL while at the low-end, OpenGL may be implemented entirely in software. In between are combination software/hardware implementations. More money buys more hardware and better performance.

Intro-level workstation hardware and the recent PC 3-D hardware typically implement point, line, and polygon rasterization in hardware but implement floating point transformations, lighting, and clipping in software. This is a good strategy since the bottleneck in 3-D rendering is usually rasterization and modern CPU's have sufficient floating point performance to handle the transformation stage.

OpenGL developers must remember that their application may be used on a wide variety of OpenGL implementations. Therefore one should consider using all possible optimizations, even those which have little return on the development system, since other systems may benefit greatly.

From this point of view it may seem wise to develop your application on a low-end system. There is a pitfall however; some operations which are cheep in software may be expensive in hardware. The moral is: test your application on a variety of systems to be sure the performance is dependable.



2. Application Organization

At first glance it may seem that the performance of interactive OpenGL applications is dominated by the performance of OpenGL itself. This may be true in some circumstances but be aware that the organization of the application is also significant.

2.1 High Level Organization

Multiprocessing

Some graphical applications have a substantial computational component other than 3-D rendering. Virtual reality applications must compute object interactions and collisions. Scientific visualization programs must compute analysis functions and graphical representations of data.

One should consider multiprocessing in these situations. By assigning rendering and computation to different threads they may be executed in parallel on multiprocessor computers.

For many applications, supporting multiprocessing is just a matter of partitioning the render and compute operations into separate threads which share common data structures and coordinate with synchronization primitives.

SGI's Performer is an example of a high level toolkit designed for this purpose.

Image quality vs. performance

In general, one wants high-speed animation and high-quality images in an OpenGL application. If you can't have both at once a reasonable compromise may be to render at low complexity during animation and high complexity for static images.

Complexity may refer to the geometric or rendering attributes of a database. Here are a few examples.

  • During interactive rotation (i.e. mouse button held down) render a reduced-polygon model. When drawing a static image draw the full polygon model.
  • During animation, disable dithering, smooth shading, and/or texturing. Enable them for the static image.
  • If texturing is required, use GL_NEAREST sampling and glHint( GL_PERSPECTIVE_CORRECTION_HINT, GL_FASTEST ).
  • During animation, disable antialiasing. Enable antialiasing for the static image.
  • Use coarser NURBS/evaluator tesselation during animation. Use glPolygonMode( GL_FRONT_AND_BACK, GL_LINE ) to inspect tesselation granularity and reduce if possible.

Level of detail management and culling

Objects which are distant from the viewer may be rendered with a reduced complexity model. This strategy reduces the demands on all stages of the graphics pipeline. Toolkits such as Inventor and Performer support this feature automatically.

Objects which are entirely outside of the field of view may be culled. This type of high level cull testing can be done efficiently with bounding boxes or spheres and have a major impact on performance. Again, toolkits such as Inventor and Performer have this feature.

2.2 Low Level Organization

The objects which are rendered with OpenGL have to be stored in some sort of data structure. Some data structures are more efficient than others with respect to how quickly they can be rendered.

Basically, one wants data structures which can be traversed quickly and passed to the graphics library in an efficient manner. For example, suppose we need to render a triangle strip. The data structure which stores the list of vertices may be implemented with a linked list or an array. Clearly the array can be traversed more quickly than a linked list. The way in which a vertex is stored in the data structure is also significant. High performance hardware can process vertexes specified by a pointer more quickly than those specified by three separate parameters.

An Example

Suppose we're writing an application which involves drawing a road map. One of the components of the database is a list of cities specified with a latitude, longitude and name. The data structure describing a city may be:
	struct city {
float latitute, longitude;	/* city location */
char *name;			/* city's name */
int large_flag;  		/* 0 = small, 1 = large */
};
A list of cities may be stored as an array of city structs.

Our first attempt at rendering this information may be:

	void draw_cities( int n, struct city citylist[] )
{
int i;
for (i=0; i < n; i++) {
if (citylist[i].large_flag) {
glPointSize( 4.0 );
}
else {
glPointSize( 2.0 );
}
glBegin( GL_POINTS );
glVertex2f( citylist[i].longitude, citylist[i].latitude );
glEnd();
glRasterPos2f( citylist[i].longitude, citylist[i].latitude );
glCallLists( strlen(citylist[i].name),
GL_BYTE,
citylist[i].name );
}
}
This is a poor implementation for a number of reasons:
  • glPointSize is called for every loop iteration.
  • only one point is drawn between glBegin and glEnd
  • the vertices aren't being specified in the most efficient manner
Here's a better implementation:
	void draw_cities( int n, struct city citylist[] )
{
int i;
/* draw small dots first */
glPointSize( 2.0 );
glBegin( GL_POINTS );
for (i=0; i < n ;i++) {
if (citylist[i].large_flag==0) {
glVertex2f( citylist[i].longitude, citylist[i].latitude );
}
}
glEnd();
/* draw large dots second */
glPointSize( 4.0 );
glBegin( GL_POINTS );
for (i=0; i < n ;i++) {
if (citylist[i].large_flag==1) {
glVertex2f( citylist[i].longitude, citylist[i].latitude );
}
}
glEnd();
/* draw city labels third */
for (i=0; i < n ;i++) {
glRasterPos2f( citylist[i].longitude, citylist[i].latitude );
glCallLists( strlen(citylist[i].name),
GL_BYTE,
citylist[i].name );
}
}
In this implementation we're only calling glPointSize twice and we're maximizing the number of vertices specified between glBegin and glEnd.

We can still do better, however. If we redesign the data structures used to represent the city information we can improve the efficiency of drawing the city points. For example:

	struct city_list {
int num_cities;		/* how many cities in the list */
float *position;	/* pointer to lat/lon coordinates */
char **name;		/* pointer to city names */
float size;		/* size of city points */
};
Now cities of different sizes are stored in separate lists. Position are stored sequentially in a dynamically allocated array. By reorganizing the data structures we've eliminated the need for a conditional inside the glBegin/glEnd loops. Also, we can render a list of cities using the GL_EXT_vertex_array extension if available, or at least use a more efficient version of glVertex and glRasterPos.
	/* indicates if server can do GL_EXT_vertex_array: */
GLboolean varray_available;
void draw_cities( struct city_list *list )
{
int i;
GLboolean use_begin_end;
/* draw the points */
glPointSize( list->size );
#ifdef GL_EXT_vertex_array
if (varray_available) {
glVertexPointerEXT( 2, GL_FLOAT, 0, list->num_cities, list->position );
glDrawArraysEXT( GL_POINTS, 0, list->num_cities );
use_begin_end = GL_FALSE;
}
else
#else
{
use_begin_end = GL_TRUE;
}
#endif
if (use_begin_end) {
glBegin(GL_POINTS);
for (i=0; i < list->num_cities; i++) {
glVertex2fv( &position[i*2] );
}
glEnd();
}
/* draw city labels */
for (i=0; i < list->num_cities ;i++) {
glRasterPos2fv( list->position[i*2] );
glCallLists( strlen(list->name[i]),
GL_BYTE, list->name[i] );
}
}
As this example shows, it's better to know something about efficient rendering techniques before designing the data structures. In many cases one has to find a compromize between data structures optimized for rendering and those optimized for clarity and convenience.

In the following sections the techniques for maximizing performance, as seen above, are explained.



3. OpenGL Optimization

There are many possibilities to improving OpenGL performance. The impact of any single optimization can vary a great deal depending on the OpenGL implementation. Interestingly, items which have a large impact on software renderers may have no effect on hardware renderers, and vice versa! For example, smooth shading can be expensive in software but free in hardware While glGet* can be cheap in software but expensive in hardware.

After each of the following techniques look for a bracketed list of symbols which relates the significance of the optimization to your OpenGL system:

  • H - beneficial for high-end hardware
  • L - beneficial for low-end hardware
  • S - beneficial for software implementations
  • all - probably beneficial for all implementations

3.1 Traversal

Traversal is the sending of data to the graphics system. Specifically, we want to minimize the time taken to specify primitives to OpenGL.
Use connected primitives
Connected primitives such as GL_LINES, GL_LINE_LOOP, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN, and GL_QUAD_STRIP require fewer vertices to describe an object than individual line, triangle, or polygon primitives. This reduces data transfer and transformation workload. [all]
Use the vertex array extension
On some architectures function calls are somewhat expensive so replacing many glVertex/glColor/glNormal calls with the vertex array mechanism may be very beneficial. [all]
Store vertex data in consecutive memory locations
When maximum performance is needed on high-end systems it's good to store vertex data in contiguous memory to maximize through put of data from host memory to graphics subsystem. [H,L]
Use the vector versions of glVertex, glColor, glNormal and glTexCoord
The glVertex, glColor, etc. functions which take a pointer to their arguments such as glVertex3fv(v) may be much faster than those which take individual arguments such as glVertex3f(x,y,z) on systems with DMA-driven graphics hardware. [H,L]
Reduce quantity of primitives
Be careful not to render primitives which are over-tesselated. Experiment with the GLU primitives, for example, to determine the best compromise of image quality vs. tesselation level. Textured objects in particular may still be rendered effectively with low geometric complexity. [all]
Display lists
Use display lists to encapsulate frequently drawn objects. Display list data may be stored in the graphics subsystem rather than host memory thereby eliminating host-to-graphics data movement. Display lists are also very beneficial when rendering remotely. [all]
Don't specify unneeded per-vertex information
If lighting is disabled don't call glNormal. If texturing is disabled don't call glTexCoord, etc.
Minimize code between glBegin/glEnd
For maximum performance on high-end systems it's extremely important to send vertex data to the graphics system as fast as possible. Avoid extraneous code between glBegin/glEnd.

Example:

	glBegin( GL_TRIANGLE_STRIP );
for (i=0; i < n; i++) {
if (lighting) {
glNormal3fv( norm[i] );
}
glVertex3fv( vert[i] );
}
glEnd();

This is a very bad construct. The following is much better:

	if (lighting) {
glBegin( GL_TRIANGLE_STRIP );
for (i=0; i < n ;i++) {
glNormal3fv( norm[i] );
glVertex3fv( vert[i] );
}
glEnd();
}
else {
glBegin( GL_TRIANGLE_STRIP );
for (i=0; i < n ;i++) {
glVertex3fv( vert[i] );
}
glEnd();
}
Also consider manually unrolling important rendering loops to maximize the function call rate.

3.2 Transformation

Transformation includes the transformation of vertices from glVertex to window coordinates, clipping and lighting.

Lighting
  • Avoid using positional lights, i.e. light positions should be of the form (x,y,z,0) [L,S]
  • Avoid using spotlights. [all]
  • Avoid using two-sided lighting. [all]
  • Avoid using negative material and light color coefficients [S]
  • Avoid using the local viewer lighting model. [L,S]
  • Avoid frequent changes to the GL_SHININESS material parameter. [L,S]
  • Some OpenGL implementations are optimized for the case of a single light source.
  • Consider pre-lighting complex objects before rendering, ala radiosity. You can get the effect of lighting by specifying vertex colors instead of vertex normals. [S]
Two sided lighting
If you want both the front and back of polygons shaded the same try using two light sources instead of two-sided lighting. Position the two light sources on opposite sides of your object. That way, a polygon will always be lit correctly whether it's back or front facing. [L,S]
Disable normal vector normalization when not needed
glEnable/Disable(GL_NORMALIZE) controls whether normal vectors are scaled to unit length before lighting. If you do not use glScale you may be able to disable normalization without ill effects. Normalization is disabled by default. [L,S]
Use connected primitives
Connected primitives such as GL_LINES, GL_LINE_LOOP, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN, and GL_QUAD_STRIP decrease traversal and transformation load.
glRect usage
If you have to draw many rectangles consider using glBegin(GL_QUADS) ... glEnd() instead. [all]

3.3 Rasterization

Rasterization is the process of generating the pixels which represent points, lines, polygons, bitmaps and the writing of those pixels to the frame buffer. Rasterization is often the bottleneck in software implementations of OpenGL.
Disable smooth shading when not needed
Smooth shading is enabled by default. Flat shading doesn't require interpolation of the four color components and is usually faster than smooth shading in software implementations. Hardware may perform flat and smooth-shaded rendering at the same rate though there's at least one case in which smooth shading is faster than flat shading (E&S Freedom). [S]
Disable depth testing when not needed
Background objects, for example, can be drawn without depth testing if they're drawn first. Foreground objects can be drawn without depth testing if they're drawn last. [L,S]
Disable dithering when not needed
This is easy to forget when developing on a high-end machine. Disabling dithering can make a big difference in software implementations of OpenGL on lower-end machines with 8 or 12-bit color buffers. Dithering is enabled by default. [S]
Use back-face culling whenever possible.
If you're drawing closed polyhedra or other objects for which back facing polygons aren't visible there's probably no point in drawing those polygons. [all]
The GL_SGI_cull_vertex extension
SGI's Cosmo GL supports a new culling extension which looks at vertex normals to try to improve the speed of culling.
Avoid extra fragment operations
Stenciling, blending, stippling, alpha testing and logic ops can all take extra time during rasterization. Be sure to disable the operations which aren't needed. [all]
Reduce the window size or screen resolution
A simple way to reduce rasterization time is to reduce the number of pixels drawn. If a smaller window or reduced display resolution are acceptable it's an easy way to improve rasterization speed. [L,S]

3.4 Texturing

Texture mapping is usually an expensive operation in both hardware and software. Only high-end graphics hardware can offer free to low-cost texturing. In any case there are several ways to maximize texture mapping performance.
Use efficient image formats
The GL_UNSIGNED_BYTE component format is typically the fastest for specifying texture images. Experiment with the internal texture formats offered by the GL_EXT_texture extension. Some formats are faster than others on some systems (16-bit texels on the Reality Engine, for example). [all]
Encapsulate texture maps in texture objects or display lists
This is especially important if you use several texture maps. By putting textures into display lists or texture objects the graphics system can manage their storage and minimize data movement between the client and graphics subsystem. [all]
Use smaller texture maps
Smaller images can be moved from host to texture memory faster than large images. More small texture can be stored simultaneously in texture memory, reducing texture memory swapping. [all]
Use simpler sampling functions
Experiment with the minification and magnification texture filters to determine which performs best while giving acceptable results. Generally, GL_NEAREST is fastest and GL_LINEAR is second fastest. [all]
Use the same sampling function for minification and magnification
If both the minification and magnification filters are GL_NEAREST or GL_LINEAR then there's no reason OpenGL has to compute the lambda value which determines whether to use minification or magnification sampling for each fragment. Avoiding the lambda calculation can be a good performace improvement.
Use a simpler texture environment function
Some texture environment modes may be faster than others. For example, the GL_DECAL or GL_REPLACE_EXT functions for 3 component textures is a simple assignment of texel samples to fragments while GL_MODULATE is a linear interpolation between texel samples and incoming fragments. [S,L]
Combine small textures
If you are using several small textures consider tiling them together as a larger texture and modify your texture coordinates to address the subtexture you want. This technique can eliminate texture bindings.
Use glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_FASTEST)
This hint can improve the speed of texturing when perspective- correct texture coordinate interpolation isn't needed, such as when using a glOrtho() projection.
Animated textures
If you want to use an animated texture, perhaps live video textures, don't use glTexImage2D to repeatedly change the texture. Use glTexSubImage2D or glTexCopyTexSubImage2D. These functions are standard in OpenGL 1.1 and available as extensions to 1.0.

3.5 Clearing

Clearing the color, depth, stencil and accumulation buffers can be time consuming, especially when it has to be done in software. There are a few tricks which can help.
Use glClear carefully [all]
Clear all relevant color buffers with one glClear.

Wrong:

  glClear( GL_COLOR_BUFFER_BIT );
if (stenciling) {
glClear( GL_STENCIL_BUFFER_BIT );
}
Right:

  if (stenciling) {
glClear( GL_COLOR_BUFFER_BIT | GL_STENCIL_BUFFER_BIT );
}
else {
glClear( GL_COLOR_BUFFER_BIT );
}
Disable dithering
Disable dithering before clearing the color buffer. Visually, the difference between dithered and undithered clears is usually negligable.
Use scissoring to clear a smaller area
If you don't need to clear the whole buffer use glScissor() to restrict clearing to a smaller area. [L].
Don't clear the color buffer at all
If the scene you're drawing opaquely covers the entire window there is no reason to clear the color buffer.
Eliminate depth buffer clearing
If the scene you're drawing covers the entire window there is a trick which let's you omit the depth buffer clear. The idea is to only use half the depth buffer range for each frame and alternate between using GL_LESS and GL_GREATER as the depth test function.

Example:

   int EvenFlag;
/* Call this once during initialization and whenever the window
* is resized.
*/
void init_depth_buffer( void )
{
glClearDepth( 1.0 );
glClear( GL_DEPTH_BUFFER_BIT );
glDepthRange( 0.0, 0.5 );
glDepthFunc( GL_LESS );
EvenFlag = 1;
}
/* Your drawing function */
void display_func( void )
{
if (EvenFlag) {
glDepthFunc( GL_LESS );
glDepthRange( 0.0, 0.5 );
}
else {
glDepthFunc( GL_GREATER );
glDepthRange( 1.0, 0.5 );
}
EvenFlag = !EvenFlag;
/* draw your scene */
}
Avoid glClearDepth( d ) where d!=1.0
Some software implementations may have optimized paths for clearing the depth buffer to 1.0. [S]

3.6 Miscellaneous

Avoid "round-trip" calls
Calls such as glGetFloatv, glGetIntegerv, glIsEnabled, glGetError, glGetString require a slow, round trip transaction between the application and renderer. Especially avoid them in your main rendering code.

Note that software implementations of OpenGL may actually perform these operations faster than hardware systems. If you're developing on a low-end system be aware of this fact. [H,L]

Avoid glPushAttrib
If only a few pieces of state need to be saved and restored it's often faster to maintain the information in the client program. glPushAttrib( GL_ALL_ATTRIB_BITS ) in particular can be very expensive on hardware systems. This call may be faster in software implementations than in hardware. [H,L]
Check for GL errors during development
During development call glGetError inside your rendering/event loop to catch errors. GL errors raised during rendering can slow down rendering speed. Remove the glGetError call for production code since it's a "round trip" command and can cause delays. [all]
Use glColorMaterial instead of glMaterial
If you need to change a material property on a per vertex basis, glColorMaterial may be faster than glMaterial. [all]
glDrawPixels
  • glDrawPixels often performs best with GL_UNSIGNED_BYTE color components [all]
  • Disable all unnecessary raster operations before calling glDrawPixels. [all]
  • Use the GL_EXT_abgr extension to specify color components in alpha, blue, green, red order on systems which were designed for IRIS GL. [H,L].
Avoid using viewports which are larger than the window
Software implementations may have to do additional clipping in this situation. [S]
Alpha planes
Don't allocate alpha planes in the color buffer if you don't need them. Specifically, they are not needed for transparency effects. Systems without hardware alpha planes may have to resort to a slow software implementation. [L,S]
Accumulation, stencil, overlay planes
Do not allocate accumulation, stencil or overlay planes if they are not needed. [all]
Be aware of the depth buffer's depth
Your OpenGL may support several different sizes of depth buffers- 16 and 24-bit for example. Shallower depth buffers may be faster than deep buffers both for software and hardware implementations. However, the precision of of a 16-bit depth buffer may not be sufficient for some applications. [L,S]
Transparency may be implemented with stippling instead of blending
If you need simple transparent objects consider using polygon stippling instead of alpha blending. The later is typically faster and may actually look better in some situations. [L,S]
Group state changes together
Try to mimimize the number of GL state changes in your code. When GL state is changed, internal state may have to be recomputed, introducing delays. [all]
Avoid using glPolygonMode
If you need to draw many polygon outlines or vertex points use glBegin with GL_POINTS, GL_LINES, GL_LINE_LOOP or GL_LINE_STRIP instead as it can be much faster. [all]

3.7 Window System Integration

Minimize calls to the make current call
The glXMakeCurrent call, for example, can be expensive on hardware systems because the context switch may involve moving a large amount of data in and out of the hardware.
Visual / pixel format performance
Some X visuals or pixel formats may be faster than others. On PCs for example, 24-bit color buffers may be slower to read/write than 12 or 8-bit buffers. There is often a tradeoff between performance and quality of frame buffer configurations. 12-bit color may not look as nice as 24-bit color. A 16-bit depth buffer won't have the precision of a 24-bit depth buffer.

The GLX_EXT_visual_rating extension can help you select visuals based on performance or quality. GLX 1.2's visual caveat attribute can tell you if a visual has a performance penalty associated with it.

It may be worthwhile to experiment with different visuals to determine if there's any advantage of one over another.

Avoid mixing OpenGL rendering with native rendering
OpenGL allows both itself and the native window system to render into the same window. For this to be done correctly synchronization is needed. The GLX glXWaitX and glXWaitGL functions serve this purpose.

Synchronization hurts performance. Therefore, if you need to render with both OpenGL and native window system calls try to group the rendering calls to minimize synchronization.

For example, if you're drawing a 3-D scene with OpenGL and displaying text with X, draw all the 3-D elements first, call glXWaitGL to synchronize, then call all the X drawing functions.

Don't redraw more than necessary
Be sure that you're not redrawing your scene unnecissarily. For example, expose/repaint events may come in batches describing separate regions of the window which must be redrawn. Since one usually redraws the whole window image with OpenGL you only need to respond to one expose/repaint event. In the case of X, look at the count field of the XExposeEvent structure. Only redraw when it is zero.

Also, when responding to mouse motion events you should skip extra motion events in the input queue. Otherwise, if you try to process every motion event and redraw your scene there will be a noticable delay between mouse input and screen updates.

It can be a good idea to put a print statement in your redraw and event loop function so you know exactly what messages are causing your scene to be redrawn, and when.

SwapBuffer calls and graphics pipe blocking
On systems with 3-D graphics hardware the SwapBuffers call is synchronized to the monitor's vertical retrace. Input to the OpenGL command queue may be blocked until the buffer swap has completed. Therefore, don't put more OpenGL calls immediately after SwapBuffers. Instead, put application computation instructions which can overlap with the buffer swap delay.

3.8 Mesa-specific

Mesa is a free library which implements most of the OpenGL API in a compatible manner. Since it is a software library, performance depends a great deal on the host computer. There are several Mesa-specific features to be aware of which can effect performance.

Double buffering
The X driver supports two back color buffer implementations: Pixmaps and XImages. The MESA_BACK_BUFFER environment variable controls which is used. Which of the two that's faster depends on the nature of your rendering. Experiment.
X Visuals
As described above, some X visuals can be rendered into more quickly than others. The MESA_RGB_VISUAL environment variable can be used to determine the quickest visual by experimentation.
Depth buffers
Mesa may use a 16 or 32-bit depth buffer as specified in the src/config.h configuration file. 16-bit depth buffers are faster but may not offer the precision needed for all applications.
Flat-shaded primitives
If one is drawing a number of flat-shaded primitives all of the same color the glColor command should be put before the glBegin call.

Don't do this:

	glBegin(...);
glColor(...);
glVertex(...);
...
glEnd();

Do this:

	glColor(...);
glBegin(...);
glVertex(...);
...
glEnd();
glColor*() commands
The glColor[34]ub[v] are the fastest versions of the glColor command.
Avoid double precision valued functions
Mesa does all internal floating point computations in single precision floating point. API functions which take double precision floating point values must convert them to single precision. This can be expensive in the case of glVertex, glNormal, etc.


4. Evaluation and Tuning

To maximize the performance of an OpenGL applications one must be able to evaluate an application to learn what is limiting its speed. Because of the hardware involved it's not sufficient to use ordinary profiling tools. Several different aspects of the graphics system must be evaluated.

Performance evaluation is a large subject and only the basics are covered here. For more information see "OpenGL on Silicon Graphics Systems".

4.1 Pipeline tuning

The graphics system can be divided into three subsystems for the purpose of performance evaluation:
  • CPU subsystem - application code which drives the graphics subsystem
  • Geometry subsystem - transformation of vertices, lighting, and clipping
  • Rasterization subsystem - drawing filled polygons, line segments and per-pixel processing
At any given time, one of these stages will be the bottleneck. The bottleneck must be reduced to improve performance. The strategy is to isolate each subsystem in turn and evaluate changes in performance. For example, by decreasing the workload of the CPU subsystem one can determine if the CPU or graphics system is limiting performance.

4.1.1 CPU subsystem

To isosulate the CPU subsystem one must reduce the graphics workload while presevering the application's execution characteristics. A simple way to do this is to replace glVertex() and glNormal calls with glColor calls. If performance does not improve then the CPU stage is the bottleneck.

4.1.2 Geometry subsystem

To isoslate the geometry subsystem one wants to reduce the number of primitives processed, or reduce the transformation work per primitive while producing the same number of pixels during rasterization. This can be done by replacing many small polygons with fewer large ones or by simply disabling lighting or clipping. If performance increases then your application is bound by geometry/transformation speed.

4.1.3 Rasterization subsystem

A simple way to reduce the rasterization workload is to make your window smaller. Other ways to reduce rasterization work is to disable per-pixel processing such as texturing, blending, or depth testing. If performance increases, your program is fill limited.

After bottlenecks have been identified the techniques outlined in section 3 can be applied. The process of identifying and reducing bottlenecks should be repeated until no further improvements can be made or your minimum performance threshold has been met.

4.2 Double buffering

For smooth animation one must maintain a high, constant frame rate. Double buffering has an important effect on this. Suppose your application needs to render at 60Hz but is only getting 30Hz. It's a mistake to think that you must reduce rendering time by 50% to achive 60Hz. The reason is the swap-buffers operation is synchronized to occur during the display's vertical retrace period (at 60Hz for example). It may be that your application is taking only a tiny bit too long to meet the 1/60 second rendering time limit for 60Hz.

Measure the performance of rendering in single buffer mode to determine how far you really are from your target frame rate.

4.3 Test on several implementations

The performance of OpenGL implementations varies a lot. One should measure performance and test OpenGL applications on several different systems to be sure there are no unexpected problems.


posted on 2009-08-25 06:05 RedLight 閱讀(917) 評(píng)論(0)  編輯 收藏 引用 所屬分類: 3D渲染技術(shù)

<2009年8月>
2627282930311
2345678
9101112131415
16171819202122
23242526272829
303112345

導(dǎo)航

統(tǒng)計(jì)

公告


Name: Galen
QQ: 88104725

常用鏈接

留言簿(3)

隨筆分類

隨筆檔案

相冊(cè)

My Friend

搜索

最新評(píng)論

閱讀排行榜

評(píng)論排行榜

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            在线电影国产精品| 亚洲精品免费在线播放| 欧美日韩精品免费观看视一区二区 | 一区二区视频免费在线观看 | 亚洲精品在线视频观看| 在线日韩电影| 久久久国产成人精品| 欧美一区二区视频观看视频| 欧美日韩视频在线一区二区| 亚洲国产成人av| 在线看不卡av| 久久久97精品| 久久婷婷av| 国产亚洲aⅴaaaaaa毛片| 亚洲综合色自拍一区| 亚洲一区二区三区精品在线观看| 欧美国产亚洲另类动漫| 亚洲国产高清自拍| 亚洲精品老司机| 欧美激情久久久久| 亚洲国产欧美一区二区三区丁香婷| 亚洲高清在线| 久久久亚洲综合| 欧美成人综合| 日韩午夜在线电影| 欧美区一区二| 亚洲性图久久| 欧美在线视频一区二区| 国产在线乱码一区二区三区| 欧美中文在线观看国产| 欧美91视频| 亚洲日本免费电影| 欧美三区美女| 亚洲欧美视频| 男女精品网站| 99在线精品免费视频九九视| 欧美日韩亚洲激情| 亚洲欧美在线免费| 久久午夜羞羞影院免费观看| 在线欧美亚洲| 欧美日韩理论| 欧美一区二区免费观在线| 久热国产精品| 99精品视频免费观看视频| 国产精品成人v| 久久精品国产91精品亚洲| 欧美成人免费在线视频| 正在播放亚洲一区| 国产亚洲一二三区| 久久久午夜视频| 日韩视频精品在线| 久久激情五月激情| 亚洲欧洲一区二区三区| 国产精品久久久久久久久久直播 | 欧美日韩综合精品| 欧美中文在线字幕| 亚洲精品视频在线观看网站| 欧美一区二区三区四区在线| 亚洲国产日韩欧美在线99| 国产精品成人一区二区三区吃奶| 久久国产精品99精品国产| 最新成人在线| 久久综合久色欧美综合狠狠| 一本色道**综合亚洲精品蜜桃冫| 国产综合欧美| 亚洲自拍16p| 亚洲国产精品99久久久久久久久| 欧美日韩少妇| 久久夜色精品亚洲噜噜国产mv| 亚洲精品久久久蜜桃| 久久免费视频在线观看| 亚洲淫片在线视频| 亚洲精品偷拍| 在线看不卡av| 国产伦精品一区二区三区| 欧美激情综合色| 久久久久久久91| 亚洲欧美日韩国产综合在线| 最新亚洲激情| 欧美国产第一页| 久久久一本精品99久久精品66| 亚洲一区二区三区四区视频| 亚洲国产美女久久久久| 国产一区二区0| 国产精品天美传媒入口| 欧美日韩国产成人精品| 欧美成人a∨高清免费观看| 欧美专区一区二区三区| 午夜精品福利在线| 亚洲特黄一级片| 一本大道av伊人久久综合| 亚洲黄色在线视频| 欧美黄色日本| 欧美激情性爽国产精品17p| 久久人人爽爽爽人久久久| 久久aⅴ国产欧美74aaa| 午夜电影亚洲| 欧美一区永久视频免费观看| 亚洲免费视频在线观看| 亚洲调教视频在线观看| 国产精品99久久久久久宅男 | 美女主播精品视频一二三四| 久久激情网站| 久久久久久久性| 久久―日本道色综合久久| 久久国产精品亚洲77777| 午夜伦理片一区| 性欧美办公室18xxxxhd| 欧美一级艳片视频免费观看| 欧美一级视频免费在线观看| 欧美有码在线观看视频| 久久国产视频网| 久久综合国产精品台湾中文娱乐网| 久久精品视频在线播放| 玖玖综合伊人| 亚洲国产欧美在线人成| 亚洲日本va午夜在线电影| 亚洲精品欧美激情| 妖精视频成人观看www| 亚洲欧美视频一区| 久久久不卡网国产精品一区| 欧美**人妖| 国产精品福利在线观看网址| 国产精品亚洲综合天堂夜夜| 狠狠入ady亚洲精品经典电影| 尤物网精品视频| 99在线热播精品免费99热| 亚洲一区精品电影| 久久手机精品视频| 亚洲福利av| 亚洲午夜精品一区二区| 久久精品欧洲| 欧美韩日一区二区| 欧美视频在线观看视频极品 | 国产精品视频久久| 在线观看av一区| 亚洲午夜视频| 久久全国免费视频| 日韩午夜在线| 久久精品日产第一区二区| 欧美精品免费播放| 国产女主播一区二区| 亚洲国产婷婷| 欧美在线观看日本一区| 亚洲成色www8888| 亚洲在线一区| 欧美二区在线看| 国产精品五区| 最新成人在线| 久久久免费av| 99亚洲一区二区| 蜜臀久久99精品久久久画质超高清 | 久久久一二三| 国产精品一区二区在线观看网站 | 欧美日韩在线观看视频| 国内揄拍国内精品久久| 亚洲图片欧美日产| 女仆av观看一区| 亚洲欧美激情视频| 欧美日韩精品免费观看| 在线看欧美日韩| 欧美一区二区在线免费观看 | 亚洲电影毛片| 久久精彩视频| 国产精品视频区| 亚洲视频专区在线| 亚洲电影专区| 久久亚洲色图| 红桃视频亚洲| 久久国产乱子精品免费女 | 一本一道久久综合狠狠老精东影业| 久久久久欧美| 国产在线成人| 欧美一区二区黄色| 亚洲图色在线| 欧美日韩亚洲一区二区三区在线 | 伊人成年综合电影网| 久久精品国产亚洲aⅴ| 正在播放欧美一区| 欧美日韩一区在线播放| 日韩一级在线| 亚洲精品久久久久久下一站 | 亚洲午夜精品久久久久久浪潮| 免费久久久一本精品久久区| 久久aⅴ国产欧美74aaa| 国产农村妇女精品一区二区| 亚洲欧美福利一区二区| 亚洲午夜三级在线| 国产精品乱码久久久久久| 亚洲自拍偷拍麻豆| 中文有码久久| 国产毛片精品国产一区二区三区| 亚洲欧美国产精品专区久久| 9l国产精品久久久久麻豆| 欧美日韩国产小视频| 亚洲少妇在线| 亚洲女性裸体视频| 国产一区二区三区久久久久久久久| 久久久久国色av免费看影院| 久久久噜噜噜久久狠狠50岁|