Immersive environments are currently a widely studied topic. To give the viewer a more involving experience, such environments surround the user with either warped screens or multiple screens. This creates an illusion of surrounding space.
In the case of multiple screens, each display, or facet, surrounding the user must be visually and temporally coherent with adjacent displays. The engine that generates the image is responsible for making sure that each facet is within the same timeframe of the others. In addition, the images should line up at the edges with nearby facets. The engine resolves this issue.
The mechanical aspects of actually displaying the images contain other problems. Surrounding the user with LCDs or similar devices created so that they are seamlessly adjacent to one another is a great challenge. It severely limits the types of immersive environments possible. In addition, it is prohibitively expensive. A much more practical solution is to use back projected displays created with projectors. Projectors are readily available and constructing screens for back projection displays is much more economical.
However, even with extremely careful construction and alignments, it is very difficult to achieve the required alignment between facets. To alleviate and remove this problem, special image compensation features can be added to the engine.
To compensate for alignment and distortion, the projected images are warped. There are different methods for generating the actual warping, but the basic principle stays the same.
The frame is generated for each facet by the engine. Once the frame is drawn, the information from the frame buffer is captured into a texture. Then, this texture is applied to a set of polygons that have altered vertices. These polygons are then rendered to the actual frame buffer and appear on the screen or projector; therefore, the image is rendered twice. The first time the frame for each facet is rendered with all the geometry, shading, and lighting calculations, and the second time, that result is re-rendered on polygonal mesh.
Moving the vertices of this “mesh” allows for the user to stretch, squash, and tweak the projection of the frame without touching the physical hardware such as the location of the screens or the projectors. There are other techniques of doing warping; however, because OpenGL’s support of textures is one of the faster methods, the focus will primarily be on textured polygonal meshes.
This paper describes previous and current image correction approaches implemented in 3DEngine and Guppy3D.
The
initial 3DEngine immersive environment included features that allowed for an
image to be warped, flipped, rotated, and scaled to better fit a display
facet. Originally, only scale, shear,
flip, and rotate operations were supported.
While that did a great job of getting the image “in the right area” it
still lacked the fine-tuning that would allow for coherent spatial transitions
between facets.
To achieve that coherency, an additional level of compensation was added. This method superimposed a rectangular grid over each facet (Figure 1). The spacing of the grid lines could be controlled by the user through an input file. The points on the perimeter could be interactively moved by the user by entering a special mode while running the engine.
This approach did a good job of achieving visual coherency between facets. In particular, moving the corners of the image to the corners of the display area took care of the majority of the perspective distortion. The interior points along the edges could then be moved to line up the areas in the middle of the facet edges. By repeating this procedure on each facet, a good visual result could be achieved.
Moving
each control vertex had a different effect on various points. There was a dependency set up between the
vertices along each edge. The corner
vertex would control all the vertices on its adjacent edges. Then, the vertices in the middle would
affect only those vertices that are between them and the corners. In addition, the effect of moving a vertex
carried a falloff as it went further into the facet (Figure 2). For example, when a vertex was moved, it
greatly affected the area right around it, but did nothing to the opposite side
of the facet.
While this method achieved the desired result, it did have a number of drawbacks. The controls were unintuitive; the method only worked with quadrilateral facets, and adjusting the number of control points and their locations on each edge sometimes produced undesirable or unexpected behavior. For example, if a central point along an edge were defined, it did not affect both sides of that edge.
The control scheme for image compensation was based solely on using the keyboard and expected the user to follow a predetermined set of steps. Since only one vertex can be controlled at a time, switching between vertices was difficult. The application first starts at a corner vertex. By pressing a key, the user can traverse to the next vertex. The traversal order would go through the corners first, then by closest-to-the-middle vertex on each side.
The idea is that the user should first move the corner vertices, then the other ones. However, this placed a number of restrictions on how the user wanted to proceed. Hunting down the desired vertex took time. Although the selected vertex was colored, it did not stand out very well. In addition, it is very common for the image to fall out of the range of the physical facet, further confounding the problem.
Moving each control vertex was also a challenge. The arrow keys were used to move the vertex, but since each facet had its own axis, the up arrow would not correspond to what the user would consider up. This forced the user to readapt to a new coordinate system for each facet. So pressing up would often move the vertex down and to the left or in another unpredictable direction.
Since the grid was designed as a quad-patch, this approach fails under non-quadrilateral facets. Triangular, pentagonal, and hexagonal facets could not be used in this compensation scheme, preventing such shapes as a soccer ball for the immersive display.
Finally, the behavior for odd number of control vertices on each edge as well as placing a vertex directly in the middle was not fully tested. This leads to erroneous or unexpected behavior when the user wishes for a midpoint on an edge.
The next iteration of 3DEngine was named Guppy3D. To improve the image compensation, the code has been completely reworked in an attempt to maintain the strengths of the previous approach while adding benefits and minimizing weaknesses.
The
Fan (Spider-web) Approach
The first approach implemented for image compensation was conceptually a simpler model. The user can still set the number of control points on each edge as well as the spacing. The program then finds the centroid of the facet, and superimposes a fan over the image. Each control point except the center can be manipulated by the user (Figure 3).
A mouse interface was added for manipulating the control points. The user was then able to click on any of the vertices and drag them to the desired location. A large cross-hair was drawn over the selected vertex to clearly distinguish it from others.
A similar dependency scheme was used for this approach as well. The corners controlled all adjacent edges while each vertex on a side would affect only those that are between it and a corner. A vertex directly in the middle affected all of the vertices on that edge.
Strengths
There are several important characteristics of this method that were an improvement over the previous one. Firstly, a more natural and intuitive control scheme using the mouse allows the user to deal with the compensation much faster and easier. The design allows the compensation to scale to any n-side polygon (Figure 4). Finally, control with various numbers of points and placements were fully tested.
The mouse feature was perhaps the strongest point of this method. Originally, the mouse motion was in the local coordinate system of the facet. However, using that interface proved very difficult. An effective correction that allowed the user to rotate the local axis of the mouse was implemented. Once the orientation was lined up with the “user up” direction, the mouse interface becomes incredibly powerful, fast, and easy to use.
Although the design was simple, it was capable of dealing with image compensation problems by providing a visually coherent immersive environment.

This approach, however, had some drawbacks as well. Although it was successful at lining up facets together, it had the potential to introduce artifacts if the user was not careful. Such artifacts damage the visual coherency. The code was also written for functionality, not speed. Therefore, potential speedups could be achieved.
The primary issue with this approach dealt with the centroid. Locating the center of a facet turned out to be an interesting problem. While an approach such as finding the intersection of diagonals in a quad could give the centroid, the method fails in other n-sided cases. Other potential methods could use smallest fitting circle, average of all corners, or recursive subdivision of the polygon.
When the centroid was found, however, the nature of the layout presented issues as well. Since all the lines led to the center, that area accumulated the most distortion. A pinch could be seen in the center of the facet. In addition, textural distortion could be noticed along the lines if the vertices were moved significantly (Figure 5).
While there were many beneficial changes in the second iteration, deficiencies in visual quality demanded a new version. This new version is currently in progress, but is nearing completion. It is a much more involved n-sided solution to the image compensation issue that attempts to combine the benefits of the previous two approaches.
The quad-patch (3DEngine approach) had the best visual results because of the falloff and its ability to focus the distortion towards the corners. The fan approach was simple, n-sided, and had a powerful mouse interface. To combine the two of these together, a type of n-grid was developed that could be applied to any n-sided polygon. It is expected to work well with 3 – 6 sided facets and while it will work with higher numbers, its effectiveness is difficult to foresee.

Once again, the user can specify how many control points lie on an edge as well as and spacing. Once that information, as well as the number of edges, is known, the grid can be generated. It is created by drawing a line from an edge point to its corresponding edge point on the n+2 edge (Figure 6). Such a connection allows a single corner to influence a significant number of edges. Note that when this approach is used on a 4-sided facet, the result is similar to a quad-patch. However, it does not have the limit of working on just a 4-sided polygon.
Unlike the fan approach, this method needs more information to build the actual structure that will hold the image-warping information. Since there are interior vertices, these must be located and stored. In addition, constructing a polygon array from these vertices can also be an interesting challenge, especially since having various numbers of points along each edge and different numbers of possible edges create different grids.
The grid is built from the outside. Since the locations of the corners are known, the points along the edge can be easily determined from the percentages given as input by user-defined files or defaults. These points are added to a vertex array that keeps track of all the vertices in the grid. A half-edge data structure is used to store connectivity information about the vertices.
Each vertex can contain any number of incoming half-edges and outgoing-half edges. Each half edge holds one source vertex and one incident vertex, in addition to an index to a face. No face information is stored when the vertex and half-edge arrays are generated.
After the edge vertices and relative half-edges are recorded, the interior edges and points are found. An interior vertex is found by picking a line and traveling along it, finding the intersections of other lines. Half-edge information is found for each of those vertices by connecting the previous and newfound vertices by half-edges.
This
information is enough to create a list of polygons that will serve as the
deformable mesh onto which the frame will be texture-mapped. To get the polygons, first, a half-edge is
selected. A new polygon is created and
assigned to this edge. Then, all the
half-edges going out of the incident vertex are compared to the initial
half-edge. The algorithm looks for an
edge that is closest to the initial half-edge pointing to the right. That edge is assigned the same polygon. The incident vertex of the new edge is
checked to see if it is the same as the source vertex of the first edge. If it is not, then the polygon is still
incomplete. The edges from the new
incident vertex are then compared to the previous edge, and so forth. When an incident vertex matches the original
source vertex, the polygon is complete (Figure 7).
The algorithm then finds the next half-edge that does not have a polygon assigned and repeats the process. Special consideration must be made when comparing edges since a pair of vertices shares two half-edges. Therefore, the closest half-edge cannot be “backwards pointing.”
When the algorithm is done, the following information is available: all the vertices, all the edges between the vertices, the polygons of the mesh, the vertices and edges that make up all the polygons.
Since the user can only directly control the outer edge vertices, a control scheme must be established between the edge, corner, and interior vertices. To achieve that, each vertex is assigned a priority, with zero being the highest and larger positive numbers specifying lower priority. A higher priority vertex affects nearby lower priority vertices. So, corners are assigned 0, midpoints on the edges are assigned 1, the next farthest from the midpoint are 2, 3, and so on. If no midpoint is present, the two middle vertices are each assigned 1.
Interior
points are assigned a priority based on what lines determine their
intersection. A line has the same
priority as the points that it originates and ends at. Note that a 1 will always connect with a 1,
a 2 with a 2, and so forth. Same
priority intersections generate a point of the same priority. An intersection of differing priority lines
results in a vertex priority of the higher line. This ensures that lower priority vertices do not affect higher
priority vertices. An alternate method
of prioritizing intersections is that if two lines have the same priority, the
intersection has one higher priority.
In that manner, if both lines have priority 2, the intersection will
have priority of 1.
With such a scheme, the control works in the following manner: a user moves a vertex. If it is a corner vertex, it affects both of its edges. If the vertex is midpoint, it affects the vertices on that line. If the vertex is a lower priority vertex, the movement behaves in a similar manner up to a higher priority vertex.
This approach seems to integrate the best parts of the two previous approaches. It behaves like a quad-patch in the 4-sided case, which delivers good visual results. It also generalizes n-sided facets. In addition, this approach has a powerful and intuitive mouse interface.
Repositioning the corners of a facet takes care of most of the perspective distortion. The control of residual distortion with this scheme is localized to the corners, where the majority of projected distortion occurs.
The applicability of this approach to n-sided polyhedron is a major strength. However, it is difficult to test without first building facets that are not 4-sided. Some minor modifications to the control might be necessary to get the warping to work properly.
The major drawback of this approach is the complexity. The difficulty in designing the scheme to work with n-sided facets is taking care of some special cases that occur in particular configurations. Getting used to the control and achieving the desired result might take some time.
Since the N-Grid approach has not yet been fully implemented, it is difficult to foresee the exact problems that might occur. Since the method was designed to minimize the issues of the previous methods, it is likely that the some new peculiarities will arise.
One
problem that quickly becomes apparent is OpenGL’s method of rendering
polygons. For example, if we are using
the quad-grid approach and we specify each of the individual polygons as
GL_QUADS, by default, OpenGL will tessellate each quad into a two triangles. When rendered, the triangles might be distorted,
but the texture will not stretch properly.
Instead of interpolating across a quad as expected, it will interpolate
across one triangle, and then another (Figure 9).
This can be corrected in several ways. One manner is to represent each quad as a 2x2 Bezier patch and use OpenGL to render out those surfaces. This corrects the problem, but is of course limited to quads.
Another powerful alternative is to use the graphics processing unit (GPU) to do all the texturing. With this approach, we are not limited to OpenGL’s methods of texturing and can gain other benefits at the same time. By converting the grid information to a 2-dimensional displacement map, we can keep track of the texture offsets and use those when sampling the generated frame. If we can determine a displacement map for a method, we can effective use the method to warp the image.
In this approach, we precompute the displacement map ahead of time and store it in a texture map. When rendering, we use the GPU to find an offset based on this displacement map and sample the source texture using the uv coordinates + the offset.
Since we are primarily using quads in our display geometry, our efforts were focused on making a displacement map for a set of quadrilateral polygons. There is some information we already know, such as the offset at the control verities and the interior points. Each point stores a uv coordinate when it is initialized. After being moved, the point will have a new uv coordinate. To get the offset, we simply subtract the initial uv value from the current.
With this information, we can build a bilinear mapping from the current configuration to the initial configuration. Using the four corners of each quad, we define an st coordinate system. Each pixel inside the quad maps back to a pixel in the original quad with the same st value. We can use this st value to figure out the original uv coordinates of that pixel and use current uv coordinates to find the offset (Figure 10). Doing the same for all the quads of the control grid, we obtain a displacement map for the entire screen.


In order to use this offset information, it should be stored in a texture map that can be accessed by a shader. We use three channels of information to store the offset. The red channel stores the u displacement, the green channel stores the v displacement, and the blue channel stores alpha information. The blue channel is really used to store a zero or a one, revealing if that pixel is inside the grid or not.
Originally, a 24bit color map was used, with 8 bits per channel. The red and green channels were set up so that a value of .5 meant no displacement and values of 0 and 1 meant full negative and full positive displacements respectively. The offsets of course had to be converted to this format for storage and converted back in the shader for use on the actual texture map.
This scheme, however, quickly proved to be ineffective. 8 bits for displacement gives only 256 unique values. The displacement map would often look “stair-cased” with clear borders where a value would switch from one step to another. The non-linearity of the distortion turns out to be another major problem. In one place, the displacement might be significant in some places and nearly non-existent in others. Problems would especially occur when the grid was adjusted just slightly. The displacement map would have difficulty registering this minor adjustment due to lack of precision. Even filtering is not useful in such a case.
Instead of using a byte for each channel, we can get much higher precision using a float. With 32 bits of precision per channel, we can directly store the offset as a positive or a negative number without having to convert it in the shader. 32 bits also gives us enough precision for the very small offsets we get when slightly adjusting the grid. The blue channel still stores a 0 or 1 to determine if that pixel is inside or outside the grid.
To achieve more efficiency and space conservation, we can also use half-floats, or 16 bit floating point numbers. This storage format of a floating point number contains 5 exponent bits, 10 mantissa bits, and a sign bit. For our purposes, the 16 bit float contains enough precision to specify the necessary offsets.
When rendering, we store the display into a texture map as with the other methods. However, to perform the actual warping, we simply render a single quad that fills the screen and pass in the source texture and the displacement map texture to the shader. The shader first performs a texture operation to get the displacement at the uv coordinates. It then adds that displacement to the uv coordinates and performs another texture operation on the source. This combination gives the proper texel.
There is a significant drawback of using floating-point textures, however. The current generation of graphics cards does not support filtering on floating point textures; therefore, the resulting image is subject to artifacts. If using a displacement map whose dimensions are smaller than the screen size, severe aliasing can occur. Blurring and fuzziness appear even if using higher dimension displacement maps.
In order to correct this problem, the shader itself must perform the necessary texture filtering. This is only necessary to do for the displacement map as OpenGL can handle texture filtering on the source image, since it is stored in standard 8 bits per channel format. To perform filtering, the width and height of the texture map must also be passed in. The shader can figure out where the sample falls between texels using this information. The cost of this filtering, however, increases the number of texture operations from 2 to 5. Now, the shader has to sample the displacement map 4 times (the 4 texels around the sample point) and once again to sample the source. The 4 offset values are bilinearly interpolated based on their distances from the sample point.
We can actually avoid a number of problems presented in the previous section using rectangular textures. Using NVIDIA’s TEXTURE_RECTANGLE_NV extensions, we can specify our screen texture and our displacement textures to be the exact size of the screen.
We get several benefits right away. First, no space is wasted on oversized textures. For example, in order to save a 1024x768 display frame, we need a 1024x1024 square power-of-two texture. We would be wasting 262,144 pixels in the texture map. In addition, we need to introduce scaling factors to deal with only part of the image if we want to conform to 0-to-1 scales.
This becomes especially important for floating point textures, as the size of each channel is significantly larger (16 or 32 bits) than the standard 8-bit channels. Using a rectangular texture for the displacement map saves us 3 megabytes of texture data for a 32-bit 1024x768 resolution texture map.
More significantly, having a 1-to-1 mapping between the screen pixels and the displacement texels virtually eliminates the need to filter the displacement map. For each pixel on the screen, we can find an exact displacement value. This value does not need to be blended with its neighbors as it was computed for the center of the pixel. Then, we can add the displacement to our uv coordinates, and get our target texel. The texture map that holds the frame data can still be filtered using built-in OpenGL functionality.
While using Bezier surfaces to render out quad-grids allows us to bypass OpenGL’s tessellation problems, using the GPU to perform the warping frees us completely. Any approach that can generate a displacement map can be used with the GPU. The results are fairly fast as only a single quad is passed to perform the render while providing the data to render each quad with a Bezier patch is more time consuming.
Since we are working directly with texels in the shader, we can apply other corrections that were not directly possible with the previous approaches. For example, we can perform hue shifts to color-correct the image. Since the operation is performed on the GPU, the speed penalty is very low.
It is also possible to pass in additional corrective maps. For example, a projector might have a bright spot in the middle of its image. We can pass in a map that adjusts the intensity of the texels in the center of the image but does not affect those on the outer edges.
The frame rate loss due to the warping is practically negligible in heavy scenes. Essentially, after the initial one-time displacement map generation, we only need to pass in one more textured quad to OpenGL. Each pixel of the quad has a single floating point access operation and a single normal texture map access with filtering operation. These can be performed very quickly on the GPU. When testing the warping, we have experienced < 2% frame rate drop.
The technology required to perform this operation is fairly sophisticated. Video cards produced prior to 2002 have GPU texture support, but only allow parallel texture operations. Unfortunately, in order for the filtering to work, we need serial texture access. In other words, the results of one texture operation are directly used as the input to another texture operation. On a parallel texture access card, all texture operations are performed at the same time, so that behavior is prohibited.
There is a one-time cost to generate the displacement map. Depending on the speed of the machine and the size of the displacement map, this operation can take anywhere between a fraction of a second to well over a minute (in debug configurations). The typical wait time for an optimized compile for a 1024x1024 displacement map was 2 seconds.
Storing the displacement data also becomes a significant task. For example, a 1024x1024 displacement map with 3 channels at 32 bits per channel takes up 100663296 bits or 12.2 megabytes. That is a lot of data to pass to a video card. However, using rectangular textures, we save a quarter of that memory and gain faster render times.