ENDS489 Course Notes - Fall 2000
Section 1-8

Neighborhood Operations

(Based on material from Digital Imaging: Theory and Applications, H. E. Burdick, Mc Graw-Hill, 1997)
(last update 9/4/00)

Neighborhood operations are those that combine a small area or neighborhood of pixels to generate an output pixel.

The most important neighborhood operator is convolution. To convolve something means to roll together. In digital imagery, this means that a local area of pixels are combined in various ways to achieve some desired result. Almost as important as convolution is the process of sampling. Many neighborhood operators result in subpixel addressing, which means that data values that exist between the discrete pixels of a digital image must be derived. Different methods for sampling allow this to occur. The applications of neighborhood operators are many, ranging from digital filters to techniques for sharpening, transforming, and warping images.

When implementing the point operations, it is possible to perform a given function and, if desired, save the resulting pixels in the same memory buffer, thereby destroying the original input pixels. For point operations this is allowable, because once an input pixel has been processed its original value is no longer needed. This is not possible with neighborhood operators because, even after an output pixel has been calculated, the corresponding input pixel at that location is included in other neighborhoods. Therefore no input pixels can be overwritten until all relevant output pixels have been calculated.

While point operation programs generally contain two nested loops (one for the horizontal dimension and one for the vertical), neighborhood operation programs usually contain four nested loops - two for the horizontal and vertical processing of the image, and two for the horizontal and vertical processing of the neighborhood.

Convolution

One of the most powerful techniques in all of image processing is convolution. It takes many forms and can be used to perform many functions. The general convolution equation, shown below is computationally intensive.

Cx,y = å(å(Pi,j*Mi,j))/(å(å(Mi,j))

This same operation shown in a pictorial fashion follows.

At the heart of this operation is the convolution mask or kernel, shown as M in these figures, which has individual mask elements labeled Mij. It is an array of numbers that has, just like a digital image, a horizontal and vertical dimension, with size usually denoted as m x n. Each element in the mask is multiplied by the corresponding pixel value in the neighborhood of the input image, P, that has elements Px,y. The results of all the multiplications are added together, and this summation is divided by the summation of the mask values. This denominator is known as the weight of the mask. The result of the division is a pixel, Cx,y, in the convolved image, C.

Convolution requires a lot of computational power. To calculate a pixel for a given mask of size m x n, m * n multiplications, m * n - 1 additions, and one division are required. So to perform a 3 x 3 convolution on a 1024 x 1024 color image (a minimal convolution on an average-size image), 27 million multiplications, 24 million additions, and 3 million divisions are performed. For more substantial convolutions, such as 5 x 5 or 8 x 8, on larger images the amount of computation required becomes very large indeed.

Image Spatial Frequencies

In signal processing, the frequency of a signal is a measure of how often its amplitude cycles from 0 through both opposing peaks and back to 0, expressed per unit of time. In digital imagery, frequency is determined by the raising or lowering of pixel intensities in the spatial directions.

An image that is said to have a "low frequency" if the change in intensity from one pixel to the next is small. An image has high frequency if the change in intensity between adjacent pixels is large. High frequency images tend to have a lot of detail and sharp edges. Low frequency image tend to be soft or fuzzy with little fine detail. . Note that spatial frequencies occur within an image at any given angle, not just along the horizontal or vertical axes.

Low-Pass and High-Pass Filters

Some of the most common applications that use convolution are digital filters. As noted earlier images can be thought of as signals that have detectable frequencies. In an image of a bright white picket fence against a dark foliage background, for example, the transitions between the edges of the pickets and the foliage produce high frequencies: The pixel values change from high to low values very quickly. Conversely, if color changes slowly across a large area, such as a sky at sunset, that image has a low frequency. Digital signal processing techniques can be applied to images to accentuate or diminish their characteristic frequencies. These digital filters are performed via convolution using different kernels.

The following shows a 3 x 3 kernel for performing a low-pass filter operation. This is a simple kernel, each element in the kernel has a value of 1. All pixels in the input neighborhood will contribute an equal amount of their intensity to the convoluted output pixel. In other words, the output pixel is just the simple average of the input neighborhood pixels.
 
1
1
1
1
1
1
1
1
1

In regions of low spatial frequency (where a neighborhood's pixel values are about the same), the output pixel is nearly identical to the input pixels. Hence the name low-pass, implying that low-frequency areas are unchanged. High-frequency regions, though, will experience the same averaging of pixels, which tends to eliminate the rapid changes from dark to light. The following example image illustrates the frequency response of this filter, indicating that low frequencies are permitted to pass through unchanged but high frequencies are rejected. By diminishing high frequencies, which define the sharp edges, the image becomes blurred. Applying a low-pass filter also has the effect of eliminating noise from an image, such as film grain in a scanned image, since noise is nothing more than very localized high frequencies. Unfortunately, using this method to eliminate grain also causes loss of sharp edge definition, which is usually unacceptable.
 
Low-Pass Filter Response Curve    

Since low-pass filtering can be accomplished using convolution, it follows that high-pass filters also exist. A common high-pass kernel is shown below.
 
-1 -1 -1
-1 9 -1
-1 -1 -1

The 9 in the central mask location means that the corresponding input pixel will add a far greater amount of its value to the output pixel than the surrounding pixels, which have values of -1. The frequency response of this kernel is shown, along with the effect it has on an image. The high frequencies, or edges, of the image are highlighted, while the low frequencies are diminished. The visual impact of this is to make the image appear sharpened.
 
High-Pass Filter Response Curve    

It is possible to graduate the amount of sharpening incorporated into the original image. Once the high-pass convolution is performed on an image, the sharpened image can be merged or compsited with the original using the alpha filtering technique described previously. The result is a final image that is sharpened by an amount proportional to the percentage of the merging.

Edge Detection Filters

Often there is a need to detect or enhance just the edges that appear within an image. There are ways to do this using convolution. One common method is to use a Laplacian edge enhancement filter, the kernel is as follows.
 
-1 -1 -1
-1 8 -1
-1 -1 -1

Note that, unlike the convolution kernels previously discussed, the weight of the Laplacian operator is equal to zero. When pondered for a moment, this makes sense for edge detection. Suppose that this kernel, or mask, is passed over an image that has the same value at every pixel. The resulting image will be black, because the sum of each pixel times its mask value will yield zero. It also should be obvious that the final step of the convolution equation, dividing by the weight, must be ignored because dividing by zero is undefined. Also note how similar this kernel is to the high-pass filter discussed earlier; only the center value is different, being 8 instead of 9. This indicates that edge detection looks for high frequencies in images. It also should indicate just how powerful the convolution operation is, because completely different results are obtained by changing just one number by the smallest possible increment.
 

Neighborhoods of constant intensity will become zero while those having a high frequency will yield pixels of positive or negative values. Because images usually have pixels with positive values, a decision must be made about how to handle negatives. As usual, the decision depends on the application, but one of three methods usually is selected. First, the resultant image can be assumed to consist of signed values. This may be most appropriate for machine vision applications in which another program interprets the results of the edge detection operation.. If this method is chosen, the range of pixel values for, say, an 8-bit image, is no longer 0 to 255. It is now -128 through +127.

The second method involves simply setting any negative values to zero and working only with positive values. This might provide the best viewing results because it will highlight edges as white pixels on a black background.

The third method involves offsetting the calculated value so that any negative pixels are moved into the positive range. For an 8-bit image, this is done by adding 128, or half the possible intensity range, to every calculated pixel. This results is a gray image in which intensities are not changing, and bright and dark pixels appear where edges occur.

Many other edge detection filters can be implemented, and all have benefits and drawbacks. Some are variations on the standard Laplacian operator. Note that filters can be constructed to detect edges in specific directions. Due to their limited size, these kernels can only detect edges along the horizontal, vertical, and diagonals. With larger masks it is possible to detect edges at any given angle. This can be useful in applications such as satellite photo reconnaissance, where it is desirable to isolate roadways (which are generally straight for a given distance), or high-energy physics, where barely discernible traces of subatomic particles are captured when collisions occur.

Embossed Image

Edge detection is closely related to high-pass filtering. It is possible to combine these two operations in a single convolution kernel that will produce the effect of embossing. In this situation, edges are detected, but since the weight of the kernel is 1 (not 0), the bulk of the image content is retained. Because one side of the kernel contains positive numbers and the other contains negative numbers, however, any intensity gradients that correlate to this direction will be enhanced, any that are opposite to it are diminished, and any that occur perpendicular to the direction of interest are ignored.

Two other types of directional edge detection methods are Sobel and Prewitt filters. Like the Laplacian operator, they detect edges, or gradients, in images, and differ only in the numbers that are used in the convolution masks. Unlike the omnidirectional Laplacian filter, these filters can only detect gradients in a single direction. To create an omnidirectional effect, the directional images are merged or added together.
 
 
-1
0
1
-1
0
1
-1
0
1
X component Prewitt filter kernel
 
-1
-2
-1
0
0
0
1
2
1
Y direction Sobel filter kernel

Object Correlation

Filters based on convolutions need not be limited to detecting frequencies within an image. It is possible to use convolution to detect geometric shaped objects within an image. If we are asked to find all of the diamonds in an image, it is a simple matter for us to identify each one visually and point to it. If we ask a computer to do this, we must make the request in a different fashion.

The convolution kernels we have used so far have had some underpinning based on mathematics. But convolution masks can be created that correlate to specific object shapes, such as that shown below. By performing the convolution operation with this mask, the resulting image is black everywhere except where the objects of interest occur, and these are identified as bright points. Note that objects that are similar in shape to the items of interest will produce some correlation, but with less intensity.

This technique can be applied in various application areas to generate useful results. For example, a machine vision system might need to find all objects in its field of view that have a specific shape. Medical images from a microscope might show higher numbers of cells that have a unique deformity when diseased. Satellite images could be used to monitor activity at hostile airfields by counting the number of aircraft on the tarmac.

It is important to note that object correlation using convolution can become very computationally expensive, since the process usually involves large convolution kernels. It would not be unreasonable to have correlation masks that are 20 x 20 or 50 x 50 pixels in size. Convolution operations of this size require many calculations.

Nonlinear Filters

Thus far, all of the filters discussed have been linear: Each is a summation of weighted pixel intensities, which then is divided by a constant value, or weight. Filters that modify their operation as the data elements change also can be constructed. This class of filter is known as nonlinear filters, also referred to as rank or statistical filters.

Consider the 3 x 3-pixel neighborhood illustrated in Fig. 5.13. Each pixel can be ordered, or ranked, based on its intensity compared to the others in the group. Nonlinear filters use this statistical information to select the final output pixel. The median filter will choose the middle value to become the output pixel, while the minimum and maximum filters choose the smallest and greatest values, respectively. The results can be rather unexpected.

The median filter can be used to remove high-frequency noise from an image, but can introduce unwanted artifacts, especially in color images where the three planes of information are somewhat independent and the filter works on them accordingly. The minimum filter, also known as an erosion filter, will cause bright objects in an image to become thinner, while the maximum (or dilation) filter will cause bright areas to grow larger. Erosion and dilation are important morphological functions that will be discussed further in the next chapter.

Some of the harsh effects of applying these filters can be limited by not selecting the pixel value with a specific rank, but by taking the average of a few adjacent values to generate the output pixel.

Averaging three pixels usually is sufficient, because doing more than that will have a blurring effect on the image. In fact, if all pixels in the group were averaged, the results would be identical to the low-pass filter discussed earlier.

Sharpening

A high-pass filter was used earlier to sharpen an image. This method has disadvantages in that it can produce severe alterations in an image and result in undesirable visual artifacts. These effects can be minimized by merging the convoluted image with the original, but the artifacts remain. A more sophisticated technique for sharpening an image is through a multistep process called unsharp masking.

This technique not only produces a sharpened image, but is controllable with processing parameters. The process can be summarized in the following equation:

Sx,y = (c/(2c-1))*Ax,y - ((l-c)/(2c-1))*A'x,y

In this equation, A is the original image. A' is the original image that has been processed with a low-pass filter, and c is a weighting constant used to produce the resultant sharpened image, S. Unsharp masking is an interesting filter because it combines the use of a low-pass filter, which is a neighborhood operator, with multiplication and division, which are point operators. Upon examination, the equation for unsharp masking is similar to that for the alpha filter presented in an earlier section. But instead of merging two different scenes with a mask to create a composite image, this process combines two versions of the same image via a constant.

As with many image processing techniques, unsharp masking does not actually sharpen the image, but increases the apparent sharpness by modifying the contrast between bright and dark areas. This subtle change causes the boundaries between dark and bright areas to be accentuated which, in turn, causes our vision system to perceive a sharper image.

The low-pass filter applied to the image is the same one shown earlier. The degree of sharpening increases as the size of the convolution becomes greater. In other words, a 9 x 9 low-pass filter will generate more sharpening than a 3 x 3 filter. The weighting constant, c, should have a value of between 0.6 and 0.85; a lower value for this constant will produce more sharpening.

Degraining

Degraining is another specialized operation that often must be applied to digital images that originate from scanned photographic film. All film has grain, which causes small bright and dark spots to appear in digital images. Many schemes have been devised to eliminate this grain, some of which have already been discussed. The technique discussed here generates good results with minimal deterioration of image quality.

The technique is known as pseudomedian filtering and, as the name suggests, it is a form of median filtering that was discussed earlier. Recall that a median filter interrogates and ranks all pixels in a neighborhood, then selects the median value to become the new output pixel intensity. Unfortunately, this is also a form of low-pass filtering, so sharp detail may be lost during the process. The degraining algorithm considers a pixel group, such as a 5 x 5 pixel neighborhood, to be made up of a vertical and a horizontal vector. The sequence of pixels along the horizontal line are labeled a through e, and those along the vertical are f through i. The center pixel, c, is included in both sequences. These two one-dimensional sequences can be combined to form a single sequence, written as: (a,b,c,d,e,f,g,c,h,i)

Pseudomedian Filter Pixel Neighborhood

This sequence can be divided into subsequences of three adjacent pixels, namely, (a,b,c), (b,c,d), (c,d,e), and so forth, until the final subsequence, (g,h,i). The minimum value of each of these subsequences can be found, which creates a new sequence. Finally, the maximum value of this new sequence of minima can be found. This is known as the maximin operator and is defined as:

maximin = max[(min(a,b,c),min(b,c,d), min(c,d,e), ... , min(g,h,i))]

When applied to an image, this operator will remove isolated bright areas of noise. Conversely, the minimax operator, which is the minimum value of the maximum of the subsequences, has the form:

minimax = min[(max(a,b,c), max(b,c,d), max(c,d,e), ... , max(g,h,i))]

The minimax operator will remove dark noise from an image. To remove both bright and dark noise, one operator must be applied to an image, followed by the other. The order of the application does not matter. While this technique can reduce grain without being overly destructive, it does have limits and might not solve all grain problems in scanned images. It does tend to maintain overall sharpness, something that previously discussed techniques do not. This example uses a 5 x 5 pixel group, but smaller and larger pixel groups can be used. The larger the pixel groups, generally the more grain that is removed - but the cost of more grain reduction is also more action as a low-pass filter. In addition, because the operation is performed on a plus-signed array of pixels rather than on a square area, some unwanted vertical or horizontal attributes of the image are enhanced or diminished.

Sampling

All neighborhood operators mentioned so far in this chapter have been based on convolution or ranking. There is another set of operators, called geometric transformations. The most common of these functions are image rotation, scaling, and translation. Others include perspective and nonlinear warping. Before these can be presented, it is necessary to grasp the underlying concept of sampling on which they are all based.

Up until now, pixels have been presented as squares that are adjacent to one another. Another way to visualize them is as points of light that are separated from one another by a discrete distance.

Consider what happens when an image is reduced in size by some arbitrary amount. Obviously, one could think of each pixel as getting smaller, or, as the distance between pinpoints becoming less. Unfortunately, image display devices do not work this way: The resolution of a device is fixed and a pixel is a pixel. An even more abstract concept is the digital image itself. A 1024 x 1024 image has a defined spatial resolution, and just wanting to think of it as half the size, or 512 x 512, does not make it so.

Image reduction

If we wished to reduce the size of the larger image by one half, we simply could ignore every other pixel in the horizontal and vertical dimensions, as producing a new and smaller image. This new image has only one-quarter the number of pixels as the original. This is called a geometric transformation, or in this example, a scale operation. Since we selectively discarded or kept pixels of the old image to form the new, smaller image, we sampled the data of the original image in some predetermined method (in this case by ignoring every other pixel).

Forward and Inverse Transforms

The transformation of pixel locations from the original, or source, image into resultant, or destination, image is called mapping. A frame of reference for mapping must be defined. There are two ways to look at mapping. In the first, each pixel in the source image is transformed to its new location in the destination, called a forward transform. Alternatively, each pixel in the destination image is transformed to find where it was in the source image, which is an inverse transform. Either method can be used, but in forward transforms care must be used to make sure that all pixels in the destination image get filled in, or 'holes' will appear. Inverse transforms never result in holes, because every destination pixel is addressed and filled in. This, then, is the method that will be discussed. Most of the time, geometric transformations result in mappings that land between pixels. This phenomenon leads to the last concept that must be discussed, which is sampling an image in a sub-pixel fashion.

There are basically two ways of doing this: nearest neighbor sampling and interpolated sampling. Suppose that an inverse transform determines the location of a required pixel in a source image to be between four actual pixels P x,y, Px +1,y, Px,y +1, and Px+1,y+1, that have intensity values of a, b, c, and d, respectively.

No pixel exists at this location, so the output pixel must be created from the four that do exist. The easiest way to do this is with nearest neighbor sampling. This method simply selects the actual pixel that is closest to the desired location. If the fractional portion of the desired location is less than 0.5, select the preceding pixel; if it is greater, select the next pixel. This decision is performed for both the horizontal and vertical dimension, resulting in the pixel closest to the desired location being selected. For example, the pixel at location Px+1,y might be selected, and the output pixel would have the value b.

Nearest Neighbor Sampling

Nearest neighbor sampling is fast, but it can result in jagged edges and loss of apparent resolution. A much better method is interpolated sampling, which means to use proportional amounts of the surrounding four-pixel neighborhood in order to arrive at the output value. There are many ways to perform interpolation between these values, based on various mathematical techniques such a quadratic or cubic curve or spline. A common method, that produces acceptable results, is called bilinear interpolation.

If dx is the sub-pixel distance from pixel Px,y to the desired location in the horizontal dimension, then that distance is proportional to the difference in pixel intensities a and b. This interpolated intensity is called ab. Similarly, that distance is proportional to the difference of intensities c and d, so cd can be calculated. The vertical sub-pixel distance dy is then proportional to the difference in pixel interpolated intensities ab and cd, so that the final bilinear interpolation intensity value, v, can be calculated and becomes the value of the output pixel.

Bilinear Interpolation Sampling

Rotate, Scale, Translate

The most common forms of geometric transformations are rotate, scale, and translate, which also are known as affine transforms because lines that are straight and parallel in the source image remain so in the destination image. These transformations can be applied individually or in unison. If the application is designed to perform them at the same time, care must be used concerning the order in which they are done. The order of the individual transformations often will be referred to by the first letters in the action, namely RST for rotate, scale, and translate. If the order is scale, then rotate followed by translate, the abbreviation would be SRT. As an exercise, you can prove to yourself that the order of the transformations will affect the resultant image drastically.

Rotation has two operational properties that must be defined. First, the direction of the rotation must be decided. Usually, a positive angle means a counterclockwise rotation and a negative value means clockwise. The point of rotation also must be decided. Rotation usually occurs either around the center, or around the origin (upper-left comer) of the image. For rotation around the origin, the inverse transform for each point is given by the following equations:

Xs = [Xd * cos(a)] - [Yd * sin(a)]

Ys = [Xd * sin(a)] + [Yd * cos(a)]

In these equations, sin and cos are the sine and cosine functions of the angle of rotation, a. The address for each pixel in the output, or destination, image is Xd, Yd and the source address is Xs,Ys. Remember, these calculations produce floating-point source addresses, which are then used for sub-pixel sampling of the source image. If rotation about the center of the image is desired, use the following equations:

Xs = [Xd * cos(a)]-[Yd * sin(a)]+

       [Xc * (1-cos(a))]+[Yc * sin(a)]

Ys = [Xd * sin(a)]+[Yd * cos(a)]-

       [Xc * sin(a)]+[Yc * (1-cos(a))]

Here, Xc, Yc, is the center address of the source image.

Scaling is a function that either enlarges an image, referred to as scaling up, or shrinks it, called scaling down. The inverse transform equations for scaling about the origin are as follows:

     Xs = Xd/Sx

     Ys = Yd/Sy

Again, Xs, Ys and Xd, Yd are source and destination addresses and Sx and Sy are the scale factors. Scale factors that are greater than 1.0 will enlarge the source image, while factors of less than 1 will shrink an image. Though scaling in the horizontal and vertical dimensions are independent of one another, the scale factors usually are set to the same value to maintain proper pixel aspect ratio. Sometimes different scale factors for the two dimensions are needed to correct for input devices that do not create square pixels. The inverse transform equations for scaling about the center of the source image are:

Xs = Xd/Sx - [(Xc/Sx) - Xc]

Ys = Yd/Sy - [(Yc/Sy) - Yc]

There is a limit to how much an image can be enlarged before is becomes unusable. Remember that any group of scaled-up pixels are created from only four source pixels, and there is only so much information that exists. Surprisingly, shrinking an image can create unforeseen problems. Problems that do not appear when scaling up. The equations defined above operate perfectly well if the scale factor is not less than .5 or in other words, if the resolution is reduced by no more than half. For reductions greater than this, certain source pixels will not be included at all in the transformation, resulting in overly bright or overly dark spots appearing in the destination image. This is especially true in images that have high frequencies. Some edges may be completely skipped in the transformation process.

To alleviate this anomaly, it may be prudent to pre-compute a series of images in which each member is exactly one-half the resolution of the previous one. Then, during scaling operations, the application can select the appropriate prescaled image that will require nothing smaller than a .5 scale factor to achieve the final scaled-down image.

Translation of an image is the easiest of the geometric transformations. The inverse transform equations are as follows:

Xs = Xd - Tx

Ys = Yd - Ty

The source and destination address are Xs, Ys. and Yd, Yd, respectively, and the horizontal and vertical translation values are Tx and Ty. This operation might seem trivial, but keep in mind that translation offsets of a fraction of a pixel are possible, allowing sub-pixel image movement that is crucial for many applications. This function can result in attempts to access nonexistent source addresses.

Polynomial Warp

An interesting and powerful geometric transformation is warping. All transformation are generalizations of polynomial functions which can be expressed by the equations:

x' = a0 + a1y + a2x + a3xy

y' = b0 + b1y + b2x + b3xy

Note that these are identical to the rotation, scale, and translate equations stated above, except that the sine and cosine functions and scale factors and other constants have been replaced by coefficients, labels a1 and b1 and the cross term, xy, is nonexistent. This is known as a first-order polynomial, since no term contains a power of x or y greater than 1. A generalized first-order polynomial warp equation can be generated if four points in the source image are mapped to desired locations in the output image. These points, labeled dl, d2, d3, and d4, are called control points. If the corners of the source image are used as the control points, the coefficients in the polynomial equation become:

a0=x1   a1=x2-x1  a3=x4-x1  a3=x1-x2+x3-x4

b0=y1   b1=y2-y1  b3=y4-y1  b3=yl-y2+y3-y4

By defining where the control points are mapped, the image thus can be rotated, scaled, or translated in any fashion. But, the warping does not need to be an affine transformation. Perspective can be added to the image by defining different control points for mapping.

Higher order warping polynomials can be used. The equations for a second order polynomial are as follows:

X' = a0 + a1x + a2y + a3xy + a4x2 + a5y2

Y' = b0 + b1x + b2y + b3xy + b4x2 + b5y2

In addition to the terms in the first-order polynomial, there are now terms that include x2 and y2. More control points are needed to define these polynomials, but by doing so the image can be made to bend and twist as if it were made out of a pliable material. This technique, sometimes known as rubbersheeting, can be used for many applications, such as correcting for spherical aberration of optical lens systems, or warping satellite images into Mercator or other map projections. Second-order polynomials usually are sufficient for most image processing applications; third-, fourth-, or higher-order warps are sometimes necessary.

Another example of using higher-ordered polynomial warping is to implement the function known as morphing. Morphing has become commonplace in music videos, television commercials, and many other products of the entertainment industry where digital imagery is used. It is the method which makes one object appear to change into another. The process is accomplished in multiple steps.

The first step is to warp one image so that the its control points map to similar points of the second object and, conversely, define control points of the second object that map to the first. From these control points, an animated sequence (or a series of digital images) of both objects is created that incrementally warps each "normal" image into the desired "warped" image. The animation sequence for the desired final image is in reverse order, because we want to end with a "normal" image of the second object.

Finally, the images of the sequences are blended together, via alpha filtering, using a slightly different percentage of one object to the other, so that the first object appears to become the second. In other words, the first merged image in the sequence uses 100 percent of the first object and 0 percent of the second object. By the middle of the sequence, the image is generated with 50 percent of both objects. Also note that in the middle of the morph the blended image is a combination of both warped images. The last image in the sequence uses 0 percent of the first object and 100 percent of the second to complete the transformation. Naturally, the effect does not have visual impact unless the sequence is seen in motion.

Morphed Image Sequence

Complex morphs are unique in their implementation and many times require specialized manipulation to give the desired effect. For example, instead of a simple merge or cross-dissolve between images of the two sequences, certain features may dissolve before others. If two human faces are being morphed, perhaps the merging of the eyes and nose will precede the dissolving of the mouth and ears. This requires generating specialized merging masks for each frame of the sequence.

Similarly, the transitional warped images might not be simple linear interpolations between the starting and ending locations of the control points. There might be curves or even hesitations in the speed of the warps that combine to give the desired effect. No matter how complex the implementation, morphing is always a combination of polynomial warping and compositing or alpha filtering.