# depth discontinuities by pixel-to-pixel stereo tomasi/papers/tomasi/ ¢ depth...

Post on 23-Feb-2020

1 views

Embed Size (px)

TRANSCRIPT

International Journal of Computer Vision 35(3), 269–293 (1999) c© 1999 Kluwer Academic Publishers. Manufactured in The Netherlands.

Depth Discontinuities by Pixel-to-Pixel Stereo

STAN BIRCHFIELD AND CARLO TOMASI Department of Computer Science, Stanford University, Stanford, CA 94305

birchfield@cs.stanford.edu

tomasi@cs.stanford.edu

Received June 17, 1997; Revised August 2, 1999

Abstract. An algorithm to detect depth discontinuities from a stereo pair of images is presented. The algorithm matches individual pixels in corresponding scanline pairs, while allowing occluded pixels to remain unmatched, then propagates the information between scanlines by means of a fast postprocessor. The algorithm handles large untextured regions, uses a measure of pixel dissimilarity that is insensitive to image sampling, and prunes bad search nodes to increase the speed of dynamic programming. The computation is relatively fast, taking about 600 nanoseconds per pixel per disparity on a personal computer. Approximate disparity maps and precise depth discontinuities (along both horizontal and vertical boundaries) are shown for several stereo image pairs containing textured, untextured, fronto-parallel, and slanted objects in indoor and outdoor scenes.

Keywords: stereo matching, depth discontinuities, dynamic programming, untextured scenes, image sampling

1. Introduction

Cartoon artists have known the perceptual importance of depth discontinuities for a long time. To create the illusion of depth, they paint characters and background on different layers of acetate, being careful to ensure a crisp delineation of foreground objects. Similarly, in human stereo vision, depth discontinuities are vividly perceived and help to carve out distinct objects as well as to elucidate the distance relations between them.

Depth discontinuities play a fundamental role in im- age understanding, as acknowledged by the insightful “layers” representation of image sequences (Wang and Adelson, 1994). For data-reduction, depth disconti- nuities are similar to intensity edges in preserving the “interesting” aspects of an image with much less data (Attneave, 1954; Malik and Perona, 1990), and they serve a purpose similar to segmentation because they tend to carve out the distinct objects in a scene (Chen and Lin, 1997; Gamble, 1989; Little and Gillett, 1990; Poggio et al., 1988). Depth discontinuities are more powerful than intensity edges, however, because they

truly tend to outline the contours of objects rather than changes in pigmentation or illumination, and unlike the elusive problem of segmentation, depth discontinuities are well-defined.

In this paper we present a method for detecting depth discontinuities from a stereo pair of images that first computes a dense disparity map and then labels those points that exhibit a significant change in disparity. (A threshold is unfortunately inherent in any problem of estimating discontinuities from a discrete function.)

Many traditional stereo algorithms, such as those based on correlation windows, tend to blur depth dis- continuities. Our algorithm, on the other hand, uses neither windows nor preprocessing of the intensities, and matches the individual pixels in one image with those in the other image. As a result, we preserve sharp changes in disparity while introducing few false discontinuities, with far less computation than would be required if disparities were computed to subpixel resolution (which of course would be necessary for the more common goal of accurate scene reconstr- uction). Thus, by sacrificing accurate disparities our

270 Birchfield and Tomasi

algorithm is able to quickly compute crisp and accurate discontinuities.

Like several previous algorithms (Belhumeur and Mumford, 1992; Cox et al., 1996; Geiger et al., 1995; Intille and Bobick, 1994), our approach first matches different epipolar scanline pairs independently, and de- tects occlusions simultaneously with a disparity map, using a form of dynamic programming. Then, a post- processor propagates information between the scan- lines to refine the disparity map, from which the depth discontinuities are detected.

Our approach contains four novelties: (1) a method for handling large untextured regions which present a challenge to many existing stereo algorithms; (2) a measure of pixel dissimilarity that overcomes the image sampling problem; (3) a technique to reduce dramat- ically the running time of dynamic programming by pruning unlikely search nodes; and (4) a postprocessor that quickly propagates disparities between scanlines to produce a cleaner disparity map. The combination of avoiding subpixel resolution, pruning bad nodes, and fast postprocessing results in an efficient algorithm that takes 600 nanoseconds per pixel per disparity on a per- sonal computer, making it a candidate for real-time implementation. We demonstrate the algorithm’s per- formance in a wide range of scenarios: indoor and out- door scenes, textured and untextured objects, textured and untextured backgrounds, curved and planar sur- faces, specular and matte surfaces, and fronto-parallel and slanted surfaces.

The paper is organized as follows. After briefly reviewing closely related work in Section 3, we de- scribe our algorithm for matching two scanlines inde- pendently in Section 3. The following three sections then explain in more detail the three novelties of the matching process: a method for handling untextured regions, a pixel dissimilarity measure that is insensi- tive to sampling, and a pruning strategy for improving the speed over standard dynamic programming. The discussion of our algorithm ends in Section 7 with a presentation of the postprocessor. Sections 8 and 9 con- tain, respectively, experimental results on several pairs of images and a thorough demonstration of the impor- tance of our dissimilarity measure. The final section contains some of our conclusions.

2. Previous Work

Some researchers have explored the possibility of detecting depth discontinuities from stereo images directly, by purely local means. Little and Gillett (1990)

propose two algorithms, the first of which matches pix- els in the two images while keeping track of the pixels that lie in the forbidden zone1 of at least one match, for all possible disparities. From these pixels, which are basically the occluded pixels, the depth discontinu- ities are inferred. In the second algorithm, a search is conducted for each pixel in one image to find a corre- sponding match in the other image, and a depth discon- tinuity is declared if there is more than one good match. This second approach is similar to the mixture-motion methods aimed at detecting motion discontinuities by identifying small regions that contain more than a sin- gle motion (Black and Anandan, 1990; Spoerri and Ullman, 1987).

Toh and Forrest (1990) describe a local method in which the stereo cameras are assumed to be fixated on an intensity edge roughly perpendicular to the baseline: A depth discontinuity is declared if either the left or the right sides of the boundary do not match. This work was extended by Wixson (1993), whose algorithm de- tects and links near-vertical edges in untextured im- ages, then matches the edges from the two images. Once correspondence is established, the left and right regions of the edges are examined, and if one of them does not match well, then a depth discontinuity is de- clared. Since the region size depends on the amount of texture in the region, Wixson’s extension is more global than the other methods.

There are two main limitations with these tech- niques. First, the local methods require texture through- out the images because depth discontinuities cannot be detected locally in the absence of texture. Secondly, the identification of occlusions with depth discontinuities is not correct. Algorithms which make this connection are not able to detect depth discontinuities that lie along boundaries parallel to the baseline, because these dis- continuities do not give rise to occlusion. Horizontal boundaries abound in man-made scenes and cannot be ignored.

As mentioned before, another promising approach is to first compute a dense disparity map, and then to de- tect the sharp changes in disparity. With the exception of the cooperative algorithm of Marr and Poggio (1976) which was tested only on random-dot stereograms, early stereo algorithms (Baker and Binford, 1981; Ohta and Kanade, 1985; Grimson, 1985) tended to smooth the disparities across the depth discontinuities, mainly due to their reliance upon simple interpolation schemes that did not allow for sharp changes in disparity. Un- fortunately, the popular technique of window-based correlation has the same problem, and the solution of

Depth Discontinuities by Pixel-to-Pixel Stereo 271

iteratively reducing the window size based on the amount of disparity variation within the window is computationally expensive (Jones and Malik, 1992; Kanade and Okutomi, 1994).

More recent stereo algorithms incorporate the phe- nomena of occlusions and depth discontinuities at an early stage and are therefore capable, with varying de- grees of success, of preserving sharp changes in dispar- ity (Belhumeur and Mumford, 1992; Cox et al., 1996; Fua, 1991; Geiger et al., 1995; Intille and Bobick, 1994; Jones and Malik, 1992; Luo and Burkhardt, 1995). Our approach builds on this line of work, resulting in crisper, more accurate depth discontinu- ities on a wider range of images, and with much less computation.

3. Matching Scanlines

We pose the stereo correspondence problem as the problem of matching pixels in one scanline to pixels in the corresponding scanline in the other