注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

dp: 生活的脚步,进步的点滴...

Cam、DSP、FPGA、PM、Life、More ...

 
 
 

日志

 
 

Stereo Vision Depth Estimation Accuracy  

2015-08-14 10:51:45|  分类: 默认分类 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

I am doing a research in stereo vision and I am interested in accuracy of depth estimation in this question. It depends of several factors like:

  • Proper stereo calibration (rotation, translation and distortion extraction),
  • image resolution,
  • camera and lens quality (the less distortion, proper color capturing),
  • matching features between two images.

Let's say we have a no low-cost cameras and lenses (no cheap webcams etc).

My question is, what is the accuracy of depth estimation we can achieve in this field? Anyone knows a real stereo vision system that works with some accuracy? Can we achieve 1 mm depth estimation accuracy?

My question also aims in systems implemented in opencv. What accuracy did you manage to achieve?

asked Mar 31 '14 at 14:41
marol
2,30411025
1 
maybe have a look at: A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithm by Seitz et al. nyx-www.informatik.uni-bremen.de/1007/1/seitz_cvpr06.pdf from their evaluation they see accuracy at about 1mm. Although they don't treat traditional binocular reconstruction, they plan to publish studies about that somewhen later. –  Micka Mar 31 '14 at 14:51 
   
Thanks for the response. Although this is not exactly I need, in my case I would rather want to know if there is working solution in industry and what is the accuracy of it. Anyway, i will look more in 3d reconstruction, maybe it will help me in the topic. –  marol Mar 31 '14 at 17:09

I would add that using color is a bad idea even with expensive cameras - just use the gradient of gray intensity. Some producers of high-end stereo cameras (for example Point Grey) used to rely on color and then switched to grey. Also consider a bias and a variance as two components of a stereo matching error. This is important since using a correlation stereo, for example, with a large correlation window would average depth (i.e. model the world as a bunch of fronto-parallel patches) and reduce the bias while increasing the variance and vice versa. So there is always a trade-off.

More than the factors you mentioned above, the accuracy of your stereo will depend on the specifics of the algorithm. It is up to an algorithm to validate depth (important step after stereo estimation) and gracefully patch the holes in textureless areas. For example, consider back-and_force validation (matching R to L should produce the same candidates as matching L to R), blob noise removal (non Gaussian noise typical for stereo matching removed with connected component algorithm), texture validation (invalidate depth in areas with weak texture), uniqueness validation (having a uni-modal matching score without second and third strong candidates. This is typically a short cut to back-and_force validation), etc. The accuracy will also depend on sensor noise and sensor's dynamic range.

Finally you have to ask your question about accuracy as a function of depth since d=f*B/z, where B is a baseline between cameras, f is focal length in pixels and z is the distance along optical axis. Thus there is a strong dependence of accuracy on the baseline and distance.

Kinect will provide 1mm accuracy (bias) with quite large variance up to 1m or so. Then it sharply goes down. Kinect would have a dead zone up to 50cm since there is no sufficient overlap of two cameras at a close distance. And yes - Kinect is a stereo camera where one of the cameras is simulated by an IR projector.

I am sure with probabilistic stereo such as Belief Propagation on Markov Random Fields one can achieve a higher accuracy. But those methods assume some strong priors about smoothness of object surfaces or particular surface orientation. See this for example, page 14.

answered Mar 31 '14 at 17:45
Vlad
2,42611020
   
Thanks for helpful advice. It will take some time to go through information you provided and I am wondering about following case: let's say an object I measure distance to is so contrasty that stereo matching problem almost disappears. Then we don't have to create depth maps so we only focus on calibration and rectification, finally applying function of depth d=f*B/z. I'm wondering what about that case. On other hand, Kinect example is promising although in may case I have up to 60 cm space. – marol Apr 1 '14 at 8:38 
1 
As much as I normally find Vlad's answers usually quite on target, there are some inaccuracies in his answer above. Specifically: (a) The reconstruction accuracy per se is entirely and exclusively dependent on the properties of the optics and accuracy of calibration. The matching algorithm only selects which 3D one triangulates, not how accurately they are triangulated. However, in general, one can reconstruct more accurately than they can match. (b) Using color is an excellent idea - if you can control it, e.g. using a structured light projector. (c) Kinect: structured-light != stereo –  Francesco Callari Apr 1 '14 at 12:10 
   
Sorry, but your are wrong on all points, Francesco. In matching, an algorithm is the key since it often makes erroneous matches that affect reconstruction accuracy. All other problems like optics are easy to fix. People write papers on comparing accuracy of different algorithms. Using color is a terrible idea since it is more noisy than grey because color pixels get less light. Just look at history of stereo cameras. Kinect is not related to structured light, though often called that. It uses IR light to create texture but overall acts as a stereo camera with one camera being a projector. –  Vlad Apr 1 '14 at 15:53 
   
Briefly (see answer below): (a) grossly wrong matches affect only bias, otherwise agree. (a2) people write papers (and patents) on what you call "easy to fix" too (b) Color: I did say "if you can control it", right? If you have to use an OTS color sensor, the light distribution on the pixels is already disuniform. Then throwing away color is silly, esp. if you can control the lighting (think randomly colored pattern). (c) You are free to use the terms you like, but in CV literature stereo == multiple cameras AND unknown lighting+geometry, whereas Kinect uses knowledge of the light pattern. –  Francesco CallariApr 1 '14 at 18:20 

Q. Anyone knows a real stereo vision system that works with some accuracy? Can we achieve 1 mm depth estimation accuracy?

Yes, you definitely can achieve 1mm (and much better) depth estimation accuracy with a stereo rig (heck, you can do stereo recon with a pair of microscopes). Stereo-based industrial part inspection systems with accuracies in the 0.1 mm range are in routine use, and have been since the early 1990's at least. To be clear, by "stereo-based" I mean a 3D reconstruction system using 2 or more geometrically separated sensors, where the 3D location of a point is inferred by triangulating matched images of the 3D point in the sensors. Such a system may use structured light projectors to help with the image matching, however, unlike a proper "structured light-based 3D reconstruction system", it does not rely on a calibrated geometry for the light projector itself.

However, most (likely, all) such stereo systems designed for high accuracy use either some form of structured lighting, or some prior information about the geometry of the reconstructed shapes (or a combination of both), in order to tightly constrain the matching of points to be triangulated. The reason is that, generally speaking, one can triangulate more accurately than they can match, so matching accuracy is the limiting factor for reconstruction accuracy.

One intuitive way to see why this is the case is to look at the simple form of the stereo reconstruction equation: z = f b / d. Here "f" (focal lenght) and "b" (baseline) summarize the properties of the rig, and they are estimated by calibration, whereas "d" (disparity) expresses the match of the two images of the same 3D point.

Now, crucially, the calibration parameters are "global" ones, and they are estimated based on many measurements taken over the field of view and depth range of interest. Therefore, assuming the calibration procedure is unbiased and that the system is approximately time-invariant, the errors in each of the measurements are averaged out in the parameter estimates. So it is possible, by taking lots of measurements, and by tightly controlling the rig optics, geometry and environment (including vibrations, temperature and humidity changes, etc), to estimate the calibration parameters very accurately, that is, with unbiased estimated values affected by uncertainty of the order of the sensor's resolution, or better, so that the effect of their residual inaccuracies can be neglected within a known volume of space where the rig operates.

However, disparities are point-wise estimates: one states that point p in left image matches (maybe) point q in right image, and any error on (q - p) appears in z scaled by f b. It's a one-shot thing. Worse, the estimation of disparity is, in all nontrivial cases, affected by the (a-priori unknown) geometry and surface properties of the object being analyzed, and by their interaction with the lighting. These conspire - through whatever matching algorithm one uses - to reduce the practical accuracy of reconstruction one can achieve. Structured lighting helps here because it reduces such matching uncertainty: the basic idea is to project sharp, well-focused edges on the object that can be found and matched (often, with subpixel accuracy) in the images. There is a plethora of structured light methods, so I won't go in any details here. But I note that this is an area where using color can help a lot.

So, what you can achieve in practice depends, as usual, on how much money you are willing to spend (better optics, lower-noise sensor, rigid materials and design for the rig's mechanics, controlled lighting), and on how well you understand and can constrain your particular reconstruction problem.

answered Apr 1 '14 at 13:11
   
I think there is a typo in your equation z = f d / b, it should be z = f b / d. –  AldurDiscipleApr 29 '14 at 15:48
   
Thanks, I fixed it –  Francesco Callari Apr 29 '14 at 17:56

If you wan't to know a bit more about accuracy of the approaches take a look at this site, although is no longer very active the results are pretty much state of the art. Take into account that a couple of the papers presented there went to create companies. What do you mean with real stereo vision system? If you mean commercial there aren't many, most of the commercial reconstruction systems work with structured light or directly scanners. This is because (you missed one important factor in your list), the texture is a key factor for accuracy (or even before that correctness); a white wall cannot be reconstructed by a stereo system unless texture or structured light is added. Nevertheless, in my own experience, systems that involve variational matching can be very accurate (subpixel accuracy in image space) which is generally not achieved by probabilistic approaches. One last remark, the distance between cameras is also important for accuracy: very close cameras will find a lot of correct matches and quickly but the accuracy will be low, more distant cameras will find less matches, will probably take longer but the results could be more accurate; there is an optimal conic region defined in many books. After all this blabla, I can tell you that using opencv one of the best things you can do is do an initial cameras calibration, use Brox's optical flow to find find matches and reconstruct.

answered Apr 1 '14 at 8:39
paghdv
345212
   
Yes I exactly meant commercial systems. I'm also thinking about using structured light to make stereo matching problem less hard. I will have a look about variational matching if it can help me to solve the problem. Thanks for the response. –  marol Apr 1 '14 at 8:46
   
Also to gave more to the state of the art: vimeo.com/channels/465969 (The guy is doing 3D reconstruction) –  marol Apr 1 '14 at 8:56
   
This is more advanced since illumination is part of the model (use of BRDF's), but, this has to be done with studio captures with controlled illumination. Quality is superb though. –  paghdv Apr 1 '14 at 10:14
  评论这张
 
阅读(336)| 评论(0)
推荐

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016