Although the principle of the stereo vision is simple, the stereo matching process is usually computationally intensive, which limits the speed of the entire stereo vision system. To solve this problem, researchers introduce the concept of the region of interesting (ROI). After extracting the ROI, the algorithm can only focus on the target area in the ROI and ignores the background. In Ref. [
3], the ROI is pre-defined in image’s center, and the pixel intensity in the ROI is used to define the threshold to segment target and background. Reference [
4] also set the ROI at the center of the image and correspondingly place the target to ignore the background. However, in practical applications, it is difficult to ensure that the target is always in the center of the image. So in Ref. [
5], the researchers proposed a method to automatically extract ROI, but this method is limited to weld inspecting on metal plates. Reference [
6] proposed a ROI extraction algorithm based on target detection, which can detect target objects with different poses and positions in a complex background. In Ref. [
7], a target detection algorithm using the histogram of gradient (HOG) feature is applied to recognize objects and estimate pose for a service robot. The above studies show that using target detection to extract ROI is more universal. However, most of the target detection algorithms perform the detection in the full image, which is computationally intensive. We introduce object tracking into stereo vision system to perform the ROI extraction, since target tracking usually has better real-time performance than object detection.