#+title: Using Top Down Vision to Improve the Performance of a Background Subtractor #+author: Robert McIntyre #+email: rlm@mit.edu #+description: Using Top Down Vision to Improve the Performance of a Background Subtractor. #+keywords: computer vision, AI, clojure, java, C, programming, background subtraction, human detection #+SETUPFILE: ../../aurellem/org/setup.org #+INCLUDE: ../../aurellem/org/level-0.org #+babel: :mkdirp yes :noweb yes :exports both * What is background subtraction? Background subtraction is a type of computer vision that takes a stream of images and tries to separate the "foreground" and "background" elements of a scene. It is commonly used in fixed security cameras to report on interesting events. Ultimately, the point of background subtraction is to find interesting things in a scene. * =BGS Library= is a collection of background subtraction algorithms. [[http://en.wikipedia.org/wiki/Background_subtraction][This Wikipedia article]] goes into greater detail about background subtraction, and mentions =bgslibrary=, which contains about 29 pixel based background subtraction algorithms. * Some common problems with background subtraction algorithms - missing center :: If a foreground object has a uniformly colored interior, then the inside of the object can be considered part of the background, since the pixels might not change much even though the object is moving. - misclassification :: when an object doesn't move fast enough, it might be absorbed into the background. - ghosting :: when an object starts to move after being still for a long time it leaves a hole in the background that is interpreted as part of the foreground. - high spatial frequency :: many algorithms have problems with things like grates and radiators, which change quickly from white to black along a particular spatial direction. This problem comes from embedded assumptions about the statistical distribution of pixels in the world. These errors are a result of background subtraction algorithms working solely on the pixel level. They have no concept of coherent objects or temporal or spatial continuity at an object level. Here are some example videos from algorithms in =bgslibrary= on the same video. As you watch them, notice the above errors. When you see one of the above errors, focus only on that pixel and see if you would make the same error as the algorithm without the benefit of context and common sense. #+begin_html

This is the base video that will be used in all experiments by default.

#+end_html * Some Selected Background Subtraction Algs. from =bgslibrary= I'm running a very slightly modified version of =bgslibrary= that has been instrumented to generate these videos. Otherwise, it is the same as the one available on the internet. You can find the code to each of these algorithms at http://code.google.com/p/bgslibrary/. ** FrameDifferenceBGS This is the simplest BGS algorithm as it simply differences each frame. Note how only the edges of the people are considered part of the foreground. #+begin_html

#+end_html ** DPGrimsonGMMBGS This is from Professor Grimson at MIT ([[http://www.ai.mit.edu/projects/vsam/Publications/stauffer_cvpr98_track.pdf][paper link]]). Notice how the inside of the man gets absorbed into the background everytime he stops. Also notice the problems with the radiator in the background and it's high spatial frequency. #+begin_html

#+end_html ** AdaptiveBackgroundLearning This gives good examples of "ghosting", where the man leaves an afterimage in the foreground everytime he moves after stopping for a while. #+begin_html

#+end_html ** LBFuzzyAdaptiveSOM Of all the algorithms in =bgslibrary=, this complicated algorithm using self organizing maps was the best in my opinion. It uses the first 30 or so frames to train its self organizing maps (note the great improvement of the segmentation of the guy after the first second of video.) Notice how it has trouble with the high spatial frequency of the radiator in the center. #+begin_html

#+end_html * Using top-down vision to improve background subtraction I use a human detector made by Co57 to determine the locations of people of interest. The Co57 system gives bounding boxes around people that it detects, but these bounding boxes can sometimes have errors. The idea here is to use the areas that are *not* in the bounding boxes to quickly build an accurate model of the background. As long as a pixel is not in a bounding box, the background model continuously updates to accommodate any new values. When the pixel does enter into the bounding box, this updating process stops. The pixel is considered part of the background if it continues to match previous recorded values when it was not in a bounding box. But if it changes its value a little bit while inside a bounding box, then it is considered part of the foreground. I call this algorithm the /bbbgs/ algorithm, because it combines Bounding Boxes with BackGround Subtraction. * Output of =bbbgs= ** Wide Bounding Boxes The =bbbgs= algorithm works a little better if the raw bounding boxes are widened a by a factor of 1.8. #+begin_src clojure (process-capture-26 (File. "/home/r/proj/bbbgs/render/1.8/") {:hist 4 :thresh 10 :bb-scale-width 1.8 :bb-scale-height 1}) #+end_src #+begin_html

The top is foreground, middle is bounding boxes, and bottom is background.

#+end_html ** Tight Bounding Boxes These are the raw bounding boxes returned from Co57. #+begin_src clojure (process-capture-26 (File. "/home/r/proj/bbbgs/render/1/") {:hist 4 :thresh 10 :bb-scale-width 1 :bb-scale-height 1}) #+end_src #+begin_html

The top is foreground, middle is bounding boxes, and bottom is background.

#+end_html Notice how this system automatically includes a low pass filter for the bounding boxes, both temporally and spatially, and how even persistent incorrect bounding boxes do not affect the foreground/background separation very much. This method tolerates too-large bounding boxes much better than too-small bounding boxes. Also notice how the algorithm does not give any information about the region that is always in a bounding box. I find this preferable to getting it wrong and putting the girl in the background, as all the other algorithms do. With more accurate bounding boxes, this method would be almost perfect. ** =bbbgs= algorithm Here is some =c= code that implements the bbbgs algorithm using OpenCV. #+begin_src c #include #include #include "bb.h" #include "org_aurellem_genesis_Bbbgs.h" using namespace cv; using namespace std; Mat image; // eventually will be able to set this. float fps = 30; int history_depth = 1; void set_history_depth(int depth){ history_depth = depth; } int height; int width; int history_available = 0; int current_history_entry = 0; Mat bounding_box_mask; Mat foreground_mat; Mat foreground_mask; Mat background_mask; Mat background_mat; Mat temp_float_image; Mat LAB_image; Mat history; Mat history_index; Mat history_rgb; Mat history_rgb_index; // set height and width on first frame, and our "history matrix" // which holds the last few images and is used to derive the // background image. void bg_init(int _width, int _height){ height = _height; width = _width; int sz[] = {height, width, history_depth}; // 9001,9001,9001 is an impossible LAB color history = Mat(3, sz, CV_32FC3, Scalar::all(9001)); history_index = Mat::zeros(height, width, CV_16U); history_rgb = Mat(3, sz, CV_8UC3, Scalar::all(120)); history_rgb_index = Mat::zeros(height, width, CV_16U); background_mask = Mat::zeros(height, width, CV_8U); foreground_mask = Mat::ones(height, width, CV_8U); bounding_box_mask = Mat::zeros(height, width, CV_8U); foreground_mat = Mat::zeros(height, width, CV_8UC3); background_mat = Mat::zeros(height, width, CV_8UC3); } void add_bounding_box(int left, int top, int _width, int _height){ int row,col; cout << "adding bounding box!\n"; for (row = top; row < top + _height; row++){ for (col = left; col < left + _width; col++){ bounding_box_mask.at(row,col) = 1; } } } void convert_input(Mat input){ input.convertTo(temp_float_image, CV_32FC3); temp_float_image *= 1./255; cvtColor(temp_float_image, LAB_image, CV_BGR2Luv); } int ROW_TEST = 200; int COL_TEST = 300; // for each pixel in the LAB image, add it to the history if it's not // in a bounding box. Update history_index accordingly. void record_history(){ int row,col; for(row=0; row(row,col) == 0){ int index = history_index.at(row,col); history.at(row,col,index) = LAB_image.at(row,col); history_index.at(row,col) = ((index + 1) % history_depth); history_rgb.at(row,col,index) = image.at(row,col); history_rgb_index.at(row,col) = ((index + 1) % history_depth); if ((row == ROW_TEST && (col == COL_TEST))){ cout << "registering "; cout << LAB_image.at(row,col); cout << "\n"; } } } } } float bg_threshold = 1; void set_threshold(float thresh){ bg_threshold = thresh; } // if any of the elements in the history matches, mark that pixel as // part of the background, and add it to the history. void calculate_masks(){ Vec3f current, background; int row,col,index; Vec3b zero = Vec3b(0,0,0); Vec3b lol = Vec3b(200,200,200); for(row=0; row(row,col); for(index=0; index(row,col,index); if ((row == ROW_TEST && (col == COL_TEST))){ cout << "current "; cout << current; cout << "\nhistory "; cout << background; cout << "\n"; cout << "norm difference = "; cout << norm(current - background); cout << "\n"; } if (norm(current - background) < bg_threshold){ if ((row == ROW_TEST && (col == COL_TEST))){ cout << "pixel is in BACKGROUND\n"; } background_mat.at(row,col) = image.at(row,col); foreground_mat.at(row,col) = zero; goto scan; } else { if ((row == ROW_TEST && (col == COL_TEST))){ cout << "pixel is in FOREGROUND\n"; } background_mat.at(row,col) = history_rgb.at(row,col,index); foreground_mat.at(row,col) = image.at(row,col); } } scan: ; } } } Mat bounding_box_mask_image(){ return bounding_box_mask * 255; } void process_image(Mat input){ image = input; convert_input(input); record_history(); calculate_masks(); // reset the bounding_box_mask now that it has done its job. bounding_box_mask = Mat::zeros(height, width, CV_8U); } Mat foreground_image(){ return foreground_mat; } Mat background_image(){ return background_mat; } #+end_src ** some debug code #+begin_src c void print_image(Mat image){ int type = image.type(); cout << "Type : "; if (CV_32FC3 == type){ cout << "CV_32FC3\n"; printf("Some Values: %f %f %f %f\n", image.at(3,3), image.at(90,8), image.at(100,100), image.at(200,200)); } else if (CV_8UC3 == type){ cout << "CV_8UC3\n"; printf("Some Values: %d %d %d %d\n", image.at(3,3), image.at(90,8), image.at(100,100), image.at(200,200)); } else { cout << type; cout << " -- unknown\n"; } printf("rows: %d\ncols: %d\nchannels : %d\n", image.rows, image.cols, image.channels()); } #+end_src ** bbbgs =C= API #+begin_src c #include #include void process_image(cv::Mat image); void add_bounding_box(int left, int top, int width, int height); void bg_init(int width, int height); cv::Mat foreground_image(); cv::Mat background_image(); cv::Mat bounding_box_mask_image(); void print_image(cv::Mat image); #+end_src ** JNI interface #+begin_src java package org.aurellem.genesis; import org.opencv.core.Mat; public class Bbbgs { public Bbbgs(){} public static void printImage(Mat image){ n_printImage(image.nativeObj); } private static native void n_printImage(long matAddr); public static void processImage(Mat image){ n_processImage(image.nativeObj); } private static native void n_processImage(long matAddr); public static native void addBoundingBox(int left, int top, int width, int height); public static native void bgInit(int width, int height); public static Mat foregroundImage(){ Mat retVal = new Mat(n_foregroundImage()); return retVal; } public static Mat backgroundImage(){ Mat retVal = new Mat(n_backgroundImage()); return retVal; } public static Mat boundingBoxImage(){ Mat retVal = new Mat(n_boundingBoxImage()); return retVal; } private static native long n_foregroundImage(); private static native long n_backgroundImage(); private static native long n_boundingBoxImage(); public static native void setThreshold(float thresh); public static native void setHistoryDepth(int depth); } #+end_src #+begin_src c ////////////////////////////////////////////////// /////// JNI Stuff ////////// ////////////////////////////////////////////////// /* * Class: org_aurellem_genesis_Bbbgs * Method: n_printImage * Signature: (J)V */ JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_n_1printImage (JNIEnv* env, jclass clazz, jlong mat_addr){ Mat* mat = (Mat*) mat_addr; print_image(*mat); } /* * Class: org_aurellem_genesis_Bbbgs * Method: n_processImage * Signature: (J)V */ JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_n_1processImage (JNIEnv* env, jclass clazz, jlong mat_addr){ Mat* mat = (Mat*) mat_addr; process_image(*mat); } /* * Class: org_aurellem_genesis_Bbbgs * Method: addBoundingBox * Signature: (IIII)V */ JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_addBoundingBox (JNIEnv* env, jclass clazz, jint left, jint top, jint width, jint height){ add_bounding_box(left, top, width, height); } /* * Class: org_aurellem_genesis_Bbbgs * Method: bgInit * Signature: (II)V */ JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_bgInit (JNIEnv* env, jclass clazz, jint width, jint height){ bg_init(width, height); } /* * Class: org_aurellem_genesis_Bbbgs * Method: n_foregroundImage * Signature: ()J */ JNIEXPORT jlong JNICALL Java_org_aurellem_genesis_Bbbgs_n_1foregroundImage (JNIEnv* env, jclass clazz){ Mat foreground = foreground_image(); return (jlong) new Mat(foreground); } /* * Class: org_aurellem_genesis_Bbbgs * Method: n_backgroundImage * Signature: ()J */ JNIEXPORT jlong JNICALL Java_org_aurellem_genesis_Bbbgs_n_1backgroundImage (JNIEnv* env, jclass clazz){ Mat background = background_image(); return (jlong) new Mat(background); } /* * Class: org_aurellem_genesis_Bbbgs * Method: n_boundingBoxImage * Signature: ()J */ JNIEXPORT jlong JNICALL Java_org_aurellem_genesis_Bbbgs_n_1boundingBoxImage (JNIEnv* env, jclass clazz){ Mat bb = bounding_box_mask_image(); return (jlong) new Mat(bb); } /* * Class: org_aurellem_genesis_Bbbgs * Method: setThreshold * Signature: (F)V */ JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_setThreshold (JNIEnv* env, jclass clazz, jfloat thresh){ set_threshold(thresh); } /* * Class: org_aurellem_genesis_Bbbgs * Method: setHistoryDepth * Signature: (I)V */ JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_setHistoryDepth (JNIEnv* env, jclass clazz, jint new_depth){ set_history_depth(new_depth); } #+end_src ** clojure driver #+begin_src clojure (ns org.aurellem.bbbgs) (import org.opencv.core.Core) (import org.opencv.video.BackgroundSubtractorMOG) (import org.opencv.core.Mat) (import org.opencv.core.MatOfByte) (import org.opencv.highgui.Highgui) (import javax.imageio.ImageIO) (import org.aurellem.genesis.Bbbgs) (import org.opencv.video.BackgroundSubtractorMOG) (import org.opencv.core.Mat) (import java.io.ByteArrayInputStream) (import org.opencv.core.MatOfByte) (import org.opencv.highgui.Highgui) (use 'cortex.util) (use 'cortex.sense) (use 'org.aurellem.co57-cache) (defn load-opencv [] (clojure.lang.RT/loadLibrary "opencv_java246")) (defn load-bbbgs [] (clojure.lang.RT/loadLibrary "bb+bgs")) (defn mat->bufferedImage [mat] (let [byte-matrix (MatOfByte.)] (Highgui/imencode ".jpg" mat byte-matrix) (ImageIO/read (ByteArrayInputStream. (.toArray byte-matrix))))) (extend-type org.opencv.core.Mat Viewable (view [mat] (view (mat->bufferedImage mat)))) (defn run-test [] (load-opencv) (load-bbbgs) (Bbbgs/bgInit 640 480) (Bbbgs/addBoundingBox 100 100 200 300) (Bbbgs/processImage (test-image)) (view (Bbbgs/foregroundImage))) (defn add-bounding-box [title bb-scale-width bb-scale-height frame-num] (dorun (map (fn [bb] (let [width (Math/abs (- (:left bb) (:right bb))) height (Math/abs (- (:top bb) (:bottom bb))) center-x (+ (:left bb) (/ width 2)) center-y (+ (:top bb) (/ height 2)) scaled-top (max 0 (- center-y (* bb-scale-height (/ height 2)))) scaled-left (max 0 (- center-x (* bb-scale-width (/ width 2))))] (Bbbgs/addBoundingBox scaled-left scaled-top (min (* bb-scale-width width) 640) (min (* bb-scale-height height) 480)))) (bounding-boxes title frame-num)))) (defn frame->mat [title frame-num] (let [input-image-name (format "/home/r/proj/genesis/movies/capture-26/capture%07d.png" frame-num)] (Highgui/imread input-image-name))) (defn process-capture-26-frame [title display-input display-foreground display-background display-bb bb-scale-width bb-scale-height frame-num] (let [input-image (frame->mat title frame-num)] (add-bounding-box title bb-scale-width bb-scale-height frame-num) (display-bb (mat->bufferedImage (Bbbgs/boundingBoxImage))) (Bbbgs/processImage input-image) (display-input (mat->bufferedImage input-image)) (display-foreground (mat->bufferedImage (Bbbgs/foregroundImage))) (display-background (mat->bufferedImage (Bbbgs/backgroundImage))) )) (defn prime-video [title start end bb-scale-width bb-scale-height] (dorun (for [frame-num (reverse (range start end))] (do (add-bounding-box title bb-scale-width bb-scale-height frame-num) (Bbbgs/processImage (frame->mat title frame-num)))))) (defn process-capture-26 [base {hist :hist thresh :thresh bb-scale-width :bb-scale-width bb-scale-height :bb-scale-height}] (let [title "capture-26.mp4" input (view-image nil "input image") bounding-box (view-image (if base (File. base "bounding-box")) "bounding-box") foreground (view-image (if base (File. base "foreground")) "foreground") background (view-image (if base (File. base "background")) "background")] (load-opencv) (load-bbbgs) (Bbbgs/setHistoryDepth (int hist)) (Bbbgs/bgInit 640 480) (Bbbgs/setThreshold (float thresh)) ;;(prime-video title 50 100 bb-scale-width bb-scale-height) (dorun (for [frame (range 30 2000)] (process-capture-26-frame title input foreground background bounding-box bb-scale-width bb-scale-height frame))))) (defn reset [] (Bbbgs/bgInit 640 480)) (defn test-bbbgs [base] (process-capture-26 base {:hist 4 :thresh 10 :bb-scale-width 1 :bb-scale-height 1})) #+end_src * More Background Subtraction Algorithms from =bgslibrary=. ** DPAdaptiveMedianBGS #+begin_html

#+end_html ** DPEigenbackgroundBGS #+begin_html

#+end_html ** DPMeanBGS #+begin_html

#+end_html ** DPPratiMediodBGS #+begin_html

#+end_html ** DPTextureBGS #+begin_html

#+end_html ** DPWrenGABGS #+begin_html

#+end_html ** DPZivkovicAGMMBGS #+begin_html

#+end_html ** LBAdaptiveSOM #+begin_html

#+end_html ** LBFuzzyGaussian #+begin_html

#+end_html ** LBMixtureOfGaussians #+begin_html

#+end_html ** LBSimpleGaussian #+begin_html

#+end_html ** LbpMrf #+begin_html

#+end_html ** MultiLayerBGS #+begin_html

#+end_html ** PBAS #+begin_html

#+end_html ** StaticFrameDifferenceBGS #+begin_html

#+end_html ** T2FGMM_UM #+begin_html

#+end_html ** T2FGMM_UV #+begin_html

#+end_html ** T2FMRF_UM #+begin_html

#+end_html ** T2FMRF_UV #+begin_html

#+end_html ** VuMeter #+begin_html

#+end_html ** WeightedMovingMeanBGS #+begin_html

#+end_html ** WeightedMovingVarianceBGS #+begin_html

#+end_html ** MixtureOfGaussianV2BGS #+begin_html

#+end_html ** MultiLayerBGS #+begin_html

#+end_html * Generate videos An example of how to use ffmpeg, xargs, and find together to render multiple videos. #+begin_src sh #!/bin/sh #!/bin/sh BASE_DIR=output find $BASE_DIR -maxdepth 2 -mindepth 2 -type d -print0 | \ xargs -L 1 -t -0 -I'{}' \ ffmpeg -y -framerate 30 -i {}/%07d.png -b:v 9000K -c:v libtheora -r 30 {}.ogg find $BASE_DIR -maxdepth 1 -mindepth 1 -type d -print0 | \ xargs -L 1 -t -0 -I'{}' \ ffmpeg -y -i {}/foreground.ogg -vf \ 'pad=iw:2*ih [top]; movie={}/background.ogg [bottom]; [top][bottom] overlay=0:main_h/2' \ -b:v 9000k -c:v libtheora {}.ogg find $BASE_DIR -iname "*.ogg" -size 0 -delete find $BASE_DIR -maxdepth 1 -mindepth 1 -type d -print0 | \ xargs -L 1 -t -0 -I'{}' \ cp -n {}/foreground.ogg {}.ogg #+end_src #+begin_src sh ffmpeg -i foreground.ogg -vf \ "pad=iw:3*ih [top];"\ "movie=./bounding-box.ogg [middle];"\ "[top][middle] overlay=0:main_h/3 [upper];"\ "movie=./background.ogg [bottom]; [upper][bottom]"\ " overlay=0:main_h*2/3"\ -b:v 9000k -c:v libtheora bbbgs.ogg #+end_src * Source Listing - [[../src/bb+bgs][bb.cpp]] - [[../clojure/src/org/aurellem/bbbgs.clj][bbbgs.clj]] #+html:

This org file

- [[http://hg.aurellem.com ][source-repository]]