#+title: Using Top Down Vision to Improve the Performance of a Background Subtractor
#+author: Robert McIntyre
#+email: rlm@mit.edu
#+description: Using Top Down Vision to Improve the Performance of a Background Subtractor.
#+keywords: computer vision, AI, clojure, java, C, programming, background subtraction, human detection
#+SETUPFILE: ../../aurellem/org/setup.org
#+INCLUDE: ../../aurellem/org/level-0.org
#+babel: :mkdirp yes :noweb yes :exports both
* What is background subtraction?
Background subtraction is a type of computer vision that takes a
stream of images and tries to separate the "foreground" and
"background" elements of a scene. It is commonly used in fixed
security cameras to report on interesting events. Ultimately, the
point of background subtraction is to find interesting things in a
scene.
* =BGS Library= is a collection of background subtraction algorithms.
[[http://en.wikipedia.org/wiki/Background_subtraction][This Wikipedia article]] goes into greater detail about background
subtraction, and mentions =bgslibrary=, which contains about 29 pixel
based background subtraction algorithms.
* Some common problems with background subtraction algorithms
- missing center :: If a foreground object has a uniformly colored
interior, then the inside of the object can be
considered part of the background, since the
pixels might not change much even though the
object is moving.
- misclassification :: when an object doesn't move fast enough, it
might be absorbed into the background.
- ghosting :: when an object starts to move after being still for a
long time it leaves a hole in the background that is
interpreted as part of the foreground.
- high spatial frequency :: many algorithms have problems with things
like grates and radiators, which change quickly from white to
black along a particular spatial direction. This problem comes
from embedded assumptions about the statistical distribution of
pixels in the world.
These errors are a result of background subtraction algorithms working
solely on the pixel level. They have no concept of coherent objects or
temporal or spatial continuity at an object level. Here are some
example videos from algorithms in =bgslibrary= on the same video. As
you watch them, notice the above errors. When you see one of the above
errors, focus only on that pixel and see if you would make the same
error as the algorithm without the benefit of context and common
sense.
#+begin_html
#+end_html
* Some Selected Background Subtraction Algs. from =bgslibrary=
I'm running a very slightly modified version of =bgslibrary= that has
been instrumented to generate these videos. Otherwise, it is the same
as the one available on the internet.
You can find the code to each of these algorithms at
http://code.google.com/p/bgslibrary/.
** FrameDifferenceBGS
This is the simplest BGS algorithm as it simply differences each
frame. Note how only the edges of the people are considered part of
the foreground.
#+begin_html
#+end_html
** DPGrimsonGMMBGS
This is from Professor Grimson at MIT ([[http://www.ai.mit.edu/projects/vsam/Publications/stauffer_cvpr98_track.pdf][paper link]]). Notice how the
inside of the man gets absorbed into the background everytime he
stops. Also notice the problems with the radiator in the background
and it's high spatial frequency.
#+begin_html
#+end_html
** AdaptiveBackgroundLearning
This gives good examples of "ghosting", where the man leaves an
afterimage in the foreground everytime he moves after stopping for a
while.
#+begin_html
#+end_html
** LBFuzzyAdaptiveSOM
Of all the algorithms in =bgslibrary=, this complicated algorithm
using self organizing maps was the best in my opinion. It uses the
first 30 or so frames to train its self organizing maps (note the
great improvement of the segmentation of the guy after the first
second of video.) Notice how it has trouble with the high spatial
frequency of the radiator in the center.
#+begin_html
#+end_html
* Using top-down vision to improve background subtraction
I use a human detector made by Co57 to determine the locations of
people of interest. The Co57 system gives bounding boxes around people
that it detects, but these bounding boxes can sometimes have errors.
The idea here is to use the areas that are *not* in the bounding boxes
to quickly build an accurate model of the background. As long as a
pixel is not in a bounding box, the background model continuously
updates to accommodate any new values. When the pixel does enter into
the bounding box, this updating process stops. The pixel is considered
part of the background if it continues to match previous recorded
values when it was not in a bounding box. But if it changes its value
a little bit while inside a bounding box, then it is considered part
of the foreground.
I call this algorithm the /bbbgs/ algorithm, because it combines
Bounding Boxes with BackGround Subtraction.
* Output of =bbbgs=
** Wide Bounding Boxes
The =bbbgs= algorithm works a little better if the raw bounding boxes
are widened a by a factor of 1.8.
#+begin_src clojure
(process-capture-26
(File. "/home/r/proj/bbbgs/render/1.8/")
{:hist 4 :thresh 10 :bb-scale-width 1.8
:bb-scale-height 1})
#+end_src
#+begin_html
#+end_html
** Tight Bounding Boxes
These are the raw bounding boxes returned from Co57.
#+begin_src clojure
(process-capture-26
(File. "/home/r/proj/bbbgs/render/1/")
{:hist 4 :thresh 10 :bb-scale-width 1
:bb-scale-height 1})
#+end_src
#+begin_html
#+end_html
Notice how this system automatically includes a low pass filter for
the bounding boxes, both temporally and spatially, and how even
persistent incorrect bounding boxes do not affect the
foreground/background separation very much. This method tolerates
too-large bounding boxes much better than too-small bounding
boxes. Also notice how the algorithm does not give any information
about the region that is always in a bounding box. I find this
preferable to getting it wrong and putting the girl in the background,
as all the other algorithms do.
With more accurate bounding boxes, this method would be almost
perfect.
** =bbbgs= algorithm
Here is some =c= code that implements the bbbgs algorithm using
OpenCV.
#+begin_src c
#include
#include
#include "bb.h"
#include "org_aurellem_genesis_Bbbgs.h"
using namespace cv;
using namespace std;
Mat image;
// eventually will be able to set this.
float fps = 30;
int history_depth = 1;
void set_history_depth(int depth){
history_depth = depth;
}
int height;
int width;
int history_available = 0;
int current_history_entry = 0;
Mat bounding_box_mask;
Mat foreground_mat;
Mat foreground_mask;
Mat background_mask;
Mat background_mat;
Mat temp_float_image;
Mat LAB_image;
Mat history;
Mat history_index;
Mat history_rgb;
Mat history_rgb_index;
// set height and width on first frame, and our "history matrix"
// which holds the last few images and is used to derive the
// background image.
void bg_init(int _width, int _height){
height = _height;
width = _width;
int sz[] = {height, width, history_depth};
// 9001,9001,9001 is an impossible LAB color
history = Mat(3, sz, CV_32FC3, Scalar::all(9001));
history_index = Mat::zeros(height, width, CV_16U);
history_rgb = Mat(3, sz, CV_8UC3, Scalar::all(120));
history_rgb_index = Mat::zeros(height, width, CV_16U);
background_mask = Mat::zeros(height, width, CV_8U);
foreground_mask = Mat::ones(height, width, CV_8U);
bounding_box_mask = Mat::zeros(height, width, CV_8U);
foreground_mat = Mat::zeros(height, width, CV_8UC3);
background_mat = Mat::zeros(height, width, CV_8UC3);
}
void add_bounding_box(int left, int top, int _width, int _height){
int row,col;
cout << "adding bounding box!\n";
for (row = top; row < top + _height; row++){
for (col = left; col < left + _width; col++){
bounding_box_mask.at(row,col) = 1;
}
}
}
void convert_input(Mat input){
input.convertTo(temp_float_image, CV_32FC3);
temp_float_image *= 1./255;
cvtColor(temp_float_image, LAB_image, CV_BGR2Luv);
}
int ROW_TEST = 200;
int COL_TEST = 300;
// for each pixel in the LAB image, add it to the history if it's not
// in a bounding box. Update history_index accordingly.
void record_history(){
int row,col;
for(row=0; row(row,col) == 0){
int index = history_index.at(row,col);
history.at(row,col,index) = LAB_image.at(row,col);
history_index.at(row,col) =
((index + 1) % history_depth);
history_rgb.at(row,col,index) = image.at(row,col);
history_rgb_index.at(row,col) =
((index + 1) % history_depth);
if ((row == ROW_TEST && (col == COL_TEST))){
cout << "registering ";
cout << LAB_image.at(row,col);
cout << "\n";
}
}
}
}
}
float bg_threshold = 1;
void set_threshold(float thresh){
bg_threshold = thresh;
}
// if any of the elements in the history matches, mark that pixel as
// part of the background, and add it to the history.
void calculate_masks(){
Vec3f current, background;
int row,col,index;
Vec3b zero = Vec3b(0,0,0);
Vec3b lol = Vec3b(200,200,200);
for(row=0; row(row,col);
for(index=0; index(row,col,index);
if ((row == ROW_TEST && (col == COL_TEST))){
cout << "current ";
cout << current;
cout << "\nhistory ";
cout << background;
cout << "\n";
cout << "norm difference = ";
cout << norm(current - background);
cout << "\n";
}
if (norm(current - background) < bg_threshold){
if ((row == ROW_TEST && (col == COL_TEST))){
cout << "pixel is in BACKGROUND\n";
}
background_mat.at(row,col) = image.at(row,col);
foreground_mat.at(row,col) = zero;
goto scan;
}
else {
if ((row == ROW_TEST && (col == COL_TEST))){
cout << "pixel is in FOREGROUND\n";
}
background_mat.at(row,col) =
history_rgb.at(row,col,index);
foreground_mat.at(row,col) = image.at(row,col);
}
}
scan: ;
}
}
}
Mat bounding_box_mask_image(){
return bounding_box_mask * 255;
}
void process_image(Mat input){
image = input;
convert_input(input);
record_history();
calculate_masks();
// reset the bounding_box_mask now that it has done its job.
bounding_box_mask = Mat::zeros(height, width, CV_8U);
}
Mat foreground_image(){
return foreground_mat;
}
Mat background_image(){
return background_mat;
}
#+end_src
** some debug code
#+begin_src c
void print_image(Mat image){
int type = image.type();
cout << "Type : ";
if (CV_32FC3 == type){
cout << "CV_32FC3\n";
printf("Some Values: %f %f %f %f\n",
image.at(3,3),
image.at(90,8),
image.at(100,100),
image.at(200,200));
}
else if (CV_8UC3 == type){
cout << "CV_8UC3\n";
printf("Some Values: %d %d %d %d\n",
image.at(3,3),
image.at(90,8),
image.at(100,100),
image.at(200,200));
}
else {
cout << type;
cout << " -- unknown\n";
}
printf("rows: %d\ncols: %d\nchannels : %d\n",
image.rows, image.cols, image.channels());
}
#+end_src
** bbbgs =C= API
#+begin_src c
#include
#include
void process_image(cv::Mat image);
void add_bounding_box(int left, int top, int width, int height);
void bg_init(int width, int height);
cv::Mat foreground_image();
cv::Mat background_image();
cv::Mat bounding_box_mask_image();
void print_image(cv::Mat image);
#+end_src
** JNI interface
#+begin_src java
package org.aurellem.genesis;
import org.opencv.core.Mat;
public class Bbbgs {
public Bbbgs(){}
public static void printImage(Mat image){
n_printImage(image.nativeObj);
}
private static native void n_printImage(long matAddr);
public static void processImage(Mat image){
n_processImage(image.nativeObj);
}
private static native void n_processImage(long matAddr);
public static native void
addBoundingBox(int left, int top, int width, int height);
public static native void bgInit(int width, int height);
public static Mat foregroundImage(){
Mat retVal = new Mat(n_foregroundImage());
return retVal;
}
public static Mat backgroundImage(){
Mat retVal = new Mat(n_backgroundImage());
return retVal;
}
public static Mat boundingBoxImage(){
Mat retVal = new Mat(n_boundingBoxImage());
return retVal;
}
private static native long n_foregroundImage();
private static native long n_backgroundImage();
private static native long n_boundingBoxImage();
public static native void setThreshold(float thresh);
public static native void setHistoryDepth(int depth);
}
#+end_src
#+begin_src c
//////////////////////////////////////////////////
/////// JNI Stuff //////////
//////////////////////////////////////////////////
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: n_printImage
* Signature: (J)V
*/
JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_n_1printImage
(JNIEnv* env, jclass clazz, jlong mat_addr){
Mat* mat = (Mat*) mat_addr;
print_image(*mat);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: n_processImage
* Signature: (J)V
*/
JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_n_1processImage
(JNIEnv* env, jclass clazz, jlong mat_addr){
Mat* mat = (Mat*) mat_addr;
process_image(*mat);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: addBoundingBox
* Signature: (IIII)V
*/
JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_addBoundingBox
(JNIEnv* env, jclass clazz, jint left, jint top, jint width, jint height){
add_bounding_box(left, top, width, height);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: bgInit
* Signature: (II)V
*/
JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_bgInit
(JNIEnv* env, jclass clazz, jint width, jint height){
bg_init(width, height);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: n_foregroundImage
* Signature: ()J
*/
JNIEXPORT jlong JNICALL Java_org_aurellem_genesis_Bbbgs_n_1foregroundImage
(JNIEnv* env, jclass clazz){
Mat foreground = foreground_image();
return (jlong) new Mat(foreground);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: n_backgroundImage
* Signature: ()J
*/
JNIEXPORT jlong JNICALL Java_org_aurellem_genesis_Bbbgs_n_1backgroundImage
(JNIEnv* env, jclass clazz){
Mat background = background_image();
return (jlong) new Mat(background);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: n_boundingBoxImage
* Signature: ()J
*/
JNIEXPORT jlong JNICALL Java_org_aurellem_genesis_Bbbgs_n_1boundingBoxImage
(JNIEnv* env, jclass clazz){
Mat bb = bounding_box_mask_image();
return (jlong) new Mat(bb);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: setThreshold
* Signature: (F)V
*/
JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_setThreshold
(JNIEnv* env, jclass clazz, jfloat thresh){
set_threshold(thresh);
}
/*
* Class: org_aurellem_genesis_Bbbgs
* Method: setHistoryDepth
* Signature: (I)V
*/
JNIEXPORT void JNICALL Java_org_aurellem_genesis_Bbbgs_setHistoryDepth
(JNIEnv* env, jclass clazz, jint new_depth){
set_history_depth(new_depth);
}
#+end_src
** clojure driver
#+begin_src clojure
(ns org.aurellem.bbbgs)
(import org.opencv.core.Core)
(import org.opencv.video.BackgroundSubtractorMOG)
(import org.opencv.core.Mat)
(import org.opencv.core.MatOfByte)
(import org.opencv.highgui.Highgui)
(import javax.imageio.ImageIO)
(import org.aurellem.genesis.Bbbgs)
(import org.opencv.video.BackgroundSubtractorMOG)
(import org.opencv.core.Mat)
(import java.io.ByteArrayInputStream)
(import org.opencv.core.MatOfByte)
(import org.opencv.highgui.Highgui)
(use 'cortex.util)
(use 'cortex.sense)
(use 'org.aurellem.co57-cache)
(defn load-opencv []
(clojure.lang.RT/loadLibrary "opencv_java246"))
(defn load-bbbgs []
(clojure.lang.RT/loadLibrary "bb+bgs"))
(defn mat->bufferedImage [mat]
(let [byte-matrix (MatOfByte.)]
(Highgui/imencode ".jpg" mat byte-matrix)
(ImageIO/read
(ByteArrayInputStream.
(.toArray byte-matrix)))))
(extend-type org.opencv.core.Mat
Viewable
(view [mat]
(view (mat->bufferedImage mat))))
(defn run-test []
(load-opencv)
(load-bbbgs)
(Bbbgs/bgInit 640 480)
(Bbbgs/addBoundingBox 100 100 200 300)
(Bbbgs/processImage (test-image))
(view (Bbbgs/foregroundImage)))
(defn add-bounding-box
[title
bb-scale-width
bb-scale-height
frame-num]
(dorun
(map
(fn [bb]
(let [width (Math/abs (- (:left bb) (:right bb)))
height (Math/abs (- (:top bb) (:bottom bb)))
center-x (+ (:left bb) (/ width 2))
center-y (+ (:top bb) (/ height 2))
scaled-top (max 0 (- center-y (* bb-scale-height (/ height 2))))
scaled-left (max 0 (- center-x (* bb-scale-width (/ width 2))))]
(Bbbgs/addBoundingBox
scaled-left scaled-top
(min (* bb-scale-width width) 640)
(min (* bb-scale-height height) 480))))
(bounding-boxes title frame-num))))
(defn frame->mat [title frame-num]
(let [input-image-name
(format
"/home/r/proj/genesis/movies/capture-26/capture%07d.png"
frame-num)]
(Highgui/imread input-image-name)))
(defn process-capture-26-frame
[title
display-input
display-foreground
display-background
display-bb
bb-scale-width
bb-scale-height
frame-num]
(let [input-image (frame->mat title frame-num)]
(add-bounding-box title bb-scale-width bb-scale-height frame-num)
(display-bb (mat->bufferedImage (Bbbgs/boundingBoxImage)))
(Bbbgs/processImage input-image)
(display-input (mat->bufferedImage input-image))
(display-foreground (mat->bufferedImage (Bbbgs/foregroundImage)))
(display-background (mat->bufferedImage (Bbbgs/backgroundImage)))
))
(defn prime-video [title start end bb-scale-width bb-scale-height]
(dorun
(for [frame-num (reverse (range start end))]
(do
(add-bounding-box title bb-scale-width bb-scale-height frame-num)
(Bbbgs/processImage (frame->mat title frame-num))))))
(defn process-capture-26
[base
{hist :hist thresh :thresh
bb-scale-width :bb-scale-width
bb-scale-height :bb-scale-height}]
(let [title "capture-26.mp4"
input (view-image nil "input image")
bounding-box
(view-image (if base (File. base "bounding-box")) "bounding-box")
foreground
(view-image (if base (File. base "foreground")) "foreground")
background
(view-image (if base (File. base "background")) "background")]
(load-opencv)
(load-bbbgs)
(Bbbgs/setHistoryDepth (int hist))
(Bbbgs/bgInit 640 480)
(Bbbgs/setThreshold (float thresh))
;;(prime-video title 50 100 bb-scale-width bb-scale-height)
(dorun
(for [frame (range 30 2000)]
(process-capture-26-frame
title input foreground background
bounding-box bb-scale-width bb-scale-height frame)))))
(defn reset []
(Bbbgs/bgInit 640 480))
(defn test-bbbgs [base]
(process-capture-26
base
{:hist 4 :thresh 10 :bb-scale-width 1
:bb-scale-height 1}))
#+end_src
* More Background Subtraction Algorithms from =bgslibrary=.
** DPAdaptiveMedianBGS
#+begin_html
#+end_html
** DPEigenbackgroundBGS
#+begin_html
#+end_html
** DPMeanBGS
#+begin_html
#+end_html
** DPPratiMediodBGS
#+begin_html
#+end_html
** DPTextureBGS
#+begin_html
#+end_html
** DPWrenGABGS
#+begin_html
#+end_html
** DPZivkovicAGMMBGS
#+begin_html
#+end_html
** LBAdaptiveSOM
#+begin_html
#+end_html
** LBFuzzyGaussian
#+begin_html
#+end_html
** LBMixtureOfGaussians
#+begin_html
#+end_html
** LBSimpleGaussian
#+begin_html
#+end_html
** LbpMrf
#+begin_html
#+end_html
** MultiLayerBGS
#+begin_html
#+end_html
** PBAS
#+begin_html
#+end_html
** StaticFrameDifferenceBGS
#+begin_html
#+end_html
** T2FGMM_UM
#+begin_html
#+end_html
** T2FGMM_UV
#+begin_html
#+end_html
** T2FMRF_UM
#+begin_html
#+end_html
** T2FMRF_UV
#+begin_html
#+end_html
** VuMeter
#+begin_html
#+end_html
** WeightedMovingMeanBGS
#+begin_html
#+end_html
** WeightedMovingVarianceBGS
#+begin_html
#+end_html
** MixtureOfGaussianV2BGS
#+begin_html
#+end_html
** MultiLayerBGS
#+begin_html
#+end_html
* Generate videos
An example of how to use ffmpeg, xargs, and find together to render
multiple videos.
#+begin_src sh
#!/bin/sh
#!/bin/sh
BASE_DIR=output
find $BASE_DIR -maxdepth 2 -mindepth 2 -type d -print0 | \
xargs -L 1 -t -0 -I'{}' \
ffmpeg -y -framerate 30 -i {}/%07d.png -b:v 9000K -c:v libtheora -r 30 {}.ogg
find $BASE_DIR -maxdepth 1 -mindepth 1 -type d -print0 | \
xargs -L 1 -t -0 -I'{}' \
ffmpeg -y -i {}/foreground.ogg -vf \
'pad=iw:2*ih [top]; movie={}/background.ogg [bottom]; [top][bottom] overlay=0:main_h/2' \
-b:v 9000k -c:v libtheora {}.ogg
find $BASE_DIR -iname "*.ogg" -size 0 -delete
find $BASE_DIR -maxdepth 1 -mindepth 1 -type d -print0 | \
xargs -L 1 -t -0 -I'{}' \
cp -n {}/foreground.ogg {}.ogg
#+end_src
#+begin_src sh
ffmpeg -i foreground.ogg -vf \
"pad=iw:3*ih [top];"\
"movie=./bounding-box.ogg [middle];"\
"[top][middle] overlay=0:main_h/3 [upper];"\
"movie=./background.ogg [bottom]; [upper][bottom]"\
" overlay=0:main_h*2/3"\
-b:v 9000k -c:v libtheora bbbgs.ogg
#+end_src
* Source Listing
- [[../src/bb+bgs][bb.cpp]]
- [[../clojure/src/org/aurellem/bbbgs.clj][bbbgs.clj]]
#+html:
- [[http://hg.aurellem.com ][source-repository]]