Priming Deep Pedestrian Detection with Geometric Context
IEEE International Conference on Robotics and Automation (ICRA) |
We investigate the role of geometric context in deep neural networks to establish better pedestrian detectors that are more robust to occlusions. Notwithstanding their demonstrated successes, deep object detectors under-perform in crowded scenes with high intra-category occlusions. One brute-force solution is to collect a large number of labeled training samples under occlusion, but the combinatorial increase in the labeling effort makes it an unaffordable solution. We argue that a promising and complementary direction to solve this problem is to bring geometric context to modulate feature learning in a DNN. We identify that an effective way to leverage geometric context is to induce it in two steps – through early fusion, by guiding region proposal generation to focus on occluded regions, and through late fusion, by penalizing misalignments of bounding boxes in both 2D and 3D. Our experiments on multiple state-of-the-art DNN detectors and detection benchmarks clearly demonstrates that our proposed method outperforms strong baselines by an average of 5%.