|
|
Learning Block Group Sparse Representation Combined with Convolutional Neural Networks for RGB-D Object Recognition
|
College of Information, South China Agricultural University, Guangzhou 510642, China
|
|
|
Abstract RGB-D (Red, Green and Blue-Depth) cameras are novel sensing systems that can improve image
recognition by providing high quality color and depth information in computer vision. In this paper we
propose a model to study feature representation of combined Convolutional Neural Networks (CNN) and
Block Group Sparse Coding (BGSC). Firstly, CNN is used to extract low-level features from raw RGB-
D images directly by applying unsupervised algorithm. Then, BGSC is used to obtain higher feature
representation for classification by incorporating both the group structure for low-level features and the
block structure for the dictionary in subsequent learning processes. Experimental results show that the
CNN-BGSC approach has higher accuracy on a household RGB-D object dataset by linear predictive
classifier than using Convolutional and Recursive Neural Networks (CNN-RNN), Group Sparse Coding
(GSC), and Sparse Representation base Classification (SRC).
|
|
|
|
|
Cite this article: |
Shuqin Tu,Yueju Xue,Jinfeng Wang, et al. Learning Block Group Sparse Representation Combined with Convolutional Neural Networks for RGB-D Object Recognition
[J]. Journal of Fiber Bioengineering and Informatics, 2014, 7(4): 603-613.
|
|
[1] Li FF, Fergus R, Perona P. One-shot learning of object categories. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2006; 28(4): 594-611.
[2] Krizhevsky A. Learning multiple layers of features from tiny images. Master thesis, Department
of Computer Science, University of Toronto, 2009.
S. Tu et al. / Journal of Fiber Bioengineering and Informatics 7:4 (2014) 603{613 613
[3] Microsoft Kinect. http://www.xbox.com/en-us/kinect.
[4] PrimeSense. http://www.primesense.com/.
[5] Lai K, Bo L, Ren X, D. Fox. A large-scale hierarchical multi-view RGB-D object dataset. In ICRA,
2011; 1817-1824.
[6] Johnson A. Spin-Images: A representation for 3-D surface matching. PhD thesis, Robotics Insti-
tute, Carnegie Mellon University. 1997.
[7] Koppula HS, Anand A, Joachims T, Saxena A. Semantic labeling of 3d point clouds for indoor
scenes. In: NIPS, 2011.
[8] Bo L, Ren X, Fox D. Depth kernel descriptors for object recognition. In: IROS, 2011.
[9] Blum M, Springenberg JT, Wlfing J, Riedmiller M. A learned feature descriptor for object recog-
nition in RGB-D data. In: ICRA, 2012.
[10] Socher R, Huval, B, Bhat B, Manning CD, Andrew Y. Convolutional-Recursive deep learning for
3D object classi¯cation. In: NIPS 2012: 1-9.
[11] Yu KT, Tseng SH, Fu LC. Learning hierachical representation with sparsity for RGB-D object
recognition. IEEE: RSJ, 2012.
[12] Bengio S, Pereira F, Singer Y, Strelow D. Group sparse coding. in: NIPS, 2009; 22: 82-89.
[13] Ramirez I, Sprechmann P, Sapiro G. Classification and clustering via dictionary learning with
structured incoherence and shared features. CVPR, 2010: 3501-3508.
[14] Chi YT, Ali M, Rajwade A. Block and group regularized sparse modeling for dictionary learning.
CVPR, 2013.
[15] Lowe DG. Distinctive Image features from scale-invariant keypoints. International Journal of Com-
puter Vision, 2004: 60(2): 91-110.
[16] Bay H, Tuytelaars T, Gool LV. Surf: Speeded up robust features. In Proceedings of the 9th
European Conference on Computer Vision (ECCV), 2006.
[17] Hinton GE. Learning multiple layers of representation. Trends in cognitive sciences, 2007; 11:
428-34.
[18] Ciresan DC, Meier U, Masci J, Luca M. High-performance neural networks for visual object clas-
si¯cation. Technical report, Dalle Molle Institute for Arti¯cial Intelligence, Manno, Switzerland,
2011.
[19] Le QV, Ngiam J, Chen ZH, Chia D, Koh PW, Ng AY. Tiled convolutional neural networks.
Advances in Neural Information Processing Systems, 2010.
[20] Socher R, Pennington J, Huang EH, Ng AY. Semi-supervised recursive autoencoders for predicting
sentiment distributions. In: EMNLP, 2011.
[21] Socher R, Lin C, Ng AY, Manning CD. Parsing natural scenes and natural language with recursive
neural networks. In: ICML, 2011.
[22] Farabet C, Couprie C, Najman L, LeCun Y. Scene parsing with multiscale feature learning purity
trees and optimal covers. In: ICML, 2012.
[23] Elhamifar E, Vidal R. Robust classi¯cation using structured sparse representation. In CVPR, 2011:
1873-1879.
[24] Wright J, Yang A, Ganesh A, Sastry S, Ma Y. Robust face recognition via sparse representation.
IEEE Trans. Pattern Analysis and Machine Intelligence. 2009; 31(2): 210-227.
[25] Zhang ST, Huang JZ, Huang YC. Automatic image annotation using group sparsity. Computer
Vision and Pattern Recognition (CVPR), 2010.
[26] Elhamifar E, Vidal R. Block-sparse recovery via convex optimization. Signal Process. IEEE Trans.
2012; (99): 1-14.
[27] Coates A, Ng AY, Lee H. An analysis of single-layer networks in unsupervised feature learning.
Journal of Machine Learning Research Proceedings Track: AISTATS, 2011. |
|
|
|