Learning Block Group Sparse Representation Combined with Convolutional Neural Networks for RGB-D Object Recognition

doi:10.3993/jfbi12201413

Abstract
Figure/Table
References (0)
Related Citation (15)

Download: PDF (859 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract RGB-D (Red, Green and Blue-Depth) cameras are novel sensing systems that can improve image recognition by providing high quality color and depth information in computer vision. In this paper we propose a model to study feature representation of combined Convolutional Neural Networks (CNN) and Block Group Sparse Coding (BGSC). Firstly, CNN is used to extract low-level features from raw RGB- D images directly by applying unsupervised algorithm. Then, BGSC is used to obtain higher feature representation for classification by incorporating both the group structure for low-level features and the block structure for the dictionary in subsequent learning processes. Experimental results show that the CNN-BGSC approach has higher accuracy on a household RGB-D object dataset by linear predictive classifier than using Convolutional and Recursive Neural Networks (CNN-RNN), Group Sparse Coding (GSC), and Sparse Representation base Classification (SRC).

Key words： RGB-D Convolutional Neural Networks Block Group Sparse Coding Classification Recognition Feature Learning Methods

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Shuqin Tu
	Yueju Xue
	Jinfeng Wang
	Xiaolin Huang
	Xiao Zhang

Cite this article:

Shuqin Tu,Yueju Xue,Jinfeng Wang, et al. Learning Block Group Sparse Representation Combined with Convolutional Neural Networks for RGB-D Object Recognition [J]. Journal of Fiber Bioengineering and Informatics, 2014, 7(4): 603-613.

[1] Li FF, Fergus R, Perona P. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006; 28(4): 594-611. [2] Krizhevsky A. Learning multiple layers of features from tiny images. Master thesis, Department of Computer Science, University of Toronto, 2009. S. Tu et al. / Journal of Fiber Bioengineering and Informatics 7:4 (2014) 603{613 613 [3] Microsoft Kinect. http://www.xbox.com/en-us/kinect. [4] PrimeSense. http://www.primesense.com/. [5] Lai K, Bo L, Ren X, D. Fox. A large-scale hierarchical multi-view RGB-D object dataset. In ICRA, 2011; 1817-1824. [6] Johnson A. Spin-Images: A representation for 3-D surface matching. PhD thesis, Robotics Insti- tute, Carnegie Mellon University. 1997. [7] Koppula HS, Anand A, Joachims T, Saxena A. Semantic labeling of 3d point clouds for indoor scenes. In: NIPS, 2011. [8] Bo L, Ren X, Fox D. Depth kernel descriptors for object recognition. In: IROS, 2011. [9] Blum M, Springenberg JT, Wlfing J, Riedmiller M. A learned feature descriptor for object recog- nition in RGB-D data. In: ICRA, 2012. [10] Socher R, Huval, B, Bhat B, Manning CD, Andrew Y. Convolutional-Recursive deep learning for 3D object classi¯cation. In: NIPS 2012: 1-9. [11] Yu KT, Tseng SH, Fu LC. Learning hierachical representation with sparsity for RGB-D object recognition. IEEE: RSJ, 2012. [12] Bengio S, Pereira F, Singer Y, Strelow D. Group sparse coding. in: NIPS, 2009; 22: 82-89. [13] Ramirez I, Sprechmann P, Sapiro G. Classification and clustering via dictionary learning with structured incoherence and shared features. CVPR, 2010: 3501-3508. [14] Chi YT, Ali M, Rajwade A. Block and group regularized sparse modeling for dictionary learning. CVPR, 2013. [15] Lowe DG. Distinctive Image features from scale-invariant keypoints. International Journal of Com- puter Vision, 2004: 60(2): 91-110. [16] Bay H, Tuytelaars T, Gool LV. Surf: Speeded up robust features. In Proceedings of the 9th European Conference on Computer Vision (ECCV), 2006. [17] Hinton GE. Learning multiple layers of representation. Trends in cognitive sciences, 2007; 11: 428-34. [18] Ciresan DC, Meier U, Masci J, Luca M. High-performance neural networks for visual object clas- si¯cation. Technical report, Dalle Molle Institute for Arti¯cial Intelligence, Manno, Switzerland, 2011. [19] Le QV, Ngiam J, Chen ZH, Chia D, Koh PW, Ng AY. Tiled convolutional neural networks. Advances in Neural Information Processing Systems, 2010. [20] Socher R, Pennington J, Huang EH, Ng AY. Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP, 2011. [21] Socher R, Lin C, Ng AY, Manning CD. Parsing natural scenes and natural language with recursive neural networks. In: ICML, 2011. [22] Farabet C, Couprie C, Najman L, LeCun Y. Scene parsing with multiscale feature learning purity trees and optimal covers. In: ICML, 2012. [23] Elhamifar E, Vidal R. Robust classi¯cation using structured sparse representation. In CVPR, 2011: 1873-1879. [24] Wright J, Yang A, Ganesh A, Sastry S, Ma Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Analysis and Machine Intelligence. 2009; 31(2): 210-227. [25] Zhang ST, Huang JZ, Huang YC. Automatic image annotation using group sparsity. Computer Vision and Pattern Recognition (CVPR), 2010. [26] Elhamifar E, Vidal R. Block-sparse recovery via convex optimization. Signal Process. IEEE Trans. 2012; (99): 1-14. [27] Coates A, Ng AY, Lee H. An analysis of single-layer networks in unsupervised feature learning. Journal of Machine Learning Research Proceedings Track: AISTATS, 2011.