[논문리뷰] Multimodal Deep Learning for Robust RGB-D Object Recognition.

나름 전문가/Vision for Robotics 2016. 3. 3. 20:59

Original Paper Link :http://ais.informatik.uni-freiburg.de/publications/papers/eitel15iros.pdf

•	A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard. Multimodal Deep Learning for Robust RGB-D Object Recognition. In Proc. of the IEEE Int. Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015. [ bib \| .pdf ]

Abstract—Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams – one for each modality – which are consecutively combined with a late fusion network. We focus on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, we introduce a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns. We present state of- the-art results on the RGB-D object dataset [15] and show recognition in challenging RGB-D real-world noisy settings.

Architecture : two CNN (each for RGB, Depth Data) -> Fully connected Layer -> Classifier

Fine Tune Area-> After fully connected layer (not fine tuning the CNN area)

Also, Suggested pre-processing method to transform gray-scale depth image to the image with 3 channel like RGB

so that uses same CNN structure with RGB data. As a final result, the surface normal was best performance. But There are computational advantages for suggested approach (Color mapping between red:near to blue:far)

Another suggestion was fitting of image size for also using IMAGE-NET base CNN Structure

for object recognition, simple warping by directly rescaling the original image can detrimental to object recognition performance (especially for death image, it could distract shapes and ratios of objects. So, They tried to fill out the undefined area with longer side edges to keep the correct ratio of the object

-> number 1 : regions that are filled by proposed method (seems like stretched edges)

Surely, with RGB-D data makes more improved accuracy.

They also suggested noise reduction approaches (Measure the specific sensor noise in advance, and filtering out the noise), but this had some problem with small objects, noise filtering removed the not only noise but also the object!!

Surface normal was better performance, but computationally advantages are needed be concern

Comment : It is easy way to start the deep learning with pre-trained model because of the pretty long time for training the model and evaluation the model itself. That’s why most of applications using deep learning tried to adapt their sensors on per-trained model structure. even they generate unmeaning full datas and expedition of it.

And also they usually uses large amount of data set that are annotated in advance by other research institute. That’s why the deep learning approach cannot be adapted well, and there are few online/robot learning studies of deep learning approach. My idea is robot can generate these kind of RGB-D Data set by itself with one annotation of human. I need to learn and research more about it.

Featured Refs

Related with RGBD Dataset
- A large-scale hierarchical multi view rgb-d object dataset

- Detection-based object labeling in 3d scenes

- Deep learning for detecting robotic grasps

- M. Schwarz, H. Schulz, and S. Behnke, “RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2015.

- J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in Neural Information Processing Systems (NIPS), 2014

저작자표시 비영리 변경금지

'나름 전문가 > Vision for Robotics' 카테고리의 다른 글

ICCV Review 그리고 Rebuttal (2)	2019.06.27
Apple Face ID 과연 얼마나 잘될까? (0)	2017.09.13
Tensorflow 1.0 설치 및 업그레이드 (0)	2017.04.08
How to install ROS, CUDA, Caffe on Nvidia Jetson TK1 board (0)	2016.03.03
쿼드롭터 이거슨 분명 새로운 시작이 될듯 하다 (0)	2014.01.14

ABOUT ME

Thinkiru and Robot Thinkiru and Robot

'나름 전문가 > Vision for Robotics' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'나름 전문가 > Vision for Robotics' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바