ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [논문리뷰] Multimodal Deep Learning for Robust RGB-D Object Recognition.
    나름 전문가/Vision for Robotics 2016. 3. 3. 20:59

    •  
    A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard
    Multimodal Deep Learning for Robust RGB-D Object Recognition
    In Proc. of the IEEE Int. Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015. 
    bib | .pdf ]
    Abstract—Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams – one for each modality – which are consecutively combined with a late fusion network. We focus on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, we introduce a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns. We present state of- the-art results on the RGB-D object dataset [15] and show recognition in challenging RGB-D real-world noisy settings.

    Architecture : two CNN (each for RGB, Depth Data) -> Fully connected Layer -> Classifier
                          Fine Tune Area-> After fully connected layer (not fine tuning the CNN area)


    Also, Suggested pre-processing method to transform gray-scale depth image to the image with 3 channel like RGB
    so that uses same CNN structure with RGB data. As a final result, the surface normal was best performance. But There are computational advantages for suggested approach (Color mapping between red:near to blue:far)


    Another suggestion was fitting of image size for also using IMAGE-NET base CNN Structure
    for object recognition, simple warping by directly rescaling the original image can detrimental to object recognition performance (especially for death image, it could distract shapes and ratios of objects. So, They tried to fill out the undefined area with longer side edges to keep the correct ratio of the object
     

                -> number 1 : regions that are filled by proposed method (seems like stretched edges)
     

         

    Surely, with RGB-D data makes more improved accuracy.

    They also suggested noise reduction approaches (Measure the specific sensor noise in advance, and filtering out the noise), but this had some problem with small objects, noise filtering removed the not only noise but also the object!!

    Surface normal was better performance, but computationally advantages are needed be concern


    Comment : It is easy way to start the deep learning with pre-trained model because of the pretty long time for training the model and evaluation the model itself. That’s why most of applications using deep learning tried to adapt their sensors on per-trained model structure. even they generate unmeaning full datas and expedition of it. 
    And also they usually uses large amount of data set that are annotated in advance by other research institute. That’s why the deep learning approach cannot be adapted well, and there are few online/robot learning studies of deep learning approach. My idea is robot can generate these kind of RGB-D Data set by itself with one annotation of human. I need to learn and research more about it.

    Featured Refs
    1.  Related with RGBD Dataset
      - A large-scale hierarchical multi view rgb-d object dataset
              - Detection-based object labeling in 3d scenes
    - Deep learning for detecting robotic grasps
    - M. Schwarz, H. Schulz, and S. Behnke, “RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2015.
    - J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in Neural Information Processing Systems (NIPS), 2014


Designed by Tistory.