From Vision to Grasping: Adapting Visual Networks (bibtex)

by Rebecca Allday, Simon Hadfield and Richard Bowden

Abstract:

Grasping is one of the oldest problems in robotics and is still considered challenging, especially when grasping unknown objects with unknown 3D shape. We focus on exploiting recent advances in computer vision recognition systems. Object classification problems tend to have much larger datasets to train from and have far fewer practical constraints around the size of the model and speed to train. In this paper we will investigate how to adapt Convolutional Neural Networks (CNNs), traditionally used for image classification, for planar robotic grasping. We consider the differences in the problems and how a network can be adjusted to account for this. Positional information is far more important to robotics than generic image classification tasks, where max pooling layers are used to improve translation invariance. By using a more appropriate network structure we are able to obtain improved accuracy while simultaneously improving run times and reducing memory consumption by reducing model size by up to 69%.

View PDF

Reference:

From Vision to Grasping: Adapting Visual Networks (Rebecca Allday, Simon Hadfield and Richard Bowden), In Proceeedings, Towards Autonomous Robotic Systems conference (TAROS), Springer, 2017. (Slides, Poster)

Bibtex Entry:

@InProceedings{Allday17,
  Title                    = {From Vision to Grasping: Adapting Visual Networks},
  Author                   = {Rebecca Allday and Simon Hadfield and Richard Bowden},
  Booktitle                = {Proceeedings, Towards Autonomous Robotic Systems conference (TAROS)},
  Year                     = {2017},

  Address                  = {Guildford, UK},
  Month                    = {19 -- 21 } # jul,
%  Pages                    = { -- },
  Publisher                = {Springer},

  Abstract                 = {Grasping is one of the oldest problems in robotics and is still considered challenging, especially when grasping unknown objects with unknown 3D shape. We focus on exploiting recent advances in computer vision recognition systems. Object classification problems tend to have much larger datasets to train from and have far fewer practical constraints around the size of the model and speed to train. In this paper we will investigate how to adapt Convolutional Neural Networks (CNNs), traditionally used for image classification, for planar robotic grasping. We consider the differences in the problems and how a network can be adjusted to account for this. Positional information is far more important to robotics than generic image classification tasks, where max pooling layers are used to improve translation invariance. By using a more appropriate network structure we are able to obtain improved accuracy while simultaneously improving run times and reducing memory consumption by reducing model size by up to 69\%.},
  Comment                  = {<a href="http://personalpages.surrey.ac.uk/s.hadfield/slides/Allday17.pptx">Slides</a>, <a href="http://personalpages.surrey.ac.uk/s.hadfield/posters/Allday17.tif">Poster</a>},
  Crossref                 = {TAROS17},
%  Doi                      = {},
%  Gsid                     = {},
  Keywords                 = {Robotic Grasping; Machine Learning; CNNs; SqueezeNet; AlexNet},
  Url                      = {http://personalpages.surrey.ac.uk/s.hadfield/papers/Allday17.pdf}
}