by Simon Hadfield and Richard Bowden
Abstract:
Action recognition in unconstrained situations is a difficult task, suffering from massive intra-class variations. It is made even more challenging when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent emergence of 3D data, both in broadcast content, and commercial depth sensors, provides the possibility to overcome this issue. This paper presents a new dataset, for benchmarking action recognition algorithms in natural environments, while making use of 3D information. The dataset contains around 650 video clips, across 14 classes. In addition, two state of the art action recognition algorithms are extended to make use of the 3D data, and five new interest point detection strategies are also proposed, that extend to the 3D data. Our evaluation compares all 4 feature descriptors, using 7 different types of interest point, over a variety of threshold levels, for the Hollywood 3D dataset. We make the dataset including stereo video, estimated depth maps and all code required to reproduce the benchmark results, available to the wider community.
Reference:
Hollywood 3D: Recognizing Actions in 3D Natural Scenes (Simon Hadfield and Richard Bowden), In Proceeedings, IEEE conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013. (Poster, Dataset and Code)
Bibtex Entry:
@InProceedings{Hadfield13,
Title = {Hollywood 3{D}: Recognizing Actions in 3{D} Natural Scenes},
Author = {Simon Hadfield and Richard Bowden},
Booktitle = {Proceeedings, IEEE conference on Computer Vision and Pattern Recognition (CVPR)},
Year = {2013},
Address = {Portland, Oregon},
Month = {23 -- 28 } # jun,
Organization = {IEEE},
Pages = {3398 -- 3405},
Publisher = {IEEE},
Abstract = {Action recognition in unconstrained situations is a difficult task, suffering from massive intra-class variations. It is made even more challenging when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent emergence of 3D data, both in broadcast content, and commercial depth sensors, provides the possibility to overcome this issue. This paper presents a new dataset, for benchmarking action recognition algorithms in natural environments, while making use of 3D information. The dataset contains around 650 video clips, across 14 classes. In addition, two state of the art action recognition algorithms are extended to make use of the 3D data, and five new interest point detection strategies are also proposed, that extend to the 3D data. Our evaluation compares all 4 feature descriptors, using 7 different types of interest point, over a variety of threshold levels, for the Hollywood 3D dataset. We make the dataset including stereo video, estimated depth maps and all code required to reproduce the benchmark results, available to the wider community.},
Comment = {<a href="http://personalpages.surrey.ac.uk/s.hadfield/posters/Hollywood%203D.tif">Poster</a>, <a href="http://cvssp.org/Hollywood3D/">Dataset and Code</a>},
Crossref = {CVPR13},
Doi = {10.1109/CVPR.2013.436},
File = {Hadfield13.pdf:Hadfield13.pdf:PDF},
Gsid = {13792529108743713674},
Keywords = {Action Recognition, 3D, Hollywood, Movies, dataset, Interest Points, RMD, 4D, 3.5D, Harris, Hessian},
Timestamp = {2013.04.12},
Url = {http://personalpages.surrey.ac.uk/s.hadfield/papers/Hollywood%203D.pdf}
}