Object Recognition in Images and Video

Tue, May 7, 2019

This is the course page for the 2019 edition of Object Recognition in Images and Video for the PhD in Smart Computing offered by the Universities of Florence, Pisa, and Siena.

Lecture 1: 10/05/2019 (Introduction)

Location: Aula 110 Santa Marta @ 10:15

In this first lecture I will introduce the basic problem of object recognition with some history of the field, an overview of the basic techniques and tools we will employ, and an introduction to the First Big Breakthrough that gave birth to modern object recognition – the Bag of Visual Words model. In this lecture we will trace the development of the Bag-of-Words (BOW) model through the first decade of the 21st century. We will see how advances in pooling (e.g. spatial pyramids and sparse coding) and and feature coding (e.g. Fisher vectors) lead to steady and significant progress in object recognition performance. We will also look at the related problem of object detection and see how descriptors like HOGs and representations like Deformable Part Models (DPMs) led to significant advances also in object localization.

Required Reading

Visual Categorization with Bags of Keypoints, Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray. In: European Conference on Computer Vision (ECCV), 2004.
Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, S Lazebnik, C Schmid, J Ponce. In: Computer Vision and Pattern Recognition (CVPR), 2006.
Improving the fisher kernel for large-scale image classification, F Perronnin, J Sánchez, T Mensink. In: European Conference on Computer Vision, 2010.
Locality-constrained linear coding for image classification, J Wang J Yang, K Yu, F Lv, T Huang, Y Gong. In: Computer Vision and Pattern Recognition (CVPR), 2010.

Lecture 2: 17/05/2019 (The Shot Heard ‘Round the World)

Location: Aula 110 Santa Marta @ 10:15

In this lecture we will look at the revolutionary breakthrough that occurred in 2012: the re-introduction of neural networks into the modern discussion on object recognition. We will study some of the classic and contemporary models of Convolutional Neural Networks (CNNs) that continue to revolutionize the field. We will also look at extensions of these models to the detection problem.

Extra Resources

Required Reading

ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. In: Proceedings of NIPS, 2012.
Very Deep Convolutional Networks for Large-Scale Image Recognition. Karen Simonyan and Andrew Zisserman. In: arXiv preprint arXiv:1409.1556, 2014.
Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. In: Proceedings of CVPR, 2016.
Fast-RCNN. R. Girshick. In: Proceedings of ICCV, 2015.

Lecture 3: 24/05/2019 (The State-of-the-art)

Location: Aula 110 Santa Marta @ 10:15

In this final lecture we will leverage what we have learned about the historical development of modern object detection to study some state-of-the-art topics in object recognition. We will see the state-of-the-art detector YOLO, how to convert a CNN into a fully-convolutional network for segmentation, how CNNs can be used to learn generative models of image distributions, and how to (partially) mitigate the need for massive amounts of data via self-supervision.

Extra resources

Required Reading

You only look once: Unified, real-time object detection. J Redmon, S Divvala, R Girshick, A Farhadi. In: Proceedings of CVPR, 2016.

Fully convolutional networks for semantic segmentation. E Shelhamer, J Long, T Darrell. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

Unsupervised representation learning with deep convolutional generative adversarial networks. A Radford, L Metz, S Chintala. In: arXiv preprint arXiv:1511.06434, 2015.

Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank. X. Liu, J. van de Weijer, A. D. Bagdanov. In: IEEE transactions on pattern analysis and machine intelligence, 2019.

Lecture 4: 31/05/2019 (Object Recognition in Video)

Location: Aula 110 Santa Marta @ 10:15

TBD

Final Examination

There will be a final, oral examination for this course. This exam will consist of a 20-minute, reading-group style presentation on a paper selected from a recent edition of a major computer vision conference. Papers from CVPR, ECCV, ICCV, BMVC, NIPS, etc., are all fair game. Please confer with me before preparing the presentation for your final examination.

These course presentations will be scheduled approximately 3-4 weeks after the end of the course.

Object Recognition in Images and Video

Lecture 1: 10/05/2019 (Introduction)

Required Reading

Recommended Reading

Lecture 2: 17/05/2019 (The Shot Heard ‘Round the World)

Extra Resources

Required Reading

Recommended Reading

Lecture 3: 24/05/2019 (The State-of-the-art)

Extra resources

Required Reading

Recommended Reading

Lecture 4: 31/05/2019 (Object Recognition in Video)

Final Examination