Leveraging deep learning representation for search-based image annotation

Published in In 2017 Artificial Intelligence and Signal Processing Conference (AISP), 2017

Image annotation aims to assign some tags to an image such that these tags provide a textual description for the content of image. Search-based methods extract relevant tags for an image based on the tags of nearest neighbor images in the training set. In these methods, similarity of two images is determined based on the distance between feature vectors of the images. Thus, it is essential to extract informative feature vectors from images. In this paper, we propose a framework that utilize deep learning to obtain visual representation of images. We apply different architectures of convolutional neural networks (CNN) to the input image and obtain a single feature vector that is a rich representation for visual content of the image. In this way, we eliminate the usage of multiple feature vectors used in the state-of-the-art annotation methods. We also integrate our feature extractors with a nearest neighbors approach to obtain relevant tags of an image. Our experiments on the standard datasets of image annotation (including Corel5k, ESP Game, IAPR) demonstrate that our approach reaches higher precision, recall and F1 than the state-of-the-art methods such as 2PKNN, TagProp, NMF-KNN and etc.

Recommended citation: Kashani, M. M., & Amiri, S. H. (2017, October). " Leveraging deep learning representation for search-based image annotation." In 2017 Artificial Intelligence and Signal Processing Conference (AISP) (pp. 156-161). IEEE. https://ieeexplore.ieee.org/abstract/document/8324073