The process of automatically generating captions for images is a key deep learning task, as it combines the two worlds of language and vision. The uniqueness of the problem makes it one of the primary problems in computer vision. A deep learning model for image captioning should be able to identify the objects present in the image and also generate text in natural language expressing the relationship between the objects and actions. There are few datasets for this problem. The most famous of the datasets is an extension of the COCO dataset covered in object detection in Chapter 4, Object Detection.

Deep Learning for Computer Vision
By :

Deep Learning for Computer Vision
By:
Overview of this book
Deep learning has shown its power in several application areas of Artificial Intelligence, especially in Computer Vision. Computer Vision is the science of understanding and manipulating images, and finds enormous applications in the areas of robotics, automation, and so on. This book will also show you, with practical examples, how to develop Computer Vision applications by leveraging the power of deep learning.
In this book, you will learn different techniques related to object classification, object detection, image segmentation, captioning, image generation, face analysis, and more. You will also explore their applications using popular Python libraries such as TensorFlow and Keras. This book will help you master state-of-the-art, deep learning algorithms and their implementation.
Table of Contents (12 chapters)
Preface
Getting Started
Image Classification
Image Retrieval
Object Detection
Semantic Segmentation
Similarity Learning
Image Captioning
Generative Models
Video Classification
Deployment
Other Books You May Enjoy
How would like to rate this book
Customer Reviews