Unlike humans, most machine learning systems have been designed, so far, for single, specific tasks. Directly applying a trained model to a different dataset would yield poor results, especially if the data samples do not share the same semantic content (for instance, MNIST digit images versus ImageNet photographs) or the same image quality/distribution (for instance, a dataset of smartphone pictures versus a dataset of high-quality pictures). As CNNs are trained to extract and interpret specific features, their performance will be compromised if the feature distribution changes. Therefore, some transformations are necessary to apply networks to new tasks.
Solutions have been investigated for decades. In 1998, Sebastian Thrun and Lorien Pratt edited Learning to Learn, a book compiling the prevalent research stands on the topic. More recently, in their Deep Learning book (http://www.deeplearningbook.org/contents/representation.html on page 534, MIT Press), Ian Goodfellow...