Inspired from the animal visual cortex, a convolutional neural network (CNN) is primarily used for, and is the de facto standard for, image processing. The core concept of the convolutional layer is the presence of kernels (or filters) that learn to differentiate between the features of an image. A kernel is usually a much shorter matrix than the image matrix and is passed over the entire image in a sliding-window fashion, producing a dot product of the kernel with the corresponding slice of matrix from the image to be processed. The dot product allows the program to identify the features in the image.
Consider the following image vector:
[[10, 10, 10, 0, 0, 0],
[10, 10, 10, 0, 0, 0],
[10, 10, 10, 0, 0, 0],
[0, 0, 0, 10, 10, 10],
[0, 0, 0, 10, 10, 10],
[0, 0, 0, 10, 10, 10]]
The preceding matrix corresponds to an image that looks like this:

On applying a filter to detect horizontal edges, the filter is defined by the following matrix:
[[1, 1, 1],
[0, 0, 0],
[-1, -1, -1]]
The output matrix produced after the convolution of the original image with the filter is as follows:
[[ 0, 0, 0, 0],
[ 30, 10, -10, -30],
[ 30, 10, -10, -30],
[ 0, 0, 0, 0]]
There are no edges detected in the upper half or lower half of the image. On moving toward the vertical middle of the image from the left edge, a clear horizontal edge is found. On moving further right, two unclear instances of a horizontal edge are found before another clear instance of a horizontal edge. However, the clear horizontal edge found now is in the opposite color as the previous one.
Thus, by simple convolutions, it is possible to uncover patterns in the image files. CNNs also use several other concepts, such as pooling.
It is possible to understand pooling from the following screenshot:

In the simplest terms, pooling is the method of consolidating several image pixels into a single pixel. The pooling method used in the preceding screenshot is known as max pooling, wherein only the largest value from the selected sliding-window kernel is kept in the resultant matrix. This greatly simplifies the image and helps to train filters that are generic and not exclusive to a single image.