Inception networks demonstrated that going larger is a valid strategy in image classification, as well as other recognition tasks. Nevertheless, experts still kept trying to increase networks in order to solve more and more complex tasks. However, the question Is learning better networks as easy as stacking more layers?, asked in the preamble of the paper written by He et al., is justified.
We know already that the deeper a network goes, the harder it becomes to train it. But besides the vanishing/exploding gradient problems (covered by other solutions already), He et al. pointed out another problem that deeper CNNs face—performance degradation. It all started with a simple observation—the accuracy of CNNs does not linearly increase with the addition of new layers. A degradation problem appears as the networks' depth increases. Accuracy starts saturating and even degrading. Even the training loss starts decreasing when negligently stacking too many...