Work on memorization and variance of gradients (VoG) shows that hard examples are learnt later in training, and that learning rates impact what is learnt.
At face value, deep neural network pruning appears to promise you can (almost) have it all — remove the majority of weights with minimal degradation to top-1 accuracy. In this work, we explore this trade-off by asking whether certain classes are disproportionately impacted.
We find that pruning is better described as "selective brain damage" -- performance on a tiny subset of classes and images is cannibalized in order to preserve overall performance. The interesting part is what makes certain images more likely to be forgotten...