Hello there, thanks for visiting this page. My blog has moved to hrishikamath.com. Won't put effort into even making this page look nice for you :P.
Mitigating Zero Training Loss in neural networks and effetcs
In this post I would like to explain why zero training loss in NN's are bad and how we could mitigate this via an technique called flooding. Note this is not exactly the same as overfitting.
Deep Neural Networks (DNN's) are powerful function approximators which could approximate interaction of variables in a manner never before.However Given a powerful enough or larger network than required we could memorize anything , literally anything. As show by this paper .... , neural networks are capable of memorizing random numbers. Most deep learning problems use gradient based optimization. However , every gradient optimization problem isnt a deep neural network problem. In Deep Learning , we train neural networks to learn patterns of a given problem rather than just find optimal parameters for a problem. Say in a cats/dog classifier , a image recognition algorithm learning to recognize dogs by the ears , face. The phenomenon of a DNN being a able to classify data outside the training data is called generalization. This is ideally evaluated by measuring how accurately a DNN performs on data outside training data (data its not seen during optimization). So its necessary the DNN will have to achieve a very low error on training data (very close to zero). But , the training and test error will have to very low. While there have been several techniques in past known as regularizers which have mitigated this effect well such as dropout , L2/L1 regularization and data augmentation. I would to elaborate on another regularizer which helps prevent zero training error. Its quite common for DNN's to be very large compared to the dataset (called overparameterization) , which at times allows it to attain zero training error. But , isnt zero training error good? Not so much since it could lead to very high confidence predictions which could be problemsome from a privacy perspective since it could typically allow users to reconstruct input data via black box attacks during inference (insert link).