Mitigating Zero Training Loss in neural networks and effetcs

In this post I would like to explain why zero training loss in NN's are bad and how we could mitigate this via an technique called flooding. Note this is not exactly the same as overfitting.

Deep Neural Networks (DNN's) are powerful function approximators which could approximate interaction of variables in a manner never before.However Given a powerful enough or larger network than required we could memorize anything , literally anything. As show by this paper .... , neural networks are capable of memorizing random numbers. Most deep learning problems use gradient based optimization. However , every gradient optimization problem isnt a deep neural network problem. In Deep Learning , we train neural networks to learn patterns of a given problem rather than just find optimal parameters for a problem. Say in a cats/dog classifier , a image recognition algorithm learning to recognize dogs by the ears , face. The phenomenon of a DNN being a able to classify data outside the training data is called generalization. This is ideally evaluated by measuring how accurately a DNN performs on data outside training data (data its not seen during optimization). So its necessary the DNN will have to achieve a very low error on training data (very close to zero). But , the training and test error will have to very low. While there have been several techniques in past known as regularizers which have mitigated this effect well such as dropout , L2/L1 regularization and data augmentation. I would to elaborate on another regularizer which helps prevent zero training error. Its quite common for DNN's to be very large compared to the dataset (called overparameterization) , which at times allows it to attain zero training error. But , isnt zero training error good? Not so much since it could lead to very high confidence predictions which could be problemsome from a privacy perspective since it could typically allow users to reconstruct input data via black box attacks during inference (insert link).

2020

Deep Learning in Practice-Be The algorithm

6 minute read

Conventional machine learning required the practitioner to manually look at images/text and handcraft appropriate features. Deep Learning models are powerful...

Back to top ↑

2019

Differential Privacy Part-II: DP Mechanisms

6 minute read

Having gone through the importance of differential privacy and its definition , this article motivates the theory with a practical example to make it more in...

Differential Privacy Part-I: Introduction

5 minute read

Personal data is a personal valuable asset , it could be used for economic , social or even malicious benifits. Most internet companies survive on personal d...

Back to top ↑