Skip to yearly menu bar Skip to main content


Oral

Neural Redshift: Random Networks are not Random Functions

Damien Teney · Armand Nicolicioiu · Valentin Hartmann · Ehsan Abbasnejad

Summit Flex Hall AB Oral #1
[ ] [ Visit Orals 2B Deep learning architectures and techniques ]
Wed 19 Jun 1 p.m. — 1:18 p.m. PDT

Abstract:

Context. Our understanding of the generalization capabilities of neural networks (NNs) is incomplete. The prevailing explanation is based on implicit biases of gradient descent (GD) but it cannot account for recent findings of the capabilities of models found by gradient-free methods nor the `simplicity bias' observed even in untrained networks. This study seeks the source of inherent properties of NNs.Findings. To characterize inductive biases provided by architectures independently from GD, we examine networks of random weights and show that they do not correspond to random functions. We characterize the functions implemented by various architectures using decompositions in Fourier and polynomial bases and compressed representations. Even simple MLPs have strong inductive biases:uniform sampling in parameter space yields a strongly biased sampling of functions in frequency, order, and compressibility. Popular components including ReLUs, residual connections, and normalizations induce a bias toward the lower end of these measures,accounting for the ``simplicity bias'' frequently attributed to (S)GD. We also show that transformer-based sequence models inherit similar properties from their building blocks.Implications. We provide a fresh explanation for the success of deep learning compatible with recent observations, complementing those based on gradient-based optimization. This also points at future avenues for controlling the solutions implemented in trained models.

Chat is not available.