3:30pm - 4:30pm | Biostat Seminar: IMPLICIT BIAS OF GRADIENT DESCENT FOR MEAN SQUARED ERROR REGRESSION WITH WIDE NEURAL NETWORKS

Date: 
Wednesday, January 20, 2021

Wednesday, Jan 20, 2021
Guido Montufar
Assistant Professor
Department of Mathematics and Statistics
UCLA

Wednesday, Jan 20, 2021
3:30pm – 4:40pm (PST), Zoom
https://ucla.zoom.us/j/95460365266?pwd=bEFCT2NBVE51RTVBTFQxa29WbnhqUT09
Meeting ID: 954 6036 5266
Passcode: 943207

We investigate gradient descent training of wide neural networks and the corresponding
implicit bias in function space. For 1D regression, we show that the solution of training a
width-n shallow ReLU network is within n^(-1/2) of the function which fits the training
data and whose difference from initialization has smallest 2-norm of the second derivative
weighted by 1/ζ. The curvature penalty function 1/ζ is expressed in terms of the
probability distribution that is utilized to initialize the network parameters, and we compute
it explicitly for various common initialization procedures. For instance, asymmetric initialization
with a uniform distribution yields a constant curvature penalty, and thence
the solution function is the natural cubic spline interpolation of the training data. While
similar results have been obtained in previous works, our analysis clarifies important
details and allows us to obtain significant generalizations. In particular, the result generalizes
to multivariate regression and different activation functions. Moreover, we show that
the training trajectories are captured by trajectories of spatially adaptive smoothing
splines with decreasing regularization strength. This is joint work with Hui Jin.