Generalization performance with non-constant stepsizes

Files

Supervisors: Glineur, François
Faculty: Ecole polytechnique de Louvain
Degree label: Master [120] : ingénieur civil en mathématiques appliquées, à finalité spécialisée
Abstract: enThe exponential growth of data has reinforced the importance of supervised learning, where models are trained to generalize well to unseen data. While improving training accuracy is crucial, it does not ensure good generalization. In 2022, Beugnot et al. theoretically showed that large learning rates in gradient descent can improve generalization. This thesis extends their result in two directions: (1) a theoretical analysis using an alternative stopping criterion based on gradient norm rather than function value, which is more practical for real-world applications; and (2) an empirical study on the impact of non-constant learning rates in kernel ridge regression. Our findings suggest that non-constant learning schedules can provide generalization benefits beyond those achieved with constant learning rates. These contributions aim to bridge the gap between optimization performance and generalization behavior in supervised learning.