Physics for AI Workshop

Physics for AI Workshop

 

This three-day workshop will be held in Oxford from 19 to 21 March, focusing on the interplay between Machine Learning (ML) and Quantum Field Theory (QFT).

Venue: Mathematical Institute, Andrew Wiles Building, Room L4

This event is funded through the Laboratory for AI Security Research (LASR) partnership.

To register, please complete this form.

Organisers: Andrei Constantin, Ard Louis, Andre Lukas, and Shivaji Sondhi

Wednesday, 19 March

Thursday, 20 March

Friday, 21 March

Expand All

 

Lecture 1 (Jim Halverson): Field Theory for ML

This lecture begins with an elementary introduction to machine learning, and why field theoretic language and structure naturally emerge when considering the statistics and dynamics of neural networks. Moving on to detailed results in their statistics, I will explain when and why neural networks behave as generalised free fields (Gaussian processes) in certain large-N limits, and how 1/N corrections yield non-Gaussian processes that turn on interactions in the field theory. Finally, I will explain why certain two-point functions (e.g., neural tangent kernel) govern the gradient descent dynamics of certain neural networks, an associated shortcoming from the learning perspective, and different scaling limits that exhibit feature learning.

Lecture 2 (Dimitrios Bachtis): Disorder, phase transitions, and quantum field-theoretic machine learning

Recent advances at the intersection of physics and computer science have further solidified the cross-fertilization between these two research fields. In this lecture, we will first establish an intuitive connection between prototypical disordered systems and a class of Hamiltonian-based neural networks. We will discuss how condensed matter systems, disordered models, discretized quantum field theories, and probabilistic neural networks can be unified under a common mathematical framework, namely that of Markov random fields.  An explicit derivation of a φ4 neural network and a φ4 multi-agent system will then be followed by applications of quantum field-theoretic and statistical-mechanical machine learning algorithms. Relations between probabilistic machine learning and mathematical aspects of quantum field theory will be briefly accounted for. Finally, we will review symmetry-breaking second-order phase transitions which describe the learning process of neural networks and we will conclude by highlighting emergent universality classes of probabilistic machine learning algorithms.

Talk 1 (Daniel Kunin): An Introduction to the Learning Dynamics of Neural Networks: Conservation Laws, Implicit Biases, and Feature Learning

The success of neural networks is often attributed to their ability to extract task-relevant features from data through training, yet a precise understanding of this process remains elusive. In this talk, I will explore the learning dynamics of neural networks, focusing on when and how they learn features. I will first review foundational work on linear neural networks, showing how the parameter initialization scale determines the degree of feature learning -- large scales result in a "kernel regime" that stays close to initialization, while small scales lead to an "active regime," traversing between saddle points. This analysis will leverage how continuous symmetries in the model's architecture leads to conserved quantities under gradient flow -- a result analogous to Noether's theorem. I will also discuss how these insights can be derived through a mirror flow analysis in function space, a common strategy used to derive implicit bias results in the literature. In the last part of my talk, I will present ongoing work studying the learning dynamics of neural networks trained to perform modular addition. Through this minimal model we will see how neural networks initialized in an active regime incrementally learn functions of increasing complexity leading to a simplicity bias that aids generalization.

Talk 2 (Zohar Ringel): A unified field theory approach to feature learning and generalization
One of the main merits of field theory is its role as a common language for reasoning about physical systems. In this talk, I'll portray how it may play a similar role in deep learning. In the first part, we'll set up a general field theory formulation of Bayesian Neural Networks or Langevin-trained DNNs at equilibrium. The aim would be to reproduce various known results within this unifying perspective using standard mean-field approaches. For instance, the reduction, in certain limits, to a Gaussian Process (GP). Away from those limits, we'll discuss how two types of mean-field approximations on the interaction/non-linear terms can explain some of the mysteries of deep learning. Specifically, DNNs' ability to generalize well despite having infinite expressibility and DNNs' ability to learn functions with better sample complexity scaling than their GPs. In the second part of the talk, I'll review Neural Scaling Laws and how these invite a certain Wilsonian renormalization group approach where one integrates out the high-energy/unlearnable modes according to the Gaussian-Process terms to induce an RG flow. Interestingly, this approach suggests some degree of universality of large-scale models.

 

Lecture 3 (Gert Aarts): Diffusion models and stochastic quantisation

Diffusion models are currently one of the leading generative AI approaches for image generation used by e.g. DALL-E and Stable Diffusion. A formulation familiar to physicists uses stochastic differential equations. We review this formulation and relate it to stochastic quantisation in quantum field theory. We demonstrate the approach for scalar fields, generating configurations on a two-dimensional lattice, and end with some speculation on further applications in physics.

Lecture 4 (Boris Hanin): Deep and wide MLPs: L/N as  effective depth at fixed dataset size

Neural networks are often studied analytically through scaling limits: regimes in which structural network parameters such as depth, width, and number of training datapoints diverge. In many such scaling limits tools and ideas from statistical and theoretical physics (e.g. 1/N expansions, random matrix theory techniques, mean-field particle systems, diagrammatic approaches etc) help to reveal the nature of inference. I will survey several such approaches and will emphasise a range of open questions. Specifically, in the first lecture, I will discuss how deep and wide fully connected networks can be viewed as an effective field theory, with the depth-to-width ratio playing the role of a cutoff. This is based on joint work with Dan Roberts and Sho Yaida. 

Talk 3 (Maurice Weiler): Equivariant and Coordinate Independent Convolutional Networks
Equivariance imposes symmetry constraints on the connectivity of neural networks. This talk investigates the case of equivariant networks for fields of feature vectors on Euclidean spaces or other Riemannian manifolds. Equivariance is shown to lead to requirements for 1) spatial (convolutional) weight sharing, and 2) symmetry constraints on the shared weights themselves. We investigate the symmetry constraints imposed on convolution kernels and discuss how they can be solved and implemented. A gauge theoretic formulation of equivariant CNNs shows that these models are not only equivariant under global transformations, but under more general local gauge transformations as well. 

Talk 4 (Ard Louis): Deep learning, generalisation and bias towards simple functions

 

Lecture 5 (Boris Hanin): Deep and wide MLPs: LP/N as effective depth at growing dataset size

In the second lecture, I will continue to look at deep fully-connected networks. But this time, we'll consider learning in the regime where the number of training samples grows with the network width and depth. In this setting, I will describe joint work with Alexander Zlokapa that develops novel diagrammatic approaches for computing Gibbs measures coming from deep shaped MLPs. 

Lecture 6 (Jim Halverson): ML for Field Theory

In this lecture I take the converse perspective to that of my first lecture, that neural networks may be utilized for field theory. I will explain how the mathematical data essential to any neural network construction yields a new way to define a field theory via an associated partition function. I will then explore when and why cherished principles from field theory emerge in this context, including the origin of interactions, conformal symmetry, locality, and unitarity.

Talk 5 (Gert Aarts): Stochastic gradient descent and random matrix theory

Stochastic gradient descent (SGD) is the workhorse of many machine learning algorithms. Here we apply concepts from Dyson Brownian motion and random matrix theory to describe stochastic weight matrix dynamics. We derive the linear scaling rule between the learning rate and the batch size, and identify universal and non-universal aspects. We test our hypothesis in the (near-)solvable case of the Gaussian Restricted Boltzmann Machine and show empirical results in more involved systems.

Talk 6 

 

To be uploaded.

 

Venue: Mansfield College, Mansfield Road

Schedule:

  • Drinks: 18:00
  • Dinner: 18:30