Theoryful Machine Learning
in the Chemical Sciences

Prof. Tyler R. Josephson

ATOMS Lab: AI & Theory-Oriented Molecular Science
Chemical, Biochemical & Environmental Engineering, UMBC

1:00-2:00 pm, 5 February 2021
online via webex

Modern machine learning (ML) algorithms have achieved remarkable success in “theoryless” problems of image recognition and natural language processing. When these algorithms find applications in “theoryful” domains like physical sciences, they frequently benefit from the incorporation of domain knowledge into the ML architecture, whether enforcing constraints or symmetries or interpreting neural networks as physical systems.

The chemical sciences have many “theoryful” ML problems. In this talk, I will discuss three projects in which we leverage background theory when designing and adopting ML algorithms. In the first project, we use classical thermodynamics to derive a method to characterize mixture properties in molecular simulations and show that multiple linear regression (with no bias) is the formally correct and thermodynamically consistent model for fitting and predicting these properties. We recently developed an alternative proof from statistical thermodynamics that gives the same result, and we provide evidence that nonlinear methods provide no improvement in performance. In the second project, we perform high-throughput molecular simulations of adsorption (when molecules from a gas or liquid stick on the surface or in the pores of a material), which we analyze using neural networks. We derive a correspondence between theories of multicomponent adsorption and the self-attention mechanism in the transformer architecture and show how the theory-inspired architecture has improved generalization over the multilayer perceptron.

In the final project, I will share work on symbolic regression, in collaboration with the Mathematics of AI department at IBM. In symbolic regression, given a data set, a search through some “space of possible equations” identifies accurately-fitting and parsimonious equations that can be easily inspected by humans. We formulate the symbolic regression problem as a mixed-integer nonlinear programming (MINLP) problem and use MINLP solvers to systematically solve multiple functional forms at once, instead of via the traditional approaches that use genetic algorithms. Future approaches to integrate symbolic regression with chemical theory will be discussed.


Tyler R. Josephson is an Assistant Professor in the Chemical, Biochemical, and Environmental Engineering department at the University of Maryland, Baltimore County. He received his B.S. in Chemical Engineering from the University of Minnesota in 2011, and his Ph.D. in Chemical Engineering from the University of Delaware in 2017, after which he was a postdoctoral associate in the University of Minnesota Chemistry Department. Prof. Josephson uses multi-scale modeling and machine learning to study catalysis, solvation, adsorption, and phase equilibria. During his downtime, he loves learning new things, thinking about deep topics (like science and philosophy), and playing the piano.