CBE - Solving the Structure Identification Problem with Machine Learning
The structure identification problem consists of determining the chemical structure of a species based on indirect pieces of information like spectra or property measurements. Structure identification is a bottleneck in many research, industrial, and regulatory contexts and the absence of a general solution has enormous societal costs. For example, consider the importance of identifying impurities in plastics, foodstuffs, or pharmaceuticals. Or consider the mechanistic information that could be gleaned from inexpensively determining degradation products, minor products from chemical reactions, or natural products produced in organisms.
The state of the art for structure identification is still manual expert interpretation of spectra. This project is seeking to change the state of the art using machine learning to more reliably and comprehensively reason from commonly available pieces of information. The ideal candidate will be eager to learn, have a deep passion for puzzles, and be excited by applying machine learning to a chemical context. The technical skills that will be learned as part of participation include training machine learning models (transformers and u-net architectures to start), working with large datasets, understanding some common physics-based modeling approaches, and interacting with GPU resources. Student(s) will work with senior graduate students and Prof. Savoie and will be expected to work at a level that merits authorship on an eventual publication. A two semester commitment is required.
The Savoie Research Group develops new methods for predicting, simulating, and designing organic materials. Working with this group will give you broad experience beyond just machine learning.