Nutmeg and SPICE: Models and data for biomolecular machine learning

Peter Eastman, Benjamin P. Pritchard, John D. Chodera, Thomas E. Markland
Journal of Chemical Theory and Computation 20:8583, 2024.
[DOI] [preprint]

We present a significant expansion of the SPICE dataset, a large-scale quantum chemical dataset for training machine learning potentials, and show how it can be used to build extremely accurate machine learning potentials.

SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

Eastman P, Behara PK, Dotson DL, Galvelis R, Herr JE, Horton JT, Mao Y, Chodera JD, Pritchard BP, Wang Y, De Fabritiis G, and Markland TE
Scientific Data 10:11, 2023 [DOI]

To remedy the lack of large, open quantum chemical datasets for training accurate general machine learning potentials and molecular mechanics force fields for druglike small molecules and biomolecules, we produce the open SPICE dataset, and show how it can be used to build extremely accurate machine learning potentials.