Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Similarity Metric for Data Optimization and Efficient Training of Reactive Machine Learning Force Fields for Hydrocarbon Radiolysis
2
Zitationen
3
Autoren
2025
Jahr
Abstract
Radiolysis is a common approach to sterilize polymers, chemically modify them for upcycling, and accelerate their decomposition for recycling purposes. Reactive molecular dynamics (MD) simulations provide a powerful tool to generate atomic-level trajectories of the reactive processes and quantify radiolytic chemical degradation pathways. For this, machine learning (ML) surrogate models for reactive force fields with quantum mechanical accuracy are now widely used, which require ML training data sets that can provide information on atomic environments for target chemical systems. However, radiolysis chemistry can be highly complex and diverse, which poses significant challenges for generating training data to parametrize ML models. In this regard, we developed a method for optimizing the training data set using a cosine similarity metric to help guide training set selection for radiolysis of polyethylene, a model hydrocarbon polymer, as well as to enhance the transferability of our reactive ML force field (MLFF) to a variety of molecular and polymeric systems. Our approach performs atom-by-atom comparisons between local atomic environments to pinpoint important data points associated with rare and localized events, such as radiolysis damage within structures. We apply this approach to train the Chebyshev Interaction Model for Efficient Simulation (ChIMES) MLFF model, which expresses the atomic interaction potentials in terms of linear combinations of many-body Chebyshev polynomials. We first show that our method can reduce our training set size by ∼70% while improving overall accuracy compared to more standard MD model fitting approaches. We then validate our optimum model against diverse hydrocarbon simulation data, including simple alkanes and systems with unsaturated carbon bonds, over a wide range of thermodynamic conditions. Finally, we use our ChIMES model to perform MD simulations of radiolytic damage with large-scale systems that help avoid system size effects. Overall, our approach yields an MD force field that retains most of the accuracy of the underlying quantum method while yielding many orders of improvement in computational efficiency. Our efforts will have impact on future hydrocarbon polymer radiolysis studies, where the chemical details of the polymer-radiation interactions can have a strong effect on the resulting products observed in experiments.
Ähnliche Arbeiten
UCSF Chimera—A visualization system for exploratory research and analysis
2004 · 47.030 Zit.
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading
2009 · 35.491 Zit.
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
1989 · 31.305 Zit.
The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals
2007 · 29.338 Zit.
<i>VESTA 3</i> for three-dimensional visualization of crystal, volumetric and morphology data
2011 · 24.085 Zit.