Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches

Rohit Goswami, Hannes Jónsson

2025-10-08

Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches

Summary

This paper focuses on making a technique called Gaussian Process (GP) regression more efficient for finding 'saddle points' in complex chemical systems. Saddle points are important for understanding how reactions happen, but finding them can take a lot of computer time.

What's the problem?

GP regression helps speed up the search for these saddle points by making smart guesses about the energy landscape, but it has two main drawbacks. First, figuring out the best settings for the GP model itself can be computationally expensive. Second, if the search goes too far away from areas the GP model already understands, it can fail to find the correct saddle point.

What's the solution?

The researchers solved these problems in two key ways. They used a method called 'optimal transport' which considers the geometry of the energy landscape to intelligently choose which configurations to evaluate, focusing on diverse areas. They also implemented a way to prune configurations, keeping only a fixed number of geometrically different ones to avoid the GP model becoming too slow as more data is added. Finally, they improved the stability of the model by using a metric that doesn't care about the order of atoms and a penalty that prevents the model from becoming overconfident.

Why it matters?

These improvements make GP regression a much more reliable and scalable tool for studying chemical reactions and other systems where finding saddle points is crucial. By reducing the computational time by more than half on a test set, this work demonstrates that GP regression can now be used for more complex and demanding problems where calculating energy and forces is very expensive.

Abstract

Gaussian process (GP) regression provides a strategy for accelerating saddle point searches on high-dimensional energy surfaces by reducing the number of times the energy and its derivatives with respect to atomic coordinates need to be evaluated. The computational overhead in the hyperparameter optimization can, however, be large and make the approach inefficient. Failures can also occur if the search ventures too far into regions that are not represented well enough by the GP model. Here, these challenges are resolved by using geometry-aware optimal transport measures and an active pruning strategy using a summation over Wasserstein-1 distances for each atom-type in farthest-point sampling, selecting a fixed-size subset of geometrically diverse configurations to avoid rapidly increasing cost of GP updates as more observations are made. Stability is enhanced by permutation-invariant metric that provides a reliable trust radius for early-stopping and a logarithmic barrier penalty for the growth of the signal variance. These physically motivated algorithmic changes prove their efficacy by reducing to less than a half the mean computational time on a set of 238 challenging configurations from a previously published data set of chemical reactions. With these improvements, the GP approach is established as, a robust and scalable algorithm for accelerating saddle point searches when the evaluation of the energy and atomic forces requires significant computational effort.

View Paper