The ability to rapidly learn from high-dimensional data to make reliable predictions about the future of a given system is crucial in many contexts. This could be a fly avoiding predators, or the retina processing terabytes of data almost instantaneously to guide complex human actions. In this talk we draw parallels between such tasks, and the efficient sampling of complex molecules with hundreds of thousands of atoms. Such sampling is critical for predictive computer simulations in condensed matter physics and biophysics, including but not limited to problems such as crystal nucleation, protein loop movement and drug unbinding. For this we use the Predictive Information Bottleneck (PIB) and long short-term memory (LSTM) frameworks from artificial intelligence (AI), and re-formulate them for the sampling of biomolecular structure and dynamics, especially when plagued with rare events. We demonstrate the methods on different test-pieces, where we calculate the dissociation pathway and timescales much longer than milliseconds. These include ligand dissociation from the protein lysozyme and and from flexible RNA. We will also discuss some generic challenges and proposed solutions regarding reliability, interpretability and extrapolative powers of AI when used in molecular simulations.
1. Wang, Y., Ribeiro, J.M.L. & Tiwary, P. Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nat Commun 10, 3573 (2019). https://doi.org/10.1038/s41467-019-11405-4
2. Tsai, S.T, Kuo, E.J. & Tiwary, P. Learning Molecular Dynamics with Simple Language Model built upon Long Short-Term Memory Neural Network. Nat Commun 11, 5115 (2020). https://doi.org/10.1038/s41467-020-18959-8