We have evaluated G2S on QM structures of thousands of constitutional isomers, singlet state carbenes, E2/S N2 transition states (TS), and elpasolite crystals. By exploiting correlations among data-sets free of conformational isomers (restriction to constitutional and compositional isomers only is necessary to avoid ambiguity), G2S learns the direct mapping from chemical graph to that structural minimum that had been recorded in the training data set (which is assumed to be generated in consistent ways), thereby bypassing the computationally demanding process of energy-based conformational search and relaxation. As query input, G2S requires only bond-network and stoichiometry-based information (see Fig. From the pairwise distance matrix, atomic coordinates can easily be recreated. While any other regressor, such as deep neural networks and alike might work just as well, we rely for simplicity on kernel ridge regression (KRR) for G2S in order to predict all elements in the pairwise distance matrix of a single atomic configuration of an out-of-sample molecule or solid. To address the 3D structure with modern supervised learning, we introduce the Graph To Structure (G2S) model. Unfortunately, however, they have not yet been used to tackle the 3D structure prediction problem, to the best of our knowledge. Recent generative machine learning developments might hold promise since they can produce structural candidates to solve inverse molecular design problems 8, 9, 10, 11, 12. While applicable to known and well-behaved regions of chemical compound space, these methods lack generality and are inherently limited when it comes to more challenging systems, such as carbene molecules or transition states (TS). State of the art approaches for generating 3D molecular structures e.g., ETKDG 6 and Gen3D 7 are very efficient yet carry significant bias since they are based on mathematically rigid functional forms, empirical parameters, knowledge-based heuristic rules, and do not directly improve upon the increase of training data set sizes. While feasible for few and small systems, conformational scans of larger subsets of chemical compound space remain computationally prohibitive. Often, only low energy conformations are desired, e.g., as practically relevant starting configurations to a chemical reaction 4, or as binding poses in computational drug design 5, requiring conformational scans to identify or rank the most promising representative candidate geometries. The problem is aggravated by the combinatorially large number of possible conformational isomers (cf. The many degrees of freedom and various levels of theory for describing potential energy surfaces make structure predictions challenging. Energy and force estimates are frequently used to relax the atomic positions on the potential energy surface in order to locate structural minima 1, 2. Elemental information and 3D coordinates of all atoms define a system’s electronic Hamiltonian, and thereby all related observables which can be estimated as expectation values of approximate solutions to the electronic Schrödinger equation. The prediction of three-dimensional (3D) structures from a molecular graph is a universal challenge relevant to many branches of the natural sciences. Applicability tests of G2S include successful predictions for systems which typically require manual intervention, improved initial guesses for subsequent conventional ab initio based relaxation, and input generation for subsequent use of structure based quantum machine learning models. G2S improves systematically with training set size, reaching mean absolute interatomic distance prediction errors of less than 0.2 Å for less than eight thousand training structures - on par or better than conventional structure generators. The numerical evidence collected includes 3D coordinate predictions for organic molecules, transition states, and crystalline solids. Exploiting implicit correlations among relaxed structures in training data sets, our machine learning model Graph-To-Structure (G2S) generalizes across compound space in order to infer interatomic distances for out-of-sample compounds, effectively enabling the direct reconstruction of coordinates, and thereby bypassing the conventional energy optimization task. This accuracy/cost trade-off prohibits the generation of synthetic big data sets accounting for chemical space with atomistic detail. Conventionally, force-fields or ab initio methods determine structure through energy minimization, which is either approximate or computationally demanding. The computational prediction of atomistic structure is a long-standing problem in physics, chemistry, materials, and biology.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |