Accurately predicting chemical reactions is essential for driving innovation
in synthetic chemistry, with broad applications in medicine, manufacturing, Und
agriculture. At the same time, reaction prediction is a complex problem which
can be both time-consuming and resource-intensive for chemists to solve. Deep
learning methods offer an appealing solution by enabling high-throughput
reaction prediction. Jedoch, many existing models are trained on the US Patent
Office dataset and treat reactions as overall transformations: mapping
reactants directly to products with limited interpretability or mechanistic
insight. To address this, we introduce PMechRP (Polar Mechanistic Reaction
Predictor), a system that trains machine learning models on the PMechDB
dataset, which represents reactions as polar elementary steps that capture
electron flow and mechanistic detail. To further expand model coverage and
improve generalization, we augment PMechDB with a diverse set of
combinatorially generated reactions. We train and compare a range of machine
learning models, including transformer-based, graph-based, and two-step siamese
architectures. Our best-performing approach was a hybrid model, which combines
a 5-ensemble of Chemformer models with a two-step Siamese framework to leverage
the accuracy of transformer architectures, while filtering away “alchemical”
products using the two-step network predictions. For evaluation, we use a test
split of the PMechDB dataset and additionally curate a human benchmark dataset
consisting of complete mechanistic pathways extracted from an organic chemistry
textbook. Our hybrid model achieves a top-10 accuracy of 94.9% on the PMechDB
test set and a target recovery rate of 84.9% on the pathway dataset.
Dieser Artikel untersucht Zeitreisen und deren Auswirkungen.
PDF herunterladen:
2504.15539v1