Reaction Yields Prediction Task Overview
Definition: Vast majority of small-molecule drugs are synthesized through chemical reactions. Many factors during reactions could lead to suboptimal reactants-products conversion rate, i.e. yields. Formally, it is defined as the percentage of the reactants successfully converted to the target product. This learning task aims to predict the yield of a given single chemical reaction.
Impact: To maximize the synthesis efficiency of interested products, an accurate prediction of the reaction yield could help chemists to plan ahead and switch to alternate reaction routes, by which avoiding investing hours and materials in wet-lab experiments and reducing the number of attempts.
Generalization: The models are expected to extrapolate to unseen reactions with diverse chemical structures and reaction types.
Product: Small-molecule.
Pipeline: Manufacturing - Synthesis planning.
Buchwald-Hartwig
Dataset Description: Ahneman et al. performed high-throughput experiments on Pd-catalysed Buchwald–Hartwig C-N cross coupling reactions, measuring the yields for each reaction.
Task Description: Given reactant and product set X, predict the yields Y.
Dataset Statistics: 55,370 reactions.
Dataset Split: Random Split
from tdc.single_pred import Yields
data = Yields(name = 'Buchwald-Hartwig')
split = data.get_split()
References:
[1] Sandfort et al. “A structure-based platform for predicting chemical reactivity.” Chem (2020).
Dataset License: CC BY 4.0.
USPTO
Dataset Description: TDC parses the yields outcome from the full USPTO (United States Patent and Trademark Office) dataset.
Task Description: Given reactant and product set X, predict the yields Y.
Dataset Statistics: 853,638 reactions.
Dataset Split: Random Split
from tdc.single_pred import Yields
data = Yields(name = 'USPTO')
split = data.get_split()
References:
Dataset License: CC0