Home to the Chemical Reaction Database

The chemical reaction database (CRD) is a collection of chemical reactions drawn from the scientific literature and patent literature. Work in progress, for now emphasis on organic reactions

Includes search options for catalysts and ligands, all data normalized with calculated ratio's for each reaction component.

The current database size is over 1.37 million reaction records, over 1.5 million compounds and 396 reaction types (with 827K reactions attributed). The virtual stockroom has 1922 common and less common reagents and 90 solvents.

2024: added dataset of USPTO 2023 only (137K entries). Including reagents and solvents. 10.6084/m9.figshare.22491730.v1

2025: added full dataset (1.37M entries). Including reagents and solvents. m9.figshare.28230053.v1

New datasets

Example organic reaction

The blog

Recently in the blog: The year in review and alkyne protiodesilylation by the numbers

Example organic reaction





Organic reactions by year

Organic reactions in the database by year

Main datasets

Currently the database contains 4 main datasets. The first is the USPTO dataset 1976-2016 as compiled by Daniel Lowe but with data enhancing. The second dataset is also mined from USPTO (2017 to present) but with custom programming and with the aid of Oscar4 software or ChatGPT and the Opsin service. The third dataset is derived from the academic literature (anything with a DOI), progressing at a snails pace (is manual labour). Occasional use of Decimer and Clipboard-To-SMILES Converter. The CJHIF dataset (academic literature) is also included, a total amount of 3.2 million records but only a fraction included thus far. Additional SMILES to IUPAC conversion by STOUT. Reaction images by SmilesDrawer. Reaction types calculated with RDKit.