Meta-science · bioRxiv 2018–2025 · 72,644 pairs

How reliably do preprint claims survive peer review?

We compared the abstract claims of every matchable bioRxiv preprint with its peer-reviewed publication, using a large language model to label each pair at the level of the scientific claim.

Hao Yin · Ruslan Rust
72,644
preprint–publication pairs
89.9%
claims unchanged or minor
10.2%
primary claims majorly changed
2.0×
more cautious than confident

The findings

Interactive versions of every figure in the paper — content change, hedging, fields, claim-type transitions, and the drivers of revision.

Explore the figures →

Browse the dataset

Search and filter all 72,644 pairs by author, institution, field, and more. Open any one to see the preprint and published claims side by side with the model's reasoning.

Open the browser →

In brief

Preprints now disseminate a large share of biomedical research before peer review, and are often regarded as unverified. We compiled every bioRxiv preprint posted between 2018 and 2025 that we could match by DOI to a peer-reviewed version, yielding 72,644 pairs, and used Claude Sonnet 4.6 to parse each abstract pair into one primary and two secondary claims, classifying content change and hedging shift.

Most central claims changed little: 39.9% unchanged, 50.0% minor, 10.2% major. Hedging shifts were uncommon and asymmetric — twice as many claims became more cautious as more confident. Major revisions were more frequent after long peer review and declined over the study period. Papers never posted as preprints were retracted at roughly twice the rate of those that were. The move from preprint to publication leaves the central claims of most biomedical abstracts intact.

Key findings

Claims are stable. 9 in 10 primary claims are unchanged or only minorly reworded through peer review.
Wording softens. When language shifts, it turns cautious twice as often as confident (8.4% vs 4.2%).
Revision tracks review. Major change rises from 7.0% after fast review to 14.1% after the slowest.
It's falling over time. Major revision dropped from 17.0% (2019) to 5.7% (2024).
Preprints aren't riskier. Never-preprinted papers were retracted ~2× as often.

How the dataset was built

Corpus

Every bioRxiv preprint posted 2018–2025 matched by DOI to its peer-reviewed publication. English abstracts ≥100 characters; first preprint version only. 72,644 pairs across 3,442 journals and 25 fields.

Labelling

Claude Sonnet 4.6 (temperature 0, locked v7.1 codebook) parsed each abstract pair into one primary and two secondary claims, then classified content change and hedging shift.

Validation

On 120 stratified pairs, model–expert agreement (κ 0.63–0.66) matched expert–expert agreement (κ 0.60); replicate model runs agreed at κ=0.75.