Evidence Rating: How Supplement AI Scores Paper Quality | Supplement AI Research

Supplement AI's Evidence Rating is a paper-level research quality signal.

In short: it helps estimate how much trust to place in a paper's reported outcomes and conclusions.

Current coverage

As of June 14, 2026, the Supplement AI research database contains 255,602 supplement-related papers. Of those, 52,773 open-access papers have evaluated Evidence Ratings.

The database is updated in batches. Once papers are ingested, open-access papers move through Evidence Rating and deeper evidence evaluation on a rolling basis, so these counts will change over time and may lag the newest literature.

We prioritize rating the study types that most directly affect supplement evidence analysis: human intervention studies and systematic reviews or meta-analyses.

Why paper quality matters

Research papers are not equally reliable or equally comparable.

A paper can study the right supplement and still be less reliable if the design or reporting is weak. Evidence frameworks such as GRADE and Cochrane explicitly treat issues like risk of bias, imprecision, indirectness, inconsistency, and publication bias as reasons to reduce confidence in a body of evidence [1].

Evidence Rating is a comparative quality signal for the papers Supplement AI has evaluated.

What the rating evaluates

Evidence Rating starts by classifying the paper type, because study design is part of the evidence.

A randomized human trial, observational study, systematic review, animal study, case report, protocol, and narrative review should not be judged by one identical checklist. Each study type has different strengths, limitations, and failure modes.

Supplement AI classifies papers into the following categories:

Interventional human studies
Systematic reviews and meta-analyses
Observational human studies
Narrative reviews
Case reports or case series
Protocols
Preclinical animal or in vitro studies
Other or unclassified publications

The rating then applies study-type-specific criteria inspired by established evidence-review and reporting frameworks, including Cochrane, GRADE, NIH/NHLBI study quality tools, PRISMA, ARRIVE, and CARE [1-6].

Methodology and transparency criteria

The largest part of Evidence Rating comes from paper methodology and reporting transparency.

The exact criteria depend on study type, but we ask questions like:

Was the study design appropriate for the type of evidence being reported?
Were the supplement, dose, duration, and comparison group described clearly enough to interpret the result?
Were the outcomes meaningful, validated, and relevant to the supplement's claimed effect?
Were randomization, blinding, dropout handling, and adverse events addressed when applicable?
For reviews, were the search strategy, inclusion criteria, study-quality assessment, and synthesis methods transparent?
For observational or preclinical studies, were exposure measurement, confounding, model relevance, and replication details handled clearly?
Were funding sources, conflicts of interest, and potential sources of bias disclosed?

Risk of bias and conflicts of interest

Evidence Rating also incorporates more specific bias signals.

The methodology criteria ask whether funding and conflicts of interest are disclosed, but we also use separate systems for deeper assessment of:

Risk of bias is especially important for randomized trials, where design choices such as random sequence generation, allocation concealment, blinding, missing outcome data, and outcome assessment can change how much trust to place in the result [2].

Conflict of interest does not automatically invalidate a study. Industry funding can support useful research. But financial ties, sponsor involvement, and undisclosed conflicts can affect how results are designed, interpreted, or presented, so they are treated as relevant evidence-quality signals rather than ignored.

Journal and citation modifiers

Evidence Rating also uses journal quality and citation influence as light modifiers.

Journal quartile and citation influence provide useful context. A paper in a strong journal is not automatically reliable. A highly cited paper is not automatically methodologically strong, and a less-cited paper is not automatically weak. Citation influence is evaluated relative to other supplement-related papers in the Supplement AI database.

This follows the broader principle that research metrics should support evaluation, not replace assessment of the paper itself [7].

The 100-Point Scale

Evidence Ratings are reported on a 100-point scale.

A high rating does not prove the outcomes of a study are true. It means the paper itself appears more reliable as a source of evidence. A low rating does not prove the paper is false. It means the study has limitations, missing information, weaker design, higher bias concern, or lower transparency that should reduce confidence.

How to use Evidence Rating

Use Evidence Rating as a research-quality signal, not as the final answer.

A high-rated paper deserves more confidence than a weak or poorly reported paper, but it still has to be interpreted in context: what supplement was studied, what outcome was measured, what dose was used, who was studied, what it was compared against, and whether other studies agree.

References

Schünemann, H. J., Higgins, J. P. T., Vist, G. E., Glasziou, P., Akl, E. A., Skoetz, N., & Guyatt, G. H. (2024). Chapter 14: Completing "Summary of findings" tables and grading the certainty of the evidence. In J. P. T. Higgins, J. Thomas, J. Chandler, M. Cumpston, T. Li, M. J. Page, et al. (Eds.), Cochrane Handbook for Systematic Reviews of Interventions (Version 6.5). Cochrane. https://www.cochrane.org/authors/handbooks-and-manuals/handbook/current/chapter-14
Sterne, J. A. C., Savović, J., Page, M. J., Elbers, R. G., Blencowe, N. S., Boutron, I., Cates, C. J., Cheng, H.-Y., Corbett, M. S., Eldridge, S. M., Hernán, M. A., Hopewell, S., Hróbjartsson, A., Junqueira, D. R., Jüni, P., Kirkham, J. J., Lasserson, T., Li, T., McAleenan, A., et al. (2019). RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ, 366, l4898. https://doi.org/10.1136/bmj.l4898
National Heart, Lung, and Blood Institute. (n.d.). Study Quality Assessment Tools. National Institutes of Health. Retrieved June 14, 2026, from https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., et al. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71
Percie du Sert, N., Hurst, V., Ahluwalia, A., Alam, S., Avey, M. T., Baker, M., Browne, W. J., Clark, A., Cuthill, I. C., Dirnagl, U., Emerson, M., et al. (2020). The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. PLOS Biology, 18(7), e3000410. https://doi.org/10.1371/journal.pbio.3000410
Gagnier, J. J., Kienle, G., Altman, D. G., Moher, D., Sox, H., Riley, D., & the CARE Group. (2013). The CARE guidelines: Consensus-based clinical case reporting guideline development. Journal of Medical Case Reports, 7, 223. https://doi.org/10.1186/1752-1947-7-223
American Society for Cell Biology. (2012). San Francisco Declaration on Research Assessment. https://sfdora.org/read/