Skip to Main Content

Study Skills

Systematic Reviews: Generative AI and systematic reviews

More on AI and systematic review searches

A recent journal article evaluating the effectiveness of generative AI tools in systematic reviews and other forms of evidence synthesis found that 

recall (percentage of relevant studies found) which ranged from 4% to 32%, with an average of 13%. This means GenAI tools missed between 68% and 96% of the relevant studies available that were found by humans.

Overall, the researchers concluded that AI should not be used in searching for systematic reviews, and that for other components of the reviews,

the current evidence does not support GenAI use in evidence synthesis without human involvement or oversight.

Source: Clark J, Barton B, Albarqouni L, et al. Generative artificial intelligence use in evidence synthesis: A systematic review. Research Synthesis Methods. Published online 2025:1-19. doi:10.1017/rsm.2025.16

Can I use generative AI to write my systematic review search?

a flowchart asking 'is it safe to use ChatGPT for your task?' the flowchart concludes that the answer is only 'yes' if one it doesn't matter if the output is true, if one has the subject expertise to recognise an inaccurate output, and if one is prepared to take legal responsibility for any consequences arising from an incorrect output

University of Toronto Libraries have written an extensive, detailed guide to the risks and possibilities of using generative AI tools such as ChatGPT to write a systematic review search strategy. We recommend you read this guide in full, but focus in particular on its final conclusions.

In this section of the guide, University of Toronto Libraries share a journal article in which the authors have used ChatGPT to generate a Pubmed advanced search strategy (Shuai Wang, Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2023. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?. 1, 1 (February 2023), 19 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn.) 

Look at the search strategy that ChatGPT generated: 

(((differentiated thyroid cancer[MeSH] OR "differentiated thyroid"[All Fields] OR "thyroid carcinoma"[All Fields] OR "papillary microcarcinoma"[All Fields]) AND (prevalence[All Fields] OR incidence[MeSH] OR "etiology of"[All Fields] OR "risk factors"[All Fields] OR gender[All Fields] OR hormonal[All Fields] OR "nodular goiter"[All Fields] OR "Hashimoto’s thyroiditis"[MeSH] OR malignancy[MeSH] OR "concomitant lesion"[All Fields] OR tumor[All Fields] OR infiltrate[All Fields] OR fibrosis[All Fields] OR "early stages of development"[All Fields] OR frequency[All Fields])) AND (autopsy[MeSH] OR surgical[All Fields] OR material[All Fields] OR series[All Fields] OR specimens[All Fields] OR cases[All Fields]))

Ask yourself the following questions:

  1. Can you identify the errors in this search?
  2. Can you explain what the search is doing in PubMed?
  3. Would you trust the results of this query to find all the relevant studies for your question (ie. recall/sensitivity) while limiting the number of irrelevant studies (precision?)
  4. Would you be able to translate this query into additional databases?
  5. Do you trust ChatGPT to do your data collection?

If your answer is not 'yes' to all these five questions, we would advise against relying solely on a generative AI tool to create your systematic review search strategy.

Cambridge Medical Library

Profile Photo
Medical Library Team
Contact:
University of Cambridge Medical Library
Box 111
School of Clinical Medicine
Cambridge Biomedical Campus
Cambridge
CB2 0SP
Website
Subjects: Clinical Medicine

© Cambridge University Libraries | Accessibility | Privacy policy | Log into LibApps