Reimagining medical AI with the most powerful large multimodal foundational model designed to excel in radiology

Introducing Harrison.rad.1, the most capable radiology foundational model today*

Authors: Dr Suneeta Mall, Head of AI Engineering, Harrison.ai, Dr Jarrel Seah, Director of Clinical AI, Harrison.ai

Advancing what is possible in healthcare through AI

Global healthcare is currently facing several challenges, such as increasing imaging volumes, a growing number of images that radiologists must review per case, a shortage of medical professionals, and a high psychological burden on the existing staff [1]. Foundational models or multimodal large language models (LLMs) have potential benefits that could mitigate these challenges.

Multimodal LLMs advance the scope and capabilities of artificial intelligence and deep learning. And yet, even as they have made inroads into everyday life since the introduction of OpenAI’s ChatGPT and similar models, their adoption in a critical and highly regulated space like healthcare still remains limited.

Current regulatory frameworks for medical devices are restricted to proprietary, specific AI models that are only capable of pre-defined tasks and intended use cases.

Both regulatory frameworks and their approved medical device AI models do not support continuous learning or generalisation to areas in which the model has not been trained. The limitations are not without reason: they are designed to ensure quality and safety, while mitigating and preventing risks for individuals and societies, including potential misuse.

In contrast, multimodal LLMs can be used in ways that transcend their original application, increasing their potential to impact and scale global healthcare.

They are open-domain models, capable of tasks that may not have been specified during training—similar to the way clinicians may draw conclusions based on their knowledge and experience when they encounter new conditions.

Evaluating such models for use in radiology presents new challenges. We need to move to a paradigm where we test them not only on their abilities to recognise individual pathologies but also on their radiology interpretation skills in general.

Harrison.rad.1 – designed to excel in radiology tasks

Harrison.rad.1 is a radiology-specific multimodal LLM by Harrison.ai that has been trained to excel in radiology tasks. It is a dialogue-based model, which accepts interleaved text and visual inputs and generates both structured and unstructured text outputs. Factual correctness and clinical accuracy are the model’s key priorities.

Unlike general-purpose generative AI models such as the GPT or Gemini family models, Harrison.rad.1 has been trained on millions of DICOM images from radiology studies and radiology reports across all X-ray modalities and trained to reason over radiology images and text. This extensive training on real-world, diverse and anonymised patient data enables Harrison.rad.1 to excel at radiology tasks.

“Harrison.rad.1 is a significant technological leap towards our end goal of creating dramatically more capacity in radiology. We are making our model available to select collaborators to help accelerate research into validation methods and the responsible integration of this technology into clinical practice.”

Dr Aengus Tran, CEO & Co-Founder, Harrison.ai

Broad capabilities of Harrison.rad.1

Harrison.rad.1 is designed to uphold the stringent standards of safety and accuracy that apply in medicine. To this end, Harrison.rad.1 can be considered a closed-source specialist model, accepting input and returning text output related to radiology – its field of expertise by design.

It is capable of:

  • processing multiple views per study, including X-rays of all body parts and OPG (orthopantomogram)
  • detecting and characterising radiological findings in X-ray images
  • localising and outlining radiological findings in X-ray images
  • generating reports based on X-ray images
  • incorporating clinical history and patient context in its response
  • longitudinal comparison across X-ray images taken at different times
  • open-ended chat related to X-ray images and general radiology knowledge
  • searching content and retrieving documents

In addition, embedded safety measures and rules will foster safety and limit misuse and output errors. To enhance safety, the model will only:

  • respond to questions related to radiology and medicine in general
  • respond to X-ray images

Quantitative performance of Harrison.rad.1 compared to other models

Benchmarking against available Visual Questions and Answers (VQA) benchmarks

Evaluating the performance of multimodal foundational models is challenging, but it is a prerequisite for regulatory clearance and adoption by both medical professionals and patients. Several benchmarks have been introduced to evaluate and compare the performance of multimodal foundational models on medical tasks. One of the most widely used is the VQA-Rad benchmark, a dataset of clinically generated visual questions and answers on radiological images [2].

Filtered for plain radiographs, Harrison.rad.1 achieves 82% accuracy on closed questions, outperforming other generalist and specialist LLM models available to date (Table 1).

The code and methodology used to reach this conclusion will be made available at https://harrison-ai.github.io/radbench/.

* Specialist medical foundational models

Table 1. Performance of Harrison.rad.1 versus other models on VQA-Rad, filtered for plain radiographs.

Benchmarking against Harrison.ai’s RadBench dataset – our unique Open-Source Visual Questions and Answers (VQA) benchmarks

To ensure a comprehensive evaluation of Harrison.rad.1’s clinical capabilities, we have curated a novel Visual Questions and Answers (VQA) dataset, termed RadBench. Harrison.rad.1 achieves an accuracy of 73% (F1 score 74%) on closed questions, outperforming other generalist and specialist LLM models available to date (Table 2).

This dataset, along with the evaluation code and methodology, will be made available at https://harrison-ai.github.io/radbench/.

* Specialist medical foundational models

Table 2. Performance of Harrison.rad.1 versus other models on RadBench, filtered for plain radiographs.

While the above benchmarking results are encouraging, demonstrating that Harrison.rad.1 outperforms other foundational models currently available, they require further validation through extensive research and clinical trials.

Want to get early access to Harrison.rad.1?

Places are limited but we’ll put you on the waitlist for potential early access to the model.

Benchmarking against humans in standard radiology examinations

Human beings are considered the gold standard when it comes to interpreting radiology images. They are also capable of generalising to unforeseen situations and pathologies. As medical specialists, they must undergo rigorous and thorough evaluations, pass demanding examinations, and continue professional development throughout their careers. While it is difficult to replicate the formative and social components of this evaluation, some elements of the qualifying exams for radiologists can be applied to multimodal LLMs.

One such exam is the Fellowship of the Royal College of Radiologists (FRCR) examination. We used a component of this examination, the FRCR 2B Rapids, to evaluate the model’s performance [3]. While the actual examinations are kept confidential to prevent leakage, practice examinations are available online. Our FRCR evaluation dataset comprises 70 unique FRCR practice examination sheets that have never been used in the development of Harrison.rad.1. We have sourced this dataset from a third party to ensure fairness in our evaluation process.

When evaluated on the FRCR 2B Rapids exams (Figure 3), a notoriously difficult test where radiologists have to interpret 30 X-rays in 35 minutes:

  • Harrison.rad.1 achieved an impressive average score of 51.4 out of 60 (85.67%)—an unprecedented score for any AI previously reported [5,6].
  • Other foundational models from OpenAI’s GPT-4o, Anthropic’s Claude -3.5-sonnet and Google’s Gemini-1.5 Pro scored below 30 (<50%), often scoring at or below the level expected from random guessing.
  • Our model’s average score is higher than that obtained by radiologists, who have reattempted the FRCR exam within one year of passing, at 50.64 (84.4%).
  • We note Harrison.rad.1 performance is competitive against average FRCR participants (Figure 3).

Figure 3. Average multimodal LLMs’ scores on the FRCR Rapids 2B. To pass the test, radiologists need to score 54 or higher.

Note: FRCR Rapids 2B includes 30 plain radiographs to be completed within 35 minutes with a score of 0 to +2 for identification and classification, resulting in a maximum score of 60. The test is passed with 90% and above (Score 54).

We acknowledge that our FRCR evaluation is constrained to 2B Rapids only, focussing mainly on clinical diagnosis. The other aspects of the FRCR examination, i.e. Part 1, 2A and 2B orals that also focus on medical imaging physics, techniques, and anatomy, could not be considered in our evaluation at this time. These excluded examinations require knowledge beyond plain film X-ray and dive into techniques such as fluoroscopy, angiography, computed tomography (CT), ultrasound imaging, radionuclide imaging and magnetic resonance imaging (MRI) that Harrison.rad.1 is not yet trained for. These exams are designed to assess human radiologists and identify common human pitfalls and not to evaluate AI. Overall, the results do indicate the suitability of Harrison.rad.1 for radiology tasks, highlighting its specialised nature. However, this needs more evaluation.

Want to get early access to Harrison.rad.1?

Places are limited but we’ll put you on the waitlist for potential early access to the model.

Transforming the future of global healthcare – today

The full scope of applications, benefits, risks, and challenges of multimodal LLM in radiology and healthcare is actively being researched. In the meantime, several non-clinical and clinical applications have emerged as potential uses of a radiology specialist large foundational model.

Non-medical use cases
Patient-care centred use cases
Working towards a new paradigm to advance healthcare – jointly

With Harrison.rad.1, we are introducing a radiology-specific large, multimodal foundational model designed to excel in radiology tasks with a high level of factual correctness and clinical accuracy, as demonstrated in several benchmark tests against other existing models and human radiologists.

Technological milestones throughout history have taught us that what might seem drastically novel at first can truly change our lives for the better, individually and globally. Embracing the challenge of adopting a new technological paradigm and integrating it into daily life cannot be achieved by a single person, organisation, or entity. It will require a joint effort between the medical community and other technology and policy stakeholders, especially the public, to determine the appropriate use of large, multimodal foundational models in healthcare and responsibly transform healthcare together.

We are giving select researchers and industry professionals access to Harrison.rad.1 as part of an open invitation to join forces in shaping the endless capabilities of generative AI for healthcare, helping improve patient care and global health in a safe and responsible manner.

Want to get early access to Harrison.rad.1?

Places are limited but we’ll put you on the waitlist for potential early access to the model.

References:

*As of 20th July 2024

  1. Global Radiology: 6 key challenges & how AI can help [Internet]. harrison. 2023 [cited 2024 Aug 1]. Available from: https://harrison.ai/news/global-radiology-6-key-challenges-how-ai-can-help/
  2. Lau, J., Gayen, S., Ben Abacha, A. et al. A dataset of clinically generated visual questions and answers about radiology images. Sci Data 5, 180251 (2018). https://doi.org/10.1038/sdata.2018.251
  3. FRCR Part 2B (Radiology) – CR2B | The Royal College of Radiologists [Internet]. Rcr.ac.uk. 2024 [cited 2024 Aug 7]. Available from: https://www.rcr.ac.uk/exams-training/rcr-exams/clinical-radiology-exams/frcr-part-2b-radiology-cr2b/
  4. Radiopaedia.org Rapids | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 1]. Available from: https://radiopaedia.org/courses/Rapids
  5. Shelmerdine SC, Martin H, Shirodkar K, Shamshuddin S, Weir-McCall JR. Can artificial intelligence pass the Fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study. BMJ [Internet]. 2022 Dec 21;379:e072826. Available from: https://www.bmj.com/content/379/bmj-2022-072826=
  6. K.E. Hawtin, H.R.T. Williams, L. McKnight, T.C. Booth. Performance in the FRCR (UK) Part 2B examination: Analysis of factors associated with success. Clinical Radiology [Internet]. 2014; Volume 69, Issue 7. Available from: https://www.sciencedirect.com/science/article/abs/pii/S000992601400107X
Credits:

We gratefully acknowledge the open-source contributions from PyTorch, HuggingFace, MetaAI, and MistralAI. Their projects have been instrumental in the development of this work.

  1. Radiology Quiz 95801 | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 1]. Available from: https://radiopaedia.org/cases/95801/studies/115245
  2. Radiology Quiz 30339 | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/30339/studies/30981
  3. ‌Yonso MOH. Distal radial fracture | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/distal-radial-fracture-23
  4. ‌Salam HMA. HIV encephalitis | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/hiv-encephalitis
  5. Cuete D. Normal CT brain | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. 2013. Available from: https://radiopaedia.org/cases/normal-ct-brain
  6. Harvey J. Cerebral fat embolism | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/cerebral-fat-embolism-8
  7. Muzio BD. Iliac crest – Risser stage 4 | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/iliac-crest-risser-stage-4-3
  8. Kwok M. Right middle and lower lobe pneumonia – Mycoplasma | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/right-middle-and-lower-lobe-pneumonia-mycoplasma-1
  9. ‌Eshraghi DR. Extensive bone metastasis from prostate cancer | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/extensive-bone-metastasis-from-prostate-cancer?lang=us
  10. St-Amant M. Charcot foot | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/charcot-foot-1
  11. Gaillard F. Mesothelioma | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/mesothelioma
  12. ‌Gaillard F. Tuberculosis – left upper lobe | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. Available from: https://radiopaedia.org/cases/tuberculosis-left-upper-lobe-3
  13. Radiology Quiz 80700 | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/80700/studies/94154
  14. Bickle I. Normal AP pelvic radiograph – female | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. Available from: https://radiopaedia.org/cases/normal-ap-pelvic-radiograph-female
  15. ‌Gaillard F. Tibial fracture | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/tibial-fracture-6
  16. Knipe H. Left hilar mass – small cell lung cancer | Radiology Case | Radiopaedia.org [Internet]. Radiopaedia. [cited 2024 Aug 2]. Available from: https://radiopaedia.org/cases/left-hilar-mass-small-cell-lung-cancer-3

More news