The Panda Challenge

How organizing a competition and involving the public improves the development of AI algorithms.

The Panda Challenge
Generated with DALL-E 3

Even though the name of this paper suggests pandas were involved, it really was a histopathological competition in developing the best possible AI algorithm for detecting prostate cancer. The PANDA, with a bit of ingenuity, stands for "Prostate cANcer graDe Assessment". The research team from the Radbound UMC and Karolinska Institutet identified a few interesting problems they wanted to solve.

The problems

Pathologists grade prostate biopsies according to the Gleason scale. Because this is done by different pathologists, the diagnoses are variable, just as in other medical specialities.

Gleason grading is a system used to evaluate the prognosis of men with prostate cancer, based on the microscopic appearance of cancer cells in biopsy samples. It scores the patterns of cells from 1 to 5, with higher numbers indicating more aggressive cancer. The scores for the two most prevalent patterns are added to give a Gleason score.

A possible solution is implementing AI algorithms to improve the accuracy and speed of grading. However, they're often developed and validated by the same researchers, which prevents them from being implemented into clinical practice. See the problem here?

One of the ways to develop them is through competitions, which markedly increase the speed of development, but are frequently not independently validated and reproduced.

Internal Validation is a process within a study to assess the accuracy of a predictive model or algorithm using the data on which it was developed. External Validation is the evaluation of a predictive model or algorithm's performance on a completely independent dataset, not used during the model's development.

The competition

For those reasons, the research team released almost 11,000 images of prostate biopsies to the public and recruited more than 1000 teams from around the world. It was one of the largest such competitions to date. In total, the algorithms made more than 32 million predictions, which showcases how useful it is to involve the community. Pathologist-level diagnoses were achieved in all but 10 days.

Algorithm development and validation

During the development phase, the teams trained the algorithms using a tuning set of images provided by the researchers. Afterward, the research team chose the best performing ones to enter the reproduction and validation phases.

In essence, they used 3 sets of images with reference standards prepared by experienced (uro)pathologists. And in the case of the US validation set, they even used immunohistochemistry to confirm the diagnosis. Additionally, they also asked some international and US pathologists to diagnose the same images as the AI did and compared their diagnostic accuracy.

Reference standard is the best available method for diagnosing a condition against which new tests or algorithms are compared. It is assumed to be the most accurate method to determine the presence or absence of the disease.
Source: Nature Medicine


The findings were astonishing. These community-developed AI algorithms could identify and classify tumours as well as pathologists. In some cases, their accuracy even exceeded the pathologists. And compared to previous similar papers, they performed even better and were just as good on unseen data (external validation set).


On the other hand, this paper showcases that it's not all about the algorithms. In clinical practice, doctors examine multiple biopsies of the same tissue (and also take into account the clinical picture), which weren't included in the paper.

Also, the training was limited to adenocarcinoma, which means we cannot assess the performance of algorithms on other types of cancer. Furthermore, the reference standards were set by a single pathologist and the evaluation was based on predominantly white countries.

The paper has 172 citations and 9 highly influential at the time of writing. Data by Semantic Scholar.
Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge - Nature Medicine
Through a community-driven competition, the PANDA challenge provides a curated diverse dataset and a catalog of models for prostate cancer pathology, and represents a blueprint for evaluating AI algorithms in digital pathology.

This summary is based on research from the article Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge by Bulten, W., Kartasalo, K., Chen, PH.C. et al. in Nature Medicine. The original work is freely available and licensed under the Creative Commons Attribution 4.0 International License. Adaptations and summaries are provided by Medical Notes and are not endorsed by the original authors or publisher.