Therapixel at the Dream Digital Mammography Challenge
It’s been three years since Therapixel won the largest ever held international competition on breast cancer detection improvement with AI: DREAM Digital Mammography Challenge. 126 participating teams from 44 countries, a dataset of 640k mammograms from 80k women, and one goal – to develop a predictive algorithm that would improve breast cancer detection and reduce the false-positive rate. In his interview, Yaroslav Nikulin, Senior Research Engineer at Therapixel and the team leader at DREAM Digital Mammography Challenge describes what it was like.
Let’s start with the problem you solved. What was the challenge about?
In this challenge participants needed to implement a model that would be able to identify signs of breast cancer in high-resolution mammography images. Professional radiologists routinely do this on huge amounts of data generated by hospitals. It is a difficult visual problem: cancer has a lot of different faces and appears as different anomalies, lesions in healthy tissue. But the real challenge is in distinguishing malignant lesions from benign ones: radiologists developed a new visual language to describe subtle texture differences which help them to assess the degree of malignancy. But because of the difficulty of the task and huge workload there are quite some errors – and AI can help here.
What motivated you to join the competition?
As a young researcher charmed by breakthroughs in AI I was persuaded that there was a huge potential for Deep Learning in radiology – so much complex data we could finally solve. When I encountered Olivier Clatz (CEO of Therapixel at that time) and Pierre Fillard (CTO) I was happy to learn that they shared the same vision and were already working on a research project for lung cancer detection. I joined the company and my first task was to participate in the DREAM Mammography Challenge: Olivier and Pierre were following the main events and decided that participating in the top 2 medical imaging challenges of the year would be a good start for Therapixel. And it was: we were ranked 5th in Kaggle Data Science Bowl (lung cancer) and 1st in DREAM Challenge.
When you began to work on this Challenge which parts of the project already existed?
Well, simply speaking I started from scratch. Of course, there already were some great libraries and tools for rapid prototyping of Deep Neural Networks, as well as examples, but there was no ready part of the code. However, I have to say that Therapixel since its very origin has been working with medical data, and thus there was a significant amount of skills and knowledge of low and high levels accumulated. What was also important, Therapixel always had connections with healthcare professionals such as surgeons and radiologists. In particular, Dr. Antoine Iannessi provided me with a great kickstart for understanding mammography images.
What was the most difficult aspect of this project?
Building a high-performing machine learning model implies a lot of different components. While implementing a model, you constantly face some choices: which architecture should I use? What hyperparameters do I choose for every stage of the model’s pipeline? How do I even formalize some aspects and ideas? And the real problem is that each idea takes some time to implement, debug, test, gather, and visualize statistics. Time and computational power are always limited, sometimes severely limited. So I’d say that generally the most difficult was to properly prioritize and concentrate efforts on the most impactful ideas following your intuition and general understanding. If you want a precise every-day work example, I can say that debugging a DL model is difficult: when you have no programming errors, your model iterates but learns nothing because of some vice problems with data or gradient flow. This can take quite some time to debug.
How long did it take to develop your model?
I joined Therapixel in November, final submissions to the Challenge were made at the end of April – thus approximately 6 months. I usually planned to implement some new ideas and run the model’s training during the night. More runs or longer runs during the weekend – time management is crucial when working with Deep Learning.
What were the tools you used?
I used Python as my primary programming language with some standard scientific libraries such as numpy, scikit, matplotlib. For the Deep Learning backend I used TensorFlow, I believe I started with a rather early version like 0.8.0. Also, to get more and better-annotated data I used an old dataset DDSM saved in a homemade image format. In order to use it I had to update and compile some old C code – it is funny how sometimes quite old geek things turn out to be extremely important for a project.
What results have you achieved?
IBM (one of the Challenge organizers) said in its press release that we improved the state-of-the-art in terms of specificity/sensitivity by 5%. This is significant since we have closed a big part of the performance gap between human professionals and AI models for this problem. Some of the Organizers were really amazed that in the limited setting of this Challenge the best submission broke the high bar of 90% of AUROC (an important metric for this problem). I’d like to believe that generally our results contributed to the demonstration that recent AI breakthroughs can and should be transferred to the complex medical data. Yes, sometimes it can take quite some R&D efforts but we did unlock some more useful tasks which were previously human-only prerogative.
Who were your competitors?
I believe the level of participants was really high. Some scientific labs participated, such as the New York University team, professor Yuanfang Guan from the University of Michigan, several young high-tech startups, such as DeepHealth or Lunit. There were some independent researchers too, such as Dezso Ribli, last year Ph.D. student from Eötvös Loránd University in Hungary. It was a world-wide open competition, people joined from all over the world. I want to thank the organizers of this Challenge again: they did a huge job to permit all of us to work with great data. They shape a better future for collaborative international science.
How has MammoSceen changed since?
It changed so much it is even difficult to describe. The version that won the DREAM Challenge was a research draft. We added 4 new model families that work all together to provide significantly better performance and most importantly to precisely delineate the suspicious lesions and assign them individual malignancy probability. We performed hundreds if not thousands of experiments to better understand each component and test them individually. Our final release ensemble passed a clinical study where it demonstrated autonomous performance on pair with best practices in radiology and showed that AI-augmented radiologists work even better than AI or humans separately. In short, it evolved into a serious validated industrial solution connected to our own secure medical data cloud and easy to deploy in any hospital on the planet.
Started from scratch more than 3 years ago, today MammoScreen is a finished product. Already implemented in numerous European and US hospitals, the software helps radiologists with different levels of experience to detect breast cancers earlier and more accurately.
Book your demo of MammoScreen here to learn how it may benefit your radiology practice.
* compared to the state of the art