According to the World Health Organization, breast cancer is the most commonly diagnosed cancer and is the leading cause of cancer deaths among women worldwide. On average, a woman is diagnosed with breast cancer every two minutes and one woman dies of it every 13 minutes worldwide. In 2019, an estimated 268,600 new cases of invasive breast cancer are expected to be diagnosed in women in the U.S. alone.
Since 1989, early detection and diagnosis have increased treatment success and survival rates. Screening or detection is generally conducted by self-examination or clinical breast palpation, followed by mammography or ultrasound imaging. This typically identifies the presence of lesions/lumps that could be cancerous. Finally, conclusive breast tissue biopsy and histopathological analysis ascertains the presence, type, grade and malignancy of cancer.
Pathologists typically use a light microscope to manually identify various cellular markers. Such visual diagnosis, however, is tedious and subjective, with average diagnostic concordance among pathologists being relatively low.
The recent introduction of slide scanners that digitize the biopsy into multi-resolution images, along with advances in deep learning methods, has ushered in new possibilities for computer-aided diagnosis of breast cancer. Artificial intelligence (AI), machine learning (ML) and computer vision can potentially automate several steps, helping to make diagnosis more accurate, reliable, efficient and cost-effective.
We have developed a deep-learning-based breast cancer grading approach aimed at automating the error-prone pre-diagnostic steps pathologists perform manually, enabling them to make faster and more accurate diagnoses.
AI approach for computer-assisted diagnosis
As noted, one of our goals is to automate error-prone pre-diagnostic grading steps that pathologists perform manually. We have applied deep convolutional neural networks (ConvNets) for localization and segmentation tasks, which pathologists can use to perform further quantitative analysis and grading of biopsy tissue. Note that deep networks require large training data sets, while available public breast cancer data sets are small. This necessitates special methods; data augmentation and transfer learning techniques are used to offset the training data sparsity.
Tumor localization
Histopathology slides feature image sizes up to several gigapixels. Processing these very large images is computationally expensive, so common practice is to identify the regions of the slides that are of interest prior to performing more detailed analysis. “Localization” refers to identifying the regions that require further analysis. For pathologists, this is a tedious and time-consuming undertaking (see Figure 1).