Review••

Multimodal artificial intelligence in ovarian cancer pathology: from image analysis to precision oncology

...

Abstract

Ovarian cancer (OC) remains a lethal gynaecologic malignancy characterised by late diagnosis, heterogeneity and limited stratification. While artificial intelligence (AI) has shown promise in pathology, single-modal approaches often fail to capture the interplay between morphological, molecular and clinical data. Multimodal AI (MMAI) integrates histopathological images, genomics, proteomics, radiology and clinical variables to generate holistic tumour profiles. This review synthesises recent MMAI advances in OC through three themes: methodological foundations, fusion strategies and clinical applications. Methodologically, MMAI leverages deep learning-based feature extraction and fusion architectures to align heterogeneous data. Clinically, MMAI enhances diagnostic accuracy, predicts therapy response (eg, platinum agents, poly (ADP-ribose) polymerase inhibitors), improves prognostic stratification and infers molecular features from histology. Challenges in data heterogeneity, interpretability and clinical integration persist. Future directions include self-supervised learning, federated training and spatial multi-omics to improve generalisability and biological insight. By providing a more comprehensive view of tumour biology, MMAI optimises therapeutic strategies and advances managing OC as a chronic disease.

Introduction

Ovarian cancer (OC) mortality is driven by late detection, histopathological ambiguity and molecular diversity.1 2 This stems from three challenges: late-stage diagnosis, inter-tumoral and intratumoral heterogeneity, and inadequate stratification.3 4 Histopathology, the diagnostic gold standard, faces inter-observer variability and spatial-temporal heterogeneity that limit molecular characterisation.5 These dilemmas underscore the need for integrative frameworks synthesising morphological, molecular and clinical data.6

Traditional pathological evaluation relies on manual interpretation of haematoxylin and eosin (H&E) slides.7 Molecular tests (eg, BRCA sequencing, homologous recombination deficiency (HRD) score) are constrained by high cost, long turnaround time and tissue requirements.8 9 Artificial intelligence (AI) promised to address gaps via computational models,10 with single-modal models achieving moderate success in tasks like tumour segmentation (75%–85% accuracy).11 However, unimodal approaches often degrade in external validation due to batch effects and fail to model complex biological nexuses.12 13 Consequently, the field shifted toward multimodal AI (MMAI), which integrates imaging, omics and clinical data through advanced fusion architectures.

MMAI enhances OC diagnostics through classification, molecular profiling and treatment response prediction (figure 1).14 15 Yet, technical and operational barriers impede translation.16 17 While several reviews summarise AI in cancer pathology, few address the integration of histopathology with multi-omics specifically in OC. Different from the existing literature, this review systematically focuses on ovarian cancer, a specific disease with high spatial and temporal heterogeneity, and to deeply explore the strategies and challenges of histopathological integration with other multimodal data. We not only summarise the general MMAI approach, but also analyse how to integrate histopathological and other omics data to address the unique clinical challenges of OC. By evaluating the clinical validation evidence of these integrated models in the whole process of OC diagnosis and treatment, we aim to provide a clear roadmap for the translation of this technology into a precision medicine tool for OC.

Figure 1

Request permissions

Multimodal Al in ovarian cancer pathology: methodological foundations and clinical applications. H&E, hematoxylin and eosin; WSI, whole slide image; IHC, immunohistochemistrymp; MRI, multiparametric magnetic resonance imaging; UNet, U-Net (U-shaped convolutional neural network); Mask R-CNN, Mask Region-based Convolutional Neural Network; T2WI, T2-weighted imaging; CE-T1WI, contrast-enhanced T1-weighted imaging; DWI, diffusion-weighted imaging; ADC, apparent diffusion coefficient; PARPI, poly (ADP-ribose) polymerase inhibitor; CA125, cancer antigen 125; HE4, human epididymis protein 4; LC-MS/MS, liquid chromatography-tandem mass spectrometry; RNA, ribonucleic acid; WES, whole-exome sequencing; BRCA, breast cancer gene; HRD, homologous recombination deficiency; HGSOC, high-grade serous ovarian carcinoma; PSPC, primary serous peritoneal carcinoma; PFS, progression-free survival; DL, deep learning.

To address these gaps, this narrative review is structured to: (1) outline the methodological foundations of MMAI systems for OC; (2) analyse and compare multimodal fusion strategies and their applications in OC pathology; and (3) critically appraise the clinical validation and utility of MMAI in diagnosis, treatment prediction and prognosis. By bridging technical principles with clinical validation evidence, this review aims to accelerate MMAI’s transition from computational novelty to an indispensable oncology tool, ultimately transforming OC from a silent killer to a manageable chronic condition.

Methodological foundations of multimodal AI

The efficacy of MMAI systems hinges on rigorous methodological frameworks spanning data acquisition, feature representation learning and fusion architecture design. Each component must address OC’s unique biological complexity, from intratumoral heterogeneity to sparse molecular data, while ensuring clinical deployability. This section delineates these foundational pillars with emphasis on OC-specific technical adaptations (figure 2).

Figure 2

Request permissions

Methodological workflow of multimodal AI in ovarian cancer digital pathology. AI, artificial intelligence. CNN, convolutional neural network; VGG16, Visual Geometry Group 16-layer network; NASNet, Neural Architecture Search Network; MIL, multiple instance learning; ViTs, vision transformers: HRR, homologous recombination repair; LASSO, least absolute shrinkage and selection operator; GB, gradient boosting; MMD-VAE, maximum mean discrepancy variational autoencoderss; GSEA, single-sample gene set enrichment analysis; LASSO/GB, LASSO (least absolute shrinkage and selection operator) and GB (gradient boosting).

Multimodal data acquisition and preprocessing

We integrated the different modality data into four core data modalities, each of which poses unique collection and standardisation challenges, through the synthesis of MMAI-related studies in OC pathology.

Pathological imaging data

Pathological imaging, primarily H&E-stained and immunohistochemistry (IHC) whole slide images (WSIs), directly visualises tumour morphology, stroma and immune infiltration.18 19 WSIs are acquired at 5× to 40× magnification, generating gigapixel images.20 IHC targets specific biomarkers (eg, Ki67) with colour deconvolution isolating signals.21 22 Preprocessing addresses technical variability: (1) stain normalisation (eg, Macenko’s algorithm) reduces batch effects23–26; (2) patch extraction divides WSIs into 256×256–512×512 pixel tiles to balance computational cost and representativeness20; (3) tumour segmentation using UNet++ or Mask R-CNN achieves Dice similarity coefficient >90%.27 28

Omics data

Omics data provide molecular insights linking morphology to mechanisms.29 30 Common modalities include: (1) genomics: detects BRCA1/2 mutations, CCNE1 (cyclin E1) amplification and HRD signatures via whole-exome sequencing (WES) or targeted sequencing31–33; (2) transcriptomics: quantifies gene expression, revealing pathways like epithelial-mesenchymal transition27 28 34 35; (3) proteomics: liquid chromatography-tandem mass spectrometry (LC-MS/MS) measures protein abundance (eg, 64-protein high-grade serous ovarian cancer (HGSOC) signature).26 36

Radiological data

Radiological data (CE-CT (contrast-enhanced computed tomography), MRI, DWI) provide macroscopic insights. Multiparametric MRI (T2-weighted imaging (T2WI), contrast-enhanced T1WI (CE-T1WI), diffusion-weighted imaging (DWI) and apparent diffusion coefficient (ADC)) characterises tumour habitats.37 Preprocessing includes: (1) image registration for multi-sequence alignment; (2) habitat clustering via k-means on pyradiomics features; (3) feature selection (t-test, least absolute shrinkage and selection operator (LASSO)).37 38

Clinical data

Clinical data include demographics, International Federation of Gynecology and Obstetrics (FIGO) stage, treatment history and lab results (CA125 (cancer antigen 125), HE4 (human epididymis protein 4)).39–41 These data are typically stored in electronic health records (EHRs) and require preprocessing to address missing values (eg, multiple imputation) and categorical variables (eg, one-hot encoding).39 40 Clinical context is critical: eg, Xiong et al identified total bile acids as an independent recurrence risk factor,40 and Bi et al showed platinum resistance (HR=6.756) strongly predicts OS.38

Data standardisation and quality control

A critical, yet often underemphasised, step in MMAI is the rigorous standardisation and quality control of multimodal data, especially in multi-centre studies.42 This involves establishing protocols for consistent data acquisition (eg, scanner settings, sequencing depth), implementing batch correction algorithms (eg, ComBat for omics data), and performing thorough quality assessments (eg, slide-level focus quality for WSIs, RNA integrity number for transcriptomics).43 In addition, generative adversarial networks (GAN) for image style transfer are emerging as a powerful tool to address differences in H&E staining between different institutions.27 At the same time, the design concept of Bridge Study, in which the model is fine-tuned by adding a small number of samples from the target centres in the external validation, is also an effective strategy to deal with the drift of the data distribution.25 These measures are essential to minimise technical confounding and ensure that the integrated models learn biologically and clinically relevant signals rather than institutional artefacts.44

Core technical frameworks for multimodal AI

The technical pipeline of MMAI consists of three sequential steps: feature extraction (converting raw data into meaningful representations), modal fusion (integrating features from different modes) and task-specific prediction (eg, classification, survival analysis). This section focuses on the first two steps, as they are the foundation of effective multimodal integration.

Feature extraction: From raw data to representational features

Feature extraction transforms raw data into discriminative representations. As summarised in table 1, approaches include texture features (GLCM (gray-level co-occurrence matrix), LBP (local binary pattern)), morphological features (nuclear/glandular metrics) and colour-based features.45–50 For instance, GLCM quantifies spatial texture but is sensitive to noise, whereas LBP offers efficiency and robustness.45 46 Morphological features link directly to grading but rely on segmentation accuracy.47 48

Table 1

•

Comparison of key feature extraction approaches for histopathological images

Deep learning (DL) models, notably CNNs and Vision Transformers (ViTs), dominate feature extraction.51 52 Pretrained CNNs (eg, ResNet-18) extract nuclear morphology and glandular structures.31 Multiple instance learning (MIL) methods like clustering-constrained attention multiple instance learning (CLAM) aggregate patch-level features into slide-level representations via attention.25 26 32 Notably, studies by Zhong et al and Nero et al have shown that features extracted from WSI based on CLAM framework can be used not only for prognostic prediction, but also for inference of BRCA mutation status, embodying the ‘image as biomarker’ concept of mapping molecular features from morphological phenotypes.25 33 ViTs model spatial relationships via self-attention, better capturing tumour-stroma interfaces.37 Models like CTransPath, pre-trained on pan-cancer WSIs, outperform CNNs in treatment response prediction.26 Small-sample learning and cross-centre calibration methods are key technical adjustments in response to OC sample scarcity and data heterogeneity. For example, transfer learning (such as fine-tuning using ImageNet pre-trained models) and contrastive learning (such as SimCLR) have been shown to be effective means of mitigating insufficient data.28 53 In addition, Meta-Learning, as an emerging strategy to quickly adapt to new tasks by ‘learning how to learn’, has shown great potential in the development of diagnostic models for rare subtypes (such as low-grade serous carcinoma (LGSOC)) with only a small number of samples.54 Although the application of meta-learning in ovarian cancer pathology is still in the exploratory stage, its advantages in overcoming sample scarcity provide an important research direction for the development of accurate diagnostic tools for rare diseases such as LGSOC in the future.

Omic feature extraction employs statistical and DL approaches. LASSO and gradient boosting select discriminative features,55 while single-sample gene set enrichment analysis (ssGSEA) reduces transcriptomic dimensionality.34 For instance, maximum mean discrepancy variational autoencoder (MMD-VAE) proposed by Hira et al, by effectively integrating copy number variation (CNV), messenger RNA (mRNA) and methylation data and learning joint potential representation across omics, achieves better accuracy than traditional methods such as principal component analysis (PCA) in HGSOC molecular typing task, which demonstrates the potential of DL in unsupervised multimodal feature fusion.56 Radiomic features (first-order, shape, texture) are extracted via pyradiomics.37 38 Habitat clustering on multiparametric magnetic resonance imaging (mpMRI) segments macroscopic heterogeneity.37 Clinical tabular data undergo imputation, encoding and feature selection (Cox regression).39 40

Multimodal fusion architectures

Modal fusion is the core of MMAI, aiming to integrate heterogeneous features into a unified representation that outperforms unimodal features. Fusion architectures are categorised into three types based on the stage of integration: early fusion (feature-level fusion), late fusion (decision-level fusion) and hybrid fusion (combining early and late fusion). Table 2 compares these architectures with applications from OC studies.

Table 2

•

Comparison of multimodal fusion architectures in ovarian cancer AI

Early fusion integrates multimodal data into a shared representation before prediction, often by projecting heterogeneous features (eg, high-dimensional images and low-dimensional clinical variables) into a common latent space. Common techniques include: (1) feature concatenation, combining feature vectors from different modalities. For instance, Wu et al concatenated ResNet-50-extracted pathology features with clinical features, improving 5-year OS prediction area under the curve (AUC) from 0.589 (pathology-only) to 0.74535; (2) cross-modal attention, dynamically weighting features by modelling inter-modal interactions. Bi et al applied cross-attention to clinical, MRI and pathology features, emphasising clinically relevant features (eg, platinum resistance, MRI habitats), achieving a C-index of 0.836 for OS prediction in external validation, outperforming prediction-level fusion (PLF (prediction-level fusion), 0.820)38; (3) pathway-guided attention, incorporating biological knowledge to guide fusion. Kilim et al developed SurvPath, an early fusion model that uses HRR pathway information to weight proteomics and pathology features, improving platinum response prediction (AUC=0.761 vs. unguided fusion AUC=0.723).26

Late fusion combines predictions from unimodal models after task-specific inference. It is simpler to implement than early fusion, as it bypasses feature alignment, but may overlook inter-modal synergies. Common approaches include: (1) averaging, taking the mean of unimodal predictions. Yu et al averaged predictions from CNN-based pathology models (VGGNet, Visual Geometry Group Network) and transcriptomic models to improve platinum-free interval (PFI) prediction, achieving an AUC of 0.78 vs53 pathology-only AUC=0.71; (2) weighted averaging, assigning higher weights to more accurate unimodal models. Bi et al fused clinical, MRI and pathology predictions using weights proportional to their unimodal AUC values (0.4, 0.3 and 0.3, respectively), achieving an AUC of 0.765 vs unweighted averaging AUC=0.73238; (3) Kronecker product fusion, multiplying feature matrices from different modes to capture interactions. Kilim et al employed PorpoiseMMF, a late fusion framework that applies Kronecker product fusion to pathology and proteomics predictions, achieving an AUC of 0.752 for platinum response prediction, outperforming proteomics-only models (AUC=0.642).

Hybrid fusion is a fusion mode that combines early and late fusion to give full play to the advantages of both. For example, Bi et al developed FoMu, a hybrid model that first uses early fusion (cross-modal attention) to integrate clinical and MRI features, then fuses the result with pathology predictions via late fusion (weighted averaging).38 This model achieved the highest C-index (0.836) for OS prediction in HGSOC, as early fusion captured clinical-radiological synergies, and late fusion preserved pathology-specific information.

Multimodal fusion strategies in OC

Building on the general MMAI approach, specific fusion strategies in OC pathology studies are designed to address the unique biological and clinical challenges of the disease. These strategies leverage the complementary nature of different modalities to solve key clinical questions (eg, platinum response prediction, HRD detection) and overcome technical barriers such as heterogeneous data integration and model interpretability.

Modality complementarity: design principles for OC pathology

The biologic basis for the integration of pathology and the genome is that mutations in genes (eg, BRCA) drive alterations in protein function that ultimately manifest as morphologic changes in cells and tissues. The clinical pain point is the high cost and time-consuming of HRD detection. The complementary advantage is that H&E can be used as an inexpensive and rapid ‘phenotypic surrogate’ for HRD for pre-screening. Specific applications such as DeepHRD developed by Bergstrom et al, which learns the complex morphological features associated with HRD to infer genomic HRD status and predict platinum response, screened patients with HRD more efficiently than models that rely on limited genomic information alone.32 Similarly, the PathoRiCH model of Ahn et al, again based only on H&E images, successfully predicted the response to platinum therapy by identifying morphological features such as intratumoral lymphocyte infiltration, which was combined with transcriptomic analysis to reveal its biological basis.28

The biological basis for the combination of pathology and proteomics is that protein expression and modification directly reflect signalling pathway activity and affect cell behaviour and tissue structure. The clinical pain point lies in the limited predictive value of single protein markers. The complementary advantage is that the proteome provides the ‘happening’ molecular state and the pathological image provides the ‘how’ spatial context, which together reveal the mechanisms of treatment resistance. Specific applications such as the study by Kilim et al, using SurvPath and PorpoiseMMF models, which fused H&E images with proteomic data, significantly improved the accuracy of platinum response prediction and revealed key pathways related to resistance.26

The biological basis for the integration of pathology with clinical and radiomics is that macroscopic imaging (such as MRI) reflects the overall tumour burden and invasiveness, microscopic pathology (H&E) reveals cellular and microenvironment characteristics, and clinical parameters provide host status. The clinical pain point is the imprecision of existing prognostic stratification tools. The complementary advantage lies in the integration of the three, which comprehensively captures the factors affecting prognosis from the macroscopic to the cellular level. For example, the FoMu model developed by Bi et al, achieved the optimal prediction of overall survival and progression-free survival of HGSOC patients by fusing clinical, MRI and pathological data, and the C-index was significantly higher than that of the unimodal model.37

Key technical breakthroughs in ovarian cancer multimodal AI

Recent MMAI advances address data heterogeneity, interpretability and generalisability.26 32 38 55 Feature normalisation55 and co-attention mechanisms26 help equalise multimodal contributions. Interpretability is achieved through attention visualisation, which highlights the regions within WSIs most relevant to predictions, as demonstrated by Ueda et al34 in the application of HGSOC subtyping and by Yu et al53 in studies predicting platinum response; through biological pathway analysis, as exemplified by Ahn et al,28 who integrated PathoRiCH model predictions with transcriptomic data to reveal immune pathways associated with treatment response; and through feature importance methods such as SHapley Additive exPlanations (SHAP),40 which help identify key clinical and biochemical indicators influencing poly (ADP-ribose) polymerase inhibitor (PARPi) efficacy. These tools increase clinician trust.28 38 To improve generalisability, transfer learning32 and multi-centre training38 are employed.

Clinical applications of multimodal AI in OC pathology

MMAI has significantly advanced the clinical management of OC, offering enhanced capabilities in diagnosis, treatment response prediction, prognostic stratification and molecular feature inference. By integrating diverse data types, such as histopathological images, genomic and proteomic profiles, radiological imaging and clinical variables, MMAI provides a more comprehensive and accurate toolset for addressing the heterogeneity and complexity of OC. Table 3 summarises the MMAI that has been applied in the pathological research of OC. This section highlights key clinical applications supported by robust empirical evidence from recent studies.

Table 3

•

Summary of the application of multimodal AI in pathological research of ovarian cancer

Tumour diagnosis and subtyping

Accurate distinction between benign and malignant ovarian tumours and precise classification of histological subtypes are crucial for determining appropriate treatment strategies. Traditional diagnostic methods, which rely heavily on histopathological examination and serum biomarkers, often suffer from inter-observer variability and limited specificity. MMAI approaches have markedly improved diagnostic accuracy by combining morphological, molecular and clinical data.

For benign-malignant differentiation, Vijayarajan et al integrated 49 clinical-imaging features, achieving 99.47% accuracy on 349 patients.41 Complementary information from clinical markers and imaging improved performance. The model successfully captured complementary information: clinical markers provided quantitative risk assessment, while imaging identified morphological abnormalities such as irregular tumour borders and nuclear pleomorphism.

In addressing rare and challenging subtypes such as peritoneal serous papillary carcinoma (PSPC), frequently misclassified as epithelial ovarian cancer (EOC), Wang et al used DL models (Improved_InceptionV3_MS and Improved_MIL_RNN) trained on H&E WSIs and mismatch repair (MMR) protein IHC. Their models achieved accuracies of 97.2% in distinguishing PSPC from EOC, with nearly perfect sensitivity and specificity using MSH2 and MSH6 markers.22 This high performance is clinically significant given the distinct surgical and management strategies required for PSPC. On this basis, the differential diagnosis between LGSOC and HGSOC is challenging, as is the case for other rare subtypes. MMAI combined with multi-omics data is expected to play a breakthrough role in the accurate diagnosis of such rare subtypes, but there are still few relevant studies, which is an important direction in the future.

For pathological subtype classification, several MMAI frameworks have been developed to categorise OC into its major subtypes (eg, HGSOC, clear cell, endometrioid, mucinous). Udeda et al introduced a pipeline combining NASNet-A-Large-based tile-level pattern recognition with decision-tree aggregation to classify HGSOC into four subtypes: mesenchymal transition (MT), immune reactive (IR), papillo glandular (PG) and solid proliferative (SP). The model demonstrated high accuracy (mean 0.910–0.933 across cohorts) and identified the MT subtype as an independent prognostic factor for poor OS.34 In another study, Klein et al used matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry to extract proteomic features from tissue microarrays and trained a convolutional neural network (CNN) to discriminate between five histotypes, achieving an overall accuracy of 85%, surpassing conventional histopathological assessment.36 As an intrinsic multimodal technology, MALDI-IMS can obtain molecular spectrum information while preserving the spatial structure of tissues, realise the in situ fusion of morphology and proteomics, and provide a new perspective for histological typing.

Beyond accuracy, the clinical adoption of MMAI for diagnosis relies on interpretability. Attention maps and feature importance analyses (eg, SHAP) help pathologists understand why a tumour is classified as malignant or a specific subtype, fostering trust and facilitating integration into diagnostic workflows.57 58

Treatment response prediction

Predicting response to platinum-based chemotherapy and PARP inhibitors (PARPis) is essential. MMAI improves on traditional biomarkers by integrating multimodal data capturing morphological and molecular determinants of sensitivity.

In platinum response prediction, Kilim et al combined H&E WSIs and proteomic data using pathway-guided attention (SurvPath), achieving AUCs up to 0.835.26 Ahn et al designed PathoRiCH, significantly stratifying patients by PFI.28 Although the DeepHRD model based on H&E images was slightly less accurate in predicting HRD status (AUC 0.81) than the standard genomic HRD assay (AUC >0.9), its cost was reduced by approximately 80%, with turnaround time reduced from weeks to minutes. This makes it an attractive prescreening tool for optimising the allocation of healthcare resources by narrowing the pool of patients who require expensive genomic testing.32 Similarly, in the study by Nero et al, its model had moderate performance (AUC ~0.7) but provided critical information as an effective supplementary diagnostic tool when tissue samples were insufficient or sequencing failed.33

For predicting PARPi efficacy, Xiong et al built a LightGBM model incorporating clinical (eg, BRCA status, PARPi type), pathological (IHC markers: Ki67, p53) and biochemical data (eg, CA-199, total bile acids). The model achieved AUCs of 0.79 (primary) and 0.72 (recurrent) in internal validation, with favourable generalisability in external cohorts. SHAP analysis identified BRCA/HRD status and bile acids as top contributors, aligning with known clinical predictors.40 Wang et al also demonstrated that deep learning models trained on H&E and MMR IHC could predict bevacizumab efficacy with high accuracy (100% mean sensitivity/specificity with MSH2 (MutS homolog)), suggesting potential applicability to combination therapies.22

Prognostic stratification

MMAI enhances prognostic stratification by integrating diverse data modalities to predict OS and progression-free survival (PFS).

In OS prediction, Bi et al developed FoMu, a foundation model-driven approach integrating clinical, MRI and pathology data. It attained C-indices up to 0.836 in external validation.38 However, the C-index of the proposed model was as high as 0.836 in the external validation cohort A, but its performance dropped to 0.78 to 0.82 in cohorts B and C without pathological modalities. This highlights the high dependence of the model on the integrity of input modalities and the risk of generalisation performance degradation due to missing data in real-world complex clinical settings. Similarly, Yang et al developed ovarian cancer digital pathology index (OCDPI) using graph deep learning, with HRs of 1.916–2.796.59 Notably, this study further revealed biological pathways (eg, angiogenesis, epithelial-mesenchymal transition) enriched in the high-score regions of OCDPI by transcriptome analysis, linking the morphological phenotype with the underlying molecular mechanisms, and providing a basis for the biological interpretation of the model. In the external validation, the AUC of the model decreased from 0.93 in the internal validation to 0.70 in The Cancer Genome Atlas dataset, suggesting overfitting to the source domain due to differences in slice preparation, scanners and staining protocols. Other approaches combining WSIs and RNA-seq data identified high-risk pathways, providing mechanistic insights.25

For PFS prediction, Wu et al used an attention-based deep survival network leveraging H&E image features and clinical data to stratify patients into risk groups with significantly different PFS (log-rank p=0.00845). The model also revealed associations between risk scores and drug sensitivity, suggesting potential therapeutic alternatives.35 Desbois et al integrated digital pathology (CD8+ T cell density) and transcriptomics to classify tumour immune microenvironments into ‘infiltrated,’ ‘excluded,’ and ‘desert’ phenotypes, finding that ‘excluded’ tumours had the worst PFS due to stromal activation and impaired antigen presentation.21

The choice of fusion strategy (early, late, hybrid) significantly impacts prognostic performance. In general, early and hybrid fusion tend to achieve higher C-indices by modelling inter-modal interactions, as seen in the FoMu model.38 Late fusion, while simpler, may suffice when modalities provide independent predictive signals or when computational constraints are a priority.

Molecular feature prediction

MMAI models predict molecular alterations directly from H&E slides, reducing reliance on costly assays.

For BRCA mutation prediction, Zeng et al integrated H&E WSIs with multi-omics, achieving AUCs of 0.952 and 0.912 for BRCA1/2.60 Nero et al used CLAM-based image-only models with modest performance but low-cost screening potential.33 Although their performance is limited, the value of such models is as pre-screening tools to identify patients with BRCA wild-type who are likely to carry HRD-related phenotypes from a large sample, thereby optimising the limited genomic testing resources, reflecting the concept of ‘image-first’ hierarchical diagnosis in multimodal strategies. This provides further evidence that image-based models can be used as complementary tools for genetic testing, especially in resource-limited settings. For microsatellite instability (MSI) status, Wang et al trained deep learning models on H&E and MMR IHC data, achieving accuracies up to 96% in external validation.22 Multi-omics approaches have also been applied to molecular subtyping, with one study reporting AUCs >0.91 for all four HGSOC subtypes and demonstrating improved prognostic stratification compared with single-omics models.60

Dynamic treatment monitoring and combination therapy optimisation

Beyond static predictions at baseline, emerging MMAI applications are leveraging longitudinal data to monitor therapeutic response dynamically and guide adaptive treatment strategies. For instance, Xiong et al explored the use of multimodal data (clinical, pathological, biochemical) to predict PARPi efficacy over time, hinting at the potential for dynamic risk stratification.40 Integrating serial imaging (eg, CT or MRI scans during neoadjuvant chemotherapy) with repeated biopsies or circulating tumour DNA (ctDNA) analysis could enable real-time prediction of emerging platinum resistance, allowing for timely adjustment of treatment regimens. Furthermore, MMAI models can be extended to optimise combination therapies, such as identifying patients most likely to benefit from PARPi plus anti-angiogenic agents based on baseline and on-treatment histomolecular features. Future models incorporating time-series data will be crucial for realising the full potential of precision oncology in OC.

Challenges and future directions

Despite significant progress, MMAI in OC pathology faces critical challenges that must be addressed to realise its clinical potential. Imaging data vary across institutions due to staining protocols and scanners, complicating generalisation.32 59 In OC, this heterogeneity is not only manifested in the staining differences between different institutions, but also in the spatial heterogeneity of HRD status and the huge molecular and morphological differences between different histological subtypes (such as clear cell carcinoma and HGSOC), which requires the ability of MMAI model to deal with this complex heterogeneity. At the same time, the heterogeneity and the small training set may lead to overfitting of the model. Multi-centre external validation is critical and important to reveal the true generalisability of the model. Core needle biopsy, which is commonly used in the diagnosis of OC, has a small sample size but a huge WSI image. The development of a lightweight and efficient model suitable for this small sample and large image scene is also the key to promote its application in clinical real-time analysis. Furthermore, the substantial computational resources required for training and deploying large multimodal models pose practical barriers to real-time clinical application in resource-constrained settings.61 62 Model ‘black-box’ behaviour remains a barrier. Attention heatmaps and SHAP highlight key features but lack biological rationale.40 53 In the OC scenario, what is needed is not just ‘which region does the model focus on’ but a deeper biologic explanation, such as: “Is a particular nuclear atypia feature that the model focuses on associated with the morphological presentation of a particular BRCA (breast cancer gene)1/2 mutant or CCNE1 amplified tumour?” Integrating model output with spatial transcriptomics is key to achieving such an interpretation.

Clinical translation barriers include evolving regulatory frameworks, limited prospective validation and workflow integration challenges.33 39 41 Ethical concerns regarding data privacy and cohort bias must be addressed.37 56 The diagnosis and treatment pathway of OC is complex (primary cytoreductive surgery vs neoadjuvant chemotherapy), and future MMAI models must be specifically designed and validated for different treatment pathways before they can be truly integrated into the clinical decision-making process.

Future directions include self-supervised learning to reduce annotation reliance,28 federated learning—where models are trained across institutions without data sharing—could address privacy concerns while expanding sample size, particularly for rare subtypes like LGSOC.36 It is also an ideal solution to solve data privacy and heterogeneity, especially suitable for OC disease research that requires multi-centre collaboration. Spatial multi-omics promises to refine tumour-immune interaction understanding.21 27 Dynamic models incorporating longitudinal data could better predict recurrence.40

Prospective, multi-centre clinical trials, designed with frameworks like PROBAST-AI (Prediction model Risk Of Bias Assessment Tool for AI) to evaluate model utility, are essential to establish clinical validity and utility.63 Emerging applications include AI-driven therapeutic development, through linking multimodal features to drug sensitivity (eg, IC50 values for cisplatin),35 models could identify novel targets or repurpose existing drugs. Furthermore, addressing ethical and fairness considerations by ensuring model training on diverse, representative populations is crucial to prevent algorithmic bias and ensure equitable benefits across different patient demographics.64 Finally, prospective validation in clinical trials, particularly through the integration of MMAI into early screening programmes,41 remains essential for establishing its utility and improving patient outcomes.

Conclusion

MMAI represents a paradigm shift in OC pathology, integrating histopathological images, multi-omics and clinical information. Computational advances have enabled clinical stratification, individualising treatment and improving survival. Challenges regarding data heterogeneity, interpretability and clinical integration remain. Emerging technologies like self-supervised and federated learning offer pathways forward. As MMAI evolves, rigorous prospective validation and seamless integration into clinical workflows will be paramount. It is poised to advance precision oncology, ultimately transitioning OC toward a manageable chronic condition.

Review••

Request permissions

Multimodal artificial intelligence in ovarian cancer pathology: from image analysis to precision oncology

...

Abstract

Introduction

Figure 1

Request permissions

Methodological foundations of multimodal AI

Figure 2

Request permissions

Multimodal data acquisition and preprocessing

Pathological imaging data

Omics data

Radiological data

Clinical data

Data standardisation and quality control

Core technical frameworks for multimodal AI

Feature extraction: From raw data to representational features

Table 1

•

Comparison of key feature extraction approaches for histopathological images

Multimodal fusion architectures

Table 2

•

Comparison of multimodal fusion architectures in ovarian cancer AI

Multimodal fusion strategies in OC

Modality complementarity: design principles for OC pathology

Key technical breakthroughs in ovarian cancer multimodal AI

Clinical applications of multimodal AI in OC pathology

Table 3

•

Summary of the application of multimodal AI in pathological research of ovarian cancer

Tumour diagnosis and subtyping

Treatment response prediction

Prognostic stratification

MMAI enhances prognostic stratification by integrating diverse data modalities to predict OS and progression-free survival (PFS).

Molecular feature prediction

MMAI models predict molecular alterations directly from H&E slides, reducing reliance on costly assays.