The Limits of Transparency: Racial Disparities in Breast Cancer Diagnosis Using an Intrinsically Explainable AI Model

Research Article

The Limits of Transparency: Racial Disparities in Breast Cancer Diagnosis Using an Intrinsically Explainable AI Model

  • Kolade Folami *

Monroe University, Bronx, New York, United States.

*Corresponding Author: Kolade Folami,Monroe University, Bronx, New York, United States.

Citation: Kolade Folami, (2026). The Limits of Transparency: Racial Disparities in Breast Cancer Diagnosis Using an Intrinsically Explainable AI Model. Journal of Cancer Management and Research, BioRes Scientia publishers. 3(1):1-12. DOI: 10.59657/2996-4563.brs.26.024

Copyright: © 2026 Kolade Folami, this is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Received: January 23, 2026 | Accepted: February 05, 2026 | Published: February 13, 2026

Abstract

Transparency is often presented as a core ethical safeguard in the use of artificial intelligence for healthcare decision-making. Intrinsically explainable models, such as logistic regression, are widely assumed to promote fairness because their logic is visible and interpretable by design. This study interrogates that assumption by examining racial disparities in self-reported breast cancer diagnosis using a transparent, glass box modelling approach. Drawing on pooled National Health and Nutrition Examination Survey data from 2011 to 2018, survey-weighted logistic regression was used to assess breast cancer diagnosis among non-Hispanic Black and non-Hispanic White women aged 40 years and older. Guided by Fundamental Cause Theory, the analysis evaluated whether transparency alone is sufficient to address equity concerns in clinical prediction. Black women exhibited significantly lower odds of reporting a breast cancer diagnosis despite higher burdens of obesity, diabetes, and hypertension. Adjustment for socioeconomic factors, particularly educational attainment, substantially attenuated the racial disparity, while cardiometabolic factors did not. Education emerged as the strongest and most consistent predictor of diagnosis across all models. Interaction analyses showed no evidence that transparent race-specific effects improved explanatory power. These findings suggest that intrinsically explainable models can faithfully reflect structural inequities embedded in healthcare access and detection without correcting them. While the model was fully transparent, its explanations described who is diagnosed rather than who is diseased. The study demonstrates the limits of transparency as a proxy for fairness and highlights the need to align explainable AI with equity-conscious frameworks when deploying clinical decision support systems.


Keywords: explainable artificial intelligence; breast cancer diagnosis; racial disparities; transparency; fundamental cause theory; health equity

Introduction

At the epicentre of intrinsic interpretability and post-hoc explainability discourse in explainable artificial intelligence [XAI] studies, transparency is an ethical ideal. However, there have been concerns about the explainability of AI in healthcare, given the high stakes involved in healthcare decision-making and the need to enhance the confidence of both clinicians and patients [1]. The ecosystem of the regulatory framework for AI in healthcare is still evolving, with different frameworks dealing with different aspects of technology. It has been noted that, inherently, AI algorithms present quality evaluation difficulties to outside regulators due to, among other things, the dependence of the predictive accuracy of algorithms on the training datasets and model parameters [2]. This situation created the complexity associated with AI models and methods, which are often characterized as “black boxes” due to the opacity of their internal workings to external entities. It has been pointed out that several AI methods are not well-reported, with tendencies for overinflation of their predictive abilities due to spin and hype [3].

Broadly speaking, XAI has been justified as providing explanations for the system’s output, creating the basis for responses to control vulnerabilities, the identification of uncertainties, facilitating the system’s improvement from an understanding of the output generation process, and enabling discovery-oriented explanations of novel facts, among others [4]. 

This study examines the use of an AI model that has been identified as intrinsically explainable and thus interpretable for the prediction of breast cancer among African American women. The core issue here is the examination of the extent of transparency when juxtaposed with the fairness of an intrinsically explainable model in the prediction of breast cancer among African American women. It is noteworthy that breast cancer has been identified as the most common cancer globally that affects women, with around 2.3 million new cases as of 2022 [5]. Intrinsically interpretable statistical learning models, such as logistic regression, have been identified as a category of transparent artificial intelligence deployed in clinical decision support. This study, therefore, evaluates explainability at the level where transparency, or “the glass-box conditionality,” is strongest. 

Research Objectives

The primary objective of this study is to evaluate racial disparities in self-reported breast cancer diagnosis among African American women in the US using NHANES, and employing logistic regression as an explainable AI methodology. The secondary objectives are:

  1. To examine how structural determinants affect diagnosis odds among Black women.
  2. To demonstrate whether logistic regression, as an intrinsically explainable AI, can guarantee fairness in clinical decision support. 

Review Of Literature

The debate on the properties of explainability in Artificial Intelligence models is not settled. Furthermore, there is a tendency to conflate interpretability with explainability, while others identify differences in the constructs, even though one (explainability) leads to the other (interpretability). Farah et al. [6] noted that while the likelihood of explainability progresses from interpretability, the lack of consensus on the mode of evaluation of interpretability, combined with the AI specificity and use case as determiners of the necessity of explainability, should be factored into the discourse. Consequently, it is suggested that the predictive performance of the model should trump explainability in a situation of either mutual exclusivity or where explainability increases costs [6]. Such costs of explainability and interpretability may include increased regulatory assessment and accountability requirements for stakeholders regarding decisions made by AI-based medical devices. Hence, for Farah et al. [6], a set of acceptable standards of explainability is context-dependent and based on the inherent risks embedded in clinical situations. 

The downsides and unforeseen consequences of explainability in AI are explored by Abedin [4]. As such, explainability is viewed in terms of a model’s internal dynamics to facilitate human interpretability. Not unlike Farah et al. [6], Abedin [4] also noted the algorithm typology dependence of explainability, while emphasizing the problematic complexity of XAI. Furthermore, the lack of generalizability of explainability components and constructs is identified as an impediment to standardization.  Thus, it was suggested that there is no universal approach to the implementation of explainability in AI, due to system environment requirements as well as internal dynamics of algorithms [4]. 

The conceptual difference between explainable AI and interpretable AI is the concern of Sreeja et al. [7]. The work noted that explainable AI is largely applicable to major deep learning-based models and the extent of their ability to explain their decision-making process, while interpretable AI is concerned with the linkages between input variables, predictions, and the rationale behind the model’s output [7]. Resultantly, explainability is linked to deep-learning models, while interpretability is attributed to many machines learning models due to their intrinsic properties. However, given Farah et al. [5]'s earlier cited stance that the possibility of explainability stems from interpretability, a clear-cut attribution of interpretability and explainability to different algorithmic typologies may be unrealistic. Indeed, it has been suggested that explainability can be achieved via both inherently interpretable algorithms and an obviously Blackbox algorithm that includes a layer of explanation algorithms [8].  The value of explainability to clinical decision support systems, as argued by Amann et al. [8], is dependent on algorithm specifics and on the level of validation as determined by the environmental contexts of the implementation of explainable algorithms, the allocated role in the decision-making process, as well as the core user groups. Not unlike the previous studies already referred to, Amann et al. [8] observed that the role of explainability cannot be situated theoretically and that, at the clinical level, there should be an individualized assessment of explainability requirements. 

Explainability discourse has been anchored on the argument of transparency and algorithmic trust valued by clinicians. Ploug et al. [9] noted that the aim of explainability is transparency, but whose trade-off may be performance optimality. Therefore, it is argued that a balance be made between the transparency ideal and system performance due to the perceived potential conflict between AI explainability/ transparency and the system’s performance [9]. For Markus et al. [10], interpretability and fidelity are necessary for explainability, but the lack of homogeneity of constructs and ill-defined terminologies remains the bane of the explainability discourse in health AI. Furthermore, it is suggested that despite the promise of increased trustworthiness by explainable modelling, the benefits of explainability are not well understood and established [10]. Consequently, it is submitted that a task model that is intrinsically interpretable to humans in the form of explainable modelling is the ideal to satisfy the requirements of transparency and trustworthiness. 

Arguments against explainability requirements can be found in Kawamleh [11], McCradden and Stedman [12], and McCoy et al. [13]. For Kawamleh [11], the argument of the opacity of the decision-making process of AI applications as the basis for explainability has weak foundations when juxtaposed with the fact that, in clinical settings, human experts are also ‘black boxes’ whose diagnostic decision-making processes remain inscrutable and yet meet the legal requirements for patients’ informed consent. Thus, from an ethical standpoint, the requirement of informed consent is not breached by the opacity of specific AI applications, and therefore, the requirement for explainability or explainable modelling is not necessary for medical AI applications [11]. McCradden & Stedman [12] suggested that explanations in AI are not sufficient and may not be necessary for clinical decision-making due to the dual aim of care and legal defensibility. Hence, for both explainability typologies, that is, inherent interpretability of models and post-hoc explainability, it is argued that explanations as a standalone construct should not be the determinant of AI applications in clinical decision-making because explainability focuses on the tool and not the patient. Instead, emphasis should be on reasonableness. For McCoy et al. [13], the value of explainability in machine learning healthcare, though necessary for advancing performance and trust, should not be elevated above performance. Instead, it is suggested that as algorithmic systems become increasingly inexplicable, there is a need for the development of empirically grounded robust standards of evaluation that will enhance both trustworthiness and optimal performance. 

In addition to the above, complex and contradictory explanations have been identified as capable of undermining trust in clinical decision-support AI. In contrast, trust in AI is not seen as entirely beneficial due to the potential fallibility of its recommendations [14]. 

Explainable Ai in Breast Cancer Diagnosis

The studies on the application of explainable AI in breast cancer diagnosis mostly emphasize the potential impact of explainable AI in enhancing diagnosis and early detection, while simultaneously underscoring present drawbacks, like the lack of standardization of the components or expectations of explainability.  Shifa et al. [15] is a systematic review of the application of explainable AI techniques in breast cancer screening. As such, the study showed that the challenges of mammography, like interpretational variability among radiologists and the tendency to make false positives, enabled the adoption of explainable AI techniques in breast cancer screening. However, the potential impact of XAI techniques in breast cancer screening is negatively affected by a low number of validated metrics or rubrics for evaluating quality, clinical reliability, and the impact of AI-assisted decision making. Specifically, high reliance on qualitative assessment due absence of integration of XAI outputs into radiologists’ decision-making and the unavailability of standardized metrics for the purpose of computing the clinical relevance, all combined to keep XAI in breast cancer screening at suboptimal levels.

Increased complexity of AI models has been associated with lower interpretability, leading to distrust of models by clinicians in both the prognosis and diagnosis of breast cancer [16]. Consequently, clinicopathological data were suggested as an explainability approach for the prediction of breast cancer metastasis, while understanding that AI models are usually biased towards the majority class within the datasets. Ansari et al. [17] also identified the trade-off between interpretability and model complexity as a challenge to trustworthiness. However, the work also showed that post-hoc explainability tools like the Gradient-weighted Class Activation Mapping [Grad-Cam] have shown higher promise of communicating visual interpretability for imaging data, while others like the Local Interpretable Model-agnostic Explanations and SHapley Additive exPlanations [LIME-SHAP] are reported to show increased effectiveness for explaining outputs from structured genomic and clinical data [17]. Despite the demonstrated promises of the identified post-hoc explainability modules, Ansari et al. [17] are quick to point out that current XAI models have not been evaluated rigorously for breast cancer diagnosis effectiveness, and that there is a gap between the real-world clinical validation and use cases derived from academic studies that largely relied on prototypical datasets. 

Deep-learning-based AI systems have been described as showing significant promise for breast cancer detection [5,18]. However, while Diaz et al. [18] saw the models as being capable of detection enhancement and elimination of observers’ variability, with areas of improvement limited to the creation of standardized guidelines and implementation of trustworthy AI practices for the purposes of ensuring fairness, transparency, and robustness, Bae and Ham [5] observed that the deep-learning-based AI in breast cancer diagnostics produces outputs without lesion localization and visual explanations that is clinically meaningful. Specifically, Bae and Ham [5] found that a heatmap is the most common type of explanation in most deep-learning models. The shortcoming of this type of explanation means that there is a need for the enhancement of AI models in breast cancer diagnosis, with a process that explains the contributing specifics in the overall decision-making process. 

Perhaps the shortcomings of current applications of AI to breast cancer diagnosis, for instance, lack of reproducibility of models for different clinical environments, the absence of evidentiary standards, issues of concern about technology, trustworthiness, postadoption uncertainty, and ethico-legal concerns [19], may have prompted the call for hybridity in XAI deployment to breast cancer diagnosis. Zou and Miao [20] suggest the combined use of a hybrid deep-learning framework that incorporates Grad-Cam with different Convolutional Neural Network (CNN) pretrained architectures. The proposed hybrid integration seeks to increase performance via feature representation and classification, while Grad-Cam will address the ubiquitous black-box issue by localizing multiple lesions, thereby promoting transparency and clinical acceptance. Contrastingly, Oberije et al. [21] suggested that a workflow design framework that encapsulates clinical evidence from whole populations, granulated with stratified analysis across the diverse subgroups within the population, and further combined with AI reading of observations as a second layer to the standard double-reading by two human readers. The work observed that AI-human double-reading did not exacerbate performance disparities, while human double-reading shows discrepancies in outcome metrics for different subgroups within the population. 

Statement of Problem

The literature review has shown that the integration of AI into clinical decision-making processes in general and breast cancer diagnosis in particular is premised on the promise of increased diagnostic accuracy and process efficiency. Hence, ethical and practical adoption of AI systems in breast cancer diagnosis requires explainability, which means the ability of human users and decision-makers to understand and trust an AI’s decision-making process and output. Despite this, the current literature depicts the lack of consensus on the components of explainability, the accepted standards of evaluation, and cost-benefit justification in clinical settings. Specifically, the literature identifies the following unresolved issues: 

  1. Firstly, conceptual and terminological confusion in the form of a lack of distinction between explainability and interpretability, with some studies viewing the constructs as consequent of one after the other, while others tying them to either deep learning or machine learning models. Accordingly, machine learning-derived models like linear and logistic regressions are viewed to be intrinsically interpretable and transparent in comparison to deep-learning models that usually need post-hoc explainability modules. 
  2. Context dependency and lack of standards in the form of the rationales for explainability being context dependent. Furthermore, clinical risk, algorithmic typology, and user-group specificity are key contributors to variability in explainability constructs. Resultantly, there is an unavailability of validated and standardized measures for the evaluation of clinical reliability in breast cancer diagnosis 
  3. Performance-explainability trade-off due to the existence of inherent tension between explainability requirements and the necessity of high predictive performance of models.
  4. The failure to provide meaningful explanations that integrate into radiologists' workflows due to the qualitative nature of current XAI methods. Due to this, the gap between academic prototypes and clinical validation in breast cancer diagnosis is not bridged. 

Stemming from above, the studies reviewed place higher explainability requirements on deep-learning models than on machine learning models in clinical practice and breast cancer diagnostics. This is due to the intrinsic interpretability of machine learning models and their overall identification with inherent transparency and, therefore, explainability. There is a need to examine the impact of intrinsically transparent and interpretable models on predicting breast cancer among minority women in a nationally representative sample, or their tendency to reinforce pre-existing societal realities from the available datasets. 

Theoretical Framework: Fundamental Cause Theory

This study draws on social science paradigms emphasizing the structural origins of health inequality, including Fundamental Cause. Epidemiological studies have identified social, ecological, and population-level risk factors as the major causes of diseases. However, Fundamental Cause Theory [FCT] takes this further by challenging individual-level proximate causes and elevating the paramountcy of socio-economic status and social support as likely fundamental causes of diseases. Proposed by Link and Phelan in a 1995 paper [22], the dominant argument of FCT is that there is firstly a need for the contextualization of individual-based risk factors through an examination of what puts people at risk, and secondly, how social factors such as socio-economic status and social support become fundamental causes due to their embodiment of access to crucial resources, affect, multiple disease outcomes [22]. The disease outcomes are mediated through several mechanisms, leading to an association with disease, regardless of changes in intervening mechanisms. 

In further development of the FCT, Alicia R. Riley argued that even though FCT shows how several social inequalities produce health inequality, ubiquitous causes are conjoining fundamental causes, and both are intersected by social stratification [23]. Riley’s position stems from the understanding that within the FCT framework, social inequality in access to flexible socio-economic resources such as wealth, income, education, and racial/class privilege is the driver of population health inequalities. As such, individuals endowed with high status can deploy their resources to avoid disease, find treatment, and adopt healthy lifestyles. For Riley, given the persistence of social inequalities in health despite increasing medical innovation and breakthroughs leading to the elimination of diseases, most scholars have constructed FCT with a kind of inevitability [23]. The assumption is that the durability and stability of social stratification are the explanatory variables of health inequalities and not biological determinism. 

However, by conceptualizing social stratification as a system of exposures that straddles both ubiquitous and fundamental causes, it is viewed as both a dynamic and ecological trait. Specifically, ubiquitous causes are seen as the mass influences that are uniformly present in a population, in contrast to individual risk factors, which explain the population distribution of diseases and their incidence rate [23]. Therefore, given the positioning of stratification systems at the intersection of ubiquitous causes and fundamental causes, they can reveal both factors behind population incidence rates and causes of health inequality. In this way, stratification systems can be thought of as both ubiquitous causes and fundamental causes, while fundamental causes can be reframed as non-static population-level systems of exposure.

Materials And Methods

Study Design and Data Source: This study employed a cross-sectional analytic design using pooled data from the National Health and Nutrition Examination Survey (NHANES) across four continuous survey cycles (2011–2012, 2013–2014, 2015–2016, and 2017–2018). NHANES is a nationally representative survey of the noninstitutionalized U.S. civilian population conducted by the National Centre for Health Statistics using a complex, multistage probability sampling design that incorporates stratification, clustering, and oversampling of selected populations. Publicly available demographic, medical conditions, and examination data files were merged by the unique respondent identifier (SEQN). Analyses followed NHANES analytic guidelines for pooled cycles and survey-weighted inference.

Study Population

The analytic sample was restricted to respondents who met the following criteria:

  • Female sex (RIAGENDR = 2)
  • Aged 40 years or older
  • Self-identified as non-Hispanic Black or non-Hispanic White

Race/ethnicity was derived from the RIDRETH1 variable in NHANES. Participants identifying with other racial or ethnic groups or with missing race data were excluded to allow a focused Black–White comparison. After exclusions, the final analytic sample represented U.S. non-Hispanic women aged 40 years and older.

The analytic sample consisted of women aged ≥40 years from pooled NHANES 2011–2018 cycles who self-identified as non-Hispanic Black or non-Hispanic White (unweighted n = 4,722; Black n = 1,812; White n = 2,910). After applying the 8-year Medical Conditions [MEC] survey weights, the sample represented approximately 62.7 million U.S. women, including 8.84 million Black women and 53.88 million White women. The outcome was a self-reported history of breast cancer diagnosis, constructed using NHANES Medical Conditions Questionnaire items indicating any cancer diagnosis and specification of breast cancer.

Outcome Measure: Breast Cancer Diagnosis

The primary outcome was self-reported lifetime breast cancer diagnosis, derived from the NHANES Medical Conditions Questionnaire (MECQ).

First, respondents who reported ever being told by a health professional that they had cancer were identified using MCQ220. Among these respondents, breast cancer diagnosis was defined as reporting “Breast” cancer in any of the following variables:

  • MECQ230A (first cancer)
  • MECQ230B (second cancer)
  • MECQ230C (third cancer)
  • MECQ230D (more than three cancers)

A binary outcome variable was constructed (breast_cancer_dx), coded as 1 for respondents reporting breast cancer and 0 otherwise. This approach captures lifetime prevalence while aligning with NHANES’ questionnaire structure.

Primary Exposure: Race

Race was modelled as a binary exposure variable:

  • Non-Hispanic Black women coded as 1 (black = 1)
  • Non-Hispanic White women coded as 0 (reference category)

Covariates

Covariates were selected via reasoning based on theoretical relevance and consistent availability across survey cycles.

Demographic Covariate

  • Age (RIDAGEYR), modelled as a continuous variable in years

Socioeconomic Covariates

  • Poverty–Income Ratio (PIR) (INDFMPIR), modelled as a continuous variable
  • Educational attainment, derived from DMDEDUC2 (adults ≥20 years), categorized as:
    • <9th grade
    • 9–11th grade
    • High school/GED
    • Some college/Associate degree
    • College graduate

Clinical Covariates

  • Diabetes status, derived from DIQ010 (yes/no)
  • Hypertension status, derived from BPQ020 (yes/no)
  • Body Mass Index (BMI), obtained from examination data (BMXBMI), modelled as a continuous variable

Survey Design and Weighting

To account for the complex NHANES sampling design and pooled survey cycles, eight-year examination weights were calculated by dividing the 2-year MEC weights (WTMEC2YR) by four. All analyses incorporated primary sampling units (SDMVPSU), stratification variables (SDMVSTRA), and adjusted survey weights. 

Software

All analyses were conducted using RStudio with R (version 4.x). Key packages included survey, dplyr, janitor, broom, and nhanesA. Survey design objects were specified using the survey package in R to ensure nationally representative estimates and valid variance estimation.

Results

Descriptive Statistical Analysis

Survey-weighted descriptive statistics were calculated to characterize the overall study population and race. Continuous variables were summarized with means and standard errors, while categorical variables were summarized with weighted proportions. T tests were used to assess racial differences in continuous variables, while Rao–Scott adjusted chi-square tests were used for categorical variables. These analyses evaluated racial differences in demographic, socioeconomic, and clinical characteristics, as well as breast cancer prevalence.

Table 1:  Survey-Weighted Characteristics of Women Aged ≥40 Years by Race

CharacteristicBlackWhiteDesign-based test
Weighted population8,836,44253,875,528
Age, mean (SE), years56.46 (0.41)59.62 (0.26)t=8.47, p<0.001
Poverty–income ratio, mean (SE)2.38 (0.08)3.35 (0.07)t=10.34, p<0.001
BMI, mean (SE), kg/m²32.93 (0.26)29.53 (0.22)t=−13.14, p<0.001
Breast cancer diagnosis, % (SE)2.89 (0.40)5.48 (0.49)F=18.97, p<0.001
Diabetes, % (SE)21.02 (1.07)11.80 (0.67)F=67.36, p<0.001
Hypertension, % (SE)58.58 (1.45)43.64 (1.08)F=96.91, p<0.001

Source: NHANES 2011–2018

Table 2: Education Distribution (within-race proportions)

Education levelBlackWhite
<9th>4.33%1.83%
9–11th grade13.38%7.25%
High school/GED24.31%23.12%
Some college/AA34.66%34.22%
College graduate23.32%33.59%

Rao–Scott χ²: F=16.03, p<0>

Tables 1 and 2 show that Black women were younger, had a lower socioeconomic position, higher BMI, and a substantially higher prevalence of diabetes and hypertension than White women. Despite this higher cardiometabolic burden, Black women reported approximately half the prevalence of breast cancer diagnosis compared with White women.

Logistic Regression Analyses

All models accounted for NHANES complex survey design using pooled MEC weights. All multivariable analyses used survey-weighted logistic regression (svyglm) with a quasibinomial family. Logistic regression was intentionally selected as an intrinsically explainable modeling approach, allowing direct interpretation of coefficients, odds ratios, and interaction terms without reliance on post hoc explainability approaches. This modeling strategy is in consonance with contemporary definitions of transparent or glass-box AI, in which model logic and process are fully observable and interpretable by design.

Model 1: Baseline Transparency Model

The first model computed the association between race and breast cancer diagnosis, adjusting only for age.

Predictors

  • Race (categorical: Black vs White)
  • Age (continuous)

Table 3: Model 1-Age-Adjusted Logistic Regression

Outcome: Self-reported breast cancer diagnosis

PredictorOR95% CIp-value
Black (vs White)0.590.43–0.810.002
Age (per year)1.061.04–1.07<0.001

Source: NHANES 2011–2018

Interpretation: After adjusting for age, Black women had 41% lower odds of reporting a breast cancer diagnosis. 

Model 2: Socioeconomic Adjustment Model

The second model evaluated whether racial disparities persisted after adjustment for socioeconomic position.

Predictors

  • Race
  • Age
  • Poverty–income ratio
  • Educational attainment

Table 4: Model 2-Adjustment for Socioeconomic Status and Education

PredictorOR95% CIp-value
Black (vs White)0.730.52–1.020.070
Age (per year)1.061.05–1.08<0.001
Poverty–income ratio1.060.92–1.220.397
Education: 9–11th grade2.090.88–4.950.101
Education: HS/GED3.541.70–7.400.001
Education: Some college/AA3.261.56–6.800.003
Education: College graduate4.892.25–10.70<0.001

Source: NHANES 2011–2018

Interpretation: Adjustment for education substantially attenuated the racial disparity. Educational attainment showed a strong positive gradient with diagnosis, independent of income.

Model 3: Full Clinical Adjustment Model

The third model further adjusted for clinical risk factors.

Predictors

  • Race
  • Age
  • Poverty–income ratio
  • Educational attainment
  • Diabetes
  • Hypertension
  • BMI

Table 5: Model 3 — Full Adjustment Including Cardiometabolic Factors

PredictorOR95% CIp-value
Black (vs White)0.730.50–1.060.103
Age (per year)1.071.05–1.08<0.001
Poverty–income ratio1.080.93–1.260.318
Diabetes1.520.91–2.520.115
Hypertension0.970.65–1.450.889
BMI (per kg/m²)1.000.97–1.020.762
Education: HS/GED4.082.00–8.31<0.001
Education: Some college/AA3.501.75–7.01<0.001
Education: College graduate5.662.60–12.30<0.001

Interpretation: Cardiometabolic conditions and BMI did not explain the racial disparity. Education remained the dominant predictor of reported diagnosis.

Model 4: Interaction Models (Limits of Transparency)

To examine whether transparent interaction terms enhanced explanatory power, three interaction models were estimated separately:

  • Model 4a: Race × Educational Attainment
  • Model 4b: Race × Diabetes
  • Model 4c: Race × BMI

Each interaction model retained all main effects included in Model 3. These models assessed whether allowing race-specific slopes improved the explanation of breast cancer diagnosis disparities.

Table 6: Interaction Analyses (Effect Modification Tests)

InteractionOR95% CIp-value
Black × Education (all levels)>0.25 (all)
Black × Diabetes1.150.50–2.660.739
Black × BMI0.990.94–1.050.791

Interpretation: There was no statistical evidence of effect modification by education, diabetes, or BMI. Associations were directionally similar across racial groups.

Discussion

In a nationally representative sample of U.S. women aged ≥40 years, Black women exhibited a markedly lower prevalence and lower odds of self-reported breast cancer diagnosis than White women, despite a higher prevalence of obesity, diabetes, and hypertension. This disparity was substantially attenuated after adjustment for educational attainment, while cardiometabolic factors did not explain the difference. Education emerged as the most consistent predictor of breast cancer diagnosis across all models.

These findings do not suggest a protective biological effect of the Black race. Rather, they are consistent with differential diagnosis and detection across social strata. For instance, biological factors have been shown to influence breast cancer outcomes, and it has been shown that Black women, due to biological and genetic factors, are usually at a disadvantage and often record higher mortality due to these factors [24]. 

Indeed, Jatoi et al. [25] submitted that Black women have a higher likelihood than White women of lacking adequate health insurance, thereby curtailing their access to screening. Furthermore, Wilkerson et al. [26], noted that several studies in the United States have established that a combination of racial and socio-economic disparities exists in the use of breast cancer screening to the extent that Black and Hispanic women had lower odds of using mammographic screening when compared to Caucasian women. Therefore, the findings reinforced previous studies. 

Higher educational attainment is suggestive of greater engagement with preventive care, diagnostic services, and continuity of healthcare interactions that increase the probability of receiving and reporting a diagnosis. This is in line with Wilkerson et al. [26] finding that 80.4% of women with a college degree have up-to-date breast cancer screening when compared to 63% with less than a high school diploma. 

The coexistence of higher disease risk factors but lower diagnosis prevalence among Black women points to structural inequities in cancer detection rather than true differences in underlying disease burden. The findings reinforced the prevailing social patterns and inequalities. 

Explainable AI Implications

This study illustrates a critical issue for explainable and transparent predictive modeling in healthcare: even fully interpretable models can encode structural bias when trained on socially patterned outcomes.

Logistic regression coefficients clearly indicate that education is a dominant predictor. However, this “explanation” reflects who is diagnosed, not necessarily who has the disease. Thus, transparency alone does not guarantee fairness. If deployed in clinical decision support, such models risk reinforcing underdiagnosis among marginalized groups while appearing methodologically sound.

Diagnosis does not explain the overall diseased population, but largely acts as a pointer to those who have been diagnosed. Importantly, Wilkerson et al. [26] observed that the breast cancer incidence rate is higher among Black women under 45 years, and that Black, American Indian, Alaska native, and Hispanic women below age 50 are often diagnosed at later stages, has two significant implications. Firstly, there will be a disproportionate and negative impact on racial and ethnic minorities due to stifled access to care. Secondly, and relevant to this study, a significant at-risk population is not included in the dataset, as the sample consists only of women who are at least 40 years old. This further reinforces the position that diagnosis does not explain the overall diseased population and that the logistic regression’s transparency in clearly identifying education as a dominant predictor of breast cancer among Black women reflects only the diagnosed and not the diseased and therefore does not guarantee fairness in clinical decision support. 

Theoretical Implications

The encoding and reinforcing of racial disparities in Breast cancer diagnosis by the intrinsically explainable logistic regression in this study largely supports the postulation of the Fundamental Cause Theory.  Reflecting on the issues that threw up the FCT, its originators, Link and Phelan, retrospectively observed that the poor members of the society, those disadvantaged via racial and class hierarchies, and those living in the stigmatized half of a us-versus-them divide, often live in unsalutary health conditions and die younger than their counterparts [22]. This reflection by the authors led to their observation that the existence of health inequalities temporally and spatially, and the patterns of social disadvantage experienced, raises the question of explaining the regular emergence of a variety of health inequalities, given the fact that major risk factors and diseases that accounted for inequalities in earlier times have been largely eradicated in the developed world. 

Persistent racial disparities despite full covariate adjustment and explicit interaction terms were interpreted as evidence of the limits of transparency in intrinsically explainable models, highlighting the distinction between interpretability and equity. This position reinforces Alicia R. Riley’s reframing of the FCT as a system of exposure originating from deep-seated stratification systems that often enable those at the disadvantaged tier of social hierarchy to deploy flexible resources like education to avoid disease or seek treatment, while those at the advantaged tier are exposed to increased health inequalities [23].

Conclusion

This study demonstrates that transparency, while ethically appealing and methodologically valuable, has clear limits when detached from broader questions of equity and structural context. Using an intrinsically explainable logistic regression model, this study showed that racial disparities in breast cancer diagnosis among U.S. women persist not because of opaque model logic, but because the outcomes being modelled are themselves shaped by long-standing social inequalities in education, access, and healthcare engagement. Although the model offers clear and interpretable explanations, those explanations largely reflect patterns of diagnosis rather than true disease burden, thereby risking the reinforcement of underdiagnosis among marginalized groups. These findings revealed that interpretability alone does not guarantee fairness in clinical decision support and that transparent models can still reproduce structural bias if underlying data encode unequal access to care. Aligning with Fundamental Cause Theory, the results of this present study highlight the need to move beyond transparency as a standalone ethical safeguard and toward AI governance frameworks that explicitly address social stratification, data representativeness, and equity-driven validation in healthcare applications.

Strengths And Limitations of The Study

Strengths

A major strength of this study lies in its use of nationally representative NHANES data pooled across multiple survey cycles. This provides robust population-level inference and ensures that the findings are not confined to a single region, healthcare system, or clinical cohort. The ability to generalize results to U.S. women aged 40 years and older strengthens both the epidemiological relevance and the policy implications of the study.

Equally important is the rigorous handling of NHANES’ complex survey design. By explicitly accounting for stratification, clustering, and survey weights, the analyses produce valid variance estimates and unbiased population parameters. This methodological rigour is particularly critical when examining racial disparities, where failure to incorporate survey design can lead to misleading conclusions.

The study’s deliberate choice of logistic regression as an intrinsically explainable model represents another key strength. Unlike black-box approaches, the modelling strategy allows full transparency of model logic, coefficient interpretation, and decision pathways. Odds ratios, confidence intervals, and interaction terms are directly interpretable, making the analytic process accessible to clinicians, policymakers, and ethicists alike. This aligns closely with the ethical motivation of the study, which interrogates transparency as a core ideal in explainable AI.

Furthermore, the explicit testing of effect modification strengthens the analytic depth of the work. By formally evaluating race-by-covariate interactions, the study moves beyond descriptive disparity assessment to interrogate whether transparent models meaningfully capture heterogeneous effects across racial groups. The absence of significant interactions is itself an informative finding, highlighting the limits of transparency in resolving structural inequities.

This study contributes to public health informatics by demonstrating that transparency in predictive modeling is not a safeguard against inequity when model outcomes reflect structurally unequal access to diagnosis. The findings challenge the prevailing assumption that explainability alone promotes fairness in clinical decision support and underscore the need for equity-aware validation frameworks.

Limitations

Despite these strengths, several limitations should be acknowledged. First, the cross-sectional nature of NHANES data precludes causal inference. The associations observed reflect patterns of diagnosis at a single point in time rather than disease development or progression. As such, temporal relationships between social determinants, clinical risk factors, and breast cancer outcomes cannot be established.

Similarly, breast cancer diagnosis was based on self-reported data. Although self-reported cancer history in NHANES is reasonably reliable, misclassification remains possible. Differential recall or reporting by race, education, or access to healthcare may further bias estimates and contribute to observed disparities.

Additionally, the study lacked direct measures of screening utilization, such as mammography history, screening frequency, or access to diagnostic follow-up. This limitation constrains the ability to empirically distinguish between true differences in disease burden and disparities driven by differential detection. As a result, education and socioeconomic position likely operate as proxies for healthcare access and screening engagement rather than direct biological risk factors.

Furthermore, the outcome reflects diagnosis rather than incidence. Diagnosis captures only those who have interacted successfully with the healthcare system and received confirmation of the disease. It does not represent the underlying pool of undiagnosed or preclinical cases. This distinction is central to the study’s core argument: even fully transparent and interpretable models may faithfully explain patterns of diagnosis while simultaneously obscuring inequities in disease detection. In this sense, the limitation is not merely methodological but substantively reinforces the study’s conclusion about the limits of transparency in explainable AI.

References