<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Exp. Biol. Med.</journal-id>
<journal-title>Experimental Biology and Medicine</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Exp. Biol. Med.</abbrev-journal-title>
<issn pub-type="epub">1535-3699</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">10341</article-id>
<article-id pub-id-type="doi">10.3389/ebm.2024.10341</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Experimental Biology and Medicine</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Leveraging AI to improve disease screening among American Indians: insights from the Strong Heart Study</article-title>
<alt-title alt-title-type="left-running-head">Rogers et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/ebm.2024.10341">10.3389/ebm.2024.10341</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Rogers</surname>
<given-names>Paul</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1294828/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>McCall</surname>
<given-names>Thomas</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2819306/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Ying</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Reese</surname>
<given-names>Jessica</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Dong</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tong</surname>
<given-names>Weida</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>National Center for Toxicological Research</institution>, <institution>Division of Bioinformatics and Biostatistics</institution>, <institution>U.S. Food and Drug Administration</institution>, <addr-line>Jefferson</addr-line>, <addr-line>AR</addr-line>, <country>United States</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Data Science and Data Analytics</institution>, <institution>Arkansas State University</institution>, <addr-line>Jonesboro</addr-line>, <addr-line>AR</addr-line>, <country>United States</country>
</aff>
<aff id="aff3">
<sup>3</sup>
<institution>University of Oklahoma Health Sciences Center</institution>, <institution>Department of Biostatistics and Epidemiology</institution>, <addr-line>Oklahoma City</addr-line>, <addr-line>OK</addr-line>, <country>United States</country>
</aff>
<author-notes>
<corresp id="c001">&#x2a;Correspondence: Paul Rogers, <email>paul.rogers@fda.hhs.gov</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>08</day>
<month>01</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>249</volume>
<elocation-id>10341</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>08</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>12</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2025 Rogers, McCall, Zhang, Reese, Wang and Tong.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Rogers, McCall, Zhang, Reese, Wang and Tong</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Screening tests for disease have their performance measured through sensitivity and specificity, which inform how well the test can discriminate between those with and without the condition. Typically, high values for sensitivity and specificity are desired. These two measures of performance are unaffected by the outcome prevalence of the disease in the population. Research projects into the health of the American Indian frequently develop Machine learning algorithms as predictors of conditions in this population. In essence, these models serve as <italic>in silico</italic> screening tests for disease. A screening test&#x2019;s sensitivity and specificity values, typically determined during the development of the test, inform on the performance at the population level and are not affected by the prevalence of disease. A screening test&#x2019;s positive predictive value (PPV) is susceptible to the prevalence of the outcome. As the number of artificial intelligence and machine learning models flourish to predict disease outcomes, it is crucial to understand if the PPV values for these <italic>in silico</italic> methods suffer as traditional screening tests in a low prevalence outcome environment. The Strong Heart Study (SHS) is an epidemiological study of the American Indian and has been utilized in predictive models for health outcomes. We used data from the SHS focusing on the samples taken during Phases V and VI. Logistic Regression, Artificial Neural Network, and Random Forest were utilized as <italic>in silico</italic> screening tests within the SHS group. Their sensitivity, specificity, and PPV performance were assessed with health outcomes of varying prevalence within the SHS subjects. Although sensitivity and specificity remained high in these <italic>in silico</italic> screening tests, the PPVs&#x2019; values declined as the outcome&#x2019;s prevalence became rare. Machine learning models used as <italic>in silico</italic> screening tests are subject to the same drawbacks as traditional screening tests when the outcome to be predicted is of low prevalence.</p>
</abstract>
<kwd-group>
<kwd>artificial intelligence</kwd>
<kwd>machine learning</kwd>
<kwd>screening test</kwd>
<kwd>American Indian</kwd>
<kwd>low prevalence</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Impact statement</title>
<p>Artificial Intelligence (AI) and Machine Learning (ML) techniques are increasingly integrated into screening and diagnostic models to pinpoint individuals at risk of specific diseases or medical conditions. However, with the rise in popularity of AI and ML, the literature (and internet) is flooded with reports on computer-based prediction and screening tests, often focusing more on showcasing the technique rather than discussing their screening and diagnostic performance. In particular, there is a proliferation of algorithms created for minority groups, including the American Indian. A motivating factor in creating an <italic>in silico</italic> screening exam for American Indians is that this population, as a whole, experiences a greater burden of comorbidities, including diabetes mellitus, obesity, cancer, cardiovascular disease, and other chronic health conditions, than the rest of the U.S. population. This report evaluates these AI algorithms for the American Indian like a screening test in terms of performance in low prevalence situations.</p>
</sec>
<sec id="s2">
<title>Introduction</title>
<p>Artificial Intelligence (AI) and Machine Learning (ML) techniques are increasingly integrated into screening and diagnostic models to pinpoint individuals at risk for specific diseases or medical conditions [<xref ref-type="bibr" rid="B1">1</xref>]. However, with AI&#x2019;s and ML&#x2019;s rise in popularity, the literature (and the internet) is flooded with reports on computer-based prediction and screening tests, often focused more on showcasing techniques than discussing their screening and diagnostic performance. Advances in computer processing speed, increasing numbers of data scientists, low- to no-cost programming libraries, and availability of larger healthcare data sets have driven the proliferation of AI algorithms [<xref ref-type="bibr" rid="B2">2</xref>]. Kumar et al. have listed a sampling of prediction algorithms and data sets, including those for outcomes in Alzheimer&#x2019;s disease, cancer, diabetes, chronic heart disease, tuberculosis, stroke, hypertension, skin disease, and liver disease, among others [<xref ref-type="bibr" rid="B3">3</xref>]. Notwithstanding the proliferation of algorithms, AI is positioned to considerably enhance the accuracy and efficiency of screening tests. Specifically, ML algorithms can be trained on extensive data sets to discern patterns and make predictive analyses based on those patterns. Rapid expansion of AI technology, coupled with enhanced computing power in health screening, underscores the necessity for evaluating the algorithm&#x2019;s performance and quality of these algorithms [<xref ref-type="bibr" rid="B4">4</xref>].</p>
<p>The U.S. Government Accountability Office conducted a technology assessment noting that AI and ML offer advantages in analyzing underserved populations [<xref ref-type="bibr" rid="B5">5</xref>]. However, one challenge of utilizing AI in epidemiology pertains to the underrepresentation or absence of minority groups within these algorithms&#x2019; training data sets [<xref ref-type="bibr" rid="B6">6</xref>]. Also, screening test performance may vary in minority populations due to their differences in disease prevalence from non-minority populations.</p>
<p>Many AI and ML methods for predicting disease in non-minority populations are recalibrated for minority groups. For example, an ML algorithm for mortality prediction based on chronic disease was recalibrated for the population of South Korea; this adjusted index showed a greater mortality prediction than the original algorithm [<xref ref-type="bibr" rid="B7">7</xref>]. Another effort adjusted this mortality prediction algorithm using hospital discharge abstracts from six countries [<xref ref-type="bibr" rid="B8">8</xref>].</p>
<p>In terms of minority status, American Indians are sometimes referred to as the &#x201c;minority of the minority&#x201d; or the &#x201c;invisible minority,&#x201d; given their small population, cultural identity, languages, and histories that set them apart from other groups. Focusing on AI and ML can offer advantages in analyzing these underserved populations, who, like American Indians, bear a greater burden of certain health conditions [<xref ref-type="bibr" rid="B9">9</xref>&#x2013;<xref ref-type="bibr" rid="B11">11</xref>]. This study focused on <italic>in silico</italic> AI and ML screening tests explicitly designed for the American Indian population.</p>
<p>The number of <italic>in silico</italic> diagnostic and screening tests has grown exponentially over the last decade, with many of these utilizing data sets based on American Indians. Our study serves as a reminder that <italic>in silico</italic> screening tests, even when classified as AI or ML algorithms, are still subject to the same limitations related to disease prevalence as those of their laboratory-based counterparts.</p>
<sec id="s2-1">
<title>Popularization of Pima Indian data</title>
<p>Several research articles in the public domain report on AI and ML algorithms for diabetes classification in the Pima Indian population. A contributing factor is the availability of numerous Pima Indian data sets provided to the AI community through platforms like Kaggle, a popular resource for AI and ML algorithm developers [<xref ref-type="bibr" rid="B12">12</xref>]. However, these studies often overlooked the differences in disease prevalence among different populations and the potential consequences of applying algorithms trained specifically on one population to another.</p>
<p>Examples of ML algorithms for diabetes classification in Pima Indians sampled from the literature include Support Vector Machines, Radial Basis Function, Kernel Support Vector Machines, K-Nearest Neighbor, Artificial Neural Networks, Fuzzy Support Vector Machine, Na&#xef;ve Bayes Classifier, J48 Decision Tree, and a Random Forest Classifier [<xref ref-type="bibr" rid="B13">13</xref>&#x2013;<xref ref-type="bibr" rid="B15">15</xref>]. Some of the articles in this sample failed to recognize the high prevalence of diabetes among the Pima Indians and the impact of disease prevalence on screening test performance, and tended to focus solely on the methods used to perform the classifications [<xref ref-type="bibr" rid="B14">14</xref>].</p>
</sec>
<sec id="s2-2">
<title>The Strong Heart Study models</title>
<p>The Strong Heart Study (SHS) has played a significant role in identifying risk factors and patterns related to cardiovascular disease (CVD) in American Indian communities. It included 12 tribes located in Oklahoma, Arizona and the Dakotas. Statistical models developed using SHS data have informed interventions and public health policies targeting CVD. SHS data were also used in developing ML models and risk-based calculators addressing hypertension, diabetes, and coronary heart disease (CHD) [<xref ref-type="bibr" rid="B16">16</xref>&#x2013;<xref ref-type="bibr" rid="B18">18</xref>].</p>
</sec>
<sec id="s2-3">
<title>AI and ML risks with American Indian data sets</title>
<p>However, potential risks are also associated with using ML in American Indian contexts. One notable concern involves the risk of ML algorithms perpetuating biases and stereotypes about American Indian communities. Specifically, algorithms trained on data sets that reinforce biases and stereotypes about American Indians could inadvertently foster further inequities against a population group frequently underrepresented in AI/ML training data sets for applications such as virtual screening tests. A lack of training data can also result in inaccuracies; a recent example involved an image recognition application identifying an American Indian in native dress as a bird [<xref ref-type="bibr" rid="B19">19</xref>]. To mitigate such risks, ML researchers and developers need to collaborate closely with American Indian communities to ensure their technologies are developed ethically and respectfully. Collaborations could entail establishing research partnerships with American Indian communities, involving community members in designing and developing ML models, and ensuring that models are built using unbiased and culturally sensitive data sets, like those of the SHS.</p>
</sec>
<sec id="s2-4">
<title>Traditional screening test performance</title>
<p>Traditional screening test performance is typically based on a gold standard in which an individual&#x2019;s true disease status is known to establish the test&#x2019;s sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The test&#x2019;s sensitivity and specificity inform its effectiveness in identifying the proportion of people in the population with and without the condition of interest [<xref ref-type="bibr" rid="B20">20</xref>]. Sensitivity is the ability of the test to correctly identify those with the condition, while specificity is the ability of the test to correctly identify those without it. Sensitivity can be calculated from the column of those truly positive for the condition, while specificity is derived from the column of those truly negative for the condition in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Calculations of sensitivity, specificity, PPV, and NPV for screening tests usually have their performance metrics determined via a gold standard. The numbers of true positives and negatives are represented by TP and TN, respectively. Likewise, the numbers of false positives and negatives are represented by FP and FN.</p>
</caption>
<graphic xlink:href="ebm-249-10341-g001.tif"/>
</fig>
<p>Among these metrics, the PPV holds clinical significance for both healthcare providers and patients. The PPV is a conditional probability that the tested individual has the disease, given that they tested positive. A high PPV indicates effective identification of individuals with the tested condition, guiding further testing, diagnosis, and treatment decisions. The PPV is calculated from the row in <xref ref-type="fig" rid="F1">Figure 1</xref> that represents those subjects who tested positive for disease.</p>
<p>As disease prevalence decreases, screening test performance decreases, particularly concerning the PPV. This decline can lead to situations where accuracy and sensitivity remain high, giving a false impression of a well-performing test due to the increased number of false positives (FP). For example, in a population of 1,000 people with a disease prevalence of 40%, a test with a sensitivity of 90% and specificity of 80% will produce a PPV of 75%. If the disease prevalence is lowered to 10% in this same population, the PPV drops to 33.33%; hence, the prevalence dominates in screening for rare diseases [<xref ref-type="bibr" rid="B21">21</xref>]. Therefore, healthcare providers should consider these factors when interpreting the PPV for further testing and treatment decisions.</p>
<p>While sensitivity and specificity provide information about test performance across populations, PPV is often more relevant in clinical practice. It helps physicians assess the likelihood of disease presence after a positive test result, especially in populations with low disease prevalence. If the disease outcome becomes increasingly rare, the algorithm will likely always predict the absence of disease, leading to high accuracy but poor PPV [<xref ref-type="bibr" rid="B22">22</xref>].</p>
<p>This study aimed to develop and evaluate three popular and commonly used AI and ML techniques as <italic>in silico</italic> screening tools for predicting three chronic conditions with differing prevalences in the SHS population: peripheral artery disease (PAD), hypertension, and type 2 diabetes. Specifically, we predicted the disease outcome using epidemiological data with methods including artificial neural networks (ANNs), random forest (RF), and logistic regression (LR). Unlike their traditional laboratory-based counterparts, these <italic>in silico</italic> tests do not have pre-determined sensitivity or specificity; rigorous testing has not been performed using a gold standard to establish these values. Our simulations provided a glimpse of the sensitivity, specificity, and PPV of these <italic>in silico</italic> screening tests, as these values changed in response to differing disease prevalences. We hypothesized that these <italic>in silico</italic> screening tools tailored to the American Indian population would show reduced performance as disease prevalence declines, regardless of the AI or ML method.</p>
<p>This research serves as a reminder that the limitations of screening tests regarding disease prevalence still apply, whether those tests are <italic>in silico</italic> AI or ML algorithms or traditional screening tools.</p>
</sec>
</sec>
<sec sec-type="materials|methods" id="s3">
<title>Materials and methods</title>
<p>LR, ANNs, and RFs are well-known methods for creating <italic>in silico</italic> screening tests. While RFs operate as a nonlinear model, LR requires a linear relationship with the regression coefficients. ANNs present a more intricate approach, often featuring multiple layers commonly known as deep learning. AI and ML can potentially enhance screening for various medical conditions by illuminating linear and nonlinear data relationships. Nonetheless, it is crucial to acknowledge that the application of AI in medical research and screening exams is still nascent, and concerns over AI algorithms&#x2019; accuracy and reliability linger.</p>
<sec id="s3-1">
<title>Longitudinal epidemiological SHS data set</title>
<p>The SHS began in 1988 as a multi-center, population-based longitudinal study of cardiovascular disease (CVD) and its risk factors among American Indians. The study had three phases: a clinical examination, a personal interview, and an ongoing mortality and morbidity survey [<xref ref-type="bibr" rid="B23">23</xref>]. Participants from 12 different American Indian tribes were recruited from Arizona, Oklahoma, and the Dakotas, aided by volunteers from each community who promoted participation [<xref ref-type="bibr" rid="B24">24</xref>].</p>
<p>Phase II of the SHS, examining changes in risk factors for CVD in the original cohort, occurred between 1993 and 1995. The Strong Heart Family Study (SHFS), launched in Phase III (1998&#x2013;1999), investigated genetic determinants of cardiovascular disease and extended recruitment to the original cohort&#x2019;s family members aged 18 years and older [<xref ref-type="bibr" rid="B25">25</xref>]. Phase IV (2001&#x2013;2003) involved surveillance of the original cohort plus 90 families to continue the study of genetic markers for CVD [<xref ref-type="bibr" rid="B26">26</xref>]. The Phase V exam (2006&#x2013;2009) continued the SHFS, which began in Phase III; all participants from Phase III and IV were invited to participate in examinations conducted at local Indian Health Service hospitals, clinics, or tribal community facilities [<xref ref-type="bibr" rid="B27">27</xref>]. In Phase VI (2014&#x2013;2018), all surviving participants were invited to complete a medical questionnaire, and continued the morbidity and mortality surveillance continued.</p>
<p>Physicians on the SHS Morbidity and Mortality (M &#x26; M) review committee examined the types of health-related events requiring hospital treatment and subsequent causes of mortality, when it occurred. Two of these physicians independently reviewed fatal events for cause, with the results reconciled by a third physician. In addition, one physician reviewed the medical records regarding study participant&#x2019;s non-fatal events to verify specific diagnoses (i.e., stroke). This surveillance occurred yearly for both the original cohort and family cohort participants.</p>
<p>The available 2,468 Phase V SHS participants were divided into development and training cohorts (80%), while the remaining sample (20%) was assigned to a testing cohort. The training cohort generated the model weights, while the testing cohort assessed the algorithm&#x2019;s quality.</p>
<p>A 1-year time-to-event data set for this study was constructed from the examination date in Phase V. The M &#x26; M results and all Phases of SHS data will cumulatively provide information on the subject&#x2019;s medical conditions and mortality outcomes. Basic descriptive demographic statistics by gender, age, and comorbidity, including the numbers and percentages for binary variables, are listed in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Age, gender, and medical condition of SHS Phase V participants.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Medical condition</th>
<th align="center">All</th>
<th align="center">Male</th>
<th align="center">Female</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">N (%)</td>
<td align="left">2,468</td>
<td align="left">977 (39.59)</td>
<td align="left">1,491 (60.41)</td>
</tr>
<tr>
<td colspan="4" align="left">Age (years)</td>
</tr>
<tr>
<td align="left">&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;Mean (SD)</td>
<td align="left">45.55 (16.41)</td>
<td align="left">43.74 (16.00)</td>
<td align="left">46.73 (16.58)</td>
</tr>
<tr>
<td align="left">&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;Median</td>
<td align="left">44.40</td>
<td align="left">42.70</td>
<td align="left">45.70</td>
</tr>
<tr>
<td align="left">Hypertension (%)</td>
<td align="left">948 (38.41)</td>
<td align="left">402 (42.41)</td>
<td align="left">546 (57.59)</td>
</tr>
<tr>
<td align="left">Diabetes (%)</td>
<td align="left">631 (25.56)</td>
<td align="left">240 (38.03)</td>
<td align="left">391 (61.97)</td>
</tr>
<tr>
<td align="left">PAD (%)</td>
<td align="left">94 (3.81)</td>
<td align="left">32 (34.04)</td>
<td align="left">62 (65.96)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Data labels for hypertension and diabetes already existed within the SHS Phase V data set but did not include a specific label for PAD. The data included the participants&#x2019; right and left ankle-brachial indexes (ABIs), which were used to define the presence or absence of PAD. This study used a resting ABI of less than 0.90 on either the right, left, or both sides, similar to that in Virane et al., to indicate a PAD diagnosis. Participants were coded as either 1 or 0 for the presence or absence of PAD, respectively [<xref ref-type="bibr" rid="B28">28</xref>, <xref ref-type="bibr" rid="B29">29</xref>].</p>
<p>LR, ANN, and RF were then used to model the PAD, hypertension, and diabetes target features. These models ran 100 unique iterations of splitting and training the data, and producing metrics from the test set. Metrics tracked for the models were accuracy, specificity, sensitivity, PPV, and NPV, which were averaged over 100 iterations for each model type.</p>
<p>SAS version 9.4 was used to assemble the Phases of the SHS into a single data set, while Python version 3.9.7 was used to script the LR, RF, and ANN models.</p>
</sec>
</sec>
<sec sec-type="results" id="s4">
<title>Results</title>
<p>Numbers of SHS participants reporting hypertension, diabetes, and positivity for PAD are reported in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<p>More females than males participated in Phase V, comprising over 60% of the study participants. In addition, women reported higher percentages of hypertension, diabetes, and PAD than did men.</p>
<p>
<xref ref-type="fig" rid="F2">Figures 2</xref>&#x2013;<xref ref-type="fig" rid="F4">4</xref> show each model&#x2019;s accuracy, sensitivity, specificity, NPV, and PPV and reflect similar performance patterns among all models. The PPV and sensitivity measures seem to suffer the most as the outcome prevalence declines, which is what is typically observed for a traditional laboratory-based screening test. PPV and sensitivity decline for all models but remain parallel for LR and appear to converge within the ANN and RF models. As PPV and sensitivity decline for the RF model, they converge to zero at an outcome prevalence of 4% (PAD). Specific numerical values for each model metric are recorded in <xref ref-type="table" rid="T2">Table 2</xref> for all three chronic conditions.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Screening test diagnostics for logistic regression. <inline-graphic xlink:href="ebm-249-10341-fx1.tif"/>Accuracy <inline-graphic xlink:href="ebm-249-10341-fx2.tif"/>Specificity <inline-graphic xlink:href="ebm-249-10341-fx3.tif"/>Sensitivity <inline-graphic xlink:href="ebm-249-10341-fx4.tif"/>PPV <inline-graphic xlink:href="ebm-249-10341-fx5.tif"/>NPV.</p>
</caption>
<graphic xlink:href="ebm-249-10341-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Screening test diagnostics for artificial neural networks. <inline-graphic xlink:href="ebm-249-10341-fx1.tif"/>Accuracy <inline-graphic xlink:href="ebm-249-10341-fx2.tif"/>Specificity <inline-graphic xlink:href="ebm-249-10341-fx3.tif"/>Sensitivity <inline-graphic xlink:href="ebm-249-10341-fx4.tif"/>PPV <inline-graphic xlink:href="ebm-249-10341-fx5.tif"/>NPV.</p>
</caption>
<graphic xlink:href="ebm-249-10341-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Screening test diagnostics for random forest. <inline-graphic xlink:href="ebm-249-10341-fx1.tif"/>Accuracy <inline-graphic xlink:href="ebm-249-10341-fx2.tif"/>Specificity <inline-graphic xlink:href="ebm-249-10341-fx3.tif"/>Sensitivity <inline-graphic xlink:href="ebm-249-10341-fx4.tif"/>PPV <inline-graphic xlink:href="ebm-249-10341-fx5.tif"/>NPV.</p>
</caption>
<graphic xlink:href="ebm-249-10341-g004.tif"/>
</fig>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Summary of screening test diagnostics by method: logistic regression, artificial neural network, and random forest are represented by LR, ANN, and RF.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="3" align="left">Metric</th>
<th colspan="9" align="center">Model and chronic condition</th>
</tr>
<tr>
<th colspan="3" align="center">Hypertension</th>
<th colspan="3" align="center">Diabetes</th>
<th colspan="3" align="center">PAD</th>
</tr>
<tr>
<th align="left">LR</th>
<th align="left">ANN</th>
<th align="left">RF</th>
<th align="left">LR</th>
<th align="left">ANN</th>
<th align="left">RF</th>
<th align="left">LR</th>
<th align="left">ANN</th>
<th align="left">RF</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Accuracy</td>
<td align="left">78.99</td>
<td align="left">76.00</td>
<td align="left">79.35</td>
<td align="left">81.49</td>
<td align="left">77.66</td>
<td align="left">82.18</td>
<td align="left">95.57</td>
<td align="left">94.56</td>
<td align="left">95.99</td>
</tr>
<tr>
<td align="left">Specificity</td>
<td align="left">87.52</td>
<td align="left">84.21</td>
<td align="left">89.24</td>
<td align="left">92.28</td>
<td align="left">89.92</td>
<td align="left">95.31</td>
<td align="left">99.47</td>
<td align="left">98.44</td>
<td align="left">100.00</td>
</tr>
<tr>
<td align="left">Sensitivity</td>
<td align="left">64.79</td>
<td align="left">62.41</td>
<td align="left">62.88</td>
<td align="left">44.59</td>
<td align="left">36.06</td>
<td align="left">37.26</td>
<td align="left">2.40</td>
<td align="left">2.07</td>
<td align="left">0.00</td>
</tr>
<tr>
<td align="left">PPV</td>
<td align="left">75.61</td>
<td align="left">73.62</td>
<td align="left">77.75</td>
<td align="left">62.88</td>
<td align="left">61.77</td>
<td align="left">70.13</td>
<td align="left">18.06</td>
<td align="left">9.04</td>
<td align="left">0.00</td>
</tr>
<tr>
<td align="left">NPV</td>
<td align="left">80.63</td>
<td align="left">79.87</td>
<td align="left">80.12</td>
<td align="left">85.08</td>
<td align="left">83.52</td>
<td align="left">83.88</td>
<td align="left">96.06</td>
<td align="left">96.00</td>
<td align="left">95.99</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>All three models reported accuracy and specificity values that increased as the condition&#x2019;s prevalence declined. These two measures are roughly 95% or higher for PAD, regardless of the model selected. Conversely, sensitivity and PPV decreased as the prevalence declined, largely due to the increased number of false positives. Although poor, the LR model reported the greatest PPV of 18% for PAD, as compared to the ANN and RF, which were at 9% and 0%, respectively.</p>
<p>The formulas for sensitivity and PPV in <xref ref-type="fig" rid="F1">Figure 1</xref> give insight to the effect of false and true positives on these two metrics. Traditional laboratory screening tests&#x2019; performance metrics are usually determined via a gold standard. As the prevalence of the condition declined, so did the number of true positives, while that of false positives increased, driving down both the sensitivity and PPV. Accuracy remained high as the true negatives grew, inflating these metrics.</p>
</sec>
<sec sec-type="discussion" id="s5">
<title>Discussion</title>
<p>LR, ANNs, and RFs are popular methods in the burgeoning world of AI and ML. Although these methods are quite different from one another, we can see that their performance metric trends are similar in screening for disease outcomes with varying prevalences. These performance metrics give the developers of these methods an idea of how a specific <italic>in silico</italic> screening method will perform in the population it was designed to serve based on the prevalence of the outcome.</p>
<p>Although these algorithms may have high predictive power, as measured in terms of predictive accuracy, some are criticized for lacking any causal reasoning [<xref ref-type="bibr" rid="B30">30</xref>]. For example, ANNs may give reliable predictions for the end users; however, these end users do not know how the algorithm came to a particular conclusion. Thus, they are &#x201c;black boxes&#x201d; contributing little to understanding a condition&#x2019;s cause.</p>
<p>Regardless of the method used, the PPV declined in parallel with the overall prevalence of the condition. The type of <italic>in silico</italic> modeling approach is still subject to the same limitations as those of traditional lab-based screening tests, an important factor to remember as online screening tests become more widespread. This study reminds us that regardless of the approach used, <italic>in silico</italic> AI and ML screening tests are not &#x201c;magic bullets.&#x201d; Their performance is still limited by the prevalence of the disease in the populations they are intended to serve.</p>
</sec>
</body>
<back>
<sec sec-type="author-contributions" id="s6">
<title>Author contributions</title>
<p>TM performed the modeling and Python coding, while PR completed the statistical analysis and initial manuscript writing. DW and WT contributed to the structure and content of the manuscript. YZ and JR described the Strong Heart Study design and contributed to the creation of the manuscript. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="disclaimer" id="s7">
<title>Author disclaimer</title>
<p>This manuscript reflects the views of its authors and does not necessarily reflect those of the U.S. Food and Drug Administration. Any mention of commercial products is for clarification only and is not intended as approval, endorsement, or recommendation.</p>
</sec>
<sec sec-type="data-availability" id="s8">
<title>Data availability</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://strongheartstudy.org/">https://strongheartstudy.org/</ext-link>.</p>
</sec>
<sec sec-type="ethics-statement" id="s9">
<title>Ethics statement</title>
<p>This project was approved by the University of Oklahoma Health Sciences Center Institutional Review Board along with the Strong Heart Study Publications and Presentations Committee (SHS700). In addition, the National Center for Toxicological Research Institutional Review Board approved the project and its publication.</p>
</sec>
<sec sec-type="funding-information" id="s10">
<title>Funding</title>
<p>The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The Strong Heart Study has been funded in whole or in part with federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under contract numbers (75N92019D00027, 75N92019D00028, 75N92019D00029, and 75N92019D00030). The study was previously supported by research grants: R01HL109315, R01HL109301, R01HL109284, R01HL109282, and R01HL109319 and by cooperative agreements: U01HL41642, U01HL41652, U01HL41654, U01HL65520, and U01HL65521.</p>
</sec>
<sec sec-type="COI-statement" id="s11">
<title>Conflict of interest</title>
<p>The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davenport</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kalakota</surname>
<given-names>R</given-names>
</name>
</person-group>. <article-title>The potential for artificial intelligence in healthcare</article-title>. <source>Future Healthc J</source> (<year>2019</year>) <volume>6</volume>:<fpage>94</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.7861/futurehosp.6-2-94</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bohr</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Memarzadeh</surname>
<given-names>K</given-names>
</name>
</person-group>. <article-title>The rise of artificial intelligence in healthcare applications</article-title>. In: <person-group person-group-type="editor">
<name>
<surname>Bohr</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Memarzadeh</surname>
<given-names>K</given-names>
</name>
</person-group>, editors <source>Artificial intelligence in healthcare</source>. <publisher-name>Academic Press</publisher-name> (<year>2020</year>). p. <fpage>i</fpage>&#x2013;<lpage>iii</lpage>.</citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kumar</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Koul</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Singla</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ijaz</surname>
<given-names>MF</given-names>
</name>
</person-group>. <article-title>Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda</article-title>. <source>J Ambient Intelligence Humanized Comput</source> (<year>2022</year>) <volume>14</volume>:<fpage>8459</fpage>&#x2013;<lpage>86</lpage>. <pub-id pub-id-type="doi">10.1007/s12652-021-03612-z</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chalkidou</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Shokraneh</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Kijauskaite</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Taylor-Phillips</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Halligan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wilkinson</surname>
<given-names>L</given-names>
</name>
<etal/>
</person-group> <article-title>Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening</article-title>. <source>Lancet Digital Health</source> (<year>2022</year>) <volume>4</volume>:<fpage>e899</fpage>&#x2013;<lpage>e905</lpage>. <pub-id pub-id-type="doi">10.1016/S2589-7500(22)00186-8</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="web">
<collab>U. S. Government Accountability Office</collab>. <article-title>Artificial intelligence in health care: benefits and challenges of machine learning technologies for medical diagnostics</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://www.gao.gov/products/gao-22-104629">https://www.gao.gov/products/gao-22-104629</ext-link> (Accessed July 1, 2024)</comment>.</citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sung</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hopper</surname>
<given-names>JL</given-names>
</name>
</person-group>. <article-title>Co-evolution of epidemiology and artificial intelligence: challenges and opportunities</article-title>. <source>Int J Epidemiol</source> (<year>2023</year>) <volume>52</volume>:<fpage>969</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1093/ije/dyad089</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>YC</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Bae</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>DK</given-names>
</name>
<etal/>
</person-group> <article-title>Recalibration and validation of the Charlson comorbidity index in an Asian population: the national health insurance service-national sample cohort study</article-title>. <source>Sci Rep</source> (<year>2020</year>) <volume>10</volume>:<fpage>13715</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-020-70624-8</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Quan</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Couris</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Fushimi</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Graham</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hider</surname>
<given-names>P</given-names>
</name>
<etal/>
</person-group> <article-title>Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries</article-title>. <source>Am J Epidemiol</source> (<year>2011</year>) <volume>173</volume>:<fpage>676</fpage>&#x2013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1093/aje/kwq433</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benjamin</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Muntner</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Alonso</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bittencourt</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Callaway</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Carson</surname>
<given-names>AP</given-names>
</name>
<etal/>
</person-group> <article-title>Heart disease and stroke statistics-2019 update: a report from the American heart association</article-title>. <source>Circulation</source> (<year>2019</year>) <volume>139</volume>:<fpage>e56</fpage>&#x2013;<lpage>e528</lpage>. <pub-id pub-id-type="doi">10.1161/CIR.0000000000000659</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="web">
<collab>Centers for Disease Control and Prevention</collab>. <article-title>CDC and Indian country working together</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://stacks.cdc.gov/view/cdc/44668">https://stacks.cdc.gov/view/cdc/44668</ext-link> (Accessed July 1, 2024)</comment>.</citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="web">
<collab>Indian health Service</collab>. <article-title>Disparities</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://www.ihs.gov/newsroom/factsheets/disparities/">https://www.ihs.gov/newsroom/factsheets/disparities/</ext-link> (Accessed July 1, 2024)</comment>.</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="web">
<collab>Kaggle</collab>. <article-title>Kaggle: your home for data science</article-title> (<year>2023</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/">https://www.kaggle.com/</ext-link> (Accessed April 28, 2023)</comment>.</citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>QA</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Z</given-names>
</name>
</person-group>. <article-title>Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms</article-title>. <source>Neural Comput Appl</source> (<year>2022</year>) <volume>35</volume>:<fpage>16157</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1007/s00521-022-07049-z</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kaur</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kumari</surname>
<given-names>V</given-names>
</name>
</person-group>. <article-title>Predictive modelling and analytics for diabetes using a machine learning approach</article-title>. <source>Appl Comput Inform</source> (<year>2022</year>) <volume>18</volume>:<fpage>90</fpage>&#x2013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1016/jaci2018.12.004</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lukmanto</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Irwansyah</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>The early detection of diabetes mellitus (DM) using Fuzzy hierarchical model</article-title>. <source>Proced Comput Sci</source> (<year>2015</year>) <volume>59</volume>:<fpage>312</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2015.07.571</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Howard</surname>
<given-names>BV</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Welty</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Galloway</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Best</surname>
<given-names>LG</given-names>
</name>
<etal/>
</person-group> <article-title>Prediction of coronary heart disease in a population with high prevalence of diabetes and albuminuria: the Strong Heart Study</article-title>. <source>Circulation</source> (<year>2006</year>) <volume>113</volume>:<fpage>2897</fpage>&#x2013;<lpage>905</lpage>. <pub-id pub-id-type="doi">10.1161/CIRCULATIONAHA.105.593178</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Fabsitz</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Devereux</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Best</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Welty</surname>
<given-names>TK</given-names>
</name>
<etal/>
</person-group> <article-title>A longitudinal study of hypertension risk factors and their relation to cardiovascular disease: the Strong Heart Study</article-title>. <source>Hypertension</source> (<year>2006</year>) <volume>47</volume>:<fpage>403</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1161/01.HYP.0000200710.29498.80</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Howard</surname>
<given-names>BV</given-names>
</name>
<name>
<surname>Fabsitz</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Devereux</surname>
<given-names>RB</given-names>
</name>
<name>
<surname>Welty</surname>
<given-names>TK</given-names>
</name>
</person-group>. <article-title>Fasting plasma glucose and hemoglobin A1c in identifying and predicting diabetes: the strong heart study</article-title>. <source>Diabetes Care</source> (<year>2011</year>) <volume>34</volume>:<fpage>363</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.2337/dc10-1680</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Cipolle</surname>
<given-names>AV</given-names>
</name>
</person-group>. <article-title>How native Americans are trying to debug A.I.&#x2019;s Biases</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://www.nytimes.com/2022/03/22/technology/ai-data-indigenous-ivow.html">https://www.nytimes.com/2022/03/22/technology/ai-data-indigenous-ivow.html</ext-link> (Accessed July 1, 2024)</comment>.</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gordis</surname>
<given-names>L</given-names>
</name>
</person-group>. <source>Epidemiology</source>. <publisher-loc>Philadelphia</publisher-loc>: <publisher-name>Elsevier Saunders</publisher-name> (<year>2004</year>). p. <fpage>335</fpage>.</citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>van Belle</surname>
<given-names>G</given-names>
</name>
</person-group>. <source>Statistical rules of thumb</source>. <edition>2nd ed.</edition> <publisher-loc>Hoboken</publisher-loc>: <publisher-name>Basic Books, Inc</publisher-name> (<year>2008</year>). p. <fpage>272</fpage>.</citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wan</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lv</surname>
<given-names>Z</given-names>
</name>
</person-group>. <article-title>Popular deep learning algorithms for disease prediction: a review</article-title>. <source>Cluster Comput</source> (<year>2023</year>) <volume>26</volume>:<fpage>1231</fpage>&#x2013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1007/s10586-022-03707-y</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Welty</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Fabsitz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cowan</surname>
<given-names>LD</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Oopik</surname>
<given-names>AJ</given-names>
</name>
<etal/>
</person-group> <article-title>The Strong Heart Study. A study of cardiovascular disease in American Indians: design and methods</article-title>. <source>Am J Epidemiol</source> (<year>1990</year>) <volume>132</volume>:<fpage>1141</fpage>&#x2013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1093/oxfordjournals.aje.a115757</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stoddart</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Jarvis</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Fabsitz</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Howard</surname>
<given-names>BV</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>ET</given-names>
</name>
<etal/>
</person-group> <article-title>Recruitment of American Indians in epidemiologic research: the strong heart study</article-title>. <source>Am Indian Alsk Native Ment Health Res</source> (<year>2000</year>) <volume>9</volume>:<fpage>20</fpage>&#x2013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.5820/aian.0903.2000.20</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>North</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Howard</surname>
<given-names>BV</given-names>
</name>
<name>
<surname>Welty</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Best</surname>
<given-names>LG</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Yeh</surname>
<given-names>JL</given-names>
</name>
<etal/>
</person-group> <article-title>Genetic and environmental contributions to cardiovascular disease risk in American Indians: the strong heart family study</article-title>. <source>Am J Epidemiol</source> (<year>2003</year>) <volume>157</volume>:<fpage>303</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1093/aje/kwf208</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="web">
<collab>Strong Heart Study</collab>. <article-title>Strong heart study phase IV operations manual</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://strongheartstudy.org/portals/1288/Assets/documents/manuals/Phase%20IV%20Operations%20Manual.pdf?ver=2017-11-15-134610-080">https://strongheartstudy.org/portals/1288/Assets/documents/manuals/Phase%20IV%20Operations%20Manual.pdf?ver&#x3d;2017-11-15-134610-080</ext-link> (Accessed July 1, 2024)</comment>.</citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="web">
<collab>Strong Heart Study</collab>. <article-title>Strong heart study phase IV operations manual</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://strongheartstudy.org/portals/1288/Assets/documents/manuals/Phase%20V%20Operations%20Manual.pdf?ver=2017-11-15-134617-657">https://strongheartstudy.org/portals/1288/Assets/documents/manuals/Phase%20V%20Operations%20Manual.pdf?ver&#x3d;2017-11-15-134617-657</ext-link> (Accessed July 1, 2024)</comment>.</citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="web">
<collab>American Heart Association</collab>. <article-title>How is PAD diagnosed</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://www.heart.org/en/health-topics/peripheral-artery-disease/diagnosing-pad">https://www.heart.org/en/health-topics/peripheral-artery-disease/diagnosing-pad</ext-link> (Accessed July 1, 2024)</comment>.</citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Virani</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Alonso</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Benjamin</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Bittencourt</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Callaway</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Carson</surname>
<given-names>AP</given-names>
</name>
<etal/>
</person-group> <article-title>Heart disease and stroke statistics-2020 update: a report from the American heart association</article-title>. <source>Circulation</source> (<year>2020</year>) <volume>141</volume>:<fpage>e139</fpage>&#x2013;<lpage>e596</lpage>. <pub-id pub-id-type="doi">10.1161/CIR.0000000000000757</pub-id>
</citation>
</ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pearl</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mackenzie</surname>
<given-names>D</given-names>
</name>
</person-group>. <source>The book of why: the new science of cause and effect</source>. <publisher-name>Basic Books, Inc.</publisher-name> (<year>2018</year>). p. <fpage>432</fpage>.</citation>
</ref>
</ref-list>
</back>
</article>