<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Exp. Biol. Med.</journal-id>
<journal-title>Experimental Biology and Medicine</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Exp. Biol. Med.</abbrev-journal-title>
<issn pub-type="epub">1535-3699</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">10374</article-id>
<article-id pub-id-type="doi">10.3389/ebm.2025.10374</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Experimental Biology and Medicine</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance</article-title>
<alt-title alt-title-type="left-running-head">Guo et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/ebm.2025.10374">10.3389/ebm.2025.10374</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Guo</surname>
<given-names>Wenjing</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/3046789/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Dong</surname>
<given-names>Fan</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/2083053/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Liu</surname>
<given-names>Jie</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/2009453/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Aslam</surname>
<given-names>Aasma</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Patterson</surname>
<given-names>Tucker A.</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/155125/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Hong</surname>
<given-names>Huixiao</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/472005/overview"/>
</contrib>
</contrib-group>
<aff>
<institution>National Center for Toxicological Research</institution>, <institution>U.S. Food and Drug Administration</institution>, <addr-line>Jefferson</addr-line>, <addr-line>AR</addr-line>, <country>United States</country>
</aff>
<author-notes>
<corresp id="c001">&#x2a;Correspondence: Huixiao Hong, <email>huixiao.hong@fda.hhs.gov</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>05</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>250</volume>
<elocation-id>10374</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>09</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>04</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2025 Guo, Dong, Liu, Aslam, Patterson and Hong.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Guo, Dong, Liu, Aslam, Patterson and Hong</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Adverse drug events are harms associated with drug use, whether the drug is used correctly or incorrectly. Identifying adverse drug events is vital in pharmacovigilance to safeguard public health. Drug safety surveillance can be performed using unstructured data. A comprehensive and accurate list of drug names is essential for effective identification of adverse drug events. While there are numerous sources for drug names, RxNorm is widely recognized as a leading resource. However, its effectiveness for unstructured data analysis in drug safety surveillance has not been thoroughly assessed. To address this, we evaluated the drug names in RxNorm for their suitability in unstructured data analysis and developed a refined set of drug names. Initially, we removed duplicates, the names exceeding 199 characters, and those that only describe administrative details. Drug names with four or fewer characters were analyzed using 18,000 drug-related PubMed abstracts to remove names which rarely appear in unstructured data. The remaining names, which ranged from five to 199 characters, were further refined to exclude those that could lead to inaccurate drug counts in unstructured data analysis. We compared the efficiency and accuracy of the refined set with the original RxNorm set by testing both on the 18,000 drug-related PubMed abstracts. The results showed a decrease in both computational cost and the number of false drug names identified. Further analysis of the removed names revealed that most originated from only one of the 14 sources. Our findings suggest that the refined set can enhance drug identification in unstructured data analysis, thereby improving pharmacovigilance.</p>
</abstract>
<kwd-group>
<kwd>adverse drug events</kwd>
<kwd>pharmacovigilance</kwd>
<kwd>natural language processing</kwd>
<kwd>database</kwd>
<kwd>DrugBank</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Impact statement</title>
<p>Adverse drug events are a significant concern for public health, necessitating accurate detection in drug safety surveillance. While unstructured data is a valuable source for identifying adverse drug events, effective analysis depends on a comprehensive and accurate list of drug names. Although RxNorm is recognized for providing standardized drug names, its effectiveness in unstructured data analysis remains unassessed. Our research refined the list of RxNorm drug names to improve its suitability for unstructured data analysis. By removing duplicates, excessively long names, false names, and replaceable names, we created a more accurate and efficient list of drug names. Testing this refined set on drug-related PubMed abstracts revealed improved accuracy and reduced computational costs compared to the original RxNorm list. This refined list of drug names enables more accurate monitoring of adverse drug events, providing a valuable tool for improving drug safety surveillance and protecting public health.</p>
</sec>
<sec sec-type="intro" id="s2">
<title>Introduction</title>
<p>Adverse drug events (ADEs) are harmful responses to medications that pose significant risks to patients with millions of deaths and hospitalization annually [<xref ref-type="bibr" rid="B1">1</xref>]. Effective monitoring of ADEs through drug safety surveillance is crucial for protecting public health. Drug safety surveillance begins in clinical trials, where new drugs are rigorously tested for safety and efficacy. However, clinical trials are limited by short exposure periods and the size and diversity of the tested population [<xref ref-type="bibr" rid="B2">2</xref>]. Therefore, post-market drug safety surveillance is crucial to identify potential ADEs in a large population, particularly for drugs repurposed to treat COVID-19. For example, originally developed for the treatment of hepatitis C, Remdesivir was later evaluated for antiviral activity against other viruses and, in 2020, received FDA approval for the treatment of COVID-19. Traditionally, post-market surveillance relies on spontaneous adverse event reporting systems [<xref ref-type="bibr" rid="B3">3</xref>, <xref ref-type="bibr" rid="B4">4</xref>]. In the United States, the Food and Drug Administration&#x2019;s Adverse Event Reporting System (FAERS) [<xref ref-type="bibr" rid="B5">5</xref>] collects adverse event reports, medication error reports, and product quality complaints from various sources, including the MedWatch program. FAERS has been widely used to investigate drug safety issues [<xref ref-type="bibr" rid="B6">6</xref>&#x2013;<xref ref-type="bibr" rid="B9">9</xref>]. However, FAERS relies on voluntary reporting, which can result in underreporting and delays in identifying ADEs. In recent years, unstructured text data has become valuable sources for investigating ADEs.</p>
<p>To effectively analyze unstructured data for drug safety surveillance, it is important to identify drugs and associated ADEs. One challenge for identifying drugs in unstructured data is different names used for the same drugs. The active ingredient, generic names, trade names, brand names, and even street names can be used to indicate the same drug in unstructured text. Using acetaminophen, a commonly used analgesic, as an example, Tylenol, Paracetamol, Panadol, Anacin, Feverall, Mapap, Ofirmev, Tempra, and APAP (the abbreviation for its chemical name, N-acetyl-para-aminophenol) are names used for the same drug in unstructured documents. The use of various names for the same drugs in unstructured data complicates accurate identification of drugs, making the standardization and normalization of drug names essential.</p>
<p>Various methods have been used in the standardization and normalization of drug names, including dictionary-based methods [<xref ref-type="bibr" rid="B10">10</xref>], rule-based systems [<xref ref-type="bibr" rid="B11">11</xref>&#x2013;<xref ref-type="bibr" rid="B16">16</xref>], advanced machine learning models [<xref ref-type="bibr" rid="B17">17</xref>&#x2013;<xref ref-type="bibr" rid="B20">20</xref>], and hybrid approaches [<xref ref-type="bibr" rid="B19">19</xref>]. Dictionary-based methods use comprehensive drug dictionaries built from various sources to identify drug names [<xref ref-type="bibr" rid="B10">10</xref>]. In these methods, a comprehensive dictionary like RxNorm is essential to ensure accurate recognition of complex or less common drug names [<xref ref-type="bibr" rid="B21">21</xref>].</p>
<p>Rule-based systems, on the other hand, rely on predefined patterns or contextual rules to identify drug names. These rules can be either composition-based, focusing on systematic naming conventions, or context-based, extracting names based on surrounding text features [<xref ref-type="bibr" rid="B22">22</xref>, <xref ref-type="bibr" rid="B23">23</xref>]. Despite the rigidity and extensive manual effort required to develop and maintain these rules and dictionaries&#x2014;especially given the evolving nature of language and the introduction of new terminology&#x2014;both dictionary and rule-based methods remain crucial for establishing a baseline of accurate drug identification.</p>
<p>To enhance the matching and normalization processes, similarity algorithms such as Levenshtein distance [<xref ref-type="bibr" rid="B24">24</xref>], cosine similarity [<xref ref-type="bibr" rid="B25">25</xref>], and Jaccard index [<xref ref-type="bibr" rid="B25">25</xref>] can be used. These techniques measure the similarity between drug names and help link various names of the same drug to a standard drug name [<xref ref-type="bibr" rid="B26">26</xref>, <xref ref-type="bibr" rid="B27">27</xref>], further improving the accuracy of drug name standardization.</p>
<p>With the increasing availability of annotated datasets, machine learning-based models have gained significant popularity in this field [<xref ref-type="bibr" rid="B10">10</xref>, <xref ref-type="bibr" rid="B17">17</xref>&#x2013;<xref ref-type="bibr" rid="B20">20</xref>, <xref ref-type="bibr" rid="B28">28</xref>]. Notable techniques such as Conditional Random Forest (CRF) [<xref ref-type="bibr" rid="B29">29</xref>], Hidden Markov Models (HMM), Recurrent Neural Networks (RNN) [<xref ref-type="bibr" rid="B30">30</xref>], and Bi-directional Long Short-Term Memory CRF (BI-LSTM-CRF) [<xref ref-type="bibr" rid="B31">31</xref>&#x2013;<xref ref-type="bibr" rid="B33">33</xref>], and Bidirectional Encoder Representations from Transformers (BERT) [<xref ref-type="bibr" rid="B15">15</xref>] have been employed for drug name identification and normalization. These models leverage various features, including domain-specific attributes and word representation features, to improve accuracy.</p>
<p>Hybrid approaches have also emerged, integrating multiple methods to capitalize on the strengths of different models while mitigating their weaknesses [<xref ref-type="bibr" rid="B19">19</xref>]. For example, a semi-supervised machine learning technique known as feature coupling generalization was applied to refine a drug name dictionary, which was constructed from sources such as DrugBank and PubMed, to enhance drug name recognition in unstructured textual data [<xref ref-type="bibr" rid="B19">19</xref>].</p>
<p>To create a drug name dictionary, different names for the same drug are linked to a standardized name. A comprehensive dictionary is essential for accurate drug identification and normalization. RxNorm [<xref ref-type="bibr" rid="B34">34</xref>], a standardized vocabulary developed by the National Library of Medicine (NLM), plays a key role in these processes. RxNorm compiles drug names from 13 different sources and further standardizes them under its own unique terminology, RxNorm, bringing the total to 14 distinct sources, enabling consistent linkage of various drug names across different databases. The integration of RxNorm with both rule-based and machine learning approaches enhances the identification and normalization of drug names.</p>
<p>Although RxNorm is widely used in clinical settings, such as electronic health records and clinical decision support systems [<xref ref-type="bibr" rid="B35">35</xref>], it faces several limitations when analyzing unstructured data. One significant issue is the extensive variability in the length of drug names within RxNorm, which can range from one to over 2000 characters. These extremely short or long names are seldom found in unstructured text. Moreover, RxNorm includes distinct entries, various drug formats, and dosages, which are typically omitted when discussing experience with drugs in unstructured text. Even when such details are mentioned, they are often inconsistent and incomplete.</p>
<p>Additionally, RxNorm&#x2019;s approach of combining drug names with specific dosages as separate entries can lead to multiple hits for the same drug in a single text. For example, &#x201c;Acetaminophen&#x201d; and &#x201c;Acetaminophen 325&#xa0;mg&#x201d; are distinct entries in RxNorm. If both terms are included in a drug name dictionary, a sentence like &#x201c;Acetaminophen 325&#xa0;mg caused my mom&#x2019;s liver injury&#x201d; could lead to two matches&#x2014;one for &#x201c;Acetaminophen&#x201d; and another for &#x201c;Acetaminophen 325&#xa0;mg&#x201d;&#x2014; resulting in redundant counts of the adverse event. These complexities stress the need for a refined set of drug names to improve the accuracy and efficiency of drug identification in unstructured data.</p>
<p>The purpose of this study is to develop an enhanced set of drug names from RxNorm, specifically tailored for identifying drug names in unstructured data for drug safety surveillance. By refining the existing drug names in RxNorm, this study aims to address current limitations and improve the accuracy and efficiency of drug identification in unstructured data.</p>
</sec>
<sec sec-type="materials|methods" id="s3">
<title>Materials and methods</title>
<sec id="s3-1">
<title>Study design</title>
<p>The workflow for generating this refined set and assessing its accuracy and efficiency is depicted in <xref ref-type="fig" rid="F1">Figure 1</xref>. Initially, a comprehensive list of drug names was downloaded from the RxNorm database. This was followed by a systematic process of removing duplicates, incorrect names, and names that could potentially cause inaccurate counts in unstructured data analysis. Drug names were classified into three categories and filtered out by those with fewer than 4 characters, those with between 5 and 199 characters, and those with 200 or more characters.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Study overview. The flowchart illustrates the procedures used to generate and evaluate a refined set of drug names from RxNorm, including extraction of drug names from the RxNorm website, removal of duplicates, filtering false names, discarding names that likely lead to redundant occurrence counts in unstructured data analysis, and evaluating accuracy and efficiency of the refined set.</p>
</caption>
<graphic xlink:href="ebm-250-10374-g001.tif"/>
</fig>
</sec>
<sec id="s3-2">
<title>Data sources</title>
<p>RxNorm file released on July 3, 2023 (RxNorm_full_07032023.zip) was downloaded from RxNorm repository [<xref ref-type="bibr" rid="B36">36</xref>]. The &#x201c;RXNCONSO.RRF&#x201d; file within this package was used to extract drug names. Specifically, drug names were obtained from the &#x201c;STR&#x201d; (string) column, while their corresponding types were identified from the &#x201c;TTY&#x201d; (type of terms) column, which includes categories such as brand name, synonyms, and others.</p>
<p>To ensure relevance, name types not associated with specific drugs were excluded based on the guidelines provided in the RxNorm technical documentation [<xref ref-type="bibr" rid="B37">37</xref>]. For instance, terms like dose form, dose form group, and special category&#x2014;which describe routes of administration rather than specific drugs&#x2014;were removed. The source of each drug name is indicated in the &#x201c;SAB&#x201d; (source abbreviation) column: ATC (Anatomical Therapeutic Chemical Classification System), CVX (Vaccines Administered), DB (DrugBank), GS (Gold Standard Drug Database), MMSL (Micromedex RED BOOK), MMX (Micromedex), MSH (Medical Subject Headings), MTHCMS (CMS Formulary Reference File), MTHSPL (FDA Structured Product Labeling), NDDF (First Databank), RXNORM (RxNorm itself), SNOMED (SNOMED Clinical Terms), USP (United States Pharmacopeia), and VANDF (Veterans Health Administration National Drug File).</p>
<p>To evaluate the extracted drug names, a dataset of 18,000 drug-related PubMed abstracts was prepared. These abstracts were retrieved by searching PubMed using the keyword &#x201c;drug&#x201d; via the Entrez Programming Utilities [<xref ref-type="bibr" rid="B38">38</xref>] (E-Utlilities) developed by the National Center for Biotechnology Information (NCBI). To comply with NCBI guidelines, we designated an email address for Entrez queries. On 22 May 2024, we generated a search query using the keyword &#x201c;drug&#x201d; without imposing any timeframe restrictions, ensuring the retrieval of all available abstracts up to that date. Entrez was used to retrieve 20,000 PubMed abstract IDs matching this query. Due to the limitation on the number of abstracts that can be fetched in a single request, we retrieved the IDs in two batches, with each batch containing 10,000 IDs. Abstracts were fetched and output for each batch. Although 20,000 IDs were obtained, 18,520 abstracts were successfully retrieved due to some missing entries. Ultimately, we used the first 18,000 abstracts, choosing this round number to simplify subsequent calculations.</p>
</sec>
<sec id="s3-3">
<title>Refinement of RxNorm drug names</title>
<p>The first step is to remove duplicates and exclude drug names that are not associated with specific drugs. This includes eliminating terms that describe dose form, dose form group, and special category&#x2014;such as &#x201c;oral tablet,&#x201d; &#x201c;chewable product,&#x201d; and &#x201c;medical supplies&#x201d;&#x2014;since these are not linked to particular drugs and should, therefore, be excluded. Brand and generic drug names were retained to ensure comprehensive drug identification. For example, both Daytrana (patch) and Ritalin (oral tablet) were included as brand names for methylphenidate. This approach ensures that drug identification focuses on the medication itself while preventing redundant counts based on formulation differences. However, we recognize that ADEs can sometimes be associated with the delivery method rather than the active ingredient. For instance, systemic methylphenidate may be linked to behavioral effects like aggression, while transdermal formulations such as Daytrana may cause localized reactions like rash.</p>
<p>For drug names with four or fewer characters such as APAP (Acetaminophen), ASA (Aspirin), and HCTZ (Hydrochlorothiazide), their use frequency in unstructured data were tested in 18,000 drug-related PubMed abstracts to remove those that would rarely appear in drug-related documents. Drug names that were not found in these abstracts were considered rare and removed. We used the &#x201c;en_core_web_sm&#x201d; model from the spaCy [<xref ref-type="bibr" rid="B39">39</xref>] natural language processing (NLP) library to identify and count occurrences of these drug names within the abstracts. Each abstract was tokenized, and both tokens and drug names were converted to lowercase for consistency. We then compared each token against the list of drug names, recording an occurrence whenever a match was found. Drug names with zero occurrences were excluded from the final list.</p>
<p>For drug names with five to 199 characters, we examined their potential redundant occurrences in unstructured data analysis. If a drug name contains another drug name, leading to redundant counts, it was discarded. To identify distinct drug names that overlap with discarded names but not with other distinct names, we split each drug name into words using the Python&#x2019;s &#x201c;re.split&#x201d; function (version 3.11.7 in Anaconda). The names were then sorted by word count. We checked if the words in a drug name contained all words of another name. If a drug name that does contain all the words of any other names, it was removed. Drug names with 199 or more characters were removed entirely, as they are unlikely to appear in real-world unstructured texts.</p>
</sec>
<sec id="s3-4">
<title>Assessment of the refined set</title>
<p>To evaluate the efficiency and accuracy of the refined set of drug names in unstructured data analysis, we conducted drug identification on the 18,000 drug-related PubMed abstracts. The refined and original drug names were converted to lowercase and tokenized using the &#x201c;en_core_web_sm&#x201d; in spaCy. These tokenized drug names were used to create matching patterns, which were added to spaCy&#x2019;s PhraseMatcher. Each abstract was tokenized, and the PhraseMatcher compared each sequence of tokens against the created matching patterns. When a match was found, the drug name was recorded.</p>
<p>Efficiency was measured by comparing the computational time required for both the refined and original RxNorm drug name sets. Accuracy was calculated as the ratio of drug names identified within the abstracts to the total number of drug names, for both the refined and original sets.</p>
</sec>
</sec>
<sec sec-type="results" id="s4">
<title>Results</title>
<sec id="s4-1">
<title>Refinement of drug names</title>
<p>
<xref ref-type="table" rid="T1">Table 1</xref> provides a summary of the percentages of words removed at each stage of the refinement process, offering a clearer overview of the impact of our filtering criteria.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Summary of removed words for each drug name type.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Name type</th>
<th align="center">Percentage of removed words</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">Duplicates</td>
<td align="center">23.61</td>
</tr>
<tr>
<td align="center">Non-drug Names</td>
<td align="center">0.09</td>
</tr>
<tr>
<td align="center">Drug Names with less than 5 characters</td>
<td align="center">0.06</td>
</tr>
<tr>
<td align="center">Drug Names with 5-199 characters</td>
<td align="center">65.78</td>
</tr>
<tr>
<td align="center">Drug Names with &#x3e;200 characters</td>
<td align="center">1.53</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-2">
<title>Download and processing of drug names</title>
<p>To refine the drug names in RxNorm, we downloaded the RxNorm file released on July 3, 2023, from the RxNorm website [<xref ref-type="bibr" rid="B40">40</xref>]. The &#x201c;RXNCONSO&#x201d; file in the downloaded zipped files was used to obtain drug names and other related information, with drug names stored in the &#x201c;STR&#x201d; column. A total of 1,143,201 drug names were retrieved from which 269,931 duplicates were identified and removed. Then, we examined the types of the retained drug names to remove those not containing specific drug information. According to the RxNorm technical documentation [<xref ref-type="bibr" rid="B41">41</xref>], three term types (DF, DFG, SC) pertain to administrative details rather than specific drugs. We removed 1,009 drug names belonging to these categories.</p>
</sec>
<sec id="s4-3">
<title>Drug names with four or fewer characters</title>
<p>We used 18,000 drug-related PubMed abstracts to evaluate the occurrence of drug names with four or fewer characters. Out of 1260 drug names, 687 had zero occurrences and were discarded. The occurrences of the remaining drug names with the abstracts are provided in <xref ref-type="sec" rid="s12">Supplementary Table S1</xref>.</p>
<p>We further analyzed the sources of the 687 discarded names. Our analysis showed that the majority originated from a single source among the 14 in RxNorm, indicating that drug names from a single source are unlikely to appear in unstructured drug-related texts. This result is not surprising, as these names lack corroboration from other sources. We also examined the source distribution of these 557 names. As shown in <xref ref-type="fig" rid="F2">Figure 2A</xref>, DrugBank had the highest number (289), followed by SNOMEDCT_US (84) and MSH (84). In total, DrugBank, SNOMEDCT_US, and MSH, contained 628, 250, and 233 drug names with four or fewer characters, respectively. This indicates that approximately 46%, 34%, and 36% of such names from DrugBank, SNOMEDCT_US, and MSH were excluded. In contrast, sources like NDDF and MTHSPL had fewer names of this length and a lower removal rate, with only 1 out of 60 from NDDF and 6 out of 62 from MTHSPL being removed.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Source distribution of the removed drug names that only originate from a single source for names with four or fewer characters <bold>(A)</bold>, names with five to 199 characters <bold>(B)</bold>, and names with 200 or more characters <bold>(C)</bold>. The y-axes give number of names and x-axes depict name sources. Abbreviations: ATC (Anatomical Therapeutic Chemical Classification System), CVX (Vaccines Administered), DB (DrugBank), GS (Gold Standard Drug Database), MMSL (Micromedex RED BOOK), MMX (Micromedex), MSH (Medical Subject Headings), MTHCMS (CMS Formulary Reference File), MTHSPL (FDA Structured Product Labeling), NDDF (First Databank), RXNORM (RxNorm itself), SNOMED (SNOMED Clinical Terms), USP (United States Pharmacopeia), and VANDF (Veterans Health Administration National Drug File).</p>
</caption>
<graphic xlink:href="ebm-250-10374-g002.tif"/>
</fig>
</sec>
<sec id="s4-4">
<title>Drug names with five to 199 characters</title>
<p>For drug names with five to 199 characters, we excluded those that could lead to redundant occurrence counts in unstructured data analysis. For example, using both original drug names &#x201c;Acetaminophen&#x201d; and &#x201c;Acetaminophen 325&#xa0;MG Oral Tablet&#x201d; to identify adverse events for drugs in the text &#x201c;my brother had headache after take acetaminophen 325&#xa0;MG tablet&#x201d;, might lead to two counts for the adverse event &#x201c;headache&#x201d; when only one should be recorded. Therefore, drug names that contain other names were removed, while distinct names without overlaps were retained. Out of 853,472 names with five to 199 characters, 101,491 are distinct names and were retained, whereas 751,981 names, which contain other names, were removed.</p>
<p>A significant portion of the removed names (730,113 out of 751,981) originate from only one of the 14 sources in RxNorm. The source distribution of these removed single-sourced names is shown in <xref ref-type="fig" rid="F2">Figure 2B</xref>. Most of these drug names came from RxNorm, followed by MTHSPL, SNOMEDCT_US, NDDF, and MSSL. Specifically, RxNorm, MTHSPL, SNOMEDCT_US, NDDF, and MMSL provided 279,465, 121,035, 108,421, 99,054, and 91,270 drug names with five to 199 characters, respectively. The removal rates for these names are notably high: 87.8% for RxNorm, 85.7% for MTHSPL, 80.4% for SNOMEDCT_US, 71.9% for MMSL, and 69.7% for NDDF. In contrast, only 16.4% (5,098 out of 31,041) of the names with five to 199 characters from DrugBank were removed.</p>
</sec>
<sec id="s4-5">
<title>Drug names with 200 or more characters</title>
<p>Drug names with 200 or more characters are rarely used in unstructured data and, therefore, were excluded. A total of 17,529 such drug names were found in RxNorm and excluded. All these names originated from a single source, with the source distribution depicted in <xref ref-type="fig" rid="F2">Figure 2C</xref>.</p>
</sec>
<sec id="s4-6">
<title>Evaluation of the refined drug names set</title>
<p>The refined set of drug names include 573 names with four or fewer characters and 101,491 names with five to 199 characters. We analyzed the distribution of drug name lengths between the refined set and the original RxNorm set. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, longer drug names were less likely to be retained in the refined set. This suggests that longer drug names are more prone to generating redundant occurrence counts in unstructured data analysis compared to shorter drug names and were thus discarded.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Comparison of name length between the refined set and the original RxNorm set. The y-axis shows the number of drug names, and the x-axis indicates name length. Name lengths were color coded in red for the refined sets and in blue for the original RxNorm set.</p>
</caption>
<graphic xlink:href="ebm-250-10374-g003.tif"/>
</fig>
<p>To evaluate the efficiency and accuracy of the refined set of drug names, we used 18,000 drug-related PubMed abstracts. Our results revealed that 3,065 names were identified in the abstracts, with lengths ranging from 1 to 46 characters. When we evaluated the original RxNorm set using the same abstracts, we found 4,471 names with lengths ranging from 1 to 66 characters. The additional 1,046 names that RxNorm identified in the abstracts were either false drug names or names likely leading to redundant occurrence counts in unstructured data analysis. These names were excluded from the refined set, with the majority originating from DrugBank and SNOMEDCT_US, as shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. Our results reveal that the refined set of drug names improved drug identification accuracy in analyzing unstructured texts compared to the original RxNorm set.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Source of original RxNorm drug names that were excluded from the refined set but identified in the PubMed abstracts. The y-axis represents number of drug names and the x-axis depicts sources. Abbreviations: ATC (Anatomical Therapeutic Chemical Classification System), CVX (Vaccines Administered), DB (DrugBank), GS (Gold Standard Drug Database), MMSL (Micromedex RED BOOK), MMX (Micromedex), MSH (Medical Subject Headings), MTHCMS (CMS Formulary Reference File), MTHSPL (FDA Structured Product Labeling), NDDF (First Databank), RXNORM (RxNorm itself), SNOMED (SNOMED Clinical Terms), USP (United States Pharmacopeia), and VANDF (Veterans Health Administration National Drug File).</p>
</caption>
<graphic xlink:href="ebm-250-10374-g004.tif"/>
</fig>
<p>The efficiency of the refined set of drug names was measured using the computational time required to analyze the abstracts. The analysis using the refined set took 1,910&#xa0;s, while using the original RxNorm set took 6,301 seconds&#x2014;over three times longer. Our results demonstrate a significant improvement in efficiency when analyzing unstructured data, making the refined set more suitable for real-time drug safety surveillance.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s5">
<title>Discussion</title>
<p>Artificial intelligence is increasingly playing a critical role in evaluating drug safety and chemical toxicity. By harnessing machine learning algorithms and computational models, artificial intelligence can predict adverse effects, identify toxic compounds, and improve pharmacovigilance efforts. There are two main types of data involved: structured and unstructured. Due to their distinct formats and organization, machine learning techniques are applied differently to each. Structured data is well-organized and easily interpretable by machines, making it a natural fit for a wide range of safety assessments and toxicity endpoints [<xref ref-type="bibr" rid="B40">40</xref>&#x2013;<xref ref-type="bibr" rid="B53">53</xref>]. In contrast, unstructured data lacks a predefined format, which makes it more challenging to process and analyze. To effectively apply machine learning techniques, such as natural language processing and recurrent neural networks, to unstructured data in pharmacovigilance, a reliable and comprehensive set of drug names is essential.</p>
<p>In this study, we generated a refined set of drug names from RxNorm to improve the accuracy and efficiency of drug identification in unstructured data. The original RxNorm set contained duplicates, non-specific drug names, and names that were either too long or too short, which hindered effective drug identification in unstructured data. Our objective was to exclude such names from analysis of unstructured texts. The refined set was evaluated using 18,000 drug-related PubMed abstracts, demonstrating enhanced accuracy and efficiency in drug identification, thereby potentially improving drug safety surveillance through unstructured data analysis.</p>
<p>Single-sourced drug names, originated from only one of the 14 sources in RxNorm, are generally less reliable than names corroborated by multiple sources. These single-sourced names tend to cause incorrect identification or generate redundant occurrence counts when analyzing unstructured data, affecting both the accuracy and efficiency of drug identification. Our results revealed that the majority of the removed names were single-sourced, highlighting the importance of utilizing drug names validated by multiple sources.</p>
<p>Furthermore, most of the removed single-sourced names originated from FDA Structured Label, RxNorm, and SNOMEDCT_US. These sources serve distinct roles in drug information management. The FDA Structured Product Label provides comprehensive regulatory drug details, including dosage, formulation, and safety information, to ensure clarity and reduce medication errors. RxNorm standardizes drug names by linking ingredients, strengths, and dosage forms, facilitating interoperability across electronic health systems. SNOMED CT, on the other hand, is primarily used for clinical documentation and coding within electronic health records.</p>
<p>RxNorm integrates drug names from multiple external sources; however, not all names from contributing databases are necessarily included. Furthermore, many drug names appear in multiple sources within RxNorm, potentially leading to redundant listings. To mitigate this, our analysis systematically identified and removed duplicate drug names contributed by multiple sources, ensuring that each unique drug name was counted only once. While these structured resources are essential for clinical and regulatory use, their detailed naming conventions can complicate drug identification in unstructured data. Refining these names is crucial to enhance their applicability in text-based analyses.</p>
<p>On the other hand, sources like DrugBank and MSH showed varying levels of reliability across different lengths of drug names. For drug names with four or fewer characters, DrugBank had a relatively high removal rate of 46%, indicating that many of these names are unlikely to appear in unstructured data. However, the removal rate for DrugBank drug names with five to 199 characters significantly reduced to 16.4%, suggesting that these names are more reliable in unstructured data analysis. Similarly, MSH had a high removal rate of 36% for names with four or fewer characters and a lower rate of 24% for names with five to 199 characters. Our results suggest that more caution is needed when using short names from DrugBank and MSH in unstructured data analysis for drug safety surveillance compared to their longer names.</p>
<p>Despite the improvements in accuracy and efficiency demonstrated by the refined set, some limitations should be noted. First, our refined set of drug names is not error-free for unstructured data analysis, and some unsuitable names may persist. For example, short drug names in the refined set might include common words that, depending on the context, do not refer to drugs. Second, as RxNorm is primarily composed of professionally used names, it may not capture the variations found in street names or slang used in non-professional documents. Third, because RxNorm is updated monthly, regular updates are necessary to maintain the accuracy and relevance of the refined set. Finally, our evaluation was limited to 18,000 drug-related PubMed abstracts. Although we focused on abstracts containing the keyword &#x201c;drug&#x201d; to increase the likelihood of identifying drug names, these abstracts may not represent other unstructured real-world data. We selected the keyword &#x201c;drug&#x201d; to maximize the inclusion of abstracts that explicitly mention specific drug names. Alternative terms such as &#x201c;medications&#x201d; or &#x201c;pharmacologic&#x201d; were not used, as they are often associated with broader discussions on treatment strategies, pharmacological mechanisms, or drug classes rather than individual drug names. Additionally, a composite search incorporating all relevant MeSH terms was not conducted to ensure consistency with prior studies that employed keyword-based retrieval for drug-related text analysis. This approach maintains methodological alignment while optimizing the extraction of relevant drug name mentions.</p>
<p>Further efforts are needed to enhance the refined set. One such effort involves evaluating the set more comprehensively using diverse unstructured data. Additionally, the refined set could be improved by integrating advanced algorithms and machine learning techniques. Machine learning algorithms, particularly those involving similarity measurements, could be trained to recognize and link synonymous drug names, thereby improving accuracy. Natural language processing techniques like BERT could also be employed to better understand the context in which drug names appear, further enhancing accuracy. Finally, developing automated processes for updating the drug names in the dataset is crucial. As RxNorm updates its dataset monthly, maintaining the refined set through an automated update process will ensure its continued reliability for unstructured data mining in drug safety surveillance.</p>
</sec>
<sec sec-type="conclusion" id="s6">
<title>Conclusion</title>
<p>The development of the refined set of drug names from RxNorm has shown significant improvements in the accuracy and efficiency of drug identification in unstructured data. This refined dataset could be valuable for extracting drug-related information from unstructured data, thereby supporting more effective monitoring and management of drug safety through unstructured data analysis. Our study also highlights the importance of addressing the limitations of existing drug names when used for unstructured data mining, particularly in the context of drug safety surveillance.</p>
</sec>
</body>
<back>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>WG and HH designed the work. WG, FD, JL, and AA conducted data analysis. WG and HH wrote the first draft. TP revised the manuscript. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Author disclaimer</title>
<p>This article reflects the views of the authors and does not necessarily reflect those of the U.S. Food and Drug Administration.</p>
</sec>
<sec id="s9">
<title>Data availability</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec sec-type="funding-information" id="s10">
<title>Funding</title>
<p>The author(s) declare that financial support was received for the research and/or publication of this article. This study was funded by the US Food and Drug Administration (FDA). This research was supported in part by an appointment to the Research Participation Program at the National Center for Toxicological Research (AA), administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration.</p>
</sec>
<sec sec-type="COI-statement" id="s11">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s12">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.ebm-journal.org/articles/10.3389/ebm.2025.10374/full#supplementary-material">https://www.ebm-journal.org/articles/10.3389/ebm.2025.10374/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.csv" id="SM1" mimetype="application/csv" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Classen</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Pestotnik</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Evans</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Lloyd</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Burke</surname>
<given-names>JP</given-names>
</name>
</person-group>. <article-title>Adverse drug events in hospitalized patients. Excess length of stay, extra costs, and attributable mortality</article-title>. <source>Jama</source> (<year>1997</year>) <volume>277</volume>:<fpage>301</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1001/jama.1997.03540280039031</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harpaz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>DuMouchel</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>NH</given-names>
</name>
<name>
<surname>Madigan</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Ryan</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>Novel data-mining methodologies for adverse drug event discovery and analysis</article-title>. <source>Clin Pharmacol Ther</source> (<year>2012</year>) <volume>91</volume>:<fpage>1010</fpage>&#x2013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1038/clpt.2012.50</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alomar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tawfiq</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Hassan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Palaian</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Post marketing surveillance of suspected adverse drug reactions through spontaneous reporting: current status, challenges and the future</article-title>. <source>Ther Adv Drug Saf</source> (<year>2020</year>) <volume>11</volume>:<fpage>2042098620938595</fpage>. <pub-id pub-id-type="doi">10.1177/2042098620938595</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Waller</surname>
<given-names>PC</given-names>
</name>
</person-group>. <article-title>Making the most of spontaneous adverse drug reaction reporting</article-title>. <source>Basic and Clin Pharmacol and Toxicol</source> (<year>2006</year>) <volume>98</volume>:<fpage>320</fpage>&#x2013;<lpage>3</lpage>. <pub-id pub-id-type="doi">10.1111/j.1742-7843.2006.pto_286.x</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="web">
<collab>U.S. Food and Drug Administration</collab>. <article-title>Questions and answers on FDA&#x2019;s adverse event reporting system (FAERS)</article-title>. <comment>Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/drugs/surveillance/questions-and-answers-fdas-adverse-event-reporting-system-faers#:%7E:text=What%20is%20FAERS%3F,that%20were%20submitted%20to%20FDA">https://www.fda.gov/drugs/surveillance/questions-and-answers-fdas-adverse-event-reporting-system-faers&#x23;:&#x223c;:text&#x3d;What%20is%20FAERS%3F,that%20were%20submitted%20to%20FDA</ext-link> (Accessed January 8, 2024).</comment>
</citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Sakkiah</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Yavas</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Y</given-names>
</name>
<etal/>
</person-group> <article-title>Informing selection of drugs for COVID-19 treatment through adverse events analysis</article-title>. <source>Scientific Rep</source> (<year>2021</year>) <volume>11</volume>:<fpage>14022</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-021-93500-5</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Q</given-names>
</name>
<etal/>
</person-group> <article-title>Safety assessment of Yasmin: real-world adverse event analysis using the FAERS database</article-title>. <source>Eur J Obstet and Gynecol Reprod Biol</source> (<year>2024</year>) <volume>301</volume>:<fpage>12</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1016/j.ejogrb.2024.07.048</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>A real-world data analysis of acetylsalicylic acid in FDA Adverse Event Reporting System (FAERS) database</article-title>. <source>Expert Opin Drug Metab and Toxicol</source> (<year>2023</year>) <volume>19</volume>:<fpage>381</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1080/17425255.2023.2235267</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Francis</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Lyn-Cook</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>YT</given-names>
</name>
<etal/>
</person-group> <article-title>A systematic analysis and data mining of opioid-related adverse events submitted to the FAERS database</article-title>. <source>Exp Biol Med (Maywood)</source> (<year>2023</year>) <volume>248</volume>:<fpage>1944</fpage>&#x2013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1177/15353702231211860</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hettne</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Stierum</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Schuemie</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Hendriksen</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Schijvenaars</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Mulligen</surname>
<given-names>EM</given-names>
</name>
<etal/>
</person-group> <article-title>A dictionary to identify small molecules and drugs in free text</article-title>. <source>Bioinformatics</source> (<year>2009</year>) <volume>25</volume>:<fpage>2983</fpage>&#x2013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp535</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sohn</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Halgrim</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>SP</given-names>
</name>
<name>
<surname>Chute</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H</given-names>
</name>
</person-group>. <article-title>MedXN: an open source medication extraction and normalization tool for clinical text</article-title>. <source>J Am Med Inform Assoc</source> (<year>2014</year>) <volume>21</volume>:<fpage>858</fpage>&#x2013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1136/amiajnl-2013-002190</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Stenner</surname>
<given-names>SP</given-names>
</name>
<name>
<surname>Doan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Waitman</surname>
<given-names>LR</given-names>
</name>
<name>
<surname>Denny</surname>
<given-names>JC</given-names>
</name>
</person-group>. <article-title>MedEx: a medication information extraction system for clinical narratives</article-title>. <source>J Am Med Inform Assoc</source> (<year>2010</year>) <volume>17</volume>:<fpage>19</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1197/jamia.m3378</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ermshaus</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Piechotta</surname>
<given-names>M</given-names>
</name>
<name>
<surname>R&#xfc;ter</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Keilholz</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Leser</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Benary</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>preon: fast and accurate entity normalization for drug names and cancer types in precision oncology</article-title>. <source>Bioinformatics</source> (<year>2024</year>) <volume>40</volume>:<fpage>btae085</fpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btae085</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fung</surname>
<given-names>KW</given-names>
</name>
<name>
<surname>Bodenreider</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Aronson</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Hole</surname>
<given-names>WT</given-names>
</name>
<name>
<surname>Srinivasan</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Combining lexical and semantic methods of inter-terminology mapping using the UMLS</article-title>. <source>Stud Health Technol Inform</source> (<year>2007</year>) <volume>129</volume>:<fpage>605</fpage>&#x2013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miftahutdinov</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Kadurin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kudrin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tutubalina</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>Medical concept normalization in clinical trials with drug and disease representation learning</article-title>. <source>Bioinformatics</source> (<year>2021</year>) <volume>37</volume>:<fpage>3856</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btab474</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vasilakes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Rizvi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bompelli</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bodenreider</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>R</given-names>
</name>
</person-group>. <article-title>Normalizing dietary supplement product names using the RxNorm model</article-title>. <source>Stud Health Technol Inform</source> (<year>2019</year>) <volume>264</volume>:<fpage>408</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.3233/SHTI190253</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Patrick</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge</article-title>. <source>J Am Med Inform Assoc</source> (<year>2010</year>) <volume>17</volume>:<fpage>524</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1136/jamia.2010.003939</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chapman</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Peterson</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Alba</surname>
<given-names>PR</given-names>
</name>
<name>
<surname>DuVall</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>OV</given-names>
</name>
</person-group>. <article-title>Detecting adverse drug events with rapidly trained classification models</article-title>. <source>Drug Saf</source> (<year>2019</year>) <volume>42</volume>:<fpage>147</fpage>&#x2013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1007/s40264-018-0763-y</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Drug name recognition in biomedical texts: a machine-learning-based method</article-title>. <source>Drug Discov Today</source> (<year>2014</year>) <volume>19</volume>:<fpage>610</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1016/j.drudis.2013.10.006</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sampathkumar</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>XW</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>B</given-names>
</name>
</person-group>. <article-title>Mining adverse drug reactions from online healthcare forums using hidden Markov model</article-title>. <source>BMC Med Inform Decis Mak</source> (<year>2014</year>) <volume>14</volume>:<fpage>91</fpage>. <pub-id pub-id-type="doi">10.1186/1472-6947-14-91</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Lyn-Cook</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H</given-names>
</name>
<etal/>
</person-group> <article-title>RxNorm for drug name normalization: a case study of prescription opioids in the FDA adverse events reporting system</article-title>. <source>Front Bioinformatics</source> (<year>2023</year>) <volume>3</volume>:<fpage>1328613</fpage>. <pub-id pub-id-type="doi">10.3389/fbinf.2023.1328613</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamon</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Grabar</surname>
<given-names>N</given-names>
</name>
</person-group>. <article-title>Linguistic approach for identification of medication names and related information in clinical narratives</article-title>. <source>J Am Med Inform Assoc</source> (<year>2010</year>) <volume>17</volume>:<fpage>549</fpage>&#x2013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1136/jamia.2010.004036</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Segura-Bedmar</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Mart&#xed;nez</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Segura-Bedmar</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems</article-title>. <source>Drug Discov Today</source> (<year>2008</year>) <volume>13</volume>:<fpage>816</fpage>&#x2013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1016/j.drudis.2008.06.001</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Levenshtein</surname>
<given-names>VI</given-names>
</name>
</person-group>. <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>. <source>Soviet Phys Doklady</source> (<year>1965</year>) <volume>10</volume>:<fpage>707</fpage>&#x2013;<lpage>10</lpage>.</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Tan</surname>
<given-names>P-N</given-names>
</name>
<name>
<surname>Steinbach</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>V</given-names>
</name>
</person-group>. <source>Introduction to data mining</source>. <edition>1st ed</edition>. <publisher-name>Addison-Wesley Longman Publishing Co., Inc.</publisher-name> (<year>2005</year>).</citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J-MS</given-names>
</name>
</person-group>. <article-title>Extracting medication information from unstructured public health data: a demonstration on data from population-based and tertiary-based samples</article-title>. <source>BMC Med Res Methodol</source> (<year>2020</year>) <volume>20</volume>:<fpage>258</fpage>. <pub-id pub-id-type="doi">10.1186/s12874-020-01131-7</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peters</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Kapusnik-Uner</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bodenreider</surname>
<given-names>O</given-names>
</name>
</person-group>. <article-title>An approximate matching method for clinical drug names</article-title>. <source>AMIA Annu Symp Proc</source> (<year>2011</year>) <volume>2011</volume>:<fpage>1117</fpage>&#x2013;<lpage>26</lpage>.</citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rockt&#xe4;schel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Weidlich</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Leser</surname>
<given-names>U</given-names>
</name>
</person-group>. <article-title>ChemSpot: a hybrid system for chemical named entity recognition</article-title>. <source>Bioinformatics</source> (<year>2012</year>) <volume>28</volume>:<fpage>1633</fpage>&#x2013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts183</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lafferty</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>McCallum</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>FCN</given-names>
</name>
</person-group>. <article-title>Conditional random fields: probabilistic models for segmenting and labeling sequence data</article-title>. In: <source>Proceedings of the eighteenth international conference on machine learning</source>. <publisher-name>San Francisco, CA: Morgan Kaufmann Publishers Inc.</publisher-name> (<year>2001</year>). p. <fpage>282</fpage>&#x2013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
<etal/>
</person-group> <article-title>Entity recognition from clinical texts via recurrent neural network</article-title>. <source>BMC Med Inform Decis Mak</source> (<year>2017</year>) <volume>17</volume>:<fpage>67</fpage>. <pub-id pub-id-type="doi">10.1186/s12911-017-0468-7</pub-id>
</citation>
</ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jagannatha</surname>
<given-names>AN</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
</person-group>. <article-title>Structured prediction models for RNN based sequence labeling in clinical text</article-title>. <source>Proc Conf Empir Methods Nat Lang Process</source> (<year>2016</year>) <volume>2016</volume>:<fpage>856</fpage>&#x2013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.18653/v1/d16-1082</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Habibi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Weber</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Neves</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wiegandt</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Leser</surname>
<given-names>U</given-names>
</name>
</person-group>. <article-title>Deep learning with word embeddings improves biomedical named entity recognition</article-title>. <source>Bioinformatics</source> (<year>2017</year>) <volume>33</volume>:<fpage>i37</fpage>&#x2013;<lpage>i48</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btx228</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>J</given-names>
</name>
<etal/>
</person-group> <article-title>A study of deep learning approaches for medication and adverse drug event extraction from clinical text</article-title>. <source>J Am Med Inform Assoc</source> (<year>2020</year>) <volume>27</volume>:<fpage>13</fpage>&#x2013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1093/jamia/ocz063</pub-id>
</citation>
</ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nelson</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Kilbourne</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Powell</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>R</given-names>
</name>
</person-group>. <article-title>Normalized names for clinical drugs: RxNorm at 6 years</article-title>. <source>J Am Med Inform Assoc</source> (<year>2011</year>) <volume>18</volume>:<fpage>441</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1136/amiajnl-2011-000116</pub-id>
</citation>
</ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freimuth</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Wix</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Siska</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chute</surname>
<given-names>CG</given-names>
</name>
</person-group>. <article-title>Evaluation of RxNorm for medication clinical decision support</article-title>. <source>AMIA Annu Symp Proc</source> (<year>2014</year>) <volume>2014</volume>:<fpage>554</fpage>&#x2013;<lpage>63</lpage>.</citation>
</ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="web">
<collab>RxNorm Files</collab>. <article-title>RxNorm</article-title> (<year>2024</year>). <comment>Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html">https://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html</ext-link> (Accessed January 8, 2024)</comment>.</citation>
</ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="web">
<collab>RxNorm</collab>. <article-title>RxNorm technical documentation</article-title> (<year>2024</year>). <comment>Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.nlm.nih.gov/research/umls/rxnorm/docs/techdoc.html">https://www.nlm.nih.gov/research/umls/rxnorm/docs/techdoc.html</ext-link> (Accessed January 8, 2024)</comment>.</citation>
</ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sayers</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>The E-utilities in-depth: parameters, syntax and more</article-title>. In: <source>Entrez programming Utilities help</source>. <publisher-loc>Bethesda (MD)</publisher-loc>: <publisher-name>National Center for Biotechnology Information US</publisher-name> (<year>2009</year>). <comment>Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/books/NBK25499/">https://www.ncbi.nlm.nih.gov/books/NBK25499/</ext-link> (Accessed November 30, 2022)</comment>.</citation>
</ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Honnibal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>M</given-names>
</name>
</person-group>. <source>An improved non-monotonic transition system for dependency parsing</source>. <publisher-loc>Lisbon, Portugal</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name> (<year>2015</year>). p. <fpage>1373</fpage>&#x2013;<lpage>8</lpage>.</citation>
</ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H</given-names>
</name>
</person-group>. <article-title>Fingerprinting interactions between proteins and ligands for facilitating machine learning in drug discovery</article-title>. <source>Biomolecules</source> (<year>2024</year>) <volume>14</volume>:<fpage>72</fpage>. <pub-id pub-id-type="doi">10.3390/biom14010072</pub-id>
</citation>
</ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Khan</surname>
<given-names>MKH</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C</given-names>
</name>
<etal/>
</person-group> <article-title>Machine learning and deep learning approaches for enhanced prediction of hERG blockade: a comprehensive QSAR modeling study</article-title>. <source>Expert Opin Drug Metab and Toxicol</source> (<year>2024</year>) <volume>20</volume>:<fpage>665</fpage>&#x2013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1080/17425255.2024.2377593</pub-id>
</citation>
</ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F</given-names>
</name>
</person-group>. <article-title>Computational nanotoxicology models for environmental risk assessment of engineered nanomaterials</article-title>. <source>Nanomaterials</source> (<year>2024</year>) <volume>14</volume>:<fpage>155</fpage>. <pub-id pub-id-type="doi">10.3390/nano14020155</pub-id>
</citation>
</ref>
<ref id="B43">
<label>43.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H</given-names>
</name>
</person-group>. <article-title>Unlocking the potential of AI: machine learning and deep learning models for predicting carcinogenicity of chemicals</article-title>. <source>J Environ Sci Health C</source> (<year>2024</year>) <volume>43</volume>:<fpage>23</fpage>&#x2013;<lpage>50</lpage>. <pub-id pub-id-type="doi">10.1080/26896583.2024.2396731</pub-id>
</citation>
</ref>
<ref id="B44">
<label>44.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>H-W</given-names>
</name>
<etal/>
</person-group> <article-title>Deep learning methods for omics data imputation</article-title>. <source>Biology</source> (<year>2023</year>) <volume>12</volume>:<fpage>1313</fpage>. <pub-id pub-id-type="doi">10.3390/biology12101313</pub-id>
</citation>
</ref>
<ref id="B45">
<label>45.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khan</surname>
<given-names>MKH</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>TA</given-names>
</name>
<etal/>
</person-group> <article-title>Machine learning and deep learning for brain tumor MRI image segmentation</article-title>. <source>Exp Biol Med (Maywood)</source> (<year>2023</year>) <volume>248</volume>:<fpage>1974</fpage>&#x2013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1177/15353702231214259</pub-id>
</citation>
</ref>
<ref id="B46">
<label>46.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Khan</surname>
<given-names>MKH</given-names>
</name>
<etal/>
</person-group> <article-title>Review of machine learning and deep learning models for toxicity prediction</article-title>. <source>Exp Biol Med (Maywood)</source> (<year>2023</year>) <volume>248</volume>:<fpage>1952</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1177/15353702231209421</pub-id>
</citation>
</ref>
<ref id="B47">
<label>47.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Khan</surname>
<given-names>MKH</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>W</given-names>
</name>
<etal/>
</person-group> <article-title>Developing a SARS-CoV-2 main protease binding prediction random forest model for drug repurposing for COVID-19 treatment</article-title>. <source>Exp Biol Med (Maywood)</source> (<year>2023</year>) <volume>248</volume>:<fpage>1927</fpage>&#x2013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1177/15353702231209413</pub-id>
</citation>
</ref>
<ref id="B48">
<label>48.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ji</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Wood</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sakkiah</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X</given-names>
</name>
<etal/>
</person-group> <article-title>Machine learning models for predicting cytotoxicity of nanomaterials</article-title>. <source>Chem Res Toxicol</source> (<year>2022</year>) <volume>35</volume>:<fpage>125</fpage>&#x2013;<lpage>39</lpage>. <pub-id pub-id-type="doi">10.1021/acs.chemrestox.1c00310</pub-id>
</citation>
</ref>
<ref id="B49">
<label>49.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sakkiah</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Yavas</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>W</given-names>
</name>
<etal/>
</person-group> <article-title>Machine learning models for predicting liver toxicity</article-title>. <source>Methods Mol Biol</source> (<year>2022</year>) <volume>2425</volume>:<fpage>393</fpage>&#x2013;<lpage>415</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-0716-1960-5_15</pub-id>
</citation>
</ref>
<ref id="B50">
<label>50.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Aungst</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fitzpatrick</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>TA</given-names>
</name>
<etal/>
</person-group> <article-title>Machine learning models for rat multigeneration reproductive toxicity prediction</article-title>. <source>Front Pharmacol</source> (<year>2022</year>) <volume>13</volume>:<fpage>1018226</fpage>. <pub-id pub-id-type="doi">10.3389/fphar.2022.1018226</pub-id>
</citation>
</ref>
<ref id="B51">
<label>51.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>W</given-names>
</name>
<etal/>
</person-group> <article-title>Deep learning models for predicting gas adsorption capacity of nanomaterials</article-title>. <source>Nanomaterials</source> (<year>2022</year>) <volume>12</volume>:<fpage>3376</fpage>. <pub-id pub-id-type="doi">10.3390/nano12193376</pub-id>
</citation>
</ref>
<ref id="B52">
<label>52.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Idakwo</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Thangapandian</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Luttrell</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Z</given-names>
</name>
<etal/>
</person-group> <article-title>Structure&#x2013;activity relationship-based chemical classification of highly imbalanced Tox21 datasets</article-title>. <source>J Cheminformatics</source> (<year>2020</year>) <volume>12</volume>:<fpage>66</fpage>. <pub-id pub-id-type="doi">10.1186/s13321-020-00468-x</pub-id>
</citation>
</ref>
<ref id="B53">
<label>53.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tan</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Benfenati</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Giesy</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Gini</surname>
<given-names>GC</given-names>
</name>
<etal/>
</person-group> <article-title>Structures of endocrine-disrupting chemicals determine binding to and activation of the estrogen receptor &#x3b1; and androgen receptor</article-title>. <source>Environ Sci Technol</source> (<year>2020</year>) <volume>54</volume>:<fpage>11424</fpage>&#x2013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1021/acs.est.0c02639</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>