Decoding Complexity: A Comprehensive Guide to Non-Target Analysis Data Interpretation with High-Resolution Mass Spectrometry

Lucas Price Dec 02, 2025 366

This article provides a comprehensive framework for interpreting complex datasets generated by high-resolution mass spectrometry (HRMS) in non-targeted analysis (NTA).

Decoding Complexity: A Comprehensive Guide to Non-Target Analysis Data Interpretation with High-Resolution Mass Spectrometry

Abstract

This article provides a comprehensive framework for interpreting complex datasets generated by high-resolution mass spectrometry (HRMS) in non-targeted analysis (NTA). Covering foundational principles, methodological workflows, optimization strategies, and validation protocols, we address critical challenges including uncertainty management, data processing techniques, machine learning integration, and quantitative interpretation. Designed for researchers and analytical professionals, this guide synthesizes current best practices to enhance confidence in chemical identification, support robust study design, and facilitate the transition of NTA from research tool to decision-support application in biomedical and environmental health contexts.

Fundamental Principles and Core Concepts of Non-Targeted Analysis

In the fields of environmental monitoring, food safety, and pharmaceutical development, the ability to comprehensively characterize complex chemical mixtures is paramount. Traditional targeted analytical methods have long been the gold standard for quantifying specific, predefined analytes. However, the expanding universe of chemical substances, including numerous emerging environmental contaminants (EECs) and non-intentionally added substances (NIAS), has revealed the limitations of targeted approaches [1] [2]. These challenges have propelled the adoption of non-targeted analysis (NTA), a powerful paradigm that enables the detection and identification of unknown or unexpected chemicals without prior knowledge of their presence [3] [4].

NTA, often used as a blanket term that encompasses both suspect screening and true non-targeted analysis, represents a fundamental shift in analytical strategy [3]. This in-depth technical guide delineates the core principles, methodological workflows, and key differentiators of NTA from traditional targeted approaches, providing researchers and drug development professionals with a framework for selecting and implementing appropriate analytical strategies for their specific applications.

Core Definitions and Conceptual Differentiation

Targeted Analysis: The Conventional Paradigm

Targeted analysis is a quantitative analytical method designed to detect and measure specific, predefined analytes with a high degree of confidence [5]. This approach relies on the availability of authentic chemical standards for each target compound to optimize detection parameters, establish retention times, and generate calibration curves for accurate quantification [3]. The performance of targeted methods is evaluated using well-established metrics including selectivity (ability to differentiate the target analyte from interferents), sensitivity (limit of detection and quantification), accuracy (closeness to the true value), and precision (reproducibility across measurements) [5]. Targeted methods are ideal for regulatory compliance monitoring, routine quantification of known contaminants, and any application where the chemical targets are well-defined and reference standards are available.

Non-Targeted Analysis: The Exploratory Paradigm

Non-targeted analysis (NTA), also referred to as non-target screening or untargeted screening, is a theoretical concept broadly defined as "the characterization of the chemical composition of any given sample without the use of a priori knowledge regarding the sample's chemical content" [3]. Unlike targeted methods, NTA does not focus on specific predefined analytes but aims to comprehensively detect a wide range of chemicals present in a sample [4]. The resulting detections may be used to classify samples based on their entire chemical profile, and subsequent analyses may focus on identifying individual chemicals of interest [3].

A related approach, suspect screening analysis (SSA), occupies a middle ground between targeted and true non-targeted analysis. SSA involves the identification of chemicals by comparison to a predefined list or library containing known chemicals of interest, essentially narrowing the scope of the investigation to compounds of suspected relevance [3]. In practical usage, the term "NTA" is often applied as a blanket term that encompasses both suspect screening and true non-targeted analysis, particularly when workflows incorporate elements of both approaches [3].

Table 1: Fundamental Differences Between Targeted, Suspect Screening, and Non-Targeted Analysis

Aspect	Targeted Analysis	Suspect Screening Analysis (SSA)	Non-Targeted Analysis (NTA)
Objective	Quantify specific, predefined analytes	Identify suspected chemicals from a predefined list	Comprehensively characterize sample chemical composition
Prior Knowledge Requirement	Complete (reference standards required)	Partial (suspect list required)	None
Scope of Analysis	Narrow (limited to target analytes)	Moderate (limited to suspect list)	Broad (theoretically unlimited)
Quantitative Capability	Fully quantitative	Semi-quantitative or qualitative	Primarily qualitative
Standard Dependence	Dependent on authentic standards	Not dependent, but improves confidence	Not dependent
Primary Application	Regulatory compliance, routine monitoring	Chemical forensics, hypothesis testing	Discovery, exploratory research, hazard identification

The Analytical Spectrum: A Conceptual Workflow

The relationship between targeted, suspect screening, and non-targeted approaches can be visualized as a spectrum of analytical strategies with varying levels of prior knowledge requirements and chemical scope. The following diagram illustrates the conceptual workflow and relationship between these approaches:

Methodological Workflows and Technical Implementation

Instrumentation and Analytical Platforms

The implementation of NTA relies heavily on high-resolution mass spectrometry (HRMS) platforms, which provide the mass accuracy and resolving power necessary to distinguish between thousands of chemical features in complex samples [1] [5]. Common HRMS instruments used in NTA include quadrupole time-of-flight (QTOF) and Orbitrap mass spectrometers, often coupled with separation techniques such as liquid chromatography (LC) or gas chromatography (GC) [6] [2]. The emergence of multidimensional separation techniques, including two-dimensional chromatography (LC×LC or GC×GC) and high-resolution ion mobility spectrometry (HRIMS), has further enhanced the peak capacity and separation power available for NTA, enabling more comprehensive analysis of complex mixtures [6].

The data acquisition modes commonly employed in NTA include data-dependent acquisition (DDA), which selects the most abundant ions for fragmentation, and data-independent acquisition (DIA), which fragments all ions within predefined mass windows [2]. Both approaches generate MS/MS spectral data that are crucial for compound identification, with each offering distinct advantages in coverage and reproducibility.

A comprehensive NTA study involves multiple interconnected steps, from initial study design to final data interpretation. The following diagram illustrates a generalized NTA workflow, highlighting key stages and decision points:

Critical Methodological Differentiators

Sample Preparation and Study Design

Targeted methods employ optimized sample preparation techniques specifically tailored to the physicochemical properties of the target analytes, aiming to maximize extraction efficiency and minimize matrix effects for those specific compounds [3]. In contrast, NTA utilizes generic sample preparation protocols designed to extract a broad range of chemicals with diverse properties, inevitably introducing biases toward certain compound classes while potentially missing others [3] [2]. The study design in NTA must intentionally incorporate quality assurance and quality control (QA/QC) approaches, including procedural blanks, quality control materials, and internal standards, to enable performance assessment after data acquisition and analysis is complete [3].

Data Processing and Compound Identification

Data processing in targeted analysis is relatively straightforward, focusing on quantifying specific precursor-product ion transitions for each target analyte [5]. NTA, however, generates complex, high-dimensional datasets that require sophisticated data processing pipelines for feature detection, peak alignment, and molecular formula assignment [1]. The identification process in NTA follows confidence levels based on the available evidence, ranging from level 1 (confirmed structure with authentic standard) to level 5 (exact mass of interest) [7] [3].

Compound identification in NTA relies on multiple lines of evidence, including:

Accurate mass measurements for elemental composition assignment
MS/MS fragmentation patterns for structural elucidation
Retention time and chromatographic behavior
Collision cross-section (CCS) values when ion mobility spectrometry is employed [6]
Isotopic patterns for element identification

The integration of computational tools and spectral libraries is essential for NTA data interpretation. Resources such as the NIST Mass Spectral Library, NORMAN Suspect List Exchange, and various open-source spectral databases support compound identification by providing reference spectra for comparison [7] [8].

Quantification Approaches

While targeted analysis provides absolute quantification using authentic standards and calibration curves, NTA typically offers semi-quantitative estimates based on assumed response factors or class-based calibration [4]. Recent advancements in quantitative non-targeted analysis (qNTA) aim to address this limitation by developing approaches for more accurate concentration estimation without reference standards for every compound [1] [4].

Performance Assessment and Quality Assurance

Performance Metrics for Targeted vs. Non-Targeted Analysis

The criteria for assessing method performance differ significantly between targeted and non-targeted approaches. The table below summarizes key performance metrics and their application in each paradigm:

Table 2: Performance Assessment in Targeted vs. Non-Targeted Analysis

Performance Metric	Targeted Analysis Application	Non-Targeted Analysis Application
Selectivity	Ability to distinguish target analyte from interferents using unique ion transitions	Chemical space coverage, isomeric resolution, specificity of identification
Sensitivity	Limit of detection (LOD) and quantification (LOQ) for specific analytes	Feature detection rate, minimum identifiable concentration across chemical space
Accuracy	Agreement with true value using certified reference materials	Structural identification correctness, database matching accuracy
Precision	Reproducibility of quantitative results across replicates	Consistency of feature detection and identification across replicates
Uncertainty	Well-defined confidence intervals for concentrations	Multiple sources: feature detection, compound identification, quantification estimation

Quality Assurance in Non-Targeted Analysis

Quality assurance in NTA presents unique challenges due to the absence of reference standards for most detected compounds [5]. Best practices include:

Implementing comprehensive QA/QC protocols throughout the analytical workflow [3]
Using internal standards covering diverse chemical classes to monitor system performance
Incorporating blank samples to identify and filter contamination
Assessing replicate consistency to ensure detection reliability
Applying confidence levels for compound identification to communicate uncertainty [7] [5]

Initiatives such as the Best Practices for Non-Targeted Analysis (BP4NTA) have developed reporting frameworks and quality control metrics to improve the reliability and comparability of NTA results across different laboratories and studies [7] [3].

Advanced Applications and Future Directions

Prioritization Strategies for NTA Data Interpretation

The large number of features detected in NTA studies (often thousands per sample) creates a bottleneck in data interpretation, necessitating effective prioritization strategies to focus resources on the most relevant compounds [9]. Recent approaches include:

Effect-directed prioritization: Integrating biological response data with chemical analysis to focus on bioactive compounds [9]
Prediction-based prioritization: Using in silico tools to predict concentrations and toxicities for risk-based ranking [9]
Chemistry-driven prioritization: Applying chemical intelligence (mass defect filtering, homologous series detection) to identify compounds of interest [9]
Process-driven prioritization: Leveraging spatial, temporal, or technical processes to highlight relevant features [9]

The US Environmental Protection Agency's INTERPRET NTA tool exemplifies efforts to streamline NTA data review by integrating chemical metadata, predicted spectra, and hazard information to support defensible prioritization of chemical candidates [8].

Machine Learning and Artificial Intelligence in NTA

Recent advancements in machine learning (ML) and artificial intelligence (AI) are addressing key challenges in NTA, including:

Structure identification: ML models are being developed to improve the accuracy of compound identification from MS/MS spectra [1]
Toxicity prediction: Approaches like MSFragTox leverage MS/MS fragmentation data to directly predict toxicity endpoints, bridging analytical data and hazard assessment [10]
Workflow optimization: Computational tools are enhancing various stages of the NTA workflow, from feature detection to compound annotation [1]

These developments are gradually transforming NTA from a purely exploratory tool toward a more robust approach capable of supporting chemical risk assessment and regulatory decision-making [1] [8].

Table 3: Key Research Reagent Solutions for Non-Targeted Analysis

Resource Category	Examples	Function and Application
Spectral Libraries	NIST Mass Spectral Library, MassBank, mzCloud	Reference spectra for compound identification via spectral matching
Suspect Lists	NORMAN Suspect List Exchange, EPA's CompTox Chemicals Dashboard	Predefined lists of potential contaminants for suspect screening
Data Processing Tools	MS-DIAL, XCMS, OpenMS	Feature detection, peak alignment, and data preprocessing
Quantitative Prediction	MS2Quant	Concentration prediction from MS/MS spectra without standards
Toxicity Prediction	MS2Tox, MSFragTox	Toxicity estimation from fragmentation patterns
Identification Tools	CSI:FingerID, SIRIUS, CFM-ID	In silico fragmentation and compound structure elucidation
Data Integration Platforms	INTERPRET NTA (EPA)	Tools for reviewing, interpreting, and reporting NTA data quality

Non-targeted analysis represents a paradigm shift in analytical chemistry, moving from hypothesis-driven targeted approaches to discovery-oriented comprehensive characterization. While targeted methods remain essential for precise quantification of known analytes, NTA provides unparalleled capability for detecting unknown and unexpected compounds across diverse sample matrices [4] [2]. The key differentiators between these approaches extend beyond technical implementation to encompass fundamental differences in philosophy, application, and performance assessment.

The ongoing development of standardized practices, advanced computational tools, and harmonized reporting frameworks is addressing current limitations in NTA, particularly regarding compound identification confidence and quantitative capability [5]. As these advancements mature, the integration of targeted and non-targeted approaches within unified analytical workflows offers the most powerful strategy for comprehensive chemical characterization, combining the quantitative rigor of targeted methods with the expansive scope of non-targeted discovery [3] [9].

For researchers and drug development professionals, understanding these complementary analytical paradigms is crucial for selecting appropriate methodologies to address specific research questions, whether the goal is precise quantification of defined targets or exploratory investigation of complex chemical mixtures. As NTA continues to evolve, its integration with emerging technologies like machine learning and high-resolution ion mobility spectrometry promises to further enhance its capabilities, ultimately strengthening environmental monitoring, pharmaceutical development, and public health protection.

Essential HRMS Concepts for Effective NTA Data Interpretation

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) represents a paradigm shift in analytical chemistry, moving from the detection of predefined analytes to the comprehensive investigation of all detectable chemical species in a sample [11]. This approach is particularly crucial for addressing emerging environmental contaminants (EECs) such as pharmaceuticals, pesticides, and industrial chemicals that pose significant challenges for detection and identification due to their structural diversity and lack of analytical standards [1]. Unlike traditional targeted methods that screen for a specific list of known compounds, NTA focuses on assigning structures or formulae to unknown signals in HRMS data, making it an indispensable tool for discovering novel contaminants, characterizing complex mixtures, and responding to unknown chemical releases [12] [13]. The versatility of HRMS-based NTA allows it to be applied to virtually any sample medium, including air, water, sediment, soil, food, consumer products, and biological specimens, providing researchers with a powerful capability for chemical discovery and exposure characterization [14].

Core HRMS Principles for NTA

Mass Resolution and Accuracy

The exceptional resolution and mass accuracy of HRMS instruments form the foundational principle enabling effective NTA. High mass resolution allows the instrument to distinguish between ions with subtle mass differences, which is critical for separating compounds in complex mixtures and reducing false positives from isobaric interferences [11]. Modern HRMS systems, including Time of Flight (TOF), Orbitrap, and Fourier Transform Ion Cyclotron Resonance (FT-ICR) instruments, achieve resolution powers ranging from tens of thousands to several million, enabling precise separation of ions with minute mass differences [11]. Mass accuracy, typically measured in parts per million (ppm), determines how closely the measured mass-to-charge ratio (m/z) aligns with the theoretical value. For NTA applications, mass accuracy within 3-5 ppm is generally required for confident molecular formula assignment, with higher accuracy significantly reducing the number of candidate formulas [13].

The combination of high resolution and accuracy allows for the determination of elemental compositions with high confidence, a capability that is fundamental for identifying unknown compounds when reference standards are unavailable [15]. This is particularly valuable for emerging contaminants like per- and poly-fluoroalkyl substances (PFAS), where nontarget HRMS methods have led to the discovery of more than 750 PFASs belonging to more than 130 diverse classes in environmental samples, biofluids, and commercial products [15].

Chromatographic Separation Considerations

Effective NTA requires the integration of high-performance chromatographic separation with HRMS detection to manage sample complexity and reduce ion suppression. Liquid chromatography (LC), particularly reversed-phase liquid chromatography (RPLC), coupled with HRMS has emerged as a prominent methodology for analyzing complex mixtures [16]. The chromatographic step separates compounds based on their chemical properties before they enter the mass spectrometer, reducing matrix effects and allowing for the detection of co-eluting isomers that would be indistinguishable by MS alone.

The retention time (RT) of a compound provides valuable supplementary information for identification. While not a direct HRMS parameter, RT can be predicted from chemical structure using quantitative structure-retention relationship (QSRR) models and used as an additional filter to increase identification confidence [16]. Recent advancements have focused on developing calibrant-free predicted retention time indices (RTIs) through machine learning models to enhance identification probability without the need for extensive calibration standards [16]. For high-quality data, laboratories should monitor critical chromatographic parameters including resolution, peak shape, and retention time reproducibility across samples, with greater than 94% of compounds demonstrating less than 20% relative standard deviation for peak height in quality control measures [13].

Tandem Mass Spectrometry (MS/MS) for Structural Elucidation

Tandem mass spectrometry (MS/MS or MSn) provides structural information critical for confident compound identification in NTA [1]. In MS/MS mode, precursor ions are isolated and fragmented through collisions with gas molecules, producing fragment ions that reveal structural characteristics of the original molecule [13]. The resulting fragmentation patterns serve as molecular fingerprints that can be matched against experimental or in silico reference spectra.

The acquisition of MS/MS data in NTA can follow either data-dependent acquisition (DDA) or data-independent acquisition (DIA) approaches. DDA selects the most abundant ions from the survey scan for fragmentation, providing clean spectra but potentially missing lower-abundance compounds. DIA fragments all ions within predefined mass windows, ensuring comprehensive coverage but producing more complex spectra that require advanced computational deconvolution [14]. For unknown identification, MS/MS spectra are searched against spectral libraries using similarity matching algorithms, though library coverage remains a challenge, with molecular networking and in silico fragmentation prediction serving as complementary strategies [13].

Table 1: Key HRMS Instrument Types and Their Characteristics in NTA

Instrument Type	Typical Resolution	Mass Accuracy (ppm)	Strengths in NTA	Limitations
Time of Flight (TOF)	40,000-100,000	1-5 ppm	Fast acquisition speed, wide dynamic range	Requires frequent calibration for high accuracy
Orbitrap	100,000-500,000+	1-3 ppm	High resolution and accuracy, stable mass calibration	Slower acquisition at highest resolutions
FT-ICR	1,000,000+	<1 ppm	Ultra-high resolution, exceptional mass accuracy	High cost, complex operation, limited accessibility

NTA Workflow and Data Interpretation

Comprehensive NTA Workflow

The end-to-end NTA workflow encompasses multiple stages from sample preparation to final reporting, with each stage requiring careful execution to ensure data quality and interpretability. The workflow can be visualized as a connected process of sequential stages:

The initial stage involves sample preparation and quality control, where implementing robust quality assurance/quality control (QA/QC) procedures is essential for generating trustworthy data [13]. This includes using non-targeted standard quality control (NTS/QC) mixtures containing compounds covering a wide range of physicochemical properties to monitor critical data quality parameters such as mass accuracy (typically within 3 ppm), isotopic ratio accuracy, and peak height reproducibility [13]. HRMS data acquisition follows, employing either data-dependent or data-independent approaches to comprehensively capture the chemical composition of samples.

Feature detection and peak picking algorithms then process the raw HRMS data to detect chromatographic peaks and extract relevant information including m/z values, retention times, and intensities [14]. The subsequent compound annotation and identification phase represents the core interpretive challenge, where multiple lines of evidence are combined to assign chemical structures to detected features [1]. Finally, prioritization and confirmation steps help focus efforts on the most relevant compounds, followed by comprehensive data interpretation and reporting to translate findings into actionable insights [17].

Data Processing and Feature Identification

Data processing in NTA converts raw instrument data into chemically meaningful information through a multi-step procedure. Feature detection algorithms identify chromatographic peaks and assemble related ions (isotopes, adducts, and fragments) into features representing unique chemical entities [14]. This step requires careful parameter optimization to balance sensitivity (detecting true features) and specificity (avoiding false positives from noise).

Once features are detected, compound annotation begins with molecular formula assignment based on accurate mass measurements and isotopic patterns. The number of possible molecular formulas increases exponentially with mass and decreasing mass accuracy, highlighting the critical importance of high mass accuracy in HRMS instruments [11]. For example, a mass measurement of 300.1000 Da with 5 ppm accuracy could correspond to dozens of plausible molecular formulas, while the same mass with 1 ppm accuracy might yield only a few possibilities.

Structural elucidation leverages MS/MS fragmentation data through spectral matching against reference databases. The Universal Library Search Algorithm (ULSA) and similar tools compare experimental spectra to reference libraries, generating matching scores that indicate similarity [16]. However, library coverage remains incomplete, necessitating complementary approaches including molecular networking which groups compounds based on spectral similarity to identify structurally related compounds, and in silico fragmentation which predicts MS/MS spectra for candidate structures to expand identification capabilities beyond library coverage [13].

Confidence Assessment and Identification Levels

A critical aspect of NTA data interpretation is establishing confidence levels for compound identifications, as NTA data are inherently less certain than targeted data [14]. The scientific community has developed reporting standards that categorize identifications into different confidence levels based on the supporting evidence:

Table 2: Confidence Levels for Compound Identification in NTA

Confidence Level	Required Evidence	Typical Uncertainty	Reporting Considerations
Level 1: Confirmed Structure	Match to reference standard using RT and MS/MS spectrum	Minimal	Considered definitive identification
Level 2: Probable Structure	Library spectrum match or in silico evidence	Moderate	Structure is plausible but not confirmed
Level 3: Tentative Candidate	Diagnostic evidence (e.g., class-specific fragments)	High	Class may be known but not exact structure
Level 4: Unknown Feature	Molecular formula or m/z only	Very High	Insufficient evidence for structural assignment

This framework acknowledges that in NTA, unlike targeted analysis, if an analyst reports that a chemical is present in a sample, it may actually be absent (e.g., it may be an isomer or an incorrect identification), and if an analyst reports that a chemical is not present, it may actually be present but not correctly identified during data processing [14]. This inherent uncertainty necessitates careful reporting and interpretation of NTA results, with clear communication of confidence levels to stakeholders.

Advanced Applications of Machine Learning in NTA

Machine Learning-Enhanced Identification

Machine learning (ML) approaches are revolutionizing NTA by enhancing identification confidence and streamlining data interpretation workflows. ML models can leverage multiple data dimensions including MS/MS spectra, retention time information, and molecular descriptors to improve the discrimination between true positive and false positive identifications [1] [16]. One innovative approach introduces class probability of true positives (P(TP)) as a metric that leverages data from MS/MS spectra and calibrant-free predicted retention time indices through multiple ML models to enhance identification probability [16].

A demonstrated implementation involves three sequential ML models: first, a molecular fingerprint-based regression model that correlates molecular fingerprints to retention time indices; second, a cumulative neutral loss-based regression model that predicts expected RTI values using experimental MS/MS spectra; and finally, a binary classification model that integrates information from both retention and m/z domains to calculate P(TP) for each matched reference spectrum [16]. This approach has shown significant improvements in identification probability, with reported increases of 54.5%, 52.1%, and 46.7% for pesticides spiked in blank, 10× diluted, and 100× diluted tea matrices, respectively, compared to library matching alone [16].

ML Workflow for Enhanced Identification

The application of machine learning in NTA follows a structured workflow that integrates traditional analytical data with computational predictions:

The workflow begins with input data collection including MS/MS spectra, retention times, and m/z values. The first model (MF-to-RTI) uses random forest regression to correlate molecular fingerprints to true RTI values of calibrants, trained on diverse chemical structures to ensure coverage of structural diversity [16]. The second model (CNL-to-RTI) employs cumulative neutral loss masses as features to predict expected RTI values using experimental MS/MS spectra from known compounds, leveraging the discriminative power of fragmentation patterns [16].

The third model (binary classification) integrates features from both RTI and m/z domains, including RTI error between values derived from the first two models, monoisotopic mass, and parameters obtained from spectral matching algorithms [16]. This model calculates the probability of true positive (P(TP)) for each matched reference spectrum, with larger RTI errors indicating true negative spectral matches while smaller errors correlate with true positive matches [16]. The final output provides an enhanced identification probability that incorporates multiple dimensions of evidence, significantly improving upon traditional spectral matching alone.

Performance Assessment and Quality Assurance

NTA Performance Metrics

Assessing and communicating the performance of NTA methods presents unique challenges compared to targeted analyses. While targeted methods rely on well-established performance metrics for selectivity, sensitivity, accuracy, and precision, defining analogous metrics for NTA requires consideration of different study objectives [14]. Performance assessment in NTA typically focuses on three primary objectives: sample classification (distinguishing sample groups based on chemical patterns), chemical identification (confidently assigning structures to detected features), and chemical quantitation (estimating concentrations without reference standards) [14].

For chemical identification, performance can be evaluated using metrics derived from the confusion matrix, including recall (ability to correctly identify present compounds), precision (ability to avoid false identifications), and overall accuracy [14]. However, these metrics require knowledge of ground truth, which is often unavailable in true non-targeted applications. Alternative approaches include using identification confidence levels and reporting the distribution of identifications across these levels, or employing benchmark compounds with known presence/absence to characterize method performance [14].

For quantitative NTA (qNTA), performance assessment becomes even more challenging due to the lack of reference standards for most compounds. Performance can be estimated using a set of chemical standards that represent different chemical classes, with metrics including accuracy (closeness to true concentration), precision (reproducibility across replicates), and linear dynamic range [14]. However, these metrics are necessarily limited to the available standards and may not represent performance for all detected compounds.

Quality Assurance and Control Procedures

Implementing robust quality assurance and control (QA/QC) procedures is essential for generating reliable NTA data. Recommended practices include:

System Suitability Testing: Using standardized quality control mixtures containing compounds covering a wide range of physicochemical properties to verify instrument performance before sample analysis [13].
Blank Analysis: Regularly analyzing procedural blanks to identify and subtract background contamination and carryover effects.
Pooled Quality Control Samples: Injecting pooled samples throughout the analytical sequence to monitor instrument stability and performance drift over time.
Reference Standard Controls: Including known reference compounds at various concentrations to validate identification and quantification capabilities.

Data quality parameters should be continuously monitored, including mass accuracy (typically within 3-5 ppm), retention time stability, intensity reproducibility, and chromatographic peak shape [13]. Any deviations beyond predefined thresholds should trigger investigation and potentially re-analysis of affected samples. These QA/QC measures help ensure that the complex data generated in NTA studies is trustworthy and fit for its intended purpose, whether for exploratory research or decision-support applications.

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Research Reagents and Materials for HRMS-Based NTA

Reagent/Material	Function in NTA Workflow	Key Considerations
NTS/QC Mixture	Quality control for data quality assessment	Should contain 80+ compounds covering diverse physicochemical properties (molecular weights 126–1100 Da, log Kow -8 to 8.5) [13]
HRMS Instrument	High-resolution mass measurement	Orbitrap, TOF, or FT-ICR systems with resolution >50,000 and mass accuracy <5 ppm [11]
Chromatography System	Compound separation before MS detection	UHPLC systems with C18 columns most common; method should resolve early eluting polar analytes [13]
Spectral Libraries	Reference for MS/MS spectrum matching	Combine commercial, public, and in-house libraries; recognize limitations in coverage [16]
Molecular Networking Tools	Grouping related compounds by spectral similarity	Identifies molecular families when reference spectra are unavailable [13]
Retention Time Prediction Models	Providing additional evidence for compound identification	Machine learning models trained on diverse chemical sets improve transferability [16]
In Silico Fragmentation Tools	Predicting MS/MS spectra for candidate structures	Expands identification beyond library coverage; domain of applicability is crucial [13]

Effective interpretation of NTA data requires a solid understanding of core HRMS concepts including mass resolution, accuracy, chromatographic separation principles, and fragmentation pattern analysis. The integration of machine learning approaches with traditional analytical techniques significantly enhances identification confidence by leveraging multiple dimensions of chemical information. As the field continues to evolve, standardized performance assessment methods and robust quality control procedures will be essential for generating reliable, reproducible data that can support environmental monitoring, public health protection, and regulatory decision-making. By mastering these essential HRMS concepts and maintaining critical assessment of data quality and uncertainty, researchers can fully leverage the powerful capabilities of non-targeted analysis to discover and characterize novel chemicals in complex samples.

Understanding the NTA Chemical Space and Detectable Coverage

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) represents a paradigm shift in analytical chemistry, enabling the detection and identification of unknown chemicals without a priori knowledge of sample composition. This technical guide explores the fundamental concept of "chemical space" in NTA, defining the theoretical and practical boundaries of what is detectable and identifiable within a given analytical workflow. We examine the key methodological parameters that define the "detectable space," present standardized workflows for chemical space mapping, and discuss advanced computational tools that enhance NTA capabilities. Understanding and communicating the coverage and limitations of NTA methods is critical for advancing environmental monitoring, exposomics research, and drug development applications, ultimately supporting more reliable and reproducible chemical exposure assessments.

The concept of "chemical space" in non-targeted analysis refers to the multidimensional domain of chemical properties that characterizes the constituents of a sample [18]. In principle, NTA can detect and identify a broad range of organic compounds across diverse matrices, but in practice, no single method can cover the entirety of the chemical universe, which encompasses >10⁶⁰ possible organic compounds [18]. Instead, each NTA method accesses specific domains of chemical space through a combination of sample preparation, instrumental analysis, and data processing techniques [18] [19].

The fundamental challenge in NTA lies in determining whether the non-detection of an analyte indicates its true absence above a detection limit or represents a false negative resulting from workflow limitations [18]. To address this, researchers have proposed mapping the "detectable space" of NTA methods—the region of chemical space where compounds are amenable to detection and identification given specific methodological constraints [18] [19]. This conceptual framework allows for better communication of method capabilities, more accurate interpretation of results, and direct comparison between different NTA workflows [18] [20].

Defining the Detectable and Identifiable Chemical Space

Theoretical Framework

In NTA methodology, chemical space is partitioned into three distinct regions [18]:

The detectable space: Compounds amenable to detection using applied methods for sampling, preparation, and data acquisition
The identifiable space: Compounds within the detectable space that can be confidently identified based on available spectral evidence and database matching
The non-detectable space: Compounds not detectable or identifiable using the selected methods

The relationship between these spaces is hierarchical; the identifiable space is necessarily a subset of the detectable space, as identification requires additional confirmatory data beyond mere detection [18].

Key Parameters Defining Detectable Space

Eight fundamental analytical parameters predominantly influence the region of chemical space accessible by an NTA method [18] [19]:

Table 1: Key Parameters Defining the Detectable Space in NTA

Parameter Category	Specific Factors	Impact on Chemical Space
Sample Preparation	Sample matrix type, extraction solvent, extract pH, extraction/cleanup media, elution buffers	Determines which compounds are extracted from the matrix and prepared for analysis
Instrument Platform	LC-MS, GC-MS, ion mobility	Defines the physicochemical properties of amenable compounds (e.g., polarity, volatility)
Ionization Technique	Ionization type (ESI, APCI, EI), ionization mode (positive/negative)	Influences which compounds can be effectively ionized for detection
Separation	Chromatographic method, retention time	Separates complex mixtures into individual components

These parameters work in concert to define the "method applicability domain" [18]. For example, LC-MS with electrospray ionization (ESI) is more amenable to polar, water-soluble compounds, while GC-MS with electron ionization (EI) better detects volatile, non-polar compounds [19]. The selection of extraction solvents and pH further narrows the chemical space by determining which compounds are efficiently recovered from specific sample matrices [18].

Experimental Workflows for Chemical Space Characterization

Comprehensive NTA Workflow

A generalized NTA workflow encompasses multiple stages from sample preparation to compound identification, each step influencing the final detectable space [21]:

This workflow illustrates the comprehensive process for NTA, where each stage progressively refines the detectable chemical space [21]. Sample preparation methods (e.g., extraction solvents, pH adjustment, clean-up media) determine initial compound recovery [18]. Data acquisition parameters (chromatography, ionization, mass analysis) further constrain detectable compounds based on their physicochemical properties [18] [19]. Finally, data processing approaches (feature detection, annotation algorithms) influence which detected features are ultimately identified [21].

Chemical Space Mapping Methodology

The proposed Chemical Space Tool (ChemSpaceTool) would implement a systematic approach to define the detectable space of any NTA method through sequential filtering [18]:

This conceptual framework parses down the vast chemical universe into an Amenable Compound List (ACL) specific to a given methodology [18]. Each filtering step corresponds to a key methodological decision that excludes compounds outside the operable parameters. The resulting ACL represents the plausible detectable space and can be used both prospectively (to guide method selection) and retrospectively (to filter identification candidates) [18].

Computational Tools for Chemical Space Analysis

Software Platforms for NTA

Various software tools have been developed to implement NTA workflows, each with distinct capabilities and limitations:

Table 2: Software Tools for Non-Targeted Analysis

Software Tool	Type	Key Features	Applications
patRoon	Open-source (R)	Comprehensive workflow, environmental focus, combines multiple algorithms	Environmental monitoring, suspect screening [21]
Compound Discoverer	Commercial	Vendor-specific optimizations, user-friendly interface	General NTA, metabolomics [19]
MetFrag	Open-source	In silico fragmentation, compound identification	Structure elucidation [21]
SIRIUS/CSI:FingerID	Open-source	Molecular formula identification, structure database searching	Unknown compound identification [21]
XCMS	Open-source	Feature detection, peak alignment, statistical analysis	Metabolomics, exposomics [21]
MZmine	Open-source	Modular pipeline, visualization, vendor-neutral	General NTA, metabolomics [21]

These tools help researchers navigate the complex data generated in NTA studies. Open-source platforms like patRoon provide tailored functionality for environmental applications and allow integration of various algorithms [21]. Commercial software often offers more user-friendly interfaces and vendor-specific optimizations but may limit data transparency and sharing [19].

Machine Learning in NTA

Machine learning (ML) approaches are increasingly applied throughout NTA workflows to enhance chemical space characterization [1]. ML algorithms improve compound identification through better prediction of retention times, collision cross-section values, and mass fragmentation patterns [1] [22]. These models can also prioritize features for identification based on likelihood of detection and potential toxicological concern [1]. Furthermore, ML techniques enable quantitative structure-property relationship modeling to predict a compound's amenability to specific analytical methods based on its physicochemical properties [1].

Research Reagent Solutions for NTA Workflows

Table 3: Essential Materials and Tools for NTA Experiments

Category	Specific Examples	Function in NTA Workflow
Extraction Media	HLB (hydrophilic-lipophilic balance) sorbents, C18 silica, ion-exchange resins	Isolate and concentrate analytes from complex matrices [18]
Chromatography Columns	Reverse-phase C18, HILIC, GC capillary columns	Separate complex mixtures into individual components [18]
Ionization Sources	Electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI), electron ionization (EI)	Generate gas-phase ions for mass analysis [19]
Mass Analyzers	Orbitrap, time-of-flight (TOF), quadrupole	Provide high-resolution mass measurements for molecular formula assignment [22] [19]
Data Processing Tools	patRoon, XCMS, MZmine, Compound Discoverer	Extract features, align chromatograms, and annotate compounds [21] [19]
Chemical Databases	PubChem, CompTox, NIST, mzCloud	Support compound identification through spectral matching [21]

Applications and Analytical Platforms in NTA

The detectable chemical space varies significantly across different analytical platforms and application areas. A review of NTA literature revealed that 51% of studies use only LC-HRMS, 32% use only GC-HRMS, and 16% use both platforms to expand chemical coverage [19]. This distribution reflects the complementary nature of these techniques, with LC-HRMS better suited for polar, thermally labile compounds and GC-HRMS more appropriate for volatile, non-polar analytes [19].

In environmental applications, the frequently detected chemical classes reflect both environmental prevalence and methodological biases [19]. Per- and polyfluoroalkyl substances (PFAS) and pharmaceuticals predominate in water samples due to their polarity and ionization efficiency in LC-ESI-MS [19]. Pesticides and polyaromatic hydrocarbons are more common in soil and sediment analyses, while flame retardants and plasticizers feature prominently in dust and consumer product studies [19]. In human biospecimens, plasticizers, pesticides, and halogenated compounds are frequently detected, reflecting exposure patterns and analytical considerations [19].

Characterizing the chemical space and detectable coverage of NTA methods is fundamental to producing reliable, interpretable, and comparable results across studies. The proposed frameworks for chemical space mapping, including the ChemSpaceTool concept, provide systematic approaches to define method boundaries and communicate capabilities [18]. As NTA continues to evolve, integration of machine learning approaches [1], development of open-source computational platforms [21], and community-wide standardization efforts [20] will enhance our ability to navigate the chemical exposome. Understanding the detectable space of NTA methods enables researchers to make informed decisions about method selection, appropriately interpret negative findings, and advance toward more comprehensive chemical exposure assessment.

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) represents a paradigm shift from traditional targeted methods, enabling researchers to characterize the chemical composition of complex samples without a priori knowledge of their content [19] [20]. This discovery-based approach generates immensely complex datasets that require sophisticated data processing and interpretation strategies. The fundamental challenge lies in accurately reducing raw instrumental data into meaningful chemical information, a process that hinges on precise terminology and standardized workflows [23] [20]. Unlike targeted analysis, which focuses on specific predefined chemicals, NTA aims to capture a broader "chemical space" – the conceptual collection of all possible chemicals within a sample, limited only by methodological choices [19] [3]. Within this framework, three critical concepts form the foundation of data interpretation: features (the raw observations from instrumentation), annotations (the attribution of chemical characteristics to these features), and identifications (the confident assignment of a specific chemical structure) [23]. This technical guide establishes precise definitions, methodologies, and confidence assessment protocols for these core terminologies, providing researchers with a standardized framework for NTA data interpretation within drug development and related chemical research fields.

Defining the Fundamental Terminology

The Building Block: Molecular Features

In NTA, a feature represents the primary signal entity detected during data processing. Specifically, a feature is defined as a set of grouped, associated m/z-retention time pairs (mz@RT) that represent MS1 components for an individual compound, which may include isotopologues, adducts, and in-source product ion m/z peaks [23] [20]. Where no such associations exist, a feature may simply be a single mz@RT pair. This grouping is crucial as it distinguishes signals arising from a single chemical entity from the thousands of individual data points collected by the HRMS instrument [23]. The process of feature formation involves several computational steps that transform raw mass spectral data into these defined chemical signals, which subsequently serve as the substrates for all further annotation and identification efforts.

The Interpretive Step: Annotations

Annotation represents the next critical step in NTA data interpretation, defined as the attribution of one or more properties or molecular characteristics to an MS1 feature or its components (such as isotopologues or adducts), or to MS/MS product ions [23]. It is essential to recognize that annotations provide evidence but do not typically constitute sufficient proof to confidently identify a single compound. Examples of common annotations include: designation of an observed mz@RT as a specific adduct (e.g., [M+H]+, [M+Na]+), assignment of a molecular formula to a feature or an MS/MS product ion, and assignment of a suggested substructure to an MS/MS product ion [23]. Annotations thus represent hypothetical assignments that require further evidence before progressing to confident identification.

The Confirmatory Stage: Identifications

Identification constitutes the highest level of confidence in NTA workflows, occurring when the annotated components, features, and/or product ions collectively provide sufficient evidence to attribute a specific compound to the detected feature, within a stated identification scope or confidence level [23]. The key distinction from annotation lies in the evidentiary standard – identification requires multiple lines of concordant evidence that collectively point to a single chemical structure with high confidence. This evidence hierarchy typically includes retention time matching with authentic standards, spectral library matching, and consistent fragmentation patterns, among other confirmatory data [23] [14].

Table 1: Comparative Definitions of Core NTA Terminology

Term	Formal Definition	Key Characteristics	Examples
Feature	A set of grouped mz@RT pairs representing MS1 components for an individual compound [23] [20]	- Represents raw instrumental observations- Grouped signals from same compound- Foundation for further analysis	- Set of m/z peaks for isotopologues- Group of adduct signals- Single mz@RT where no grouping exists
Annotation	Attribution of properties/molecular characteristics to features or their components [23]	- Provides evidence but not conclusive- Represents hypothetical assignment- Multiple annotations possible per feature	- Adduct designation ([M+H]+)- Molecular formula assignment- Substructure assignment from MS/MS
Identification	Confident attribution of a specific compound to a detected feature [23]	- Requires sufficient evidence- States confidence level/scope- Represents conclusive assignment	- Match to authentic standard (RT, MS/MS)- Level 1 confidence identification- Definitive structure elucidation

Experimental Workflows and Data Processing

From Raw Data to Features: Processing Steps

The transformation of raw HRMS data into meaningful features follows a multi-step computational workflow that intentionally reduces data complexity while preserving chemically relevant information. This data processing segment encompasses all steps that transform raw data into meaningful information prior to annotation and identification efforts, with inputs being raw or converted data files and outputs being lists of features in each sample with associated chromatography, MS, and MS/MS data [23]. As detailed in Table 2, these steps include both fundamental signal processing and advanced grouping algorithms that collectively enable the detection and definition of molecular features.

Table 2: Key Data Processing Steps in NTA Workflows

Processing Step	Description	Purpose	Common Algorithms/Software
Initial m/z Detection	Selection of unique mz@RT pairs from raw data	Identify potential signals of interest	Vendor software, MZmine, MS-DIAL
Retention Time Alignment	Modifies RTs within a dataset based on representative compounds	Correct for chromatographic shift between runs	Cross-sample correlation algorithms
Isotopologue Grouping	Groups mz@RT pairs representing isotopologues of same compound	Reduce redundancy; link related signals	Isotope pattern recognition
Adduct Grouping	Groups mz@RT pairs representing adducts of same compound	Consolidate signals from same chemical entity	Adduct rule application
Between-Sample Alignment	Comparison of features across multiple samples	Identify same feature in different samples	MZ/RT window matching
Gap-Filling	Detection of features missed during initial selection	Improve comprehensiveness of feature detection	Recursive peak extraction

Workflow Visualization: From Raw Data to Confident Identification

The following diagram illustrates the complete NTA workflow, highlighting the critical pathway from raw data acquisition through feature detection, annotation, and finally to confident identification, with key decision points and confidence assessment stages.

Quality Assurance and Method Validation

Robust quality assurance procedures are essential throughout the NTA workflow to ensure reliable results. Unlike targeted analyses with well-established validation frameworks, NTA requires specialized approaches to performance assessment [14]. Key considerations include: implementing blank samples to identify and subtract contamination; using quality control spikes to monitor instrument performance and feature detection rates; employing pooled quality control samples to assess reproducibility; and utilizing internal standards to evaluate extraction efficiency and matrix effects [23] [14]. For performance assessment, promising approaches include using the confusion matrix for qualitative study outputs (sample classification and chemical identification) and adapting estimation procedures from targeted methods for quantitative applications, with consideration for additional sources of uncontrolled experimental error [14]. These procedures help address the inherent uncertainties in NTA, where false positives (reporting a chemical present when it is actually absent) and false negatives (failing to detect a present chemical) can significantly impact data interpretation and subsequent decision-making [14].

Confidence Assessment and Reporting Standards

Establishing Identification Confidence Levels

Confidence in chemical identification exists on a spectrum, and standardized levels have been established to communicate the degree of certainty unambiguously. The highest confidence (Level 1) requires confirmation with an authentic chemical standard analyzed under identical experimental conditions, providing matching retention time and MS/MS spectrum [23] [14]. Level 2 identification, considered probable structure, requires compelling evidence such as a library spectrum match without retention time confirmation or characteristic fragmentation patterns indicative of a specific compound class [14]. Level 3 confidence, representing tentative candidates, applies when a molecular formula can be unambiguously determined but insufficient evidence exists for structural elucidation [23]. For annotations without molecular formula assignment (Level 4), the chemical identity remains unknown but distinguishable based on spectral data [14]. This tiered confidence framework enables researchers to appropriately communicate the certainty of their findings and prevents overinterpretation of insufficient data.

Reporting Standards and Metadata Requirements

Comprehensive reporting of experimental details is essential for interpreting NTA results and assessing their reliability. The Benchmarking and Publications for Non-Targeted Analysis (BP4NTA) Working Group has established guidelines to improve transparency and reproducibility across NTA studies [20]. Critical reporting elements include: complete description of sample preparation procedures; detailed instrumentation parameters and data acquisition methods; comprehensive documentation of data processing steps and software parameters with version information; clear description of annotation and identification criteria including scoring thresholds; and full disclosure of quality assurance/quality control measures and results [23] [20]. For suspect screening analyses, researchers must report the specific suspect list used, its source, version, and size, while for true NTA, the approaches for unknown compound identification should be thoroughly documented [3]. These reporting standards enable proper evaluation of study limitations, facilitate inter-laboratory comparisons, and support the growing integration of NTA data into regulatory decision-making frameworks.

Computational Tools for NTA Data Processing

The computational demands of NTA necessitate specialized software tools for data processing, annotation, and identification. Both commercial and open-source options are available, each with distinct strengths and applications. As evidenced by literature reviews, vendor-specific software (such as Thermo Compound Discoverer and Agilent MassHunter) is currently used in most studies (approximately 57%), while open-source platforms (including MZmine, MS-DIAL, and Cardinal) offer flexible, customizable alternatives [19] [24]. The selection of appropriate software depends on multiple factors including instrumental platform, sample type, study objectives, and computational resources. As shown in Table 3, these tools encompass the complete NTA workflow from raw data processing to final identification.

Table 3: Essential Research Reagents and Computational Tools for NTA

Tool Category	Specific Examples	Primary Function	Application in NTA Workflow
Data Conversion Tools	Proteowizard MSConvert, Reifycs Abf Converter	Convert proprietary vendor files to open formats	Pre-processing: Enables cross-platform data analysis
Commercial Processing Software	Compound Discoverer, MassHunter	Comprehensive workflow management	Data Processing: Feature detection, alignment, annotation
Open-Source Processing Platforms	MZmine, MS-DIAL, XCMS	Flexible data processing and analysis	Data Processing: Customizable workflows for feature detection
Spectral Libraries	NIST, mzCloud, MassBank	Reference spectra for compound matching	Annotation & Identification: Spectral comparison and matching
Visualization Tools	QUIMBI, Cardinal, SCiLS	Interactive data exploration and visualization	Data Interpretation: Spatial distribution, spectral analysis
In Silico Prediction Tools	CFM-ID, MetFrag, SIRIUS	Predict fragmentation spectra and structures	Annotation: Generate hypotheses for unknown compounds

Chemical Standards and Reference Materials

While NTA aims to detect unknowns, reference standards remain essential for method validation, retention time calibration, and confident identification. Key categories include: isotope-labeled internal standards for quality control and semi-quantification; chemical class-specific standards for evaluating method performance for particular compound classes; retention time index standards for chromatographic alignment; and authentic chemical standards for definitive confirmation of identifications (Level 1) [14]. The strategic use of these materials throughout the NTA workflow, from initial method development to final confirmation, significantly enhances data quality and reliability.

The precise definitions and standardized application of the terms feature, annotation, and identification form the critical foundation for rigorous non-targeted analysis using high-resolution mass spectrometry. Features represent the fundamental observations detected through data processing; annotations provide interpretive hypotheses about chemical characteristics; and identifications constitute confident assignments of specific chemical structures based on sufficient evidence. As NTA methodologies continue to evolve and expand into new application areas including drug development, environmental monitoring, and exposomics research, consistent terminology and reporting standards become increasingly vital for scientific communication and data interoperability [19] [20] [14]. The frameworks, workflows, and confidence assessments presented in this technical guide provide researchers with a standardized approach for NTA data interpretation that promotes transparency, reproducibility, and appropriate confidence in analytical results. Through the continued adoption and refinement of these standards by the scientific community, NTA will increasingly deliver on its potential to comprehensively characterize complex chemical mixtures and uncover previously unrecognized chemical exposures and transformations.

Inherent Uncertainties in NTA and Strategies for Management

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful discovery tool for identifying unknown and unexpected chemicals across diverse sample matrices, from environmental samples to biological specimens [19]. Unlike targeted methods that provide definitive, quantitative data for predefined analytes, NTA generates information-rich data with inherent uncertainties that prevent its widespread acceptance by decision-makers [14]. If an analyst reports that a chemical is present in a sample, it may actually be absent; if reported absent, it may be present; and if a concentration is reported, the true value could be orders of magnitude different [14]. This technical guide examines the fundamental sources of uncertainty in NTA workflows and presents structured strategies for their management, enabling researchers to communicate data reliability effectively within chemical exposure and drug development research.

Uncertainty in NTA stems from multiple interconnected sources throughout the analytical workflow. These can be categorized into three primary domains: analytical, data processing, and identification uncertainties.

Analytical Uncertainties

The initial stage of NTA introduces significant uncertainties through sample preparation and instrumental analysis. The "detectable chemical space" – the subset of chemicals ultimately observed – is heavily influenced by eight key analytical considerations: sample matrix type, extraction solvent, pH, extraction/cleanup media, elution buffers, instrument platform, ionization type, and ionization mode [19]. For instance, the choice between liquid chromatography (LC) and gas chromatography (GC) platforms fundamentally alters detectable chemicals, with LC being more amenable to water-soluble compounds with polar functional groups, while GC better captures non-polar, volatile compounds [19]. This methodological bias represents a fundamental uncertainty in comprehensive exposome characterization.

Table 1: Analytical Techniques and Their Influence on Detectable Chemical Space

Analytical Technique	Chemical Space Bias	Common Ionization Modes	Frequency of Use in NTA Studies
LC-HRMS	Polar, water-soluble compounds	ESI+, ESI-, APCI	51% (LC-only); 16% (combined with GC)
GC-HRMS	Non-polar, volatile compounds	EI, CI	32% (GC-only); 16% (combined with LC)
Direct Injection HRMS	No chromatography separation	Various	1%

Data Processing and Computational Uncertainties

The transformation of raw HRMS data into meaningful chemical information introduces computational uncertainties through peak picking, feature alignment, and database searching. Most NTA studies (approximately 57 of 76 reviewed studies) rely on vendor software (e.g., Thermo Compound Discoverer, Agilent MassHunter), while only a minority use open-source platforms (e.g., MzMine, MS-DIAL) [19]. This software dependency creates consistency challenges as different algorithms employ distinct parameters and confidence metrics. The feature detection process must distinguish true chemical signals from instrumental noise, with variations in sensitivity thresholds directly affecting reported chemical spaces. Similarly, molecular formula assignment from accurate mass measurements carries uncertainty, particularly for compounds with complex isotopic patterns or those containing unusual elemental compositions.

Identification and Confidence Assessment Uncertainties

The most significant uncertainty in NTA lies in compound identification, where multiple orthogonal criteria are needed to establish confidence. The absence of universal standards for unknown identification means that, unlike targeted methods, NTA cannot provide unambiguous links between detected features and specific chemical structures [14]. Research indicates substantial variability in identification confidence, with many studies relying on level 3 identifications (probable structures based on spectral library matching without reference standards) rather than level 1 (confirmed with reference standards) [19]. This uncertainty is compounded by the presence of isomeric compounds that may share virtually identical mass spectra but possess different toxicological properties, potentially leading to misidentification in exposure assessment.

Strategic Framework for Uncertainty Management

Effective uncertainty management requires a systematic approach addressing each stage of the NTA workflow. The following strategies provide a structured framework for enhancing reliability in NTA data interpretation.

Experimental Design and Analytical Standardization

Implementing rigorous quality assurance/quality control (QA/QC) protocols forms the foundation for uncertainty management. Blank samples (method, procedural, and instrumental) must be incorporated to identify contamination, while pooled quality control samples (PQC) assess system stability and aid in distinguishing true features from artifacts [14]. For quantitative estimations, standard addition methods with internal standards spanning diverse physicochemical properties help account for matrix effects. Analytical standardization should also include reference materials when available, with established criteria for retention time stability, mass accuracy (< 5 ppm error), and signal intensity variation (< 30% RSD in QC samples) [14].

Table 2: Confidence Levels for Chemical Identification in NTA

Confidence Level	Identification Criteria	Required Data	Uncertainty Level
Level 1: Confirmed Structure	Match to authentic standard	Retention time, MS/MS spectrum, accurate mass	Low
Level 2: Probable Structure	Library spectrum match or diagnostic evidence	MS/MS spectrum, accurate mass	Medium
Level 3: Tentative Candidate	Library match without MS/MS or in silico prediction	Molecular formula, possible structure	High
Level 4: Unknown Feature	No structural information	Accurate mass, retention time	Highest

Data Analysis and Computational Harmonization

Managing computational uncertainties requires transparent reporting of all processing parameters and implementation of standardized confidence frameworks. The Schymanski scale for confidence assessment provides a harmonized approach for communicating identification certainty [14]. For peak picking, signal-to-noise thresholds should be optimized and consistently applied, with manual verification of high-priority features. Molecular formula assignment should incorporate multiple scoring algorithms that consider isotopic patterns, elemental probability, and ring double bond equivalents. Implementing open-source computational workflows enhances reproducibility and allows cross-validation between different software platforms, addressing a critical gap in current NTA research [19].

Uncertainty Quantification and Communication

Translating NTA results into actionable information requires clear quantification and communication of uncertainties. Confusion matrices can assess qualitative performance (sample classification, chemical identification) by comparing true positives, false positives, true negatives, and false negatives [14]. For quantitative applications, traditional accuracy and precision metrics should be applied with consideration for additional sources of uncontrolled experimental error [14]. Stakeholder communication should include clear confidence statements using standardized terminology, distinguishing between identified, annotated, and unknown features, with explicit acknowledgment of methodological limitations, such as coverage biases introduced by platform selection (LC vs. GC) and ionization techniques.

Experimental Protocols for Uncertainty Assessment

Protocol for Confidence Tier Assignment

This protocol establishes a standardized approach for assigning confidence levels to chemical identifications in NTA studies, adapted from community-established guidelines.

Level 1 Identification (Confirmed Structure)
- Acquire MS/MS spectrum of feature of interest
- Analyze authentic reference standard using identical chromatographic and MS conditions
- Confirm match of retention time (RT tolerance ± 0.1 min)
- Verify MS/MS spectral similarity (dot product score > 0.8)
- Record accurate mass measurement (mass error < 5 ppm)
Level 2 Identification (Probable Structure)
- Obtain high-resolution MS/MS spectrum
- Search against mass spectral databases (e.g., mzCloud, MassBank, NIST)
- Require minimum dot product score of 0.7 for spectral match
- Verify molecular formula from accurate mass (error < 5 ppm)
- Assess biological/environmental plausibility of proposed structure
Level 3 Identification (Tentative Candidate)
- Determine molecular formula from accurate mass (error < 5 ppm)
- Search compound databases (e.g., PubChem, ChemSpider) using molecular formula
- Prioritize candidates based on likelihood of occurrence
- Report as tentative structure with appropriate uncertainty qualifications

Protocol for Performance Assessment Using Spiked Samples

This procedure quantifies method performance characteristics using chemically spiked samples to establish uncertainty metrics.

Sample Preparation
- Select 20-50 representative compounds spanning diverse physicochemical properties
- Prepare calibration curves in representative sample matrix across 3-5 orders of magnitude
- Spike samples at low, medium, and high concentrations (n=5 each)
- Include unspiked controls (n=5) to assess background levels
Data Acquisition and Processing
- Analyze samples in randomized order to minimize batch effects
- Process data using standardized NTA workflow
- Record detection frequency for each spiked compound
- Calculate accuracy as (measured concentration/true concentration) × 100%
- Determine precision as relative standard deviation (RSD) of replicate measurements
Performance Metric Calculation
- For qualitative assessment: Calculate false positive rate (FPR) and false negative rate (FNR)
- For semi-quantitative assessment: Determine linearity (R²) and relative error (RE%)
- Establish limit of identification (LOI) as lowest concentration with ≥95% detection rate
- Document all parameters for comparison across laboratories and studies

Visualization of NTA Uncertainty Management

NTA Uncertainty Sources and Relationships: This diagram illustrates the sequential nature of uncertainty propagation throughout the NTA workflow, from sample preparation to final quantification, highlighting specific uncertainty contributors at each stage.

NTA Uncertainty Management Workflow: This workflow diagram outlines key stages in managing NTA uncertainties, highlighting quality control checkpoints (green), high-uncertainty stages requiring special attention (red), and critical reporting phases (blue), with mitigation strategies at each step.

Essential Research Reagent Solutions

Table 3: Key Research Reagents for NTA Uncertainty Management

Reagent / Material	Function in NTA Workflow	Uncertainty Addressed
Authentic Analytical Standards	Confirmation of compound identity	Identification uncertainty
Stable Isotope-Labeled Internal Standards	Correction for matrix effects and recovery	Quantitative uncertainty
Reference Materials (NIST, EPA)	Method validation and benchmarking	Method performance uncertainty
Quality Control Pooled Samples	Monitoring instrumental performance	Analytical variability
Blank Samples (Method, Procedural)	Contamination identification	False positive uncertainty
Retention Time Index Standards	Retention time alignment and prediction	Chromatographic variability
Ionization Efficiency Calibrants	Response factor estimation	Semi-quantification uncertainty

Uncertainty management in non-targeted analysis represents both a fundamental challenge and critical opportunity for advancing exposure science and drug development research. The inherent uncertainties throughout the NTA workflow – from analytical biases affecting detectable chemical spaces to computational limitations in compound identification – necessitate systematic approaches to quality assurance and validation [14] [19]. By implementing the structured frameworks, experimental protocols, and uncertainty quantification strategies outlined in this guide, researchers can enhance the reliability and interpretability of NTA data. The advancing standardization of confidence assessment frameworks and growing availability of open-source computational tools promise to reduce current limitations, ultimately supporting the transition of NTA from a research tool to a methodology capable of informing regulatory decisions and public health protection. As the field progresses, transparent communication of uncertainties will remain essential for appropriate interpretation and utilization of NTA results by diverse stakeholders across scientific disciplines.

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful analytical approach for detecting and identifying unknown and unexpected compounds across diverse sample matrices, including environmental, biological, and food samples [14]. Unlike targeted methods that focus on specific predefined analytes, NTA generates global chemical information, providing a comprehensive characterization of sample composition [14]. This capability makes NTA particularly valuable for discovering novel chemical stressors, retrospectively assessing past exposures through archived samples, and classifying samples based on their chemical profiles [14]. However, this information-rich data presents significant challenges in evaluation and interpretation, necessitating specialized experimental designs and performance assessment approaches distinct from traditional targeted methods.

The inherent uncertainty in NTA data represents a fundamental characteristic that differentiates it from targeted analysis [14]. When analysts report chemical presence through NTA, the compound may actually be absent due to misidentification as an isomer; conversely, reported absence may reflect failed detection rather than true absence [14]. Similarly, sample classification models may lack repeatability over time or transferability between instruments, and concentration estimates often lack confidence intervals, potentially deviating from true values by orders of magnitude [14]. These uncertainties have limited broader adoption of NTA data by decision-makers, creating an urgent need for methods that accurately measure and communicate uncertainty extent and implications for specific use cases. This article defines and explores the three primary NTA study objectives—sample classification, chemical identification, and chemical quantitation—that structure most NTA projects and yield results most useful for stakeholders [14].

Defining the Three Primary NTA Study Objectives

Sample Classification

Sample classification represents a fundamental NTA objective focused on distinguishing samples into categories based on their overall chemical profiles rather than individual compounds [14]. This approach utilizes the entire chemical fingerprint detected by HRMS to differentiate samples according to source, biological response, temporal changes, or spatial distribution [14]. In practical applications, sample classification enables researchers to identify patterns indicative of environmental contamination sources, disease states in biological specimens, or geographical origins of food products. The performance of qualitative studies emphasizing sample classification can be assessed using confusion matrices, though with recognized challenges and limitations [14]. This objective particularly valuable when specific chemical markers remain unknown, but differential patterns can still distinguish sample classes effectively.

Chemical Identification

Chemical identification constitutes a core NTA objective aimed at discovering and characterizing unknown or unexpected chemicals present in samples [14]. This process involves detecting chemical features in HRMS data and subsequently determining their molecular structures through various confirmation strategies [14]. The confidence levels for presumed identifications vary significantly based on available information, including molecular formula determination, fragmentation spectra matching, retention time consistency, isotopic distribution analysis, and library comparison [14]. Within chemical identification, researchers often distinguish between suspect screening analysis (SSA) and true non-targeted analysis [3]. SSA focuses on identifying chemicals through comparison with predefined libraries of known compounds of interest, effectively narrowing the study scope [3]. In contrast, true NTA aims to identify chemicals without a priori knowledge, including compounds not represented in established databases [3].

Chemical Quantitation

Chemical quantitation represents the most challenging NTA objective, seeking to estimate concentrations of identified chemicals without prior method optimization for specific analytes [14]. While targeted analytical methods provide precise quantitative data with well-defined confidence intervals for predefined chemicals, NTA quantitation carries greater uncertainty [14]. Quantitative NTA (qNTA) approaches typically employ estimation procedures adapted from targeted methods but must account for additional sources of uncontrolled experimental error [14]. Recent advancements incorporate machine learning models to improve quantification methods, leveraging computational approaches to overcome limitations in traditional calibration techniques [1]. Despite these innovations, quantitative results from NTA should be interpreted with appropriate caution, recognizing that true concentrations may differ significantly from reported values without proper validation [14].

Table 1: Comparison of Primary NTA Study Objectives

Study Objective	Primary Focus	Key Outputs	Performance Assessment Approaches	Common Applications
Sample Classification	Overall chemical patterns	Sample categories, differentiation models	Confusion matrix, model repeatability and transferability	Source tracking, disease diagnosis, product authentication
Chemical Identification	Individual unknown compounds	Molecular structures, compound identities	Confidence levels based on supporting data (MS/MS, retention time, etc.)	Discovery of novel contaminants, metabolite identification
Chemical Quantitation	Concentration estimation	Semi-quantitative or quantitative concentrations	Estimation procedures with error consideration, machine learning validation	Exposure assessment, risk characterization

Performance Assessment Metrics for NTA Objectives

Performance Metrics for Sample Classification and Chemical Identification

Qualitative NTA performance, encompassing sample classification and chemical identification, requires specialized assessment approaches distinct from traditional targeted analysis [14]. For sample classification, the confusion matrix provides a foundational framework for evaluating model performance by comparing predicted versus actual class assignments across sample categories [14]. This matrix enables calculation of standard classification metrics including accuracy, precision, recall, and F1-score, though specific challenges emerge when applying these metrics to NTA data, particularly regarding class imbalance and uncertain ground truth [14].

For chemical identification, confidence ranking systems represent the primary performance assessment approach, categorizing identifications based on the strength of supporting evidence [14]. The highest confidence levels typically require matching retention times and fragmentation spectra with authentic analytical standards analyzed under identical conditions [14]. Lower confidence levels may rely on library spectrum matching, in silico fragmentation prediction, or molecular formula assignment alone [14]. The evolving nature of confidence frameworks for chemical identification reflects ongoing community efforts to standardize reporting and communicate uncertainty effectively to stakeholders [14].

Performance Metrics for Chemical Quantitation

Quantitative performance assessment in NTA adapts established metrics from targeted analysis while acknowledging additional uncertainty sources [14]. Key metrics include accuracy (closeness to true values), precision (measurement reproducibility), and sensitivity (detection limits), though each requires careful interpretation in NTA contexts [14]. Accuracy evaluation proves particularly challenging without authentic standards for all identified compounds, often necessitating surrogate approaches using structurally similar compounds or standard addition methods [25]. Precision assessment must account for additional variability sources throughout the non-targeted workflow, including feature detection consistency and alignment reliability across sample batches [14] [25].

Sensitivity characterization in NTA extends beyond traditional limit of detection (LOD) calculations to include concept of feature detectability across chemical space [14]. This broader sensitivity perspective acknowledges that detection capabilities vary significantly across different chemical classes and concentrations in non-targeted approaches [14]. Recent research has developed specific quality control guidelines to assure reliable quantitative NTA data, evaluating method specificity, precision, accuracy, and reproducibility in terms of peak area and retention time variability, true positive identification rates, and intraday/interday variations [25].

Table 2: Performance Assessment Metrics for NTA Study Objectives

Performance Aspect	Targeted Analysis Approach	NTA Adaptation	Key Challenges in NTA
Selectivity/Specificity	Ability to distinguish target analyte from interferents [14]	Feature detection specificity, chromatographic separation, mass resolution [14] [25]	Unknown interferents, isobaric compounds, matrix effects
Sensitivity	Limit of detection (LOD) for specific analytes [14]	Feature detectability across chemical space, chemical coverage [14]	Variable detection capabilities across chemical classes
Accuracy	Agreement between measured and true values [14]	Identification confidence, quantitative agreement when standards available [14] [25]	Lack of authentic standards for most compounds
Precision	Reproducibility of measurements [14]	Feature detection repeatability, retention time stability, alignment consistency [14] [25]	Additional variability sources in non-optimized workflow

Experimental Design and Methodologies

Defining Study Scope and Objectives

Comprehensive experimental design precedes successful NTA implementation, requiring clear definition of study objectives and analytical scope [3]. Researchers must deliberately determine whether their approach emphasizes targeted analysis, suspect screening, non-targeted analysis, or integrated combinations thereof [3]. This intentional scope definition directly influences subsequent methodological choices across the analytical workflow, from sample preparation to data interpretation [3]. Practical NTA applications frequently combine targeted, suspect screening, and non-targeted approaches within unified workflows [3]. For example, data acquisition may operate comprehensively to maximize collected information, while data analysis sequentially applies suspect screening against defined chemical lists followed by true non-targeted identification efforts for remaining features [3]. Such integrated approaches efficiently leverage methodological strengths while acknowledging practical constraints.

Sample Preparation and Quality Control

Sample preparation strategies for NTA require careful consideration to balance comprehensive chemical coverage with practical analytical constraints [26]. For complex matrices like biological fluids or environmental samples, preparation often incorporates divide-and-conquer strategies that reduce sample complexity into manageable fractions [26]. These approaches may include abundant protein depletion in plasma samples, fractionation through chromatographic techniques, and enzymatic digestion for bottom-up proteomic approaches [26]. Quality assurance and quality control (QA/QC) implementation provides critical foundation for reliable NTA results, requiring intentional incorporation throughout study design [3]. Essential QA/QC elements include procedural blanks, replicate samples, reference materials, and internal standards that monitor analytical performance across sample batches [3] [25]. Specific quality control guidelines proposed for NTA methodologies evaluate method specificity, precision, accuracy and reproducibility using standardized approaches assessing peak area and retention time variability, true positive identification rates, and intraday/interday variations [25].

Data Acquisition and Instrumentation

High-resolution mass spectrometry represents the foundational analytical platform for NTA, with different mass analyzers offering complementary capabilities [14] [26]. Common HRMS platforms include time-of-flight (TOF), Fourier transform ion cyclotron resonance (FT-ICR), and Orbitrap instruments, each providing the mass accuracy and resolution essential for confident molecular formula assignment [26]. Data acquisition strategies significantly influence chemical space coverage, with electrospray ionization (ESI) and atmospheric pressure chemical ionization (APCI) offering complementary selectivity for different compound classes [25]. Tandem mass spectrometry (MS/MS or MSn) fragmentation data provides critical structural information for compound identification, with acquisition methods ranging from data-dependent acquisition to unbiased fragmentation approaches [14] [1]. Advanced instrumental configurations combining multiple separation dimensions with high-resolution mass analysis further enhance chemical coverage and detection capabilities for complex samples [26].

NTA Experimental Workflow: This diagram illustrates the generalized workflow for non-targeted analysis, from sample preparation through data interpretation.

Advanced Applications and Future Perspectives

Machine Learning in NTA

Machine learning (ML) and artificial intelligence (AI) applications represent the most significant advancement in NTA methodologies, offering transformative potential across all study objectives [1]. ML algorithms enhance sample classification through improved pattern recognition capabilities, enabling more accurate sample categorization based on complex chemical fingerprints [1]. For chemical identification, ML approaches facilitate structure annotation through improved in silico fragmentation prediction and spectral similarity assessment, significantly expanding identifiable chemical space beyond reference libraries [1]. In quantitative applications, ML models enable more accurate concentration predictions without authentic standards by leveraging chemical structure-property relationships [1]. These computational advancements also improve toxicity prediction capabilities through quantitative structure-activity relationship (QSAR) modeling, directly supporting risk assessment frameworks [1]. Current research focuses on refining ML tools for complex mixture analysis, improving inter-laboratory validation, and further integrating computational models into environmental risk assessment paradigms [1].

Interlaboratory Studies and Method Validation

Interlaboratory studies and method validation initiatives address critical needs for standardization and reproducibility in NTA [25]. The Environmental Protection Agency's Non-Targeted Analysis Collaborative Trial (ENTACT) represents a prominent example, evaluating performance across multiple laboratories analyzing identical complex mixtures [25]. Such studies reveal substantial variability in reported identifications and concentrations, highlighting the urgent need for standardized protocols and performance benchmarks [14] [25]. Validation approaches for NTA methods continue evolving, with recent proposals advocating for standardized quality control metrics assessing accuracy, precision, selectivity, and reproducibility using defined reference materials [25]. These community-wide efforts aim to establish fit-for-purpose criteria enabling broader acceptance of NTA data in regulatory decision-making contexts [14].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Research Reagents and Materials for NTA Studies

Tool/Reagent	Function	Application Context
High-Resolution Mass Spectrometer	Provides accurate mass measurements for elemental composition assignment [14] [26]	All NTA study objectives; different analyzers (TOF, FT-ICR, Orbitrap) offer complementary capabilities [26]
Chromatography Systems	Separate complex mixtures to reduce ionization suppression and enable isomer differentiation [26]	All NTA study objectives; LC-MS most common, with GC-MS expanding coverage for volatile compounds
Authentic Analytical Standards	Confirm compound identities and enable quantitative calibration [14]	Method development/validation, identification confidence assessment, quantitation
Stable Isotope-Labeled Internal Standards	Monitor analytical performance, correct for matrix effects, enable quantitative accuracy [14]	Quality control, quantitative NTA, method performance assessment
Sample Preparation Materials	Extract, clean-up, and concentrate analytes from complex matrices [26]	All NTA applications; specific materials (SPE cartridges, extraction solvents) determine chemical space coverage
Compound Databases & Spectral Libraries	Support chemical identification through mass, retention time, and fragmentation matching [14] [3]	Suspect screening, compound identification; examples include NIST, MassBank, mzCloud
Data Processing Software	Convert raw data to features, align across samples, perform statistical analyses [14] [1]	All NTA study objectives; both commercial and open-source platforms available
In Silico Fragmentation Tools	Predict MS/MS spectra for structural elucidation of unknowns without standards [1]	Compound identification, particularly for true NTA without library matches

NTA Identification Confidence: This diagram outlines the decision process for compound identification in NTA, from initial feature detection through confirmation.

Advanced Workflows and Cutting-Edge Applications in NTA

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) generates complex, information-rich datasets that are impossible to interpret manually without sophisticated computational approaches [23]. The primary challenge in NTA research lies not merely in detecting chemical signals, but in developing robust computational methods to extract meaningful environmental information from the vast chemical datasets generated by HRMS instruments [27]. A critical, yet often poorly defined aspect of this process involves the terminology used to describe detections that are not yet annotated or identified. Establishing consistent definitions is essential for improving reproducibility and readability across NTA studies from different research groups [23].

Fundamental terminology used throughout this pipeline includes: m/z-retention time pair (mz@RT), defined as a unique pairing of mass-to-charge ratio and retention time values; and Feature, which represents a set of mz@RT pairs that form a grouping of associated MS1 components (e.g., isotopologue, adduct, and in-source product ion m/z peaks), represented as a tensor of observed retention time, monoisotopic mass, and intensity [23]. The comprehensive data processing pipeline transforms raw instrument data into chemically meaningful features through a structured sequence of computational steps that will be detailed in this guide.

The complete NTA workflow encompasses four critical stages that transform environmental samples into actionable insights about contamination sources. While this guide focuses primarily on Stage III (Data Processing), understanding the entire context is essential for proper implementation. Stage I involves sample treatment and extraction, requiring careful optimization to balance selectivity and sensitivity through techniques such as solid phase extraction (SPE), Soxhlet extraction, gel permeation chromatography (GPC), and pressurized liquid extraction (PLE) [27]. Stage II covers data generation and acquisition through HRMS platforms including quadrupole time-of-flight (Q-TOF) and Orbitrap systems coupled with liquid or gas chromatographic separation (LC/GC), which generate the complex datasets essential for NTA [27].

Stage III represents the ML-oriented data processing and analysis phase, where the transition from raw HRMS data to interpretable patterns occurs through sequential computational steps including data preprocessing, dimensionality reduction, and statistical analysis [27]. Stage IV focuses on result validation through a three-tiered approach incorporating analytical confidence verification using certified reference materials, model generalizability assessment on independent external datasets, and environmental plausibility checks correlating model predictions with contextual data [27]. The following diagram illustrates the complete workflow and the interrelationships between each stage:

Data Processing: From Raw Data to Features

Data Format Conversion

The initial stage of data processing involves converting proprietary instrument data files into open, accessible formats that can be processed by subsequent algorithms. This conversion must occur before any intentional data interpretation or processing can take place [23]. Although idealized data format conversions are not intended to remove any data, it is possible that data losses may occur during this process. Researchers should carefully evaluate their format conversion steps and associated settings to assess whether such data losses are occurring, potentially by evaluating multiple format conversion platforms and proceeding through subsequent data processing steps to screen for known compounds [23].

Table 1: Data Format Conversion Methods

Method Step	Description	Common Tools/Formats
File conversion to open-source format	Changes the raw data file format (e.g., .d, .raw, .wiff, etc.) to a different file format	mzML, mzXML, ANDI-MS (netCDF), ABF; Tools: Proteowizard MSConvert, Reifycs Abf Converter
Mass spectrum centroiding	For mass spectra collected in profile/continuous mode, centroiding reduces the individual m/z peaks to a single peak (a centroid)	Not necessary for mass spectra collected in centroid mode

Core Data Processing Steps

Data processing relies on user-defined settings and thresholds to reduce the raw or converted analytical data to meaningful information, such as a list of features with relative abundance information and accompanying LC data and MS & MS/MS spectra [23]. As data is intentionally reduced during processing, researchers should consider evaluating the impact of various settings through the use of QC spikes and/or QC samples [23]. The terminology differences across both open-source and proprietary software platforms present a significant challenge, as different software tools may use the same term to describe different steps. Until these discrepancies are resolved through communication and consensus among software developers, NTA researchers should carefully read software user manuals to ensure they correctly interpret the purpose of each step in the workflow [23].

Table 2: Detailed Data Processing Steps

Processing Step	Description	Key Parameters
Initial m/z detection	Selection of unique mz@RT pairs	Sensitivity threshold, mass accuracy
Retention time alignment	Modifies retention times within a single dataset based on representative compounds	Alignment algorithm, reference compounds
Shoulder peaks filtering	Removes noise signals known as "shoulder peaks" (relevant to Fourier transform MS instruments)	Peak width, symmetry thresholds
Signal thresholding	Removes signal (m/z values) below a designated abundance threshold	Absolute value or signal-to-noise ratio
Chromatogram smoothing	Reduces the noise of a selected chromatogram	Smoothing algorithm, window size
Spectral deconvolution	Removal of undesired m/z peaks within a mass spectrum	Deconvolution algorithm, peak model
Isotopologue grouping	Grouping of unique mz@RT pairs that represent isotopologues of the same compound	Mass difference, isotopic pattern
Adduct grouping	Grouping of unique mz@RT pairs that represent adducts of the same compound	Common adduct masses, retention time tolerance
Between-sample alignment	Comparison of detected features in multiple samples to determine if the same feature was detected	m/z and RT windows, variance thresholds
Gap-filling	Detection of features that were missed during initial m/z selection	Lower thresholds, peak prediction algorithms
Feature filtering	Filtering detected features based on retention time or m/z range	m/z range, RT range, intensity thresholds
Duplicate feature removal	Removing duplicate features based on designated m/z and RT windows	m/z and RT tolerance windows
Replicate filter	Evaluating feature detection frequency across analytical replicates	Replicate frequency threshold
Abundance thresholding and/or blank comparison	Applying absolute or relative abundance thresholds	Blank subtraction methods, fold-change thresholds

The data processing workflow involves both sequential and parallel operations that transform raw data into curated features ready for statistical analysis and machine learning. The following diagram illustrates the logical flow and relationships between these critical processing steps:

Machine Learning-Oriented Data Processing

Data Preprocessing for ML Applications

A typical output from the data generation and acquisition stage is a peak table that records the intensities of detected signals [27]. Preprocessing this data, including tasks such as harmonizing the dataset and minimizing noise, is necessary for ensuring data quality and consistency prior to machine learning applications. A high-quality preprocessing workflow is critical for enhancing the reliability and robustness of subsequent machine learning outcomes [27].

Variations in mass spectrometry data may arise due to differences in analytical platforms or acquisition dates, making data alignment essential to ensure the comparability of chemical features across all samples. This alignment process mainly includes three key steps: retention time correction, mass-to-charge ratio recalibration, and peak matching [27]. Retention time correction compensates for slight shifts in retention times caused by variations in chromatographic conditions, while m/z recalibration standardizes mass accuracy across different batches. Peak matching algorithms align identical chemical features detected across different batches, facilitating accurate compound identification and cross-sample comparison [27].

Dimensionality Reduction and Feature Selection

Exploratory ML-oriented data processing identifies significant features via univariate statistics (t-tests, Analysis of Variance [ANOVA]) and prioritizes compounds with large fold changes [27]. Dimensionality reduction techniques like principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) simplify high-dimensional data, while clustering methods (hierarchical cluster analysis [HCA], k-means clustering) group samples by chemical similarity [27]. Supervised ML models, including Random Forest and Support Vector Classifier, are subsequently trained on labeled datasets to classify contamination sources. Feature selection algorithms (e.g., recursive feature elimination) refine input variables, optimizing model accuracy and interpretability [27].

Experimental Protocols and Methodologies

Data Processing Protocol for HRMS-Based NTA

Objective: To process raw HRMS data into a curated feature table suitable for statistical analysis and machine learning applications.

Materials and Equipment:

Raw HRMS data files in proprietary format (.d, .raw, .wiff, etc.)
Computing workstation with sufficient processing power and memory
Data conversion software (e.g., Proteowizard MSConvert, Reifycs Abf Converter)
Data processing software (e.g., XCMS, MS-DIAL, OpenMS)

Procedure:

Data Format Conversion:
- Convert raw data files to open-source formats (mzML, mzXML) using MSConvert
- Apply centroiding to profile data if necessary
- Verify conversion integrity through file size checks and spot comparisons

Feature Detection and Processing:
- Set initial m/z detection parameters: mass accuracy threshold (typically 5-10 ppm), intensity threshold (S/N ratio ≥ 3)
- Perform retention time alignment using statistical algorithms or reference compounds
- Apply shoulder peak filtering for FT-MS data with appropriate peak width parameters
- Execute signal thresholding to remove low-abundance noise
- Implement chromatogram smoothing using Savitzky-Golay or moving average algorithms
- Conduct spectral deconvolution to separate co-eluting compounds
Feature Grouping and Alignment:
- Group isotopologues using mass difference thresholds (¹³C: 1.003355 Da, ¹⁵N: 0.997035 Da)
- Identify and group common adducts ([M+H]⁺, [M+Na]⁺, [M-H]⁻, etc.)
- Perform between-sample alignment with appropriate m/z (5-10 ppm) and RT (0.1-0.3 min) windows
- Execute gap-filling to recover missed features with relaxed parameters
Feature Curation:
- Apply feature filtering based on analytical constraints (m/z range: 50-2000, RT range: method-dependent)
- Remove duplicate features using strict m/z and RT tolerance
- Implement replicate filtering to retain features detected in ≥70% of technical replicates
- Apply abundance thresholding based on blank samples (typically 5-fold above blank)

Quality Control:

Process quality control samples (pooled samples, reference standards) alongside experimental samples
Monitor feature detection consistency across QC replicates (RSD < 30%)
Verify process integrity through known internal standards

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Research Reagents and Computational Tools for NTA

Category	Item/Software	Function/Application
Sample Preparation	Solid Phase Extraction (SPE) cartridges	Compound enrichment and clean-up
	Multi-sorbent strategies (Oasis HLB, ISOLUTE ENV+)	Broad-spectrum extraction coverage
	QuEChERS kits	Rapid sample preparation for complex matrices
Reference Materials	Certified Reference Materials (CRMs)	Method validation and quality control
	Internal standards (isotope-labeled compounds)	Retention time alignment and quantification
	Batch-specific quality control samples	Monitoring instrumental performance
Data Conversion	Proteowizard MSConvert	Conversion of proprietary formats to open formats
	Reifycs Abf Converter	Alternative conversion tool for ABF format
	mzML, mzXML formats	Standardized open data formats
Data Processing	XCMS, MS-DIAL, OpenMS	Feature detection, alignment, and processing
	Computational algorithms	Isotopologue grouping, adduct identification
	Statistical packages (R, Python)	Data analysis and visualization
Advanced Analysis	PCA, t-SNE algorithms	Dimensionality reduction and pattern recognition
	Random Forest, SVC classifiers	Supervised machine learning for source identification
	Feature selection algorithms	Identification of statistically significant features

Statistical and Chemometric Analysis for Pattern Recognition

The integration of statistical and chemometric analysis, particularly through machine learning (ML), is revolutionizing pattern recognition in non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS). This paradigm shift enables researchers to translate complex HRMS datasets into actionable insights for contaminant source identification and chemical exposure assessment. This guide provides a detailed technical framework for ML-oriented NTA, covering core workflows, algorithmic selection, performance validation, and essential research tools to advance data interpretation in environmental and pharmaceutical research.

The transformation of raw HRMS data into interpretable patterns for source identification follows a systematic, multi-stage workflow. This process is critical for managing the high-dimensionality of NTA data, where the number of detected chemical features often far exceeds the number of samples. A structured approach ensures that the chemical signals are accurately processed, modeled, and validated to support environmentally actionable decisions [27]. The workflow can be conceptualized in four primary stages: (i) sample treatment and extraction, (ii) data generation and acquisition, (iii) ML-oriented data processing and analysis, and (iv) result validation [27]. The following diagram illustrates the logical sequence and key components of this integrated workflow.

Core Analytical Workflow and Methodologies

Stage (i): Sample Treatment and Extraction

The goal of this initial stage is to achieve a broad and sensitive extraction of compounds while minimizing matrix interference, thereby laying a reliable foundation for subsequent ML analysis [27].

Detailed Protocol: Solid Phase Extraction (SPE)
- Objective: To concentrate analytes and purify samples.
- Procedure: Pass the prepared sample through a conditioned SPE cartridge. Condition the sorbent (e.g., Oasis HLB) with 5-10 mL of methanol followed by 5-10 mL of reagent water or a buffer at the sample pH. Load the sample at a flow rate of 5-10 mL/min. After loading, dry the cartridge under vacuum or with nitrogen for 10-30 minutes to remove residual water. Elute analytes using 5-10 mL of an organic solvent (e.g., methanol, acetonitrile, or a mixture with acetone). Evaporate the eluent to near dryness under a gentle stream of nitrogen and reconstitute in an injection solvent compatible with the chromatographic system [27].
- Critical Consideration: The inherent selectivity of SPE for certain physicochemical properties (e.g., polarity) can limit broad-spectrum coverage. To mitigate this, a multi-sorbent strategy is recommended, combining sorbents like Oasis HLB with ISOLUTE ENV+, Strata WAX, and WCX [27].
Detailed Protocol: QuEChERS (Quick, Easy, Cheap, Effective, Rugged, and Safe)
- Objective: To provide a efficient, broad-range extraction suitable for high-throughput analysis.
- Procedure: Weigh 10-15 g of sample into a 50 mL centrifuge tube. Add an internal standard if required. For water-containing samples, add salts for partitioning (e.g., 4 g MgSO₄, 1 g NaCl, 1 g sodium citrate, 0.5 g disodium citrate sesquihydrate). Shake vigorously for 1 minute. Centrifuge at >3000 RCF for 5 minutes. Transfer an aliquot of the extract (e.g., 1 mL) to a dispersive-SPE (d-SPE) tube containing cleanup sorbents (e.g., 150 mg MgSO₄, 25 mg PSA, 25 mg C18). Vortex for 30-60 seconds. Centrifuge and filter the supernatant for analysis [27].

Stage (ii): Data Generation and Acquisition

HRMS platforms, such as quadrupole time-of-flight (Q-TOF) and Orbitrap systems, generate the complex datasets required for NTA [27]. The output of this stage is a structured feature-intensity matrix, where rows represent samples and columns correspond to aligned chemical features (defined by mass-to-charge ratio m/z and retention time). This matrix is the fundamental input for all subsequent chemometric and ML analyses [27].

Stage (iii): ML-Oriented Data Processing and Analysis

This stage represents the core of pattern recognition, where statistical and ML models are applied to extract meaningful patterns from the feature-intensity matrix.

Data Preprocessing Protocol: Preprocessing is critical to ensure data quality and model reliability.
- Data Alignment: Correct for retention time drift and align m/z features across samples using algorithms within software like XCMS [27].
- Missing Value Imputation: Replace missing values using methods like k-nearest neighbors (KNN) imputation. A typical protocol involves using the knn function in R (e.g., impute::knn.impute), with k=10 neighbors, to estimate missing values based on feature similarity across samples.
- Normalization: Apply total ion current (TIC) normalization to correct for sample-to-sample variation in overall signal intensity. Calculate the normalized intensity for each feature in a sample as: (Feature Intensity / Total Sum of Intensities in Sample) * 1,000,000, resulting in parts-per-million (ppm) scaled data.
Exploratory Data Analysis and Pattern Recognition: This phase involves both unsupervised and supervised learning.
- Protocol: Unsupervised Clustering with k-Means
  - Objective: To group samples based on intrinsic chemical similarity without prior labels.
  - Procedure: Standardize the feature data (e.g., z-scores). Use the elbow method or silhouette analysis on a subset of data to determine the optimal number of clusters, k. Apply the k-means algorithm (e.g., using sklearn.cluster.KMeans in Python) to the preprocessed and normalized feature-intensity matrix. Visualize the resulting clusters using a PCA score plot.
- Protocol: Supervised Classification with Random Forest
  - Objective: To build a predictive model for classifying samples into known source categories.
  - Procedure: Assign class labels to samples (e.g., "industrial," "agricultural") based on prior knowledge. Split the feature-intensity data into training (70-80%) and testing (20-30%) sets. Train a Random Forest classifier (e.g., using sklearn.ensemble.RandomForestClassifier with n_estimators=100) on the training set. Validate model performance on the held-out test set by calculating accuracy, precision, and recall.

Stage (iv): Result Validation

A tiered validation strategy is essential to ensure the reliability of ML-NTA outputs [27].

Analytical Confidence: Verify compound identities using certified reference materials (CRMs) or matches to spectral libraries [27].
Model Generalizability: Assess the classifier's performance on independent, external datasets. Use cross-validation techniques (e.g., 10-fold cross-validation) during model training to evaluate and mitigate the risk of overfitting [27].
Environmental Plausibility: Correlate model predictions with contextual field data, such as geospatial proximity to known emission sources or the presence of source-specific chemical markers, to ensure predictions are environmentally meaningful [27].

Performance Metrics for NTA and Machine Learning

Evaluating the performance of both the qualitative and quantitative outputs of an NTA study is crucial for assessing its reliability. The table below summarizes key metrics adapted from targeted analysis and their application in NTA, addressing objectives like sample classification and chemical identification [14].

Table 1: Performance Assessment Metrics for NTA Studies

Metric	Definition in Targeted Analysis	Application in NTA	Considerations & Challenges
Selectivity	A method's ability to differentiate a unique chemical from interferents [14].	The confidence in correctly identifying a chemical structure among isomers or similar compounds.	NTA identifications are probabilistic. A reported compound may be an isomer, leading to false positives [14].
Sensitivity (LOD)	The lowest concentration at which a chemical can be reliably detected [14].	The minimum response or concentration at which a feature can be detected and reliably annotated.	Defining a universal Limit of Detection (LOD) is complex due to varying ion efficiencies. "Not detected" does not guarantee absence [14].
Accuracy	Closeness of agreement between a reported concentration and a true value [14].	For classification: Model accuracy in predicting the correct source category. For quantitation: Agreement of semi-quantitative estimates with true concentrations.	Quantitative NTA estimates can have high uncertainty, with true concentrations potentially orders of magnitude different [28].
Precision	The consistency of reported values across repeated measurements [14].	The reproducibility of feature detection and sample classification across analytical replicates or different model runs.	Model predictions may not be fully repeatable over time or transferable between different HRMS instruments [14].

Machine Learning Algorithm Selection Guide

The choice of ML algorithm depends heavily on the specific research objective, data structure, and the need for interpretability. The following diagram outlines a decision pathway for selecting an appropriate algorithm for pattern recognition in NTA data.

Table 2: Key Machine Learning Algorithms for NTA Pattern Recognition

Algorithm	Type	Primary Use Case in NTA	Strengths	Technical References
Principal Component Analysis (PCA)	Unsupervised, Dimensionality Reduction	Exploring data structure, identifying outliers, visualizing sample groupings [27].	Simplifies complex data, reveals major trends without sample labels.	[27]
k-Means Clustering	Unsupervised, Clustering	Grouping samples with similar chemical profiles to hypothesize common sources [27].	Simple and efficient for finding intrinsic patterns in unlabeled data.	[27]
Random Forest (RF)	Supervised, Classification	Classifying samples into predefined source categories with high accuracy [27].	Robust to overfitting, provides feature importance rankings for interpretability.	[27]
Support Vector Classifier (SVC)	Supervised, Classification	Effective for binary classification tasks (e.g., contaminant vs. control) in high-dimensional spaces [27].	Performs well with complex, non-linear decision boundaries.	[27]
Logistic Regression (LR)	Supervised, Classification	A baseline model for classification; useful when model interpretability is paramount [27].	Highly interpretable, outputs probabilities for class membership.	[27]
Partial Least Squares Discriminant Analysis (PLS-DA)	Supervised, Classification	Identifying source-specific indicator compounds through variable importance metrics [27].	Powerful for finding features that best discriminate between known classes.	[27]

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Research Reagent Solutions for NTA Workflows

Item	Function / Application	Technical Notes
Oasis HLB SPE Sorbent	Broad-spectrum extraction of polar and non-polar analytes from water samples [27].	Often used in multi-sorbent strategies with WAX/WCX for comprehensive coverage [27].
QuEChERS Extraction Kits	Rapid, efficient sample preparation for solid and complex matrices (e.g., soil, food, biological tissues) [27].	Reduces solvent usage and processing time, ideal for large-scale environmental studies [27].
ISOLUTE ENV+ / Strata WAX/WCX	Mixed-mode or ion-exchange sorbents used in multi-sorbent SPE to target a wider range of compound classes, particularly ionic species like PFAS [27].	Expands the "detectable space" beyond what single-sorbent SPE can achieve [27] [19].
Certified Reference Materials (CRMs)	Critical for tiered validation, used to confirm compound identities and support quantitative estimates [27] [14].	Essential for establishing analytical confidence in Level 1 identifications [27].
Quality Control (QC) Samples	(e.g., procedural blanks, pool QC samples) Monitor instrument stability, evaluate background contamination, and assess data quality throughout acquisition [27].	Batch-specific QC samples are a fundamental part of data integrity assurance in Stage (ii) [27].

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) is a powerful, discovery-based approach for identifying unknown and suspected chemicals in complex samples without a priori knowledge of their presence [29] [19]. The interpretation of NTA data requires careful annotation of chemical features and systematic assessment of identification confidence. Unlike targeted methods that quantify predefined analytes, NTA attempts to characterize thousands of chemicals simultaneously, inevitably introducing varying degrees of uncertainty regarding chemical structures and concentrations [29]. Communicating this uncertainty consistently is crucial for the scientific acceptance and regulatory use of NTA data.

The fundamental challenge in NTA lies in moving from an observed instrumental signal to a confident chemical identification. This process involves gathering multiple lines of evidence to support or reject potential structures. To address this need, the scientific community has developed confidence frameworks that standardize how identification certainty is communicated [30] [31]. These frameworks enable researchers, reviewers, and end-users to properly evaluate the reliability of reported identifications and facilitate more meaningful comparisons between studies and laboratories.

Established Confidence Frameworks

General Confidence Framework for Small Organic Molecules

The foundational framework for communicating identification confidence in NTA was established by Schymanski et al. (2014) and has been widely adopted across environmental chemistry, metabolomics, and exposomics research [31]. This framework defines multiple confidence levels ranging from confirmed structure (Level 1) to exact mass of interest (Level 5). The general principles of this framework have been specialized for particular chemical classes and applications, including per- and polyfluoroalkyl substances (PFAS) [30] [31].

Specialized PFAS Confidence Framework

PFAS present specific identification challenges due to their complex isomerism, existence in homologous series, and limited availability of reference standards. A specialized confidence framework has been developed to address these nuances (Table 1) [31]. This PFAS-specific scale maintains the same overall structure as the general framework but incorporates criteria particularly relevant to fluorinated compounds, such as the detection of homologous series and characteristic mass defect ranges.

Table 1: Confidence Levels and Criteria for PFAS Identification via HRMS

Confidence Level	Level Name	Required Criteria	Additional Supporting Evidence
Level 1a	Confirmed by reference standard	Match to analytical standard for MS/MS spectrum, retention time, and accurate mass	Isotope pattern confirmation in matrix-matched sample
Level 1b	Indistinguishable from reference standard	Match to standard but existence of indistinguishable isomers	Distinction of branched/linear isomers when possible
Level 2a	Probable structure by library spectrum	Match to library MS/MS spectrum without reference standard	Sufficient diagnostic fragments to rule out isomers
Level 2b	Probable structure by diagnostic evidence	≥3 diagnostic MS/MS fragments supporting specific headgroup	Retention time consistency; homologue detection
Level 2c	Probable structure by homologue evidence	≥2 homologues identified at Level 2a or higher	Characteristic fragmentation pattern within series
Level 3	Tentative candidate	Exact match to suspect list; limited spectral evidence	Class-specific mass defect; isotope pattern
Level 4	Unequivocal molecular formula	Assigned molecular formula based on accurate mass	Elemental composition constraints
Level 5	Exact mass of interest	Mass anomaly or suspect list match	Insufficient evidence for structure or formula

This PFAS-specific confidence framework clarifies distinctions between isomeric forms and provides guidance on leveraging homologous series for identification. For example, branched and linear PFAS isomers that are chromatographically resolvable may be reported with higher confidence than those that co-elute [31]. The framework emphasizes that position of isomerization matters—isomerization in the headgroup typically creates distinct PFAS, while branching in the tail may not, reflecting conventional regulatory practices.

Experimental Workflows for Confidence Ranking

End-to-End NTA Workflow

Establishing identification confidence requires executing a systematic analytical workflow from sample preparation to data interpretation. The following diagram illustrates the complete NTA process with key decision points for confidence assessment:

NTA Confidence Assessment Workflow

Sample Preparation and Data Acquisition

The initial stages of the NTA workflow significantly impact the quality of final identifications. Sample preparation methods must balance comprehensiveness with selectivity:

Generic Extraction Approaches: Solid phase extraction (SPE) using materials capable of multiple interactions (e.g., ion exchange, Van der Waals forces) broadens the range of recoverable compounds [32]. For liquid samples, direct injection is preferred when concentrations permit, while vacuum-assisted evaporation or freeze-drying concentrates more dilute samples.
Chromatographic Separation: Reversed-phase liquid chromatography (LC) with C18 columns and generic gradients (e.g., 0-100% methanol) accommodates a wide polarity range [32]. Gas chromatography (GC) with temperature programming (e.g., 40-300°C) complements LC by extending coverage to more volatile, non-polar compounds.
Mass Spectrometry Acquisition: High-resolution mass spectrometers enable simultaneous full-scan data acquisition with high mass accuracy (≤ 5 ppm) and resolution (≥ 20,000) [32]. Data-dependent acquisition automatically triggers MS/MS fragmentation for the most abundant ions, while data-independent acquisition provides fragmentation data for all detected ions.

Data Processing and Feature Prioritization

Following data acquisition, raw instrument files undergo extensive processing:

Feature Detection: Algorithms detect chromatographic peaks and deconvolute complex signals into individual chemical features characterized by mass-to-charge ratio (m/z), retention time, and intensity [20].
Feature Annotation: Detected features are annotated with potential molecular formulas using accurate mass measurements and isotope pattern matching [19].
Suspect Screening: Annotated features are compared against suspect chemical lists (which can contain thousands of entries) using exact mass matching with narrow mass tolerance (typically < 5 ppm) [19].
Prioritization: Statistical analysis and trend detection (e.g., homologous series, significant concentration changes between sample groups) prioritize features for identification efforts [33].

Essential Research Tools and Reagents

Successful implementation of NTA confidence frameworks requires specific analytical resources and computational tools. The following table details essential components of the NTA research toolkit:

Table 2: Essential Research Toolkit for NTA Confidence Assessment

Tool Category	Specific Examples	Function in Confidence Assessment
Reference Standards	Analytical-grade certified standards	Level 1 confirmation via retention time and spectrum matching [31]
Mass Spectral Libraries	NIST, MassBank, mzCloud	Level 2a identification through MS/MS spectrum matching [32] [31]
Suspect Lists	EPA PFAS, NORMAN, CEUR	Level 3 candidate identification via exact mass matching [33]
Data Processing Software	Compound Discoverer, MZmine, MS-DIAL	Feature detection, alignment, and formula prediction [19]
Quantitative Structure-Activity Relationship (QSAR)	ChemSpace, EPI Suite	Predicting chemical properties and chromatographic behavior [32]
Quality Control Materials	INTERPRET NTA, Standard Reference Materials	QA/QC procedures for method validation and performance tracking [33]

Reference standards represent the most critical resource for achieving the highest confidence levels (Level 1), yet they exist for only a small fraction of potential environmental contaminants [31]. This limitation has driven development of technical approaches that maximize information obtained from available standards, such as read-across methods within homologous series and prediction of retention time behavior based on chemical structure.

Methodologies for Establishing Identification Confidence

Level 1 Confirmation Protocols

Achieving Level 1 confidence requires analytical reference standards analyzed under identical conditions as samples:

Reference Standard Qualification: High-quality standards must be fully characterized for isomer-specific identity and purity using orthogonal analytical approaches [31]. These typically come from recognized manufacturers or are synthesized in-house with comprehensive characterization.
Matrix-Matched Analysis: Standards are analyzed in matrices matching samples to account for matrix effects on retention time and ionization efficiency [31].
Multi-Parameter Matching: Confirmation requires matching of (1) exact mass (± 5 ppm), (2) isotope pattern (similar fit score), (3) retention time (± 0.1 min), and (4) MS/MS spectrum (significant spectral match) [31].

Level 2 Identification Protocols

When reference standards are unavailable, Level 2 identification relies on spectral interpretation and diagnostic evidence:

MS/MS Spectral Interpretation: Fragmentation patterns are interpreted to identify diagnostic fragments supporting proposed structures. For PFAS, this includes characteristic fragments like CF3⁺ (m/z 68.9952), C2F5⁺ (m/z 118.9921), and C3F7⁺ (m/z 168.9889) [31].
Diagnostic Evidence Requirements: Level 2b identification requires at least three diagnostic fragments that collectively support a specific headgroup structure and distinguish it from isomeric possibilities [31].
Homologous Series Analysis: Detection of multiple members of a homologous series (differing by CF2 units, Δm/z 49.9968) provides supporting evidence for Level 2c identifications, with retention time following predictable patterns [31].

Lower Confidence Level Methodologies

When spectral data is insufficient for structural proposals, lower confidence assignments are appropriate:

Molecular Formula Assignment (Level 4): Elemental compositions are assigned using accurate mass measurements with mass error ≤ 5 ppm and isotope pattern fitting [31]. Software tools apply heuristic rules to constrain possible formulas based on chemical合理性.
Exact Mass of Interest (Level 5): Features are flagged based on mass defect filtering (e.g., PFAS typically show negative mass defects between -0.2 to -0.6) or presence in suspect lists without sufficient evidence for structural annotation [30].

Community Initiatives and Harmonization Efforts

Significant efforts are underway to harmonize confidence reporting practices across the NTA community:

The Benchmarking and Publications for Non-Targeted Analysis (BP4NTA) working group has developed consensus definitions for NTA terms and created reporting standards to improve reproducibility [20].
The NORMAN Network has published extensive guidance on suspect and non-target screening in environmental monitoring, including quality assurance procedures [32].
EPA's Enabling Non-Targeted Analysis for PFAS (ENTAiLS) toolkit provides methodology guidance specifically for PFAS identification, representing a "living document" that incorporates best practices [34].

These initiatives collectively address the need for standardized quality assurance/quality control (QA/QC) frameworks, shared compound databases and libraries, and clear linkages between identification confidence and potential decision contexts [33]. Continued community adoption of these harmonized approaches will strengthen the reliability and acceptance of NTA data across scientific, regulatory, and public health domains.

Machine Learning Integration for Enhanced Pattern Recognition and Source Tracking

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful approach for discovering unknown chemical contaminants in environmental, food, and clinical samples. Unlike targeted analysis that focuses on predefined compounds, NTA aims to comprehensively characterize sample chemical composition without a priori knowledge of its content [3]. The principal challenge in contemporary NTA has shifted from detection capabilities to interpreting the vast, complex datasets generated by HRMS instruments [27]. Machine learning (ML) has redefined NTA potential by providing powerful pattern recognition capabilities that can identify latent patterns within high-dimensional data, making it particularly well-suited for contamination source identification and tracking [27] [35].

ML integration addresses critical gaps in traditional NTA workflows, particularly the inability to disentangle complex source signatures using conventional statistical methods. While early NTA data interpretation relied on univariate analysis and unsupervised clustering, these approaches often prioritize abundance over diagnostic chemical patterns, potentially overlooking low-concentration but high-risk contaminants and failing to account for source-specific chemical interactions [27]. ML-enhanced NTA represents a paradigm shift, enabling researchers to translate molecular features into environmentally actionable parameters through systematic computational frameworks [27] [1].

ML-NTA Workflow Architecture

The integration of ML and NTA for contaminant source tracking follows a systematic four-stage workflow that transforms raw instrumental data into attributable contamination sources. This structured approach ensures analytical rigor while maximizing the extraction of meaningful environmental information from complex HRMS datasets [27].

Stage 1: Sample Treatment and Extraction

Sample preparation requires careful optimization to balance selectivity and sensitivity, necessitating a compromise between removing interfering components and preserving as many compounds as possible with adequate sensitivity [27]. Key considerations include:

Purification Techniques: Solid phase extraction (SPE) is widely employed but its inherent selectivity for certain physicochemical properties limits broad-spectrum coverage. Multi-sorbent strategies (e.g., combining Oasis HLB with ISOLUTE ENV+, Strata WAX and WCX) can achieve broader-range extractions [27].
Green Extraction Techniques: Methods including QuEChERS, microwave-assisted extraction (MAE), and supercritical fluid extraction (SFE) improve efficiency by reducing solvent usage and processing time, particularly beneficial for large-scale environmental samples [27].
Matrix Interference Management: Comprehensive analyte recovery while minimizing matrix interference establishes the critical foundation for downstream ML analysis. Without proper sample cleanup, matrix effects can obscure the chemical signals essential for accurate pattern recognition [27].

Stage 2: Data Generation and Acquisition

HRMS platforms, including quadrupole time-of-flight (Q-TOF) and Orbitrap systems, generate complex datasets essential for NTA [27]. Orbitrap systems generally show lower retention time drift than some Q-TOF instruments due to coupling with high-performance liquid chromatography systems, though their higher mass accuracy often necessitates more stringent alignment procedures [27]. Critical data processing steps include:

Post-Acquisition Processing: Centroiding, extracted ion chromatogram (EIC/XIC) analysis, peak detection, alignment, and componentization to group related spectral features into molecular entities [27].
Quality Assurance: Confidence-level assignments and batch-specific quality control samples ensure data integrity, producing a structured feature-intensity matrix where rows represent samples and columns correspond to aligned chemical features [27].
Data Structure: The output is a structured feature-intensity matrix serving as the foundation for ML-driven analysis, with rows representing samples and columns corresponding to aligned chemical features [27].

Stage 3: ML-Oriented Data Processing and Analysis

The transition from raw HRMS data to interpretable patterns involves sequential computational steps that leverage machine learning capabilities:

Initial Preprocessing: Addresses data quality through noise filtering, missing value imputation, and normalization to mitigate batch effects [27].
Exploratory Analysis: Identifies significant features via univariate statistics and prioritizes compounds with large fold changes [27].
Dimensionality Reduction: Techniques like principal component analysis and t-distributed stochastic neighbor embedding simplify high-dimensional data [27].
Clustering Methods: Hierarchical cluster analysis and k-means clustering group samples by chemical similarity [27].
Supervised ML Models: Random Forest and Support Vector Classifier are trained on labeled datasets to classify contamination sources, with feature selection algorithms refining input variables to optimize model accuracy and interpretability [27].

Stage 4: Result Validation

Validation ensures reliability of ML-NTA outputs through a three-tiered approach [27]:

Analytical Confidence: Verified using certified reference materials or spectral library matches to confirm compound identities [27].
Model Generalizability: Assessed by validating classifiers on independent external datasets with cross-validation techniques to evaluate overfitting risks [27].
Environmental Plausibility: Correlates model predictions with contextual data, such as geospatial proximity to emission sources or known source-specific chemical markers [27] [12].

Table 1: Machine Learning Algorithms for NTA Applications

Algorithm Category	Specific Algorithms	NTA Application	Performance Examples
Supervised Learning	Random Forest, Support Vector Classifier, Logistic Regression, Decision Trees	Classification of contamination sources, quantitative structure-retention relationship modeling	85.5-99.5% balanced accuracy for PFAS source classification [27]
Unsupervised Learning	k-means, Hierarchical Cluster Analysis, Principal Component Analysis	Sample clustering, dimensionality reduction, pattern discovery	Grouping samples by chemical similarity without prior labels [27] [36]
Deep Learning	Neural Networks, TensorFlow, PyTorch	Peak alignment, feature extraction, spectrum-structure relationship modeling	Automated feature extraction reducing manual operations [35]
Ensemble Methods	Random Forest, Model stacking	Improving prediction accuracy and robustness	Enhanced compound identification probability [16]

The following workflow diagram illustrates the complete ML-NTA pipeline and the iterative validation process:

Experimental Protocols for ML-NTA Implementation

Protocol 1: Contaminant Source Tracking and Apportionment

Objective: To identify and apportion contamination to specific sources using chemical fingerprinting and ML classification [27] [36].

Materials and Instruments:

High-resolution mass spectrometer (e.g., LC-QTOF or Orbitrap)
Chromatographic separation system
Python environment with scikit-learn, Pandas, and Mass-Suite packages [36]

Procedure:

Sample Collection and Preparation: Collect environmental samples from potential contamination sources and receptors. Perform solid-phase extraction using multi-sorbent strategies to maximize chemical coverage [27].
HRMS Analysis: Analyze samples using reversed-phase liquid chromatography coupled to HRMS in data-dependent acquisition mode. Include quality control samples (pooled quality control, procedural blanks) throughout the sequence [27].
Feature Extraction and Alignment: Process raw data using open-source platforms (e.g., XCMS, MZmine, or Mass-Suite) to detect chromatographic peaks and align features across samples. Export feature intensity table with m/z, retention time, and intensity values [36].
Data Preprocessing:
- Apply k-nearest neighbors imputation for missing values
- Perform total ion current or probabilistic quotient normalization
- Log-transform intensity data to reduce heteroscedasticity
Feature Selection: Apply recursive feature elimination or variable importance measures to identify source-discriminatory compounds [27].
Model Training: Implement Random Forest or Support Vector Classifier using source categories as labels. Optimize hyperparameters via grid search with cross-validation [27] [36].
Model Validation: Apply trained model to independent samples. Assess accuracy using confusion matrices and receiver operating characteristic curves [27].

Expected Outcomes: Source classification balanced accuracy of 85.5-99.5% has been demonstrated for PFAS source tracking [27].

Protocol 2: Enhanced Compound Identification Probability

Objective: To increase confidence in compound identifications by integrating spectral matching with predicted retention time indices using machine learning [16].

Materials and Instruments:

LC-HRMS system capable of MS/MS fragmentation
Reference spectral libraries
Three machine learning models (molecular fingerprint-to-RTI, cumulative neutral loss-to-RTI, and binary classification) [16]

Procedure:

Data Acquisition: Acquire MS/MS spectra for compounds of interest under standardized chromatographic conditions.
Molecular Fingerprint-Based RTI Prediction: Train Random Forest regression model using 790 preselected molecular fingerprints to predict retention time indices for 4,713 calibrants across comparable scales [16].
Cumulative Neutral Loss-Based RTI Prediction: Develop second Random Forest model using cumulative neutral loss masses as features to predict RTI values from experimental MS/MS spectra [16].
Spectral Matching: Perform universal library search algorithm matching between query and reference MS/MS spectra [16].
True Positive Classification: Implement k-nearest neighbors binary classification model incorporating RTI error and spectral matching features to compute probability of true positive identification [16].
Identification Probability Calculation: Determine final identification probability by averaging probabilities across multiple matched spectra [16].

Expected Outcomes: This approach has demonstrated 54.5%, 52.1%, and 46.7% increases in identification probability for pesticides in blank, 10× diluted, and 100× diluted tea matrices, respectively, compared to library matching alone [16].

Table 2: Performance Metrics for ML-Enhanced NTA Applications

Application Area	ML Technique	Performance Metrics	Reference
PFAS Source Tracking	Random Forest, SVC, Logistic Regression	85.5-99.5% balanced accuracy for source classification	[27]
Compound Identification	k-Nearest Neighbors with RTI integration	54.5% average increase in identification probability	[16]
Feature Extraction	Mass-Suite algorithms	99.5% feature extraction accuracy	[36]
Inter-platform Transferability	Random Forest RTI prediction	R² = 0.96 (training), 0.88 (testing) for RTI correlation	[16]

Advanced Data Mining and Source Tracking

Machine learning enables sophisticated data mining approaches that extend beyond basic compound identification in NTA workflows. The Mass-Suite package exemplifies this advancement with specialized modules for unsupervised clustering and source tracking that leverage ML algorithms to extract meaningful patterns from complex HRMS datasets [36].

Unsupervised Clustering for Pattern Discovery

Unsupervised ML approaches including hierarchical clustering analysis and k-means clustering enable pattern discovery in NTA data without prior knowledge of sample groupings [36]. These techniques:

Group samples based on chemical similarity metrics
Identify co-varying chemical features that may share common sources
Reveal latent patterns that might be overlooked in supervised approaches
Provide insights for hypothesis generation in exploratory studies

Source Fingerprinting and Apportionment

The source tracking function within Mass-Suite represents a cutting-edge application of ML in NTA, moving beyond classification to quantitative source apportionment [36]. This approach:

Establishes source "fingerprints" comprising hundreds to thousands of chemical features
Uses both identified and unknown features that collectively provide source specificity
Enables complex mixture quantitation through chemical fingerprinting
Is more robust to dilution and transformation processes compared to single-marker approaches

The following diagram illustrates the relationship between different ML approaches and their applications in the NTA workflow:

Table 3: Essential Research Reagents and Computational Tools for ML-NTA

Tool Category	Specific Tools/Resources	Function in ML-NTA Workflow	Key Features
HRMS Instrumentation	Q-TOF, Orbitrap systems	High-resolution data acquisition for comprehensive chemical analysis	High mass accuracy, resolution; tandem MS capabilities [27]
Open-Source Software	Mass-Suite, XCMS, MZmine, PatRoon	Data processing, feature detection, alignment	Flexible workflows, machine learning integration [36]
Spectral Libraries	NIST, MassBank, in-house databases	Compound identification via spectral matching	Reference spectra for annotation confidence [3] [16]
Machine Learning Frameworks	scikit-learn, TensorFlow, PyTorch	Predictive modeling, pattern recognition	Pre-built algorithms, neural network architectures [35] [36]
Chemical Databases	DSSTox, PubChem, ChEMBL	Structural information, metadata	Chemical properties, hazard data for prioritization [8]
Specialized ML-NTA Tools	INTERPRET NTA	Data quality assessment, chemical prioritization	Integrates metadata, spectral similarity, hazard scoring [8]

Future Perspectives and Challenges

Despite significant advances, several challenges remain in the full operationalization of ML-assisted NTA for environmental decision-making. The most critical gap lies in the absence of systematic frameworks bridging raw NTA data to environmentally actionable parameters [27]. Key areas for future development include:

Model Interpretability and Transparency

Complex models like deep neural networks can achieve high classification accuracy, but their black-box nature limits transparency and hinders the ability to provide chemically plausible attribution rationale required for regulatory actions [27]. Future research should focus on:

Developing explainable AI approaches for ML-NTA
Creating model interpretation frameworks that provide chemical insights
Establishing confidence metrics for model predictions

Validation Strategies

Current validation strategies in ML-assisted NTA studies remain fragmented and overly reliant on laboratory-based tests, which may underperform in real-world conditions involving field-validated source-receptor relationships [27]. Enhanced validation should include:

Tiered validation integrating reference material verification, external dataset testing, and environmental plausibility assessments [27]
Standardized performance metrics across studies
Interlaboratory comparisons for method validation

Integration with Environmental Risk Assessment

ML-assisted NTA shows promise for enhancing risk assessment frameworks through improved contaminant identification and hazard evaluation [1]. Future applications should focus on:

Linking chemical fingerprints to toxicity endpoints
Developing predictive models for mixture effects
Integrating NTA data into quantitative risk assessment

As ML-NTA methodologies continue to mature, they hold tremendous potential for transforming how we monitor, assess, and manage chemical contaminants in the environment, ultimately contributing to more effective environmental protection and public health safeguards [27] [1].

Quantitative non-targeted analysis (qNTA) represents a significant advancement in the field of analytical chemistry, serving as an essential tool for characterizing emerging contaminants in environmental, biological, and product-based samples. While traditional non-targeted analysis (NTA) focuses primarily on chemical identification, qNTA extends this capability by producing quantitative chemical concentration estimates. These estimates provide crucial data that can inform provisional risk-based decisions and prioritize targets for follow-up analysis, effectively bridging the gap between compound discovery and risk assessment [37].

The fundamental difference between NTA and qNTA lies in their analytical outputs. Suspect screening aims to identify potential known compounds, NTA works to discover and identify completely unknown chemicals, and qNTA adds the critical layer of concentration estimation for these identified unknowns. This quantitative dimension enables researchers to answer not just "what is present?" but also "how much is there?" – a essential question for meaningful risk assessment and regulatory decision-making [37].

Core Principles and Workflow of qNTA

The Role of Surrogate-Based Calibration

Many common qNTA and "semi-quantitative" approaches rely on surrogate chemicals for calibration and model predictions. The selection of appropriate surrogates is therefore critical for generating accurate concentration estimates. Historically, surrogates have often been chosen based on a combination of intuition and/or availability rather than rational, structure-based selection. This limitation has constrained the degree to which qNTA can be objectively, mathematically assessed and improved [37].

Recent research has systematically assessed the extent to which chemical structure should inform the selection of qNTA surrogates using datasets from liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. This work involves calculating a chemical space embedding using available LC-HRMS training data and 2D molecular descriptors deemed important to electrospray ionization efficiency. By implementing multiple structure-based surrogate selection strategies and comparing them to random selection using qNTA metrics for accuracy, uncertainty, and reliability, researchers have demonstrated that qNTA models can significantly benefit from rational surrogate selection strategies [37].

Experimental Workflow for qNTA

The following diagram illustrates the comprehensive workflow for quantitative non-targeted analysis, from sample preparation to final risk assessment:

This workflow begins with sample preparation and extraction, followed by analysis using liquid chromatography high-resolution mass spectrometry (LC-HRMS). The resulting data undergoes processing and feature detection before compound identification and structural elucidation. The quantitative phase involves careful surrogate selection and calibration, leading to concentration estimation and culminating in risk assessment and priority ranking of identified compounds [37].

Experimental Protocols and Methodologies

Sample Preparation and LC-HRMS Analysis

Proper sample preparation is crucial for successful qNTA. Protocols vary depending on sample matrix but generally include:

Sample Extraction: Utilize appropriate solvents and extraction techniques based on the chemical properties of expected analytes and sample matrix.
Clean-up Procedures: Implement solid-phase extraction (SPE) or other clean-up methods to remove matrix interferents that may affect ionization efficiency.
Concentration Adjustment: Concentrate or dilute samples to ensure analyte levels fall within the instrumental detection range.

For LC-HRMS analysis, the following parameters should be optimized:

Chromatographic Separation: Employ gradient elution with reverse-phase columns (typically C18) to separate complex mixtures.
Mass Resolution: Ensure high-resolution capabilities (typically >25,000) for accurate mass measurement and formula assignment.
Ionization Mode: Utilize both positive and negative electrospray ionization (ESI) modes to maximize compound coverage.
Mass Accuracy: Maintain mass accuracy within 5 ppm for reliable molecular formula assignment.

Structure-Based Surrogate Selection Protocol

Rational surrogate selection represents a significant advancement in qNTA methodology. The following protocol outlines the structure-based approach:

Chemical Space Embedding Calculation:
- Compile training data for known chemicals (n=385 chemicals as demonstrated in recent research)
- Calculate 2D molecular descriptors relevant to electrospray ionization efficiency
- Create embedded chemical space representing the structural diversity
Leverage Calculation:
- Analyze measured analytes (n=533 chemicals as in EPA's ENTACT study)
- Calculate leverage of each analyte within the embedded chemical space
- Identify representative structures covering the chemical domain
Surrogate Selection Strategies:
- Implement multiple structure-based selection methods
- Compare with random selection approaches
- Evaluate using qNTA metrics for accuracy, uncertainty, and reliability
Coverage Assessment:
- Apply the "leveraged averaged representative distance" (LARD) metric
- Quantify coverage of qNTA surrogates within the defined chemical space
- Optimize surrogate set to maximize chemical space coverage [37]

Essential Research Reagents and Materials

The following table details key research reagent solutions and essential materials used in qNTA experiments:

Reagent/Material	Function in qNTA	Specification Notes
LC-MS Grade Solvents	Mobile phase preparation; sample reconstitution	Acetonitrile, methanol, water with < 1 ppm additives
Surrogate Standards	Calibration and response factor estimation	Preferably structure-informed selection from chemical space
Internal Standards	System performance monitoring; retention time correction	Stable isotope-labeled compounds covering various chemical classes
* SPE Cartridges*	Sample clean-up and concentration	Various chemistries (C18, HLB, etc.) based on application
Reference Mass Compounds	Mass axis calibration during HRMS analysis	Compounds providing precise mass locks in positive/negative modes

Quantitative Approaches and Data Analysis

Key Quantitative Data and Performance Metrics

The table below summarizes quantitative data and performance metrics for qNTA methods:

Quantitative Metric	Target Range	Application in Risk Assessment
Concentration Estimate Accuracy	Typically within ±50% of true value	Determines reliability for risk-based decisions
Chemical Space Coverage (LARD metric)	Higher values indicate better coverage	Ensures representative quantification across diverse structures
Response Factor Variability	Lower variability improves quantification	Impacts uncertainty of concentration estimates
Limit of Quantification (LOQ)	Compound-dependent;越低越好	Determines lowest measurable level for risk screening
Surrogate Selection Efficiency	Structure-based vs. random comparison	Informs optimal approach for specific applications

Recent research has demonstrated that qNTA models benefit significantly from rational surrogate selection strategies. Interestingly, studies have also shown that a large enough random surrogate sample can perform as well as a smaller, chemically informed surrogate sample. This finding provides important practical guidance for researchers designing qNTA studies, suggesting that when sufficient surrogates are available, random selection may be adequate, but when working with limited surrogates, structure-based selection becomes crucial [37].

Structure-Based vs. Random Surrogate Selection

The following diagram illustrates the comparative effectiveness of structure-based versus random surrogate selection strategies in qNTA:

This diagram highlights the key finding that both structure-based selection and random selection with large sample sizes can achieve optimal performance, while small random sample sizes typically yield suboptimal results with higher uncertainty [37].

Advanced Applications and Future Directions

The application of high-resolution mass spectrometry continues to advance qNTA capabilities. Recent innovations in sample pretreatment and analysis have significantly reduced the unknown chemical space, while machine learning models have been increasingly incorporated into HRMS data mining workflows [38].

Effect-directed analysis represents another promising approach that aids in the discovery of toxic fractions, though the identification of specific toxicity drivers remains a challenge. As these methodologies mature, qNTA is poised to become an increasingly powerful tool for comprehensive chemical characterization and risk-based prioritization in complex environmental, biological, and product-based samples [38].

The integration of structure-based surrogate selection with advanced HRMS instrumentation and data processing algorithms will continue to enhance the accuracy, reliability, and application scope of qNTA methodologies. This progression promises to strengthen the bridge between compound discovery and meaningful risk assessment in increasingly complex sample matrices.

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) represents a paradigm shift in analytical chemistry, enabling comprehensive characterization of complex samples without a priori knowledge of their chemical composition [14] [20]. This discovery-based approach has become indispensable across environmental monitoring, exposomics, and clinical biomarker discovery, where researchers face the challenge of identifying unknown or unexpected chemicals that may significantly impact ecosystem and human health [39] [19]. Unlike traditional targeted methods that quantify specific predefined analytes, NTA generates global chemical information, allowing researchers to detect novel chemical stressors, retrospectively screen archived samples, and classify samples based on chemical profiles [14] [39].

The versatility of HRMS platforms makes NTA amenable to virtually any sample medium, including air, water, soil, food, consumer products, and biological specimens [14] [19]. As momentum builds to integrate NTA into chemical monitoring and regulatory decision-making frameworks, standardized approaches for assessing and communicating method performance have become increasingly critical [14] [20]. This technical guide explores the foundational principles, current applications, methodological considerations, and future directions of NTA across three key domains, providing researchers with practical frameworks for implementing these powerful approaches in their own work.

Fundamental Principles and Terminology

Core NTA Concepts and Definitions

Harmonized terminology is essential for accurate communication of NTA methods and results. The Benchmarking and Publications for Non-Targeted Analysis (BP4NTA) Working Group has established consensus definitions for key terms spanning all aspects of NTA workflows [20]:

Non-Targeted Analysis (NTA): A discovery-based approach for detecting and identifying organic chemicals without a priori knowledge of the sample composition. NTA encompasses both suspect screening analysis (SSA) and unknown compound analysis [20] [19].
Feature: A set of grouped, associated m/z-retention time pairs (mz@RTs) that represent a set of MS1 components for an individual compound (e.g., an individual compound and associated isotopologue, adduct, and in-source product ion m/z peaks) or a single mz@RT if no such associations exist [23] [20].
Annotation: The attribution of one or more properties or molecular characteristics to an MS1 feature or MS/MS product ion without conclusive structural identification [23].
Identification: The case where annotated components, features, and/or product ions provide sufficient evidence to attribute a specific compound to a detected feature at a stated confidence level [23].

Performance Assessment in NTA

Evaluating NTA method performance presents distinct challenges compared to targeted analyses due to inherent uncertainties in detecting and identifying unknown compounds [14]. In contrast to targeted methods where performance metrics like accuracy, precision, sensitivity, and selectivity are well-established, NTA requires specialized assessment approaches:

Qualitative Performance: For sample classification and chemical identification, performance can be assessed using confusion matrices, though with limitations due to uncertain ground truth [14].
Quantitative Performance: Assessment adapts targeted method estimation procedures with consideration for additional sources of uncontrolled experimental error [14].
Uncertainty Considerations: If an analyst reports a chemical is present, it may actually be absent (e.g., an isomer or incorrect identification); if reported absent, it may actually be present; and reported concentrations may lack confidence intervals with true values potentially orders of magnitude different [14].

Table 1: Comparison of Targeted Analysis and Non-Targeted Analysis

Characteristic	Targeted Analysis	Non-Targeted Analysis
Objective	Quantify predefined analytes	Discover unknown/unsuspected chemicals
Identification Confidence	High (reference standards)	Variable (level system)
Performance Metrics	Well-established (accuracy, precision, LOD/LOQ)	Evolving frameworks
Chemical Coverage	Limited (dozens to hundreds)	Extensive (thousands of features)
Standardization	Mature protocols	Under development

Environmental Monitoring Applications

Water Analysis

NTA has revolutionized water quality assessment by enabling comprehensive detection of chemical contaminants, including transformation products and newly synthesized compounds that escape conventional targeted methods [19]. The chemical space captured in water samples predominantly includes per- and polyfluoroalkyl substances (PFAS), pharmaceuticals, pesticides, and their transformation products [19]. Successful NTA of water matrices requires careful consideration of sample preparation to concentrate low-abundance contaminants while minimizing interferences [39].

A systematic review of NTA applications found that in water studies, 51% used only LC-HRMS, 32% used only GC-HRMS, and 16% used both platforms to expand chemical coverage [19]. This multi-platform approach is crucial since LC-HRMS better captures polar, water-soluble compounds, while GC-HRMS excels for non-polar, volatile compounds [19]. Effect-directed analysis (EDA) combined with NTA has proven particularly valuable for identifying toxicity drivers in complex water samples, with studies demonstrating that NTA explains a median of 34% of observed toxicity compared to just 13% for targeted analysis alone [40].

Soil, Sediment, and Air Monitoring

In soil and sediment analysis, NTA frequently detects pesticides and polyaromatic hydrocarbons (PAHs), while air monitoring focuses on volatile and semi-volatile organic compounds (VOCs/SVOCs) [19]. The choice of ionization techniques significantly impacts the detectable chemical space:

LC-HRMS studies typically use electrospray ionization (ESI), with many (43%) employing both positive and negative modes to broaden compound coverage [19].
GC-HRMS studies predominantly use electron ionization (EI), sometimes complemented by chemical ionization (CI) to enhance molecular ion information [19].

A critical challenge in environmental NTA is the selective exclusion of polar, highly polar, and ionic compounds when using reverse-phase liquid chromatography (RPLC), which remains overrepresented in HRMS-NTA methods [40]. This analytical bias means many potentially significant environmental contaminants may be overlooked in standard monitoring campaigns.

Experimental Protocol: Water Contaminant Identification

Materials and Methods:

Sample Collection: Grab samples (1L) in amber glass bottles, acid-washed, with sodium thiosulfate added to quench residual chlorine [19].
Sample Extraction: Solid-phase extraction (SPE) using hydrophilic-lipophilic balanced (HLB) cartridges for broad-spectrum retention [19].
Instrumentation: LC-HRMS system with Q-Orbitrap mass analyzer; GC-HRMS system with time-of-flight (TOF) mass analyzer [40] [19].
Chromatography: For LC, C18 column with water/acetonitrile gradient (both with 0.1% formic acid); for GC, mid-polarity column (e.g., DB-35ms) with temperature ramping [19].
Data Acquisition: Full-scan MS (m/z 50-1500) with data-dependent MS/MS for most intense ions [23].

Data Processing Workflow:

Feature Detection: Using MZmine, OpenMS, or vendor software [23].
Retention Time Alignment: Correct minor shifts between runs [23].
Adduct and Isotope Grouping: Associate related ions [23].
Suspect Screening: Against custom and public databases (e.g., NORMAN, CompTox) [20].
Unknown Identification: Using in silico fragmentation tools and spectral networking [41].

Exposomics and Human Biomonitoring

Mapping the Chemical Exposome

The exposome encompasses all non-genetic exposures individuals experience throughout life, constituting a critical determinant of health [39]. HRMS-based exposomics aims to comprehensively profile small-molecule exposure agents (molecular weight ≤1000 Da), their transformation products, and associated biomolecules in human matrices [39]. This approach represents a fundamental shift from hypothesis-driven, quantitation-centric targeted analyses toward data-driven, hypothesis-generating chemical exposome-wide profiling [39].

Recent studies have demonstrated that all-cause mortality is driven more by the exposome than the genome, highlighting the critical importance of comprehensive exposure assessment [39]. The chemical space of the human exposome is vast, with global inventories cataloging over 350,000 compounds and mixtures in commercial production, approximately triple previous estimates [39]. Surprisingly, about 120,000 substances remain inconclusively identified due to corporate confidentiality, creating significant gaps in exposure knowledge [39].

Analytical Challenges in Human Exposomics

Human exposomics faces unique analytical challenges distinct from other applications:

Dynamic Range: Exposome chemicals in blood span up to 11 orders of magnitude, with pollutants typically 1,000 times lower in abundance than compounds from food, drug, and endogenous origins [39].
Matrix Complexity: High-abundance endogenous molecules (e.g., phospholipids in plasma) can interfere with detection of low-abundance xenobiotics, requiring specialized sample cleanup [39].
Analytical Coverage Trade-offs: Balancing coverage, throughput, and sensitivity is particularly challenging given the sporadic occurrences, structural diversity, and wide-ranging physicochemical properties of environmental chemicals in biological matrices [39].

Table 2: Chemical Classes Frequently Detected in Human Exposomics Studies

Chemical Class	Detection Frequency	Major Sources	Analytical Platform
Plasticizers	High	Food packaging, consumer products	LC-ESI(+/-)
Pesticides	High	Diet, residential applications	LC-ESI(+), GC-EI
Halogenated Compounds	Medium	Flame retardants, industrial processes	GC-EI, LC-ESI(-)
Pharmaceuticals	Medium	Medication use	LC-ESI(+)
PFAS	Medium	Stain-resistant coatings, firefighting foam	LC-ESI(-)
Personal Care Products	Medium	Cosmetics, hygiene products	LC-ESI(+)

Experimental Protocol: Serum Exposome Profiling

Sample Preparation:

Protein Precipitation: 100μL serum with 300μL cold acetonitrile, vortex, centrifuge [39].
Phospholipid Removal: HybridSPE-Precipitation plates to reduce ion suppression [39].
Concentration: Nitrogen evaporation at 40°C, reconstitution in initial mobile phase [39].

LC-HRMS Analysis:

Chromatography: Reverse-phase C18 column (2.1×100mm, 1.7μm) with water/methanol gradient, both with 5mM ammonium acetate [39].
Ionization: Dual electrospray ionization (ESI+ and ESI-) [19].
Mass Analysis: Q-Exactive Orbitrap with resolution ≥70,000 at m/z 200 [39].
Data Acquisition: Full scan (m/z 100-1500) with Top-5 data-dependent MS/MS [39].

Data Processing and Annotation:

Feature Detection: Using XCMS or Compound Discoverer with 5ppm mass tolerance [23].
Blank Subtraction: Remove features present in procedural blanks [23].
Database Searching: Against hierarchical databases (MassBank, HMDB, CompTox) [39].
Confidence Scoring: Apply Schymanski et al. confidence levels for identification [20].

Clinical Biomarker Discovery

Metabolic Phenotyping and Biomarker Identification

While the provided search results focus primarily on environmental and exposomics applications, NTA principles in clinical biomarker discovery share fundamental methodologies with exposomics research. The integration of NTA into clinical studies enables unprecedented discovery of metabolic signatures associated with disease states, treatment response, and environmental exposures [39]. The transdisciplinary field of exposomics provides a framework for discovering environmental drivers of disease through unbiased, scalable analytical approaches [39].

The transition from biomarker discovery to clinical validation requires careful consideration of analytical performance and standardization. As noted in NTA research, meaningful evaluation of study performance is predicated on harmonized terminology and clear guidance about best practices for analysis and reporting results [20]. This is particularly crucial in clinical applications where findings may inform diagnostic or therapeutic decisions.

Methodological Considerations for Clinical Applications

Clinical biomarker discovery using NTA requires special attention to several methodological aspects:

Sample Cohort Design: Appropriate sample size with balanced case-control groups and inclusion of relevant covariates [14].
Pre-analytical Variables: Standardization of collection, processing, and storage protocols to minimize technical variation [39].
Quality Control: Incorporation of pooled quality control samples, technical replicates, and reference standards throughout the analytical workflow [20].
Batch Effects: Implementation of randomization schemes and statistical correction to account for instrumental drift [39].

Data Processing, Visualization, and Interpretation

NTA Data Processing Workflows

Effective data processing is essential for transforming raw HRMS data into meaningful chemical information. A typical NTA data processing workflow includes three primary segments [23]:

Data Processing: Transforming raw data into a list of features with associated abundance information, including steps such as:
- Retention time alignment
- Signal thresholding
- Isotopologue and adduct grouping
- Between-sample alignment
- Gap-filling [23]
Statistical and Chemometric Analysis: Identifying trends, clusters, and relationships between samples and/or detections [23].
Annotation and Identification: Attributing molecular characteristics or specific compound identities to detected features [23].

A significant challenge in NTA data processing is the terminology differences across software platforms, where different tools may use the same term to describe different steps [23]. This underscores the importance of detailed methodological reporting to ensure reproducibility.

Effective Data Visualization Strategies

Data visualization plays a crucial role throughout the NTA workflow, providing core components for data inspection, evaluation, and sharing [41]. Effective visualization strategies include:

Perceptually Uniform Colormaps: Scientifically derived colormaps like cividis with perceptually linear color gradients are preferred over traditional rainbow colormaps (e.g., jet), which can misrepresent data and challenge viewers with color vision deficiencies [42] [43].
Volcano Plots: Displaying statistical significance versus magnitude of change for differential analysis [41].
Cluster Heatmaps: Visualizing patterns and relationships in complex datasets [41].
Spectral Networks: Organizing and showcasing relationships between MS/MS spectra [41].

Visualizations extend human cognitive abilities by translating data to a more accessible visual channel, particularly important for abstract data components like multi-dimensional chromatographic outputs or MS/MS spectral data [41].

Table 3: Essential Research Reagent Solutions for NTA Workflows

Reagent/Category	Function	Application Examples
HLB SPE Cartridges	Broad-spectrum extraction of organic compounds	Water analysis, serum proteome precipitation
HybridSPE-Precipitation Plates	Phospholipid removal from biological samples	Serum/plasma exposome analysis
Stable Isotope-Labeled Standards	Quality control, retention time calibration	Internal standards for performance monitoring
QC Pooled Samples	Monitoring instrumental performance	Inter-batch normalization
Reference Standard Mixtures	MS/MS spectral library generation	Compound identification verification
Mobile Phase Additives	Modifying chromatography and ionization	Formic acid, ammonium acetate buffers

Analytical Gaps and Future Perspectives

Current Limitations in NTA

Despite significant advances, NTA still faces several critical limitations that constrain its application:

Spectral Library Gaps: Severe limitations exist for liquid chromatography data compared to gas chromatography, with insufficient spectral library databases for confident identification [40] [19].
Analytical Coverage Biases: Overrepresentation of RPLC in HRMS-NTA contributes to selective exclusion of polar, highly polar, and ionic compounds [40].
Software Accessibility: Vendor software predominates in NTA research (used in 57 of 95 reviewed studies), with limited adoption of open-source alternatives [19].
Standardization Challenges: Lack of universally accepted performance metrics and QA/QC guidelines impedes inter-laboratory comparisons and method validation [14] [20].

Emerging Solutions and Future Directions

Promising developments are addressing current NTA limitations:

Multi-platform Approaches: Combining LC-HRMS and GC-HRMS expands detectable chemical space, though currently employed in only 16% of studies [19].
Advanced Identification Tools: In silico fragmentation prediction and retention time prediction software are partially alleviating spectral library limitations [40].
Community Harmonization Efforts: Initiatives like the BP4NTA Working Group are developing consensus definitions, reporting standards, and performance assessment frameworks [20].
Expanded Chemical Space Mapping: Tools like ChemSpaceTool, PubChemLite for Exposomics, and the CompTox Chemicals Dashboard are making the search space of exposome compounds more accessible and actionable [39].

Non-targeted analysis using high-resolution mass spectrometry has emerged as a transformative approach across environmental monitoring, exposomics, and clinical biomarker discovery. By enabling comprehensive characterization of complex samples without a priori knowledge of chemical content, NTA provides unprecedented capabilities for discovering unknown environmental contaminants, mapping the human exposome, and identifying novel metabolic signatures of disease. As the field continues to mature, ongoing efforts in method harmonization, performance assessment standardization, and expanded chemical space coverage will be essential for realizing the full potential of NTA in protecting human health and the environment. The integration of advanced data visualization strategies, multi-platform analytical approaches, and community-wide collaboration frameworks will further enhance the impact and applicability of NTA across diverse scientific disciplines.

Overcoming Challenges and Optimizing NTA Data Quality

Common Data Quality Issues and Diagnostic Approaches

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful approach for identifying unknown and unexpected chemical compounds in complex samples. Unlike targeted methods that focus on predefined analytes, NTA generates comprehensive chemical profiles without prior knowledge of sample composition [12] [14]. This capability makes NTA particularly valuable for environmental monitoring, pharmaceutical development, and emergency response scenarios where unknown chemical releases may occur [12].

However, the interpretive power of NTA depends entirely on data quality. Various technical challenges can compromise results, leading to false positives, missed detections, and erroneous quantitations [14]. This technical guide examines common data quality issues in HRMS-based NTA and provides diagnostic approaches to address them, framed within the broader context of NTA data interpretation research.

Common Data Quality Issues in HRMS-Based NTA

The complex, multi-step workflow of HRMS-based NTA introduces numerous potential sources of error that can affect data quality and interpretation.

Chemical Noise and Matrix Interference

Complex sample matrices can obscure chemical signals through ion suppression or enhancement, particularly in biological and environmental samples [44]. Matrix effects alter ionization efficiency, leading to inaccurate compound quantification and detection. Additionally, chemical noise from solvents, contaminants, and sample processing materials can generate false features or mask true signals of interest [45].

Instrument Performance Variability

HRMS instruments exhibit performance fluctuations that affect data quality. Mass accuracy drift, retention time shifting, and intensity variations can occur due to environmental changes, calibration status, or instrument aging [14] [44]. Without proper monitoring and correction, these variations reduce confidence in compound identification and quantification across multiple analytical batches.

Data Processing Artifacts

Feature detection algorithms may generate false positives through incorrect peak picking, alignment errors, or adduct misassignment [27]. Conversely, these algorithms can also produce false negatives by missing low-abundance features or failing to separate co-eluting compounds. Inconsistent data processing parameters across samples or batches introduces additional variability that complicates result interpretation [14].

Identification Uncertainties

Without authentic standards, compound identification relies on spectral matching and in-silico fragmentation prediction, both of which have inherent limitations [14]. Isomeric compounds often produce similar fragmentation patterns, leading to ambiguous annotations. Database incompleteness further exacerbates this issue, as unknown compounds cannot be matched to reference spectra [1].

Quantitative Inaccuracies

NTA methods typically provide semi-quantitative estimates rather than precise concentration measurements [14]. Response factors vary significantly across chemical classes, making accurate quantification without compound-specific standards challenging. This limitation is particularly problematic for risk assessment applications where precise concentration data is essential [14].

Table 1: Common Data Quality Issues and Their Impacts on NTA Results

Quality Issue	Primary Causes	Impact on Data Interpretation
Chemical Noise	Matrix effects, contaminant ions, solvent impurities	Reduced signal-to-noise ratio, false feature detection
Mass Accuracy Drift	Instrument calibration status, environmental fluctuations	Incorrect molecular formula assignment, reduced identification confidence
Retention Time Shifts	Chromatographic system variability, column aging	Misalignment across samples, incorrect peak matching
Feature Misannotation	Incorrect adduct assignment, isotope pattern misidentification	Wrong molecular formula assignment, structural misidentification
Spectral Library Gaps	Limited reference databases, absent authentic standards	Reduced identification rates, uncertain compound annotation

Diagnostic Approaches for Data Quality Assessment

Implementing systematic quality assessment protocols is essential for identifying and mitigating data quality issues in NTA workflows.

Quality Control Samples and Reference Materials

Incorporating quality control (QC) samples throughout the analytical batch enables continuous monitoring of system performance [44]. Pooled QC samples, prepared by combining small aliquots of all study samples, assess overall system stability. Process blanks identify contamination sources, while spiked samples with internal standards monitor extraction efficiency and matrix effects [44].

Data Quality Metrics Tracking

Monitoring specific technical parameters throughout analysis provides quantitative assessment of data quality. Key metrics include mass error (typically < 5 ppm for Orbitrap instruments), retention time stability (relative standard deviation < 2%), and peak intensity variance across replicate injections [44]. Establishing acceptance thresholds for these metrics ensures consistent data quality across batches.

Confidence Level Assignment for Identifications

Implementing a standardized confidence framework for compound identification clarifies uncertainty levels. The Schymanski et al. (2014) scale is widely adopted, ranging from Level 1 (confirmed structure with reference standard) to Level 5 (exact mass of interest but no structural information) [27]. This tiered system communicates identification certainty to stakeholders and supports appropriate data interpretation.

Multidimensional Performance Assessment

A comprehensive evaluation of NTA performance should address four key aspects: quality, boundary, accuracy, and precision [44]. Quality assessment verifies adherence to QA/QC protocols. Boundary evaluation defines the chemical space covered by the method. Accuracy measurement compares results to known values, while precision assessment examines repeatability and reproducibility [44].

Table 2: Key Performance Metrics for NTA Data Quality Assessment

Performance Aspect	Assessment Method	Recommended Frequency
Mass Accuracy	Analysis of reference compounds with known m/z values	Each analytical batch
Retention Time Stability	Monitoring of internal reference compounds	Throughout analytical sequence
Signal Intensity Reproducibility	Relative standard deviation of QC sample features	Every 10-12 samples
Feature Detection Consistency	Comparison of features detected in replicate analyses	Each sample type
Blank Contamination	Analysis of process blanks with study samples	Each extraction batch

Experimental Protocols for Data Quality Evaluation

Protocol 1: System Suitability Testing

Purpose: Verify instrument performance before sample analysis. Materials: Reference standard mixture containing compounds spanning relevant chemical space. Procedure:

Prepare reference standard at appropriate concentration in mobile phase.
Inject reference standard at beginning of analytical sequence.
Evaluate mass accuracy (< 5 ppm error), retention time stability (< 2% RSD), and signal intensity (> 10^4 counts for base peak).
Repeat analysis every 10-12 samples to monitor performance drift. Acceptance Criteria: Mass error ≤ 5 ppm, retention time RSD ≤ 2%, intensity RSD ≤ 15% for reference compounds [44].

Protocol 2: Feature Quality Assessment

Purpose: Distinguish real chemical features from artifacts. Materials: Study samples, process blanks, pooled QC samples. Procedure:

Process all data using consistent parameters (mass tolerance, retention time window, intensity threshold).
Compare features detected in study samples against process blanks.
Filter features present in blanks (blank subtraction) or occurring in single samples (rare features).
Assess feature reproducibility in replicate injections and pooled QCs. Acceptance Criteria: Features must be detected in ≥ 2/3 replicate injections with intensity RSD < 30% [44].

Protocol 3: Identification Confidence Assessment

Purpose: Assign confidence levels to compound annotations. Materials: HRMS/MS data, spectral databases (e.g., NIST, MassBank), computational tools. Procedure:

Level 1: Match retention time and MS/MS spectrum to authentic standard analyzed under identical conditions.
Level 2: Match MS/MS spectrum to library spectrum without reference standard.
Level 3: Propose candidate structure based on diagnostic evidence (e.g., fragmentation pathways).
Level 4: Assign molecular formula based on accurate mass and isotope pattern.
Level 5: Exact mass only, no structural information [27]. Acceptance Criteria: Report all identifications with appropriate confidence level designation.

NTA Data Quality Assessment Workflow

The Scientist's Toolkit: Essential Research Reagent Solutions

Table 3: Key Research Reagents and Materials for NTA Quality Assurance

Reagent/Material	Function	Application Example
Quality Control Reference Standards	Monitor instrument performance and data quality	System suitability testing with known compounds
Internal Standard Mixture	Correct for matrix effects and injection variability	Isotopically-labeled compounds spiked into all samples
Solid Phase Extraction Cartridges	Concentrate analytes and remove matrix interferents	Sample preparation for trace analysis (HLB, WAX, WCX)
Retention Index Calibration Mix	Standardize retention time alignment across samples	Hydrocarbon series for GC-HRMS or homologous series for LC-HRMS
Spectral Libraries	Support compound identification through spectrum matching	NIST, MassBank, mzCloud databases
Certified Reference Materials	Validate method accuracy and performance	EPA methods, NIST standard reference materials

Advanced Approaches: Machine Learning and Prioritization Strategies

Emerging computational approaches enhance data quality in NTA workflows. Machine learning algorithms improve compound identification accuracy by recognizing complex patterns in HRMS data [1] [27]. Prioritization strategies help focus resources on the most relevant chemical features, addressing the data overload challenge in NTA [45] [9].

Seven key prioritization strategies have been identified: (1) target and suspect screening using reference databases; (2) data quality filtering to remove artifacts; (3) chemistry-driven prioritization focusing on specific compound classes; (4) process-driven prioritization using spatial/temporal comparisons; (5) effect-directed prioritization linking features to biological effects; (6) prediction-based prioritization using quantitative structure-property relationships; and (7) pixel- or tile-based analysis for complex chromatographic data [45] [9].

Prioritization Strategy Workflow

Data quality is fundamental to generating reliable, interpretable results in HRMS-based non-targeted analysis. By understanding common quality issues and implementing systematic diagnostic approaches, researchers can improve the accuracy and reproducibility of NTA studies. Integrating robust quality assurance protocols, standardized confidence frameworks, and advanced computational approaches addresses the inherent challenges of NTA and supports its transition from exploratory research to regulatory applications. As the field evolves, continued development of standardized performance metrics and validation procedures will further enhance the reliability and interpretability of NTA data.

Optimizing Data Processing Parameters and Thresholds

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful approach for detecting unknown and unexpected compounds in complex samples, filling critical data gaps not easily addressed by targeted methods [5]. The principal challenge of contemporary NTA lies not in detection itself, but in developing computational methods to extract meaningful environmental information from the vast chemical datasets generated by HRMS instruments [27]. A typical LC-HRMS dataset comprises a series of high-resolution mass spectra collected over time, resulting in abstract feature triplets consisting of retention time (rt), mass-to-charge ratio (m/z), and intensity (I) for each detected substance [46]. The fundamental goal of data processing is to transform this raw, complex data into a structured list of chemically relevant features—groupings of associated MS1 components like isotopologues and adducts—which can then be used for statistical analysis, annotation, and identification [23].

The data processing workflow is critical yet challenging, involving numerous user-defined parameters with poorly understood interactions [46]. Variations in these parameters can significantly impact the final results, making optimization essential for reliable outcomes. Unlike targeted analyses, where performance metrics like selectivity, sensitivity, accuracy, and precision are well-defined, NTA methods lack standardized performance assessment procedures, creating a barrier to broader adoption and confident interpretation of results [5]. This guide addresses these challenges by providing a detailed framework for optimizing data processing parameters and thresholds, ensuring researchers can generate high-quality, reproducible data suitable for their specific research objectives, whether sample classification, chemical identification, or quantitative estimation.

Core Data Processing Workflow and Algorithms

The transition from raw HRMS data to interpretable chemical information follows a structured workflow involving sequential computational steps. This process intentionally reduces the data, transforming raw spectral information into aligned features ready for statistical analysis and chemical interpretation [27] [23]. Each stage employs specific algorithms with associated parameters that require careful optimization to balance sensitivity, specificity, and computational efficiency.

Data Format Conversion and Centroiding

Before core processing begins, data often requires format conversion from proprietary vendor formats (.raw, .d, .wiff) to open standards like mzML or mzXML [23]. This step, while not intentionally interpretive, may cause data loss if not carefully evaluated. Subsequently, centroiding is typically the first user-applied processing step for data collected in profile mode. This process reduces the number of data points by a factor of 10–150 by converting high-resolution mass peak profiles into single centroids defined by an m/z value and intensity [46].

Table 1: Common Centroiding Algorithms and Their Characteristics

Algorithm	Underlying Principle	m/z Determination	Key Considerations
Continuous Wavelet Transform (CWT)	Uses wavelet transforms for peak detection [46].	Local maximum analysis of the scalogram (measured value) [46].	Faster but may have larger m/z errors compared to interpolation methods [46].
Full Width at Half Maximum (FWHM)	Identifies peaks based on width at half height [46].	Interpolation of the center within the peak profile's FWHM range [46].	An "exact mass" method; provides interpolated m/z values [46].
Savitzky-Golay Derivative	Detects zero-crossings of the first-order derivative [46].	Interpolation between data points [46].	Improves m/z accuracy; based on well-established peak detection principles [46].
Non-Linear Regression (e.g., Cent2Prof)	Fits a Gaussian peak model to the profile data [46].	Regression coefficients from the model fit [46].	Retains peak width information; computationally intensive [46].
Linearized Regression	Linearizes the Gaussian function via log-transform [46].	Faster linear regression coefficients [46].	Significant time savings (factor 100–1000) vs. non-linear regression [46].

The choice of centroiding algorithm influences mass accuracy, a critical factor for subsequent compound identification. Vendor-specific algorithms are often used, but open alternatives like CWT and FWHM are widely implemented in common tools like MzMine and msConvert [46]. Furthermore, centroiding involves an inherent trade-off: while it achieves crucial data compression, it also results in information loss, such as peak width details that convey mass accuracy and precision. Advanced methods like Cent2Prof aim to mitigate this by retaining peak width as a regression coefficient [46].

Chromatographic Peak Detection and Alignment

Following centroiding, the processed data undergoes chromatographic peak detection. This step identifies features in the chromatographic dimension by analyzing extracted ion chromatograms (XICs). A key parameter here is signal thresholding, which removes signals below a designated abundance threshold (absolute value) or a signal-to-noise (S/N) ratio [23]. Setting this threshold is critical; too high a value risks missing low-abundance but chemically significant features, while too low a value drastically increases noise and false positives. Other relevant steps include chromatogram smoothing to reduce noise and shoulder peaks filtering, particularly for data from Fourier transform MS instruments [23].

After peak detection, retention time (RT) alignment corrects for minor shifts in chromatographic retention times across different samples within a batch. This is essential for ensuring that the same chemical feature is correctly matched across all samples. Alignment can be based on user-selected or algorithm-selected representative compounds [23]. It is important to note that different mass spectrometers exhibit different retention time stability; for instance, Orbitrap systems coupled with high-performance liquid chromatography often show lower retention time drift than some Q-TOF systems, which may influence alignment stringency [27].

Feature Grouping and Gap Filling

A single compound generates multiple signals in HRMS, including various adducts, isotopologues, and in-source fragments. The next processing steps group these related signals into a single "feature" representing the molecular entity.

Isotopologue Grouping: This groups signals from different isotopic forms of the same molecule [23]. Parameters define the expected mass differences and relative abundances of isotopes (e.g., for Carbon-13).
Adduct Grouping: This groups signals from different ionized forms of the same molecule (e.g., [M+H]+, [M+Na]+) [23]. Parameters specify a list of potential adducts and their mass differences.
Duplicate Feature Removal: This step filters out duplicate features based on designated m/z and RT windows [23].

Gap filling is a crucial recursive step that attempts to detect features missed during initial peak picking, often by using slightly lower intensity or S/N thresholds than the initial settings [23]. This helps correct for instances where a compound is present in multiple samples but only detected above the threshold in some.

Between-Sample Alignment and Filtering

The final stages of processing prepare the feature list for statistical analysis. Between-sample alignment compares detected features across all samples in the study, grouping them based on allowed variances in m/z and RT [23]. This creates a consolidated feature list across the entire dataset.

Following alignment, several filtering steps are applied to enhance data quality:

Replicate Filtering: This evaluates feature frequency across analytical replicates. Features not meeting a designated replicate frequency threshold (e.g., detected in 2 out of 3 replicates) are set to non-detect [23]. This improves data robustness.
Blank Comparison/Abundance Thresholding: Features are filtered based on their abundance compared to procedural or solvent blanks. A common approach is to remove features with an intensity in the sample only a few times higher (e.g., 3-5x) than in the blank [23].

The final output is a feature-intensity matrix, where rows represent samples and columns correspond to aligned chemical features, serving as the foundation for all subsequent statistical and chemometric analyses [27].

Parameter Optimization and Experimental Protocols

Optimizing data processing parameters is not a one-time task but an iterative process essential for ensuring data quality. The following protocols provide methodologies for systematically evaluating and optimizing key parameters.

Protocol for Assessing Peak Detection Sensitivity

Objective: To determine the optimal balance between sensitivity and selectivity for chromatographic peak detection parameters (intensity threshold and S/N ratio) and gap filling.

Methodology:

Spiked Standard Preparation: Prepare a set of calibration standards containing a mixture of known compounds covering a range of physicochemical properties and concentrations, including concentrations near the expected limit of detection.
Data Acquisition and Processing: Analyze these standards using your HRMS method. Process the resulting data multiple times, varying only the intensity threshold or S/N ratio for initial peak detection.
Performance Evaluation: For each processing parameter set, calculate:
- True Positives (TP): Number of spiked compounds correctly detected.
- False Positives (FP): Number of features detected in the blank or not corresponding to a spiked standard.
- False Negatives (FN): Number of spiked compounds not detected.
Parameter Selection: Plot the number of TP, FP, and FN against the parameter value (e.g., intensity threshold). The optimal parameter is often at the point where the TP curve is high and the FP curve begins to rise sharply, indicating maximum true detections with minimal false positives. Repeat this process for gap filling parameters.

Protocol for Evaluating Alignment Accuracy

Objective: To optimize parameters for retention time alignment and between-sample alignment (m/z and RT tolerance windows).

Methodology:

QC Sample Preparation: Use a pooled quality control (QC) sample analyzed multiple times throughout the batch.
Data Processing with Variation: Process the dataset multiple times, systematically varying the allowed m/z (e.g., 5, 10, 15 ppm) and RT (e.g., 0.1, 0.2, 0.3 min) tolerances for alignment.
Accuracy Assessment: For each parameter combination, assess alignment accuracy by measuring the feature intensity relative standard deviation (RSD) across the QC injections. Poor alignment will result in high RSDs for many features because the same chemical signal is incorrectly split into multiple features. The optimal tolerance windows minimize the number of features with high RSD in the QCs while avoiding the merging of distinct features (which would also increase RSDs).
Chemical Verification: Manually inspect the alignment of several known features to confirm automated assessments.

Performance Assessment Using Confusion Matrix

For qualitative NTA objectives like sample classification or chemical identification, performance can be assessed using a confusion matrix, adapted from traditional metrics [5].

Table 2: Performance Assessment for Qualitative NTA Using a Confusion Matrix

Performance Metric	Calculation	Interpretation in NTA Context
Accuracy	(TP + TN) / (TP + FP + FN + TN)	Overall correctness of sample classification or feature detection.
Precision	TP / (TP + FP)	Proportion of detected/classified features that are correct (reliability).
Recall (Sensitivity)	TP / (TP + FN)	Proportion of true features that were successfully detected (completeness).
False Discovery Rate (FDR)	FP / (TP + FP)	Proportion of detected features that are incorrect.

Application: This framework can be used to evaluate a data processing workflow's performance against a ground-truth dataset, such as the spiked standard set from Protocol 3.1. It helps quantify the trade-offs inherent in parameter selection.

The Scientist's Toolkit: Essential Research Reagents and Materials

A robust NTA study relies on more than just optimized software parameters. Several physical reagents and reference materials are essential for quality control, method validation, and performance assessment throughout the data processing workflow.

Table 3: Essential Research Reagents and Materials for NTA

Item	Function in NTA Workflow
Certified Reference Materials (CRMs)	Used to verify analytical confidence and confirm compound identities during the validation stage [27].
Internal Standard Mixtures	Injected into every sample to monitor instrument performance, correct for signal drift, and aid in retention time alignment [27] [5].
Quality Control (QC) Samples	A pooled sample from all study samples or a standard reference material analyzed repeatedly throughout the batch. Used to monitor system stability, evaluate feature reproducibility (e.g., via RSD), and for signal correction [27] [5].
Procedural & Solvent Blanks	Samples taken through the entire sample preparation and analysis process without the sample matrix. Critical for identifying and filtering out background contamination and instrumental artifacts during data filtering [23].
Spiked Standard Mixtures	Custom mixtures of known compounds, used to assess method performance metrics like detection limits, accuracy, and precision for the data processing workflow [5].

Machine Learning Integration and Advanced Validation

Machine learning (ML) is redefining the potential of NTA by identifying latent patterns within high-dimensional data, making it particularly well-suited for tasks like contamination source identification [27]. The integration of ML necessitates additional considerations for data processing.

ML-Oriented Data Preprocessing

Prior to ML analysis, the feature-intensity matrix requires specific preprocessing to ensure data quality and model robustness [27]. This includes:

Missing Value Imputation: Techniques like k-nearest neighbors (k-NN) are used to estimate missing values, which may arise from compounds being present in some samples but below the detection threshold in others [27].
Normalization: Methods like Total Ion Current (TIC) normalization are applied to correct for overall variations in sample concentration and instrument response [27].
Data Scaling: Features often require scaling (e.g., mean-centering, unit variance) to prevent models from being biased by the arbitrary magnitude of ion intensities.

Tiered Validation Strategy for ML-NTA

Given the complexity of ML-NTA workflows, a robust, multi-tiered validation strategy is crucial for ensuring reliable results [27].

Analytical Confidence Verification: Compound identities suggested by the ML model should be verified using certified reference materials (CRMs) or spectral library matches where possible [27].
Model Generalizability Assessment: Classifiers should be validated on independent external datasets not used during model training. Cross-validation techniques (e.g., 10-fold) are essential to evaluate overfitting risks [27].
Environmental Plausibility Checks: Model predictions must be correlated with contextual data, such as geospatial proximity to known emission sources or the presence of source-specific chemical markers. This bridges analytical rigor with real-world relevance [27].

Optimizing data processing parameters and thresholds is a critical, multi-faceted endeavor in non-targeted analysis. From the initial centroiding of raw data to the final alignment and filtering of features, each step contains user-defined parameters that directly impact the quality, reliability, and interpretability of the results. As the field moves towards greater integration with machine learning and aims to provide actionable insights for environmental and health decision-making, the importance of systematic parameter optimization and rigorous, tiered validation cannot be overstated. By adopting the structured protocols and frameworks outlined in this guide—such as using spiked standards for sensitivity assessment, QC samples for alignment evaluation, and confusion matrices for performance quantification—researchers can advance the field of NTA, improve the comparability of results across different studies, and enhance the translation of complex HRMS data into meaningful scientific knowledge.

Managing Complex Matrices and Interferences

In the realm of non-target analysis (NTA) using high-resolution mass spectrometry (HRMS), complex matrices represent one of the most significant challenges for accurate data interpretation and compound identification. Matrix effects (MEs)—defined as the combined influence of all sample components other than the analyte on the measurement of quantity—can substantially alter ionization efficiency in the mass spectrometer source when interference species co-elute with target analytes [47]. These effects manifest primarily as ion suppression or, less frequently, ion enhancement, directly impacting method reproducibility, linearity, selectivity, accuracy, and sensitivity during validation [47]. For NTA, which aims to identify unknown or suspected environmental contaminants, pharmaceuticals, and transformation products without analytical standards, these interferences pose particular difficulties [1] [27]. The inherent variability of real-world samples, such as urban runoff, biological fluids, or environmental extracts, introduces unpredictable matrix components that can obscure the detection of low-abundance compounds and complicate the translation of raw HRMS data into actionable environmental insights [48] [27].

The following diagram illustrates the core challenge of matrix effects in the LC-ESI-MS process and the fundamental strategies to manage them:

Diagram: Matrix effects occur when sample components co-elute with analytes in LC-ESI-MS, leading to ion suppression/enhancement. Management follows compensation or minimization strategies.

Classification and Mechanisms of Interferences

Interferences in mass spectrometry can be systematically categorized into two primary classes: spectroscopic and nonspectroscopic interferences, each with distinct mechanisms and impacts on analytical results [49].

Spectroscopic Interferences

Spectroscopic interferences contribute directly to a specific analyte signal by sharing the same mass-to-charge ratio (m/z) as the analyte ion [49]. These are further subdivided into three types:

Isobaric Interferences: Occur when different elements or molecules have isotopes with the same nominal mass (e.g., ¹⁰⁰Mo and ¹⁰⁰Ru) [49]. These can typically be avoided by selecting an alternate analyte isotope without isobaric overlap.
Doubly Charged Interferences: Formed when elements with low second ionization potentials produce doubly charged ions (e.g., ¹³⁶Ba²⁺ interfering with ⁶⁸Zn⁺) [49]. The mass spectrometer measures mass-to-charge ratio, so these ions appear at half their actual mass.
Polyatomic Interferences: Arise from molecular ions formed from plasma gases, solvent components, or sample matrix [49]. These are particularly problematic in complex matrices and may require collision/reaction cell technology or mathematical corrections for mitigation.

Nonspectroscopic Interferences

Nonspectroscopic interferences, often termed matrix effects, do not create new signals but rather alter the response of analytes [49]. These include:

Sample Transport and Nebulization Effects: Physical properties of the sample (viscosity, surface tension, volatility) that change transport efficiency to the plasma [49].
Ionization Suppression: Easily ionized elements in high concentrations suppress ionization of elements with higher ionization potentials [49].
Space-Charge Effects: High concentrations of high-mass ions repel low-mass ions from the positive ion beam in the ion optic region, preferentially suppressing low-mass ions [49].

In electrospray ionization (ESI)—the most common ionization technique for LC-MS applications—matrix effects primarily occur in the liquid phase during droplet formation and charge transfer processes [47]. Co-eluting matrix components can compete for available charge, alter droplet formation dynamics, or impede the efficient transfer of ions into the gas phase [47].

Strategic Framework for Managing Matrix Effects

Two complementary paradigms exist for addressing matrix effects: compensation and minimization. The choice between these approaches depends on sensitivity requirements, availability of blank matrices, and the specific analytical context [47].

Compensation Approaches

Compensation strategies acknowledge the presence of matrix effects and employ techniques to correct for their influence on quantitative results:

Internal Standardization: The use of isotopically labeled internal standards (identical in chemical behavior to analytes but distinguishable by mass) represents the gold standard for compensation [48] [47]. For NTA where analyte identities are unknown, the Individual Sample-Matched Internal Standard (IS-MIS) strategy has demonstrated superior performance, achieving <20% RSD for 80% of features compared to 70% with conventional internal standard matching [48].
Matrix-Matched Calibration: Preparing calibration standards in a blank matrix that matches the sample composition corrects for consistent matrix effects [47]. This approach requires access to appropriate blank matrices, which may be unavailable for certain sample types.
Standard Addition: Adding known quantities of analytes to the sample itself provides a perfect matrix match but is time-consuming as it requires preparing and measuring multiple spikes for every sample [49].

Minimization Approaches

Minimization strategies aim to reduce the presence of interfering matrix components before they reach the mass spectrometer:

Sample Clean-up and Extraction: Selective extraction techniques, including solid-phase extraction (SPE), liquid-liquid extraction (LLE), and QuEChERS, remove interfering components while preserving analytes of interest [50] [27]. Multi-sorbent SPE strategies (e.g., combining Oasis HLB with ISOLUTE ENV+) provide broader coverage of compound classes [27].
Chromatographic Optimization: Improving separation through ultra-high performance liquid chromatography (UHPLC), optimized gradients, or specialized stationary phases reduces co-elution of analytes and matrix components [50] [51].
Sample Dilution: Diluting samples decreases the concentration of both analytes and matrix components, potentially reducing interferences to acceptable levels while maintaining adequate sensitivity [48] [47].

Table 1: Quantitative Comparison of Matrix Effect Management Strategies

Strategy	Relative Efficiency	Implementation Complexity	Cost Considerations	Best-Suited Applications
Internal Standard (IS-MIS)	High (<20% RSD for 80% of features) [48]	High (requires method development)	Moderate (cost of labeled standards)	Heterogeneous samples, quantitative NTA [48]
Sample Dilution	Variable (median suppression 0-67% at REF 50) [48]	Low (simple to implement)	Low (minimal additional resources)	Initial approach, less complex matrices [48]
Solid-Phase Extraction	Moderate to High (depends on selectivity)	Moderate (method optimization needed)	Moderate (cartridge costs)	Complex matrices, need for pre-concentration [50] [27]
Chromatographic Optimization	Moderate (reduces co-elution)	High (method redevelopment)	High (instrumentation, columns)	All applications, particularly targeted methods [50]

The decision framework for selecting the optimal strategy involves assessing sensitivity requirements and blank matrix availability, as shown below:

Diagram: Decision framework for managing matrix effects based on sensitivity requirements and blank matrix availability [47].

Experimental Protocols for Matrix Effect Assessment

Rigorous assessment of matrix effects is essential during method development and validation. Several established protocols provide qualitative and quantitative evaluation of matrix effects.

Post-Column Infusion Method

This qualitative approach identifies retention time zones most susceptible to ion enhancement or suppression [47].

Protocol:

Inject a blank sample extract through the LC-MS system.
Simultaneously infuse a standard solution of the analyte post-column via a T-piece connection.
Monitor the analyte signal throughout the chromatographic run.
Signal suppression or enhancement in specific regions indicates matrix effects [47].

Considerations: This method provides spatial information about matrix effects but only qualitative results. It is less efficient for highly diluted samples and can be laborious for multi-analyte methods [47].

Post-Extraction Spiking Method

This quantitative method compares analyte response in neat solution to response when spiked into a blank matrix [47].

Protocol:

Prepare a neat standard solution at a known concentration.
Spike the same concentration of analyte into a blank matrix sample after extraction.
Compare the responses of both solutions.
Calculate matrix effect (ME) using the formula: ME (%) = [(Response of post-spiked sample) / (Response of neat standard) - 1] × 100 Negative values indicate suppression; positive values indicate enhancement [47].

Considerations: This method requires access to blank matrix and provides quantitative data at a single concentration level [47].

Slope Ratio Analysis

A semi-quantitative approach that evaluates matrix effects across a concentration range [47].

Protocol:

Prepare matrix-matched calibration standards at multiple concentration levels.
Prepare solvent-based standards at identical concentrations.
Compare the slopes of the two calibration curves.
Calculate the slope ratio to assess the extent of matrix effects [47].

Considerations: This approach provides information across the calibration range but requires more extensive preparation [47].

Advanced Approaches: Machine Learning and Quantitative NTA

Recent advances in machine learning (ML) and quantitative non-targeted analysis (qNTA) are revolutionizing how complex matrices and interferences are managed in HRMS data interpretation.

Machine Learning-Assisted NTA

ML algorithms excel at identifying latent patterns in high-dimensional HRMS data, making them particularly suited for contaminant source identification in complex environmental samples [27]. The systematic workflow for ML-assisted NTA encompasses four key stages:

Sample Treatment and Extraction: Balancing selectivity and sensitivity through techniques such as multi-sorbent SPE and green extraction methods [27].
Data Generation and Acquisition: HRMS platforms (Q-TOF, Orbitrap) generate complex datasets with isotopic patterns, fragmentation signatures, and structural features [27].
ML-Oriented Data Processing: Including data preprocessing (noise filtering, missing value imputation, normalization), dimensionality reduction (PCA, t-SNE), and pattern recognition through supervised/unsupervised learning [27].
Result Validation: A tiered approach incorporating reference material verification, external dataset testing, and environmental plausibility assessments [27].

ML classifiers such as Support Vector Classifier (SVC), Logistic Regression (LR), and Random Forest (RF) have demonstrated balanced accuracy ranging from 85.5% to 99.5% for source identification across different environmental samples [27].

Quantitative Non-Targeted Analysis (qNTA)

Traditional NTA has primarily focused on compound identification, but recent efforts have established frameworks for deriving quantitative estimates from NTA measurements [28]. qNTA bridges the gap between contaminant discovery and risk characterization by providing concentration estimates essential for risk assessment [28]. Key considerations include:

Accounting for estimation uncertainty, particularly regarding experimental recovery rates.
Integrating NTA estimates with available hazard metrics for provisional safety evaluations.
Coupling qNTA data with high-throughput data streams and predictive models to support risk-based decisions [28].

Table 2: Essential Research Reagent Solutions for Managing Matrix Effects

Reagent/ Material	Function	Application Examples	Technical Considerations
Isotopically Labeled Internal Standards	Compensation for analyte loss and matrix effects during sample preparation and analysis	IS-MIS normalization for urban runoff samples [48]	Match chemical properties and retention times with target analytes; limited availability for unknown compounds
Multi-Sorbent SPE Cartridges	Broad-spectrum extraction and clean-up	Oasis HLB + ISOLUTE ENV+ for comprehensive contaminant screening [27]	Different sorbents target specific compound classes; combination provides wider coverage
UHPLC Columns (C18, HILIC)	High-resolution chromatographic separation	BEH C18 column for urban runoff analysis [48]	Sub-2μm particles provide superior separation efficiency; requires high-pressure systems
QuEChERS Kits	Rapid sample preparation and clean-up	Food, environmental, and biological samples [27]	Combines extraction and partitioning salts with dispersive SPE for efficient matrix removal
Matrix-Matched Calibration Standards	Compensation for consistent matrix effects	Pharmaceutical and bioanalytical applications [47]	Requires access to appropriate blank matrices; challenging for unique sample types

Effective management of complex matrices and interferences is fundamental to generating reliable, actionable data from non-target analysis using high-resolution mass spectrometry. A systematic approach—incorporating appropriate assessment methods, strategic application of compensation and minimization techniques, and leveraging advanced computational approaches—enables researchers to overcome the challenges posed by complex sample matrices. As ML-assisted NTA and quantitative frameworks continue to evolve, they promise to further bridge the gap between analytical capability and environmentally meaningful decision-making, ultimately supporting more effective chemical risk assessment and management across pharmaceutical, environmental, and public health domains.

Strategies for Handling False Positives and Annotation Ambiguities

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful approach for detecting unknown and unexpected compounds in complex sample matrices, enabling applications from environmental monitoring to drug discovery [27] [14]. Unlike targeted methods that provide unambiguous results for predefined chemicals, NTA generates information-rich data with inherent uncertainties that complicate interpretation and validation [14]. If an analyst reports that a chemical is present in a sample, it may actually be absent (e.g., an isomer or incorrect identification). Conversely, if a chemical is reported as absent, it may actually be present but not correctly identified during data processing [14]. These fundamental uncertainties create critical challenges for reliable data interpretation, necessitating robust strategies for managing false positives and annotation ambiguities throughout the NTA workflow.

The core of this challenge lies in the analytical gap between detection and confident identification. While HRMS instruments can detect thousands of features in a single sample, compound identification remains a major bottleneck, requiring sophisticated prioritization and validation strategies to focus identification efforts where they matter most [45] [52]. This technical guide examines systematic approaches for handling false positives and annotation ambiguities within the context of NTA-HRMS data interpretation, providing researchers with validated methodologies to enhance data reliability and support confident decision-making.

Foundational Prioritization Strategies for NTA Data

Effective management of false positives begins with strategic prioritization that filters out unreliable signals and focuses attention on the most chemically relevant features. Research by Zweigle et al. (2025) outlines seven complementary prioritization strategies that can be integrated into a comprehensive NTA workflow [45] [52] [9]. These strategies operate at different stages of the analytical process, collectively enabling stepwise reduction from thousands of detected features to a manageable number of high-confidence candidates worthy of further investigation.

Table 1: Seven Core Prioritization Strategies for NTA Workflows

Strategy	Primary Function	Key Techniques	Impact on False Positives
Target & Suspect Screening (P1)	Filters known compounds	Library matching, predefined databases	Reduces false structural assignments
Data Quality Filtering (P2)	Removes analytical artifacts	Blank subtraction, replicate consistency, peak shape assessment	Eliminates instrument-derived false signals
Chemistry-Driven Prioritization (P3)	Identifies specific compound classes	Mass defect filtering, homologue series, halogenation patterns	Focuses on chemically plausible features
Process-Driven Prioritization (P4)	Highlights process-relevant features	Spatial/temporal comparisons, correlation analysis	Identifies environmentally relevant compounds
Effect-Directed Prioritization (P5)	Links features to biological activity	Bioassay integration, virtual EDA (vEDA)	Prioritizes toxicologically significant compounds
Prediction-Based Prioritization (P6)	Estimates risk and concentration	MS2Quant, MS2Tox, QSPR models	Ranks by potential environmental impact
Pixel/Tile-Based Analysis (P7)	Localizes regions of interest	Chrom. image analysis, variance mapping	Identifies significant regions before peak detection

The power of these prioritization strategies emerges from their integration rather than individual application. For example, an initial dataset containing thousands of features might be reduced through P1 (target and suspect screening) to several hundred candidates. Application of P2 (data quality filtering) then removes artifacts and unreliable signals, while P3 (chemistry-driven prioritization) focuses attention on compound classes of specific interest, such as halogenated substances [45] [9]. Subsequent application of process-driven (P4) and effect-directed (P5) prioritization can further refine the list to dozens of features linked to specific environmental processes or biological effects. Finally, prediction-based prioritization (P6) enables risk-based ranking, resulting in a focused shortlist of less than ten high-priority compounds deserving of comprehensive identification and confirmation [9].

Computational and Annotation Strategies

Advanced Spectral Annotation and Matching

Mass spectral matching forms the foundation of compound annotation in NTA workflows, but traditional approaches suffer from significant limitations in accuracy and coverage. While library matching can identify compounds using authentic reference standards (Metabolomics Standards Initiative [MSI] level 1 identification), approximately 95% of measured spectra lack corresponding reference entries in databases, creating substantial annotation gaps [53]. This limitation has spurred development of more flexible spectral matching approaches that can tolerate analytical differences between experimental conditions and reference databases, though these introduce new challenges in distinguishing correct from incorrect matches [53].

Molecular networking and network annotation propagation (NAP) represent significant advances in computational metabolomics that address annotation ambiguities by grouping molecules of likely high chemical similarity based on their MS/MS spectra [53]. This approach allows propagation of chemical identities from confidently annotated molecules to structurally related unknowns within the same spectral network, effectively expanding annotation coverage across chemically related compound families. As noted in recent reviews, "identified (or annotated) molecules allow the propagation and use of this chemical identity to improve the annotation of other unidentified or unannotated members of this metabolite group or molecular family" [53]. This strategy is particularly valuable for characterizing novel transformation products or metabolic derivatives that share core structural elements with known compounds.

Machine Learning and Prediction-Based Approaches

Machine learning (ML) approaches are redefining the potential of NTA by identifying latent patterns within high-dimensional data that traditional statistical methods often miss. ML classifiers such as Support Vector Classifier (SVC), Logistic Regression (LR), and Random Forest (RF) have demonstrated impressive performance in source tracking applications, with balanced accuracy ranging from 85.5% to 99.5% for classifying per- and polyfluoroalkyl substances (PFAS) across different contamination sources [27]. These pattern recognition capabilities make ML particularly valuable for discriminating between true signals and false positives based on subtle spectral and chromatographic features that may not be apparent through manual inspection.

Prediction-based prioritization represents another powerful approach for managing annotation uncertainties, using quantitative structure-property relationships (QSPR) and machine learning models to estimate risk parameters even when complete structural identification remains unresolved [45] [9]. Tools such as MS2Quant predict concentrations directly from MS/MS spectra, while MS2Tox estimates toxicity parameters (e.g., LC50) from fragmentation patterns [9]. These predictive approaches enable calculation of risk quotients (PEC/PNEC - Predicted Environmental Concentration vs. Predicted No Effect Concentration), providing a scientifically defensible basis for prioritizing features with potential high environmental or health impacts despite incomplete identification [9].

Integrated Computational Workflow

The integration of diverse computational strategies into a cohesive workflow significantly enhances the reliability of NTA annotations. The INTERPRET NTA platform developed by the US Environmental Protection Agency exemplifies this integrated approach, combining chemical metadata from the AMOS database, predicted spectra for approximately 1.2 million chemical substances from the DSSTox database, and hazard values from the Cheminformatics Hazard Module (CHM) to support defensible review and interpretation of NTA results [8]. Validation studies demonstrated that known chemicals showed higher values for metadata, MS2, and hazard scores in 99.0%, 80.5%, and 92.0% of cases, respectively, compared to false positives, providing multiple orthogonal metrics for distinguishing reliable annotations from ambiguous assignments [8].

Diagram 1: Integrated computational workflow for managing false positives and annotation ambiguities in NTA studies. The workflow progresses through three major phases, with iterative refinement mechanisms (dashed lines) to improve annotation confidence.

Experimental Protocols for False Positive Reduction

Cross-Hit Analysis for Background Interference Reduction

Background interference represents a significant source of false positives in mass spectrometry-based screening, particularly in high-throughput applications. Traditional approaches that limit background analysis to a narrow window of adjacent samples (e.g., "5 wells before and after" in plate-based assays) may miss distributed interference patterns, leading to false positive identification. A case study using a 250,000-compound library across 96 wells demonstrated that expanding the cross-hit analysis window to encompass the entire plate reduced confirmed hits by 33%, indicating that nearly one-third of initial "hits" were actually false positives resulting from recurring background noise [54].

Table 2: Experimental Protocol for Comprehensive Cross-Hit Analysis

Step	Procedure	Parameters	Quality Control
1. Sample Preparation	Distribute samples across plate wells with randomized controls	96- or 384-well format, randomized control placement	Document plate layout and control positions
2. Data Acquisition	Acquire MS data using standardized instrument methods	Consistent ionization settings across all wells	Include system suitability tests
3. Cross-Hit Analysis	Perform experiment-wide background interference assessment	Expand search window to entire plate, not just adjacent wells	Automated analysis using tools like Virscidian's Analytical Studio
4. Hit Confirmation	Apply consistent thresholding across all samples	Statistical significance above plate-wide background	Manual review of ambiguous signals
5. Validation	Confirm true hits with orthogonal techniques	Retention time alignment, MS/MS fragmentation	Compare to reference standards when available

This protocol emphasizes the critical importance of plate-wide background assessment rather than localized interference evaluation. Automated cross-hit analysis tools dramatically improve reliability by detecting recurring background peaks and applying consistent thresholds objectively across the entire dataset [54]. Implementation of this expanded search window approach conserves valuable resources by reducing false leads and increasing confidence in final hit lists.

Molecular Ion Targeting for Structurally Similar Compounds

Structurally similar compounds with nearly identical fragmentation patterns present particular challenges for accurate identification, as traditional spectral matching algorithms may struggle to distinguish between closely related analogs. This problem is especially pronounced in forensic and pharmaceutical applications where families of compounds share core structural elements. For example, in samples containing only MDMA, it is common for MDA to be falsely reported as present because both molecules fragment to produce similar core structures with nearly identical higher energy fragmentation profiles [55].

The experimental protocol for addressing this specific false positive mechanism involves targeted verification of molecular ions alongside conventional spectral matching. Research demonstrates that while MDA and MDMA produce nearly identical fragment profiles at higher cone voltages (e.g., 35V), they are clearly distinguished by their molecular ions in lower energy function (m/z 180 for MDA vs. m/z 194 for MDMA) [55]. By implementing a molecular ion target filter that gates identification based on the presence of the expected molecular ion, the false positive rate for structurally similar analogs can be significantly reduced.

Implementation Protocol:

Acquire multi-function MS data using increasing collision energies (e.g., 15V, 25V, 35V, 50V) to capture both molecular ions and fragment profiles
Perform conventional spectral matching across all acquisition functions to identify potential compounds
Apply molecular ion verification by confirming the presence of expected molecular ions in the lowest energy function
Utilize statistical validation through techniques like Principal Component Analysis (PCA) to visualize separation based on molecular ion differences
Implement target ion filtering to automatically reject identifications lacking the appropriate molecular ion confirmation

This approach was experimentally validated using mixtures of MDA and MDMA across concentration ranges from 100% MDA to 100% MDMA. While standard non-target processing produced false positives for MDA in MDMA-only samples, implementation of the molecular ion target method eliminated these false identifications while correctly confirming the presence of MDA when the m/z 180 mass was detected [55].

Tiered Validation Strategy for NTA Workflows

A comprehensive, tiered validation strategy is essential for establishing confidence in NTA results, particularly when supporting regulatory decisions or health-related assessments. This approach integrates multiple orthogonal validation measures throughout the analytical workflow, progressing from basic analytical confirmation to environmental plausibility assessments [27].

Table 3: Tiered Validation Protocol for NTA Studies

Validation Tier	Assessment Methods	Acceptance Criteria	Documentation
Analytical Confidence	Certified reference materials (CRMs), spectral library matches, retention time prediction	MSI level 1-3 identification based on available standards	Spectral similarity scores, retention time deviations
Model Performance	Cross-validation (e.g., 10-fold), external dataset testing, balanced accuracy metrics	Accuracy >80%, precision appropriate to application	Confusion matrices, performance metrics, overfitting assessment
Environmental Plausibility	Geospatial correlation, known source signatures, chemical fate principles	Consistent with known transport/transformation pathways	Correlation with contextual data, source-receptor relationships

This tiered approach bridges analytical rigor with real-world relevance, ensuring results are both chemically accurate and environmentally meaningful [27]. For ML-based NTA applications, particular emphasis should be placed on model interpretability, with strategies such as feature importance analysis and rational attribution provided to overcome the "black-box" limitations of complex algorithms like deep neural networks [27].

Successful implementation of false positive reduction strategies requires access to specialized computational tools, databases, and analytical resources. The following table summarizes key resources mentioned in the literature that support various aspects of NTA workflow optimization and validation.

Table 4: Essential Research Reagents and Computational Resources for NTA

Resource Category	Specific Tools/Databases	Primary Function	Application in False Positive Reduction
Spectral Databases	NORMAN Suspect List Exchange, US EPA AMOS, PubChemLite	Reference spectra for suspect screening	Provides validated benchmarks for annotation
Chemical Structure Databases	US EPA DSSTox (~1.2 million substances)	Chemical structures and properties	Supports structure-based prediction and prioritization
Hazard Assessment Tools	US EPA Cheminformatics Hazard Module (CHM)	Hazard value calculation	Enables risk-based prioritization of features
Data Processing Platforms	INTERPRET NTA, XCMS, Analytical Studio	Automated data processing and analysis	Reduces manual review errors and subjective bias
Prediction Tools	MS2Quant, MS2Tox	Concentration and toxicity prediction	Supports risk-based prioritization without full identification
Statistical Analysis Software	AnalyzerPro XD, various R/Python packages	Multivariate statistical analysis	Identifies patterns distinguishing true signals from artifacts

These resources collectively enable the implementation of the integrated strategies discussed throughout this guide. Platforms such as INTERPRET NTA are particularly valuable as they combine multiple functionalities—accessing chemical metadata from AMOS, retrieving predicted spectra from DSSTox, and obtaining hazard values from CHM—within a unified interface that supports defensible review and reporting of NTA results [8].

The expanding chemical landscape facing environmental and pharmaceutical researchers necessitates robust, systematic approaches for managing the uncertainties inherent in non-targeted analysis. By implementing the integrated prioritization strategies, computational workflows, and experimental protocols outlined in this technical guide, researchers can significantly enhance the reliability of their NTA results while efficiently focusing resources on the most chemically and toxicologically significant findings. The continued development and standardization of these approaches, particularly through improved benchmarking of computational tools and validation of machine learning applications, will further strengthen the translation of NTA data from exploratory research to actionable environmental and health decisions [53] [27] [14]. As the field progresses, emphasis should remain on creating transparent, defensible workflows that explicitly address uncertainty quantification and provide stakeholders with clear understanding of result limitations and appropriate applications.

Critical QA/QC Measures Throughout the NTA Workflow

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) represents a powerful, discovery-focused approach for characterizing unknown chemicals in complex samples without a priori knowledge of the sample's chemical content [3]. Unlike traditional targeted methods, NTA aims to capture a broader chemical space, making robust quality assurance and quality control (QA/QC) measures imperative to ensure data quality and consistency [56]. The fundamental challenge in NTA lies in minimizing the risk of losing potential substances of interest (false negatives) while maintaining confidence in identified compounds [56]. Given the relative novelty of comprehensive NTA workflows and the absence of universal benchmarks for "good" performance, researchers must implement specialized QA/QC approaches throughout the entire analytical process [44] [57].

The quality assurance framework for NTA encompasses all practices, benchmarks, and assessments that ensure the reliability of non-targeted analysis [44]. This includes defining the chemical space (the physicochemical property space spanned by detectable and identifiable chemicals), assessing accuracy (closeness of agreement between results and known true values), and determining precision (closeness of agreement between replicated results) [44] [3]. This technical guide details the critical QA/QC measures required throughout the NTA workflow, providing researchers and drug development professionals with a structured approach to quality management in non-targeted applications.

QA/QC Framework and Core Concepts

A systematic QA/QC framework for NTA evaluates performance across two primary domains: data acquisition and data processing/analysis [44]. Within these domains, four key aspects should be assessed: quality, boundary, accuracy, and precision. The following table defines these core concepts as adapted from IUPAC 2019 guidelines for NTA performance assessment [44].

Table 1: Core QA/QC Performance Aspects in Non-Targeted Analysis

Performance Aspect	Definition in NTA Context	Key Assessment Questions
Quality	The QA/QC practices, benchmarks, and assessments for the non-targeted analysis, including adherence to protocols and QC benchmarks [44].	Were stated QA/QC protocols followed? Were deviations documented and their implications discussed? [44]
Boundary	Describes the chemical and analytical space of the non-targeted analysis, including limitations of sample prep, instrumentation, and data processing [44].	Did authors discuss how methods impacted the observable chemical space (e.g., extraction recoveries, ionization efficiency)? [44]
Accuracy	The closeness of agreement between NTA results (e.g., mass error, identification) and known true values [44].	Did authors report method performance in correctly classifying samples or identifying known chemicals? [44]
Precision	The closeness of agreement between results when components of the experiment are replicated, including repeatability and reproducibility [44].	Did authors communicate the repeatability/reproducibility of key measures across replicates, samples, or batches? [44]

QA/QC in Sample Preparation and Study Design

Defining Study Objectives and Chemical Space

The foundation of effective QA/QC begins with careful study design that clearly defines objectives and scope with respect to targeted analysis, suspect screening analysis (SSA), and true NTA [3]. Suspect screening analysis identifies chemicals by comparison to a predefined list or library, thereby narrowing the study's scope, while true NTA attempts characterization without a predefined library [3]. The chemical space—the physicochemical property space spanned by detectable and identifiable chemicals—is fundamentally shaped by methodological choices made during study design [3] [19]. Key analytical considerations that influence the detectable chemical space include: (1) sample matrix type, (2) extraction solvent and pH, (3) extraction/cleanup media, (4) elution buffers, (5) instrument platform, (6) chromatography conditions, (7) ionization type, and (8) ionization mode [19].

Sample Preparation QA/QC Protocols

Sample preparation requires careful optimization to balance selectivity and sensitivity, aiming to remove interfering components while preserving as many compounds as possible with adequate sensitivity [27]. The following protocols are essential for QA/QC during sample preparation:

Use of Balanced Extraction Techniques: Implement purification techniques such as solid-phase extraction (SPE), with broader-range extractions achieved using multi-sorbent strategies (e.g., combining Oasis HLB with ISOLUTE ENV+, Strata WAX, and WCX) [27]. Green techniques like QuEChERS, microwave-assisted extraction (MAE), and supercritical fluid extraction (SFE) can improve efficiency while maintaining comprehensive analyte recovery [27].
Quality Control Spikes and Samples: Incorporate QC spikes and samples during study design to enable subsequent performance assessments [44] [3]. This includes analyzing standardized samples with known chemical compositions to evaluate extraction efficiency and method performance [44].
Blank Analysis: Implement procedural blanks to assess carryover from samples to blanks and identify potential contamination sources [44]. Features detected in blanks should be filtered from sample data to avoid false positives [23].
Replication and Randomization: Adhere to predefined randomization and replication protocols to account for analytical variability and batch effects [44]. This includes assessing analytical batch or other time/storage-related effects on sample integrity [44].

QA/QC During Data Acquisition

Data acquisition QA/QC covers performance with respect to objectives & scope, sample information & preparation, chromatography, and mass spectrometry [44]. Assessments typically rely on data from QC spikes and samples, though certain aspects can also be evaluated with real samples containing unknown chemical constituents [44].

Table 2: Data Acquisition QA/QC Measures and Assessment Protocols

Analytical Domain	QA/QC Measures	Recommended Assessment Protocols
Chromatography	Retention time stability, separation efficiency, polarity range of detected compounds [44].	Monitor deviation of retention time (RT) from expected RT for QC spikes; assess separation of isomeric compounds of interest [44].
Mass Spectrometry	Mass accuracy, observed matrix effects, ionization efficiency, mass error range [44].	Evaluate observed mass error range for QC spikes; list monoisotopic masses of known chemicals and describe deviations of observed accurate masses from known values [44].
System Performance	Signal intensity stability, carryover assessment, instrument detection limits [44] [56].	Analyze QC reference materials at regular intervals; monitor signal drift over time; implement cleaning procedures to minimize carryover [44] [56].
Overall Precision	Variability (repeatability/reproducibility) across replicates, samples, or batches [44].	Calculate relative standard deviation (RSD), standard deviation (SD), or coefficient of variation (CV) for mass error, RT, and peak intensity across replicate analyses [44].

The data acquisition process for HRMS-based NTA typically uses information-dependent acquisition (IDA) or data-dependent acquisition (DDA), where the mass analyzer performs accurate mass scans of precursor ions and selects the most abundant ions for successive MS/MS analysis [58]. This cycle repeats throughout the chromatographic run, generating data files containing all precursor ion scans and dependent product ion scans for subsequent analysis [58].

Diagram 1: Comprehensive QA/QC Framework for NTA Workflow. This diagram illustrates the integrated quality assurance and quality control measures throughout the non-targeted analysis pipeline, from sample preparation to final validation.

QA/QC for Data Processing and Analysis

Data processing and analysis QA/QC covers performance with respect to data processing, statistical & chemometric analysis, and annotation & identification [23]. These assessments rely on data from QC spikes and samples, and in some cases, can be evaluated with real samples containing unknown chemical constituents [44].

Data Processing QA/QC

Data processing transforms raw data into meaningful information through steps that intentionally reduce data complexity [23]. Key QA/QC measures for data processing include:

Feature Extraction and Alignment: Utilize open-source platforms (e.g., XCMS, MZmine, SIRIUS, MS-DIAL, enviMass, PatRoon) or vendor software (e.g., Thermo Compound Discoverer, Agilent MassHunter) with parameters optimized for environmental samples rather than -omics data [59]. Implement retention time correction, mass-to-charge ratio (m/z) recalibration, and peak matching algorithms to align identical chemical features across different batches [27].
Componentization: Group associated MS1 components (isotopologues, adducts) into features represented as tensors of observed retention time, monoisotopic mass, and intensity [23]. Apply shoulder peaks filtering for Fourier transform MS data to remove noise signals [23].
Feature Filtering: Implement replicate filters to evaluate feature detection frequency across analytical or extraction replicates, setting non-detects to zero abundance or a minimum threshold for features that don't meet replicate frequency thresholds [23]. Apply abundance thresholding and blank comparison to remove artifacts [23].

Statistical and Chemometric Analysis QA/QC

Statistical and chemometric analyses aid summarization, evaluation, and interpretation of processed data [23]. QA/QC measures include:

Data Preprocessing: Address data quality through noise filtering, missing value imputation (e.g., k-nearest neighbors), and normalization (e.g., total ion current normalization) to mitigate batch effects [27].
Dimensionality Reduction and Pattern Recognition: Apply principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and hierarchical cluster analysis (HCA) to simplify high-dimensional data and group samples by chemical similarity [27] [59].
Machine Learning Validation: When using supervised ML models (e.g., Random Forest, Support Vector Classifier), implement cross-validation techniques (e.g., 10-fold) and validate classifiers on independent external datasets to evaluate overfitting risks [27].

Annotation and Identification QA/QC

Annotation attributes properties or molecular characteristics to MS1 features or MS/MS product ions, while identification provides enough evidence to attribute a specific compound to a detected feature [23]. QA/QC measures include:

Confidence Level Assignment: Implement confidence-level assignments (Level 1-5) for identifications, with Level 1 representing confirmed structure with authentic standard and Level 5 representing exact mass of interest without structural information [27] [19].
Confusion Matrix Analysis: Use confusion matrices to evaluate method performance in correctly classifying samples or identifying known chemicals, calculating performance measures such as accuracy, precision, recall, and F1-score [44].
Database and Library Management: Document the chemical space (e.g., Kow, ionizability) and information available (e.g., MS/MS spectra) in selected libraries/databases, acknowledging constraints to chemical space introduced by data analysis approaches [44].

Table 3: Data Processing and Analysis QA/QC Performance Assessment

Processing Stage	QA/QC Assessment	Performance Metrics
Data Processing Quality	Results of QC checks throughout data processing workflow [44].	Detection of QC features/compounds; alignment of features across technical replicates; filtering of blank compounds [44].
Data Analysis Boundary	Description of capabilities of data processing and analysis methods [44].	Chemical space of selected library/database; information available in libraries; estimated limits of detection/identification [44].
Annotation Accuracy	Ability to correctly classify samples or identify known chemicals [44].	Performance calculations from confusion matrix for samples with known classification or known compounds in QC spikes [44].
Identification Precision	Repeatability/reproducibility of performance for QC samples analyzed multiple times [44].	Consistency of correct identification across replicates, samples, or batches; performance measures from confusion matrix [44].

Validation and Reporting Frameworks

Tiered Validation Strategy

A robust, tiered validation strategy ensures the reliability of NTA outputs through multiple verification layers [27]:

Analytical Confidence Verification: Use certified reference materials (CRMs) or spectral library matches to confirm compound identities, applying confidence-level assignments (Level 1-5) [27].
Model Generalizability Assessment: Validate classifiers on independent external datasets, complemented by cross-validation techniques to evaluate overfitting risks [27].
Environmental Plausibility Checks: Correlate model predictions with contextual data, such as geospatial proximity to emission sources or known source-specific chemical markers [27].

Comprehensive Reporting Standards

The NTA Study Reporting Tool (SRT) provides an interdisciplinary framework for comprehensive methods and results reporting, organized by study chronology with scored sections based on reporting quality [57]. Key reporting elements include:

Transparent Methodology: Report all software programs/algorithms, method steps, thresholds, and settings used in the data processing workflow [23]. Document data format conversion steps and any potential data losses [23].
QA/QC Metric Documentation: Communicate adherence to or deviation from stated QA/QC protocols and benchmarks, discussing implications of any deviations [44].
Chemical Space Description: Discuss impacts of sample preparation, chromatographic, and mass spectrometry methods on the observable chemical space, reporting observed limits of detection/identification where assessed [44].

Diagram 2: Tiered Validation Strategy for NTA Results. This diagram outlines the multi-layered approach required to validate non-targeted analysis outputs, incorporating analytical, statistical, and environmental plausibility assessments.

Essential Research Reagents and Software Tools

Implementation of robust QA/QC measures requires specific research reagents and software tools. The following table details essential resources for NTA workflows.

Table 4: Essential Research Reagent Solutions for NTA QA/QC

Reagent/Software Category	Specific Examples	Function in QA/QC
QC Spikes & Reference Materials	Certified Reference Materials (CRMs), Isotope-labeled internal standards, Performance evaluation standards [44] [27]	Verify analytical accuracy, monitor instrument performance, assess matrix effects, enable quantification [44] [27]
Sample Preparation Media	Multi-sorbent SPE cartridges (Oasis HLB, ISOLUTE ENV+, Strata WAX/WCX), QuEChERS kits [27]	Ensure comprehensive analyte recovery, minimize matrix interference, maintain broad chemical coverage [27]
Data Processing Software	Open-source: XCMS, MZmine, SIRIUS, MS-DIAL, PatRoon, InSpectra [59]	Perform feature detection, alignment, and annotation; enable transparent and customizable processing workflows [59]
Commercial Data Analysis Platforms	Thermo Compound Discoverer, Agilent MassHunter [19] [59]	Provide integrated workflows for data processing, statistical analysis, and database searching [19] [59]
Spectral Libraries & Databases	NIST Mass Spectral Library, mzCloud, MassBank, in-house MS/MS databases [19] [59]	Enable compound identification and confirmation through spectral matching [19] [59]

Implementing critical QA/QC measures throughout the NTA workflow is essential for producing reliable, reproducible results in non-targeted analysis using high-resolution mass spectrometry. While universal benchmarks for NTA performance remain elusive, researchers should always conduct performance self-assessments and transparently report findings using shared terminology [44]. The integrated framework presented in this guide—spanning study design, sample preparation, data acquisition, data processing, and validation—provides a structured approach to quality management in NTA studies. As the field continues to evolve, widespread adoption of comprehensive QA/QC protocols and reporting standards will enhance the scientific rigor necessary for utilizing NTA study data in regulatory decision-making and risk assessment contexts [57]. Future advancements will likely focus on establishing more harmonized guidelines, improving QA/QC measures for quantitative NTA, and integrating artificial intelligence/machine learning tools into quality assessment workflows [59].

Performance Assessment Metrics for Qualitative and Quantitative NTA

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful approach for comprehensively characterizing unknown and unexpected chemicals in complex samples. Unlike targeted methods that focus on predefined analytes, NTA aims to detect and identify a broad range of chemical compounds without prior knowledge of sample composition [5]. This capability makes NTA particularly valuable for discovering emerging contaminants, characterizing complex mixtures, and identifying chemical signatures in environmental, biological, and product samples. However, the very nature of NTA—its openness to detecting the "unknown"—presents significant challenges for assessing and communicating method performance [5].

The establishment of robust performance assessment metrics is critical for advancing NTA from a research technique to a reliable analytical approach that can support regulatory decisions and risk assessments. Without standardized performance metrics, it remains difficult to compare results across different laboratories, instruments, and methods, or to determine whether NTA data are fit for specific purposes [33] [5]. Performance assessment in NTA must address both qualitative aspects (chemical identification and sample classification) and quantitative aspects (concentration estimation), each with distinct challenges and requirements for metrics [44] [5]. This technical guide provides a comprehensive overview of current frameworks, metrics, and experimental approaches for assessing both qualitative and quantitative NTA performance, contextualized within the broader field of HRMS data interpretation research.

Qualitative NTA Performance Metrics

Confidence in Chemical Identification

Qualitative NTA performance primarily concerns the accuracy and reliability of chemical identifications. Unlike targeted analysis where identifications are confirmed using reference standards, NTA often relies on tiered confidence levels for reporting identifications when authentic standards are unavailable. The community has established a framework that classifies identifications into five confidence levels [27]:

Level 1: Confirmed Structure - Verified by reference standard
Level 2: Probable Structure - Evidence from library spectrum match
Level 3: Tentative Candidate - Diagnostic evidence for chemical class
Level 4: Unequivocal Molecular Formula - Formula supported by diagnostic evidence
Level 5: Exact Mass - m/z value only

The distribution of identifications across these confidence levels serves as a key qualitative performance metric, with higher proportions of Level 1-2 identifications indicating better performance.

Performance Assessment Using Confusion Matrix

For qualitative NTA studies focused on sample classification or chemical detection, performance can be assessed using a confusion matrix approach [44] [5]. This method evaluates a method's ability to correctly classify samples or identify known chemicals in quality control samples. The confusion matrix enables calculation of several key performance metrics:

Table 1: Performance Metrics Derived from Confusion Matrix Analysis

Metric	Calculation	Interpretation
Accuracy	(TP + TN) / (TP + TN + FP + FN)	Overall correctness of classifications/identifications
Precision	TP / (TP + FP)	Reliability of positive findings
Recall (Sensitivity)	TP / (TP + FN)	Ability to detect true positives
Specificity	TN / (TN + FP)	Ability to exclude true negatives
F1-Score	2 × (Precision × Recall) / (Precision + Recall)	Balanced measure of precision and recall

TP = True Positive; TN = True Negative; FP = False Positive; FN = False Negative

These metrics are particularly valuable for assessing performance in studies involving sample classification (e.g., distinguishing contaminated vs. clean samples) or for evaluating detection capabilities using spiked quality control samples with known chemical composition [5].

Data Quality Metrics

Assessment of data acquisition quality provides fundamental metrics for qualitative NTA performance. These metrics evaluate the technical performance of the instrumental analysis and help identify potential issues that could affect chemical identifications [44]:

Mass Accuracy: Deviation between measured and theoretical m/z values, typically reported in parts per million (ppm)
Retention Time Stability: Precision of retention time measurements across replicates and batches
Peak Intensity Precision: Repeatability of peak area or height measurements
Carryover Assessment: Evaluation of contamination between samples
Blank Contamination: Assessment of interfering compounds present in procedural blanks

These metrics are typically evaluated using quality control samples analyzed throughout the analytical sequence and compared against pre-established benchmarks where available [44].

Quantitative NTA (qNTA) Performance Metrics

Framework for qNTA Performance Assessment

Quantitative non-targeted analysis (qNTA) aims to provide concentration estimates for chemicals detected in NTA without requiring compound-specific calibration [60]. This represents a significant advancement beyond purely qualitative applications, but introduces additional challenges for performance assessment. A recently proposed framework for qNTA performance evaluation focuses on three key aspects: accuracy, uncertainty, and reliability [60].

Table 2: Core Performance Metrics for Quantitative NTA

Performance Aspect	Definition	Calculation Approach
Accuracy	Closeness of agreement between estimated and true concentration	Ratio of predicted to true concentration or relative error
Uncertainty	Range within which the true value is expected to lie	95% inverse confidence intervals from bootstrap approaches
Reliability	Proportion of cases where confidence intervals contain true values	Percentage of predictions where true concentration falls within confidence bounds

This framework recognizes that qNTA approaches inherently exhibit higher uncertainty compared to targeted methods, and provides standardized metrics for communicating this uncertainty to stakeholders [60].

Performance Comparison Between Targeted and qNTA Approaches

Research comparing quantitative performance across methodological approaches reveals important patterns in qNTA capabilities. A recent study examining PFAS quantification found that the most generalizable qNTA approach (using "global" surrogates) showed decreased accuracy by a factor of ~4, increased uncertainty by a factor of ~1000, and decreased reliability by ~5% on average compared to a benchmark targeted approach using matched calibration curves and internal standard correction [60].

These performance differences highlight both the current limitations and potential utility of qNTA. While qNTA cannot match the performance of optimized targeted methods for specific compounds, it provides valuable semi-quantitative estimates that support preliminary risk assessments and prioritization decisions, particularly for chemicals lacking reference standards [60].

Surrogate Selection Strategies for qNTA

The accuracy of qNTA concentration estimates depends heavily on the selection of appropriate calibration surrogates. Different surrogate selection strategies yield different performance characteristics [60]:

Global Surrogates: Using all available calibration chemicals; provides greater generality but higher uncertainty
Expert-Selected Surrogates: Using a limited set of surrogates chosen based on chemical similarity; can improve accuracy but may reduce reliability
Model-Based Prediction: Using predicted ionization efficiency based on chemical structure; requires validation with representative surrogates

The choice among these approaches involves trade-offs between accuracy, uncertainty, and applicability domains that must be considered when designing qNTA studies and interpreting their results [60].

Experimental Protocols for Performance Assessment

QC Spike Preparation and Analysis

A fundamental approach for assessing both qualitative and quantitative NTA performance involves analysis of quality control (QC) samples containing known chemicals at known concentrations [44] [5]. These samples are typically analyzed at regular intervals throughout the analytical sequence to monitor performance over time. The recommended protocol includes:

Select a diverse set of reference compounds representing different chemical classes, physicochemical properties, and concentration levels relevant to the study
Prepare QC samples in a clean matrix similar to study samples, with multiple concentration levels spanning the expected range
Analyze QC samples at the beginning of the sequence and after every 5-10 study samples
Include method blanks to assess background contamination and detection limits
Calculate performance metrics for each QC compound, including detection frequency, mass error, retention time stability, and quantitative accuracy (if applicable)

This approach provides direct assessment of method capabilities and limitations for specific chemical classes and concentration ranges [44].

Interlaboratory Comparison Studies

Interlaboratory comparisons provide critical data on reproducibility and standardization of NTA methods [61]. While formal interlaboratory studies for NTA are still emerging, existing models from related fields provide guidance:

Distribute standardized samples to multiple participating laboratories
Provide core protocols while allowing laboratories to also apply their own methods
Collect and analyze data using standardized metrics and reporting formats
Evaluate both consensus results and between-laboratory variability

Such studies help identify major sources of variability and establish performance benchmarks for the community [61].

Data Processing and Analysis QA/QC

Performance assessment must extend to data processing and analysis steps, which can introduce significant variability in NTA results [44]. Recommended approaches include:

Process QC samples alongside study samples using identical parameters
Evaluate feature detection sensitivity using known spiked compounds
Assess alignment accuracy for retention time and m/z across samples
Verify compound identification against expected results in QC samples
Document all processing parameters and software versions

These measures help ensure that data processing steps do not inadvertently introduce errors or biases that affect study conclusions [44].

Workflow Visualization for NTA Performance Assessment

NTA Performance Assessment Workflow

This workflow illustrates the integrated approach to assessing both qualitative and quantitative NTA performance, highlighting the key metrics at each stage and their relationship to overall study objectives.

qNTA Experimental Design and Metrics

qNTA Methodology and Performance

This diagram illustrates the experimental approaches for quantitative NTA and typical performance outcomes relative to targeted analysis, highlighting the trade-offs between different surrogate selection strategies.

Essential Research Reagent Solutions

Table 3: Key Research Reagents and Materials for NTA Performance Assessment

Reagent/Material	Function in Performance Assessment	Application Examples
QC Standard Mixtures	Benchmarking detection capabilities and quantitative performance	Community-defined mixtures for interlaboratory comparisons [33]
Stable Isotope-Labeled Standards	Internal standards for quantitative accuracy assessment	Isotope dilution methods for qNTA [60]
Matrix-Matched Calibrants	Evaluation of matrix effects on qualitative and quantitative performance	Studying quantitative bias in different environmental matrices [33]
Reference Materials	Ground truth for method validation and accuracy assessment	Certified Reference Materials (CRMs) for specific sample types [27]
Retention Time Index Standards	Chromatographic performance monitoring and alignment	Homologous series of compounds for retention time calibration
Ionization Efficiency Standards	Calibration of quantitative response factors	Compounds for predicting ionization efficiency in qNTA [60]

These research reagents form the foundation for systematic assessment of NTA method performance, enabling laboratories to benchmark their capabilities and identify areas for improvement.

The establishment of robust performance assessment metrics for both qualitative and quantitative NTA represents a critical step toward broader adoption and application of these powerful techniques. While significant progress has been made in developing frameworks and metrics, particularly through community-driven initiatives such as the BP4NTA working group, challenges remain in standardizing approaches and establishing universal benchmarks [33] [44]. The metrics and experimental protocols outlined in this guide provide a foundation for researchers to systematically evaluate their NTA methods, communicate performance limitations transparently, and work toward generating comparable, reliable data across laboratories and applications. As the field continues to evolve, further refinement of these metrics—particularly for quantitative applications—will enhance the utility of NTA for environmental monitoring, exposure assessment, and regulatory decision-making.

Ensuring Confidence: Validation Frameworks and Comparative Analysis

Systematic Validation Strategies for NTA Findings

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful approach for detecting and identifying unknown and unexpected compounds across diverse sample matrices, including environmental, biological, and food samples [14] [1]. Unlike targeted analytical methods with well-established performance criteria, NTA generates information-rich data with inherent uncertainties that complicate performance assessment and interpretation [14]. The absence of standardized validation procedures has significantly limited the adoption of NTA data by stakeholders and regulatory bodies [14] [62]. Systematic validation strategies are therefore essential to establish confidence in NTA findings and enable their effective utilization in chemical risk assessment, exposure science, and drug development [62].

The fundamental challenge in NTA validation stems from the core difference between targeted and non-targeted approaches. In targeted analysis, performance metrics for selectivity, sensitivity, accuracy, and precision are well-defined, with results considered unambiguously true within defined tolerances [14]. In contrast, NTA data are inherently less certain: reported compound identifications may be incorrect, absent compounds may be missed, and quantitative estimates may lack confidence intervals [14]. This technical guide outlines comprehensive validation strategies to address these challenges, providing researchers with structured approaches to demonstrate the reliability of their NTA findings for scientific and regulatory applications.

NTA Study Objectives and Performance Metrics

Categorizing NTA Study Objectives

Validation approaches for NTA must be aligned with study objectives, as different goals require distinct performance assessments. Research indicates that most NTA projects fall into three primary categories [14]:

Table 1: NTA Study Objectives and Corresponding Validation Focus Areas

Study Objective	Primary Output	Key Validation Metrics
Sample Classification	Pattern recognition and group differentiation	Confusion matrix statistics, model repeatability, transferability between instruments
Chemical Identification	Compound annotation and structural elucidation	Confidence levels, identification error rates, spectral matching scores
Chemical Quantitation	Concentration estimates	Accuracy, precision, linearity, detection limits

For sample classification, the focus is on correctly categorizing samples based on their chemical profiles, often using multivariate statistical models [14]. Validation here must assess whether classification models are repeatable over time and transferable between instruments or laboratories [14]. For chemical identification, the goal is to accurately annotate and identify unknown compounds, with validation requiring confidence levels and error rate assessments [14]. Chemical quantitation in NTA aims to provide reliable concentration estimates, necessitating traditional analytical validation approaches adapted to NTA's unique challenges [14] [62].

Performance Metrics Framework

A robust validation framework for NTA should incorporate both qualitative and quantitative performance metrics, adapted from targeted analysis but acknowledging the distinct characteristics of NTA [14]. For qualitative studies focusing on sample classification and chemical identification, performance can be assessed using a confusion matrix approach, despite some limitations and challenges [14]. Key metrics include:

Sensitivity: Ability to correctly identify true positives
Specificity: Ability to correctly reject true negatives
Accuracy: Overall correctness of identification or classification
Precision: Consistency across repeated measurements

For quantitative NTA studies, performance assessment should include estimation procedures developed for targeted methods, with consideration for additional sources of uncontrolled experimental error [14]. These include accuracy (closeness to true value), precision (measurement reproducibility), and sensitivity (limit of detection) [14]. The specific combination and application of these metrics depend on the NTA study objectives and the required confidence level for the intended application.

Experimental Design for Validation

Quality Assurance and Quality Control (QA/QC) Measures

Implementing comprehensive QA/QC protocols throughout the NTA workflow is fundamental to generating reliable, validated results. While specific QA/QC approaches should be incorporated throughout any NTA workflow to evaluate specific method steps [14], several key measures are particularly critical for validation:

System Suitability Checks: Regular analysis of reference standards to verify instrument performance and stability throughout the analytical sequence [63].
Blank Controls: Inclusion of method blanks (solvent without sample) to identify and correct for contamination background [23].
Quality Control Samples: Pooled quality control samples analyzed at regular intervals to monitor system stability and performance drift [23].
Reference Materials: Use of certified reference materials or well-characterized samples with known compositions to assess method accuracy [63].
Replication Strategies: Incorporation of technical replicates to assess measurement precision and sample replicates to evaluate overall method reproducibility [23].

The specific QA/QC measures should be documented thoroughly, including acceptance criteria for each parameter. For publications, primary method steps should be noted in the main text, with detailed procedures and settings provided in supporting information [23].

Validation Sample Design

Effective validation requires carefully designed sample sets that challenge the NTA method across its intended application range. Sample design should include:

Known Unknowns: Samples spiked with compounds not included in initial method development but relevant to the application domain [62].
Complex Matrices: Samples representing the complexity of actual study matrices, including environmental, biological, or product formulations [14].
Concentration Series: Samples with varying concentrations of target analytes to establish detection limits and quantitative performance [62].
Interference Testing: Samples containing structurally similar compounds or isobars to assess method selectivity [63].

The validation sample set should be of sufficient size and diversity to provide statistical confidence in performance estimates, with recommended minimums of 15-20 representative compounds across the concentration range of interest [63].

Data Processing and Compound Identification Validation

Data Processing Workflow

The transformation of raw HRMS data into meaningful chemical information requires multiple processing steps, each requiring validation to ensure data integrity [23]. The typical workflow consists of three main segments:

Diagram 1: NTA Data Processing Workflow

Data format conversion transforms raw data files to usable formats (e.g., .d, .raw to mzML, mzXML) without intentional data interpretation [23]. Data processing then reduces the raw data to meaningful information through steps including retention time alignment, peak detection, adduct and isotopologue grouping, and between-sample alignment [23]. Statistical and chemometric analysis identifies trends and relationships between samples and detections, while annotation and identification attributes molecular characteristics and specific compounds to detected features [23]. Each step requires specific quality control checkpoints to ensure proper validation.

Compound Identification Confidence Framework

Establishing confidence in compound identifications is a cornerstone of NTA validation. A standardized framework for communicating identification confidence should be implemented, typically consisting of multiple levels [23] [64]:

Table 2: Confidence Levels for Compound Identification in NTA

Confidence Level	Required Evidence	Typical Applications
Level 1	Confirmed by reference standard (match on retention time, accurate mass, and fragmentation spectrum)	Definitive identification for regulatory decisions
Level 2	Probable structure based on library spectrum match (without RT confirmation) or characteristic fragmentation	Prioritization for further investigation
Level 3	Tentative candidate based on diagnostic evidence (e.g., class-specific fragmentation)	Compound class identification
Level 4	Molecular formula assignment based on accurate mass and isotope pattern	Elemental composition determination
Level 5	Accurate mass of interest (exact mass match to database)	Suspect screening

The confidence level should be explicitly reported for all compound identifications in NTA studies, along with the specific evidence supporting the assignment [23]. This framework enables appropriate interpretation of identification confidence by stakeholders and facilitates comparison across studies.

Optimization of Identification Parameters

Parameter optimization is critical for reliable compound identification. An empirical approach to determining optimal positivity criteria has been demonstrated to achieve high identification efficiency [63]. Key parameters requiring optimization include:

Mass Error Tolerance: Maximum acceptable difference between measured and theoretical m/z values, typically < 5 ppm for high-resolution instruments [63].
Retention Time Alignment: Acceptable variation in retention time across samples, typically < 0.1-0.3 minutes depending on chromatographic conditions [23].
Isotope Pattern Matching: Threshold for matching observed and theoretical isotope abundance patterns, often using dot product or similarity scoring [63].
Spectral Library Matching: Criteria for matching experimental MS/MS spectra to reference spectra, typically using combined scoring approaches [63].

One validated approach uses a combined scoring threshold of 70, with 70% weight given to library match and 10% weight each to mass error, retention time error, and isotope pattern difference, achieving identification efficiency of 99.2% [63]. This demonstrates the importance of library matching while incorporating orthogonal identification evidence.

Quantitative NTA Validation

Approaches to Quantitative Validation

While traditionally qualitative, NTA is increasingly being applied to generate quantitative estimates, necessitating robust validation approaches [62]. Several strategies have emerged for quantitative NTA (qNTA) validation:

Internal Standardization: Use of stable isotope-labeled analogs or chemical analogs as internal standards to correct for matrix effects and ionization variability [62].
Proxy Compound Approaches: Application of response factors from structurally similar compounds with available standards to estimate concentrations [62].
Quantification by Ionization Efficiency: Use of models predicting ionization efficiency based on chemical structure to estimate concentrations without authentic standards [62].
Background Subtraction: Correction for native compound levels through analysis of procedural blanks [23].

Each approach requires specific validation experiments to establish performance characteristics, including linearity, accuracy, precision, and sensitivity over the concentration range of interest [62].

Validation Parameters for qNTA

Comprehensive validation of quantitative NTA methods should establish performance across multiple parameters, adapted from targeted analysis but with consideration for NTA-specific challenges [14] [62]:

Table 3: Validation Parameters for Quantitative NTA

Parameter	Assessment Method	Acceptance Criteria
Linearity	Analysis of calibration standards across concentration range	R² > 0.98, residual plots without pattern
Accuracy	Comparison to reference methods or spiked recovery	70-120% recovery for most matrices
Precision	Repeated analysis of quality control samples	< 20% RSD for intermediate precision
Limit of Detection	Signal-to-noise approach or statistical methods	Sufficient for intended application
Matrix Effects	Comparison of solvent vs. matrix-matched standards	Consistent ionization suppression/enhancement
Carryover	Analysis of blanks after high concentration samples	< 20% of LOD

Establishing these parameters provides stakeholders with confidence in quantitative estimates derived from NTA data, enabling their use in risk assessment and decision-making contexts [62].

Advanced Validation Techniques

Machine Learning-Assisted Validation

Machine learning (ML) approaches offer promising avenues for enhancing NTA validation through improved chemical structure identification, advanced quantification methods, and enhanced toxicity prediction capabilities [1]. ML applications in NTA validation include:

Spectral Prediction: Models that predict MS/MS spectra from chemical structures, enabling comparison with experimental data for identification validation [1] [64].
Retention Time Prediction: Algorithms that predict chromatographic retention based on chemical structure, providing orthogonal confirmation of identifications [1].
Quality Assessment: Automated evaluation of spectral quality and identification confidence, reducing subjective manual review [1].
Error Detection: Identification of systematic errors or anomalies in data processing that may affect result validity [1].

While ML-assisted validation shows significant promise, challenges remain in refining these tools for complex mixtures and improving inter-laboratory validation [1].

Database Infrastructure for Validation

Robust database infrastructure supports validation through curated reference data and standardized data exchange formats [65]. Key developments include:

Reference Spectral Libraries: Curated databases of high-quality mass spectra with attached metadata on experimental conditions [65].
Standardized Metadata: Structured information about sample provenance, analytical methods, and processing parameters [65].
FAIR Data Principles: Implementation of Findable, Accessible, Interoperable, and Reusable data practices to enhance validation transparency [65].
Quality Control Metrics: Integration of quality assessment results directly with spectral data [65].

Tools such as the Database Infrastructure for Mass Spectrometry (DIMSpec) provide open-source toolkits for creating portable databases with comprehensive metadata, supporting validation through contextual data preservation [65].

Implementation and Reporting

Standardized Reporting Protocols

Comprehensive reporting of validation parameters is essential for interpreting NTA results and assessing their reliability. Minimum reporting standards should include:

Data Processing Parameters: All software programs, algorithms, method steps, thresholds, and settings used in data processing [23].
Identification Criteria: Specific thresholds and scoring approaches used for compound identification, including mass error, retention time tolerance, and spectral matching criteria [63].
Confidence Documentation: Explicit confidence levels for all compound identifications with supporting evidence [23].
Quality Control Results: System suitability data, blank contamination profiles, and QC sample performance [23].
Quantitative Approach: Detailed description of quantification method with associated validation data [62].

Adherence to standardized reporting frameworks facilitates comparison across studies and builds confidence in NTA findings among stakeholders [14].

Implementation of systematic NTA validation requires specific resources and tools. The following table details key solutions and their functions in the validation process:

Table 4: Essential Research Resources for NTA Validation

Resource Category	Specific Tools/Resources	Function in Validation
Reference Spectral Libraries	NIST Mass Spectral Library, MassBank, DIMSpec databases [65]	Provide reference spectra for identification confirmation and confidence assessment
Chemical Databases	EPA CompTox Dashboard, PubChem, CAS [62]	Supply structural information and metadata for identification and hazard context
Data Processing Tools	patRoon, RMassBank, XCMS [65]	Enable reproducible data processing with documented parameters
QC Materials	Custom synthetic opioid libraries [66], PFAS standards [65]	Provide reference materials for method validation and performance monitoring
Statistical Packages	R-based tools, Python libraries for chemometrics [1]	Support statistical validation and model performance assessment
FAIR Data Infrastructure	DIMSpec toolkit, mzML format [65]	Ensure data preservation, sharing, and reusability for validation

Systematic validation of NTA findings is essential for building confidence in results and enabling application in regulatory and decision-making contexts. This guide has outlined comprehensive strategies spanning experimental design, data processing, compound identification, quantitative analysis, and reporting. By implementing these structured approaches, researchers can demonstrate the reliability of their NTA findings, address inherent uncertainties in non-targeted approaches, and facilitate broader adoption of NTA data by stakeholders. As the field continues to evolve, standardization of validation practices across laboratories will further enhance the comparability and interpretability of NTA results, ultimately strengthening their utility in chemical safety assessment and public health protection.

Confidence Level Frameworks for Chemical Identification

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful discovery-based approach for identifying unknown or unsuspected chemicals in complex samples across various fields, including environmental monitoring, food safety, and exposomics [19]. Unlike targeted methods that focus on predefined analytes, NTA aims to characterize sample composition without a priori knowledge of chemical content, making it particularly valuable for detecting emerging contaminants and identifying previously unrecognized chemical exposures [67]. However, the complexity of NTA workflows and the challenge of assigning identities to detected chemical features have highlighted the critical need for standardized confidence frameworks to ensure transparent and reproducible reporting of chemical identifications.

Confidence level frameworks provide systematic approaches for communicating the certainty associated with chemical identifications in NTA studies. These frameworks establish standardized criteria based on the type and quality of analytical data supporting each identification, enabling researchers, reviewers, and end-users to appropriately interpret and trust reported findings. The implementation of such frameworks addresses fundamental challenges in NTA, including variable data quality across laboratories, subjective interpretation of analytical results, and insufficient reporting of identification evidence [68]. This technical guide examines the established confidence frameworks, detailed methodologies for achieving different confidence levels, and practical implementation strategies within the broader context of NTA research using high-resolution mass spectrometry.

Established Confidence Level Frameworks

The Schymanski Confidence Scale

The most widely adopted confidence framework in NTA is the hierarchical scale proposed by Schymanski et al., which categorizes chemical identifications into five distinct levels based on the strength of supporting evidence [67]. This system has become the community standard for reporting identification confidence, with specific criteria that must be met at each level:

Level 1: Confirmed Structure - Requires matching of both retention time and mass spectral data (including fragmentation spectrum) with an authentic standard analyzed under identical analytical conditions [67]. This level provides the highest confidence and is considered definitive confirmation.
Level 2: Probable Structure - Demonstrates concordance of experimental MS/MS spectra with literature or library spectral data, but lacks confirmation with an authentic standard [67]. While highly confident, this level acknowledges potential for isomeric compounds to produce similar fragmentation patterns.
Level 3: Tentative Candidate - Assigns a specific structure based on diagnostic evidence such as spectral similarity to related compounds, characteristic fragmentation patterns, or class-specific retention time behavior, but with insufficient evidence for definitive identification [67].
Level 4: Unequivocal Molecular Formula - Determines a specific molecular formula based on accurate mass measurement and isotope pattern matching, but cannot distinguish between isomeric structures [67].
Level 5: Exact Mass of Interest - Identifies a feature of interest based solely on accurate mass measurement without additional supporting evidence for structure or molecular formula [67].

Table 1: Schymanski Confidence Levels for Chemical Identification in NTA

Confidence Level	Identification Type	Required Evidence	Typical Applications
Level 1	Confirmed structure	Match to authentic standard for RT and MS/MS	Definitive identification for risk assessment
Level 2	Probable structure	Library spectrum match without reference standard	Compound identification when standards unavailable
Level 3	Tentative candidate	Diagnostic evidence (class-specific fragments, etc.)	Structural class identification
Level 4	Unequivocal molecular formula	Accurate mass + isotope pattern	Formula assignment for unknown prioritization
Level 5	Exact mass feature	Accurate mass only	Feature detection and prioritization

Complementary Confidence Assessment Approaches

Beyond the Schymanski scale, additional approaches provide complementary confidence metrics for specific aspects of NTA. Kilgour et al. developed confidence metrics for automatic peak assignment that combine mass accuracy, relative ion abundance, and rings-plus-double-bonds equivalence with novel metrics based on interconnectivity of mass difference networks and confidence of initial library matches [69]. These metrics help analysts determine appropriate degrees of trust in automated elemental formula assignments, particularly for complex natural organic materials where manual verification is impractical.

The NTA Study Reporting Tool (SRT) provides a comprehensive framework for assessing reporting quality across all aspects of NTA studies, including chemical identification protocols [68]. Developed by the Benchmarking and Publications for Non-Targeted Analysis (BP4NTA) working group, the SRT structures reporting requirements according to study chronology and emphasizes harmonization between methods and results sections, with specific sub-categories for annotation and identification methods and corresponding identification outputs [68].

Experimental Protocols for Achieving Different Confidence Levels

Level 1 Identification: Confirmed Structure Protocol

Achieving Level 1 confirmation requires analysis of authentic chemical standards under identical analytical conditions as the sample. The detailed protocol includes:

Reference Standard Acquisition: Obtain certified reference materials or purified standards of candidate compounds from commercial suppliers or through synthesis.
Chromatographic Alignment: Analyze standards using identical chromatographic conditions (column, mobile phase composition, gradient, flow rate, temperature) as experimental samples.
Retention Time Matching: Compare sample and standard retention times using acceptable tolerance windows (typically ±0.1-0.2 minutes for LC, ±5 seconds for GC), accounting for proper retention time indexing if needed.
Mass Spectral Verification: Confirm match between experimental and standard MS and MS/MS spectra using similarity scoring (e.g., dot product ≥0.8 for MS/MS) with appropriate mass tolerance (≤5 ppm for HRMS).
Ion Ratio Consistency: Verify consistency in adduct formation, isotope patterns, and fragment ion ratios between sample and standard analyses.

This protocol was successfully implemented in a study of semi-volatile organic compounds in indoor dust, where 128 compounds were identified at confidence level 1 or 2 in standard reference material dust, with verification using authentic standards [70].

Level 2 Identification: Probable Structure Protocol

When authentic standards are unavailable, Level 2 identification can be achieved through comprehensive spectral matching:

High-Quality MS/MS Acquisition: Obtain fragmentation spectra using collision energies optimized for structural elucidation, typically employing data-dependent acquisition or inclusion lists.
Spectral Library Searching: Compare experimental MS/MS spectra against comprehensive reference databases such as mzCloud, MassBank, NIST, or GNPS.
Spectral Match Evaluation: Apply similarity scoring algorithms and manual spectral interpretation to confirm key fragment ions and neutral losses match proposed structure.
Orthogonal Evidence Integration: Support spectral matches with additional evidence including accurate mass measurement (≤5 ppm error), isotope pattern matching (≤10 mSigma fit), and retention time prediction when available.

In rapid response scenarios for unidentified chemicals, this approach has correctly assigned structures to more than half of features investigated, achieving Level 2 or 3 identifications sufficient for initial hazard assessment [67].

Level 3-4 Identification: Tentative Candidate and Molecular Formula Protocols

For unknown compounds without reference spectra or standards, systematic structure elucidation approaches include:

Molecular Formula Determination: Use accurate mass measurement (typically ≤3 ppm error) combined with isotope abundance pattern matching to assign molecular formulas.
In Silico Fragmentation: Employ computational tools such as CFM-ID, MetFrag, or SIRIUS to generate predicted fragments for candidate structures and compare with experimental MS/MS spectra.
Retention Time Prediction: Develop quantitative structure-retention relationship (QSRR) models using machine learning to predict retention behavior for candidate structures [71]. Recent applications have achieved prediction errors below 0.5 minutes, significantly improving confidence in tentative identifications [71].
Class-Specific Diagnostic Evidence: Identify characteristic fragments, neutral losses, or mass defects indicative of specific chemical classes (e.g., -CF2- units for PFAS, halogen patterns for brominated/chlorinated compounds).

Workflow Visualization and Implementation

The complete workflow for implementing confidence frameworks in NTA studies involves sequential steps from initial detection to final confirmation, with decision points at each stage determining the achievable confidence level.

Confidence Level Determination Workflow

NTA Study Reporting Framework

The NTA Study Reporting Tool provides a structured approach for comprehensive documentation of confidence-related methodologies and results throughout the research process.

NTA Study Reporting Structure

The Scientist's Toolkit: Essential Research Reagents and Materials

Successful implementation of confidence frameworks requires specific reagents, reference materials, and computational tools throughout the NTA workflow.

Table 2: Essential Research Reagents and Materials for Confidence Framework Implementation

Category	Specific Items	Function in Confidence Assessment
Reference Standards	Authentic chemical standards, Certified reference materials (e.g., NIST SRM 2585 dust)	Level 1 confirmation via retention time and spectral matching [70]
Chromatography	LC columns (C18, HILIC, etc.), GC columns (non-polar, mid-polar), Mobile phase additives	Compound separation and reproducible retention behavior for identification
Ionization Sources	ESI, APCI, APPI, EI, CI sources for LC- and GC-HRMS	Optimal ionization for different compound classes to maximize detection
Spectral Libraries	Commercial (mzCloud, NIST) and public (MassBank, GNPS) databases	Level 2-3 identification via spectral matching and similarity scoring [19]
Software Tools	Vendor (Compound Discoverer, MassHunter) and open-source (MS-DIAL, MZmine) platforms	Data processing, feature detection, and identification workflow implementation [19]
QA/QC Materials	Internal standards, QC spikes, procedural blanks, pool quality control samples	Monitoring analytical performance and identifying potential artifacts [68]

Advanced Applications and Future Directions

Machine Learning and Predictive Modeling

Recent advances in machine learning are enhancing confidence framework implementation through improved prediction capabilities. Quantitative structure-retention relationship (QSRR) models using machine learning algorithms can now predict retention times with errors below 0.5 minutes, providing valuable orthogonal evidence for compound identification [71]. These models establish correlations between molecular descriptors and chromatographic behavior, serving as supplementary evidence for confidence assignment in suspect and non-targeted screening.

Machine learning approaches are also being applied to MS/MS spectrum prediction, false discovery rate estimation, and automated structure annotation, addressing key challenges in NTA confidence assessment [1]. The integration of these computational tools with established confidence frameworks shows promise for reducing manual intervention while improving identification accuracy and reproducibility across laboratories.

Integrated Targeted and Non-Targeted Approaches

Hybrid approaches that integrate targeted and non-targeted analysis provide complementary advantages for comprehensive chemical characterization and confidence assessment. As demonstrated in a study of semi-volatile organic compounds in indoor dust, integrated NTA and TA approaches enable identification and prioritization of a wider range of chemicals while maintaining confidence in the results [70]. This strategy avoids the loss of critical trace compounds that might be missed by NTA alone while providing discovery capabilities beyond traditional targeted methods.

The integrated approach facilitates compound prioritization based on both exposure relevance and potential toxicity, supporting more informed risk-based assessment of identified chemicals [70]. This is particularly valuable in rapid response scenarios where timely and confident identification of unknown stressors is required, such as chemical threat incidents, illicit drug contamination, or accidental industrial spills [67].

Community Standards and Reporting Practices

The development and adoption of community-wide standards represent a critical direction for advancing confidence frameworks in NTA. The NTA Study Reporting Tool provides a living framework for assessing reporting quality, with integration into the BP4NTA website allowing continued evolution as community needs change [68]. Widespread implementation of such tools is expected to improve study design and standardize reporting practices, ultimately leading to broader use and acceptance of NTA data across scientific disciplines and regulatory applications.

Current efforts focus on addressing reporting areas that need immediate improvement, such as analytical sequence documentation and quality assurance/quality control information, which are essential for proper assessment of identification confidence [68]. As these community standards mature, they will enhance the transparency, reproducibility, and reliability of chemical identification confidence assignments in non-targeted analysis.

Non-targeted analysis (NTA) using high-resolution mass spectrometry has emerged as a powerful approach for characterizing the chemical composition of complex samples without prior knowledge of their content. This capability makes NTA invaluable across diverse fields including environmental science, food safety, exposomics, and drug development [68] [20]. Unlike traditional targeted methods that focus on predefined analytes, NTA workflows aim to detect and identify unknown chemicals, classify samples based on chemical profiles, and discover previously unrecognized compounds [68]. The exponential growth of NTA applications, however, has revealed significant challenges in research reproducibility and transparency due to the complexity of methodologies and lack of universal reporting standards [68] [72].

The fundamental challenge facing the NTA community lies in the tremendous diversity of analytical workflows, instrumentation, data processing techniques, and quality assurance practices employed across different laboratories and research domains [68]. This methodological heterogeneity, combined with insufficient reporting of critical experimental details, has hampered the ability to reproduce findings, compare results across studies, and build upon existing research [68] [57]. In response to these challenges, the Benchmarking and Publications for Non-Targeted Analysis (BP4NTA) working group developed the NTA Study Reporting Tool (SRT) as a comprehensive framework to standardize reporting practices and improve research transparency [73] [68] [72].

Genesis and Development Process

The SRT emerged from coordinated efforts by the BP4NTA working group, which formed in 2018 to address critical challenges in NTA research and reporting [20]. Comprising researchers from academic, government, and industrial sectors across North America and Europe, BP4NTA recognized that despite existing guidance on specific NTA elements—most notably for compound identification—no comprehensive reporting framework covered the complete NTA workflow [68] [20]. The development process involved rigorous validation where eleven NTA practitioners with diverse expertise evaluated eight published articles covering environmental, food, and health-based exposomic applications [68]. This evaluation demonstrated that the SRT provided a valid structure for guiding study design, manuscript preparation, and critical assessment of reporting quality [68] [57].

Tool Structure and Organization

The SRT is strategically organized to follow the chronological progression of a typical NTA study, ensuring logical flow and comprehensive coverage of all critical workflow components [68] [74]. The tool divides the research process into two major sections (Methods and Results) containing five categories total, with each category further broken down into specific sub-categories for detailed evaluation [68].

Table: Structure of the NTA Study Reporting Tool

Section	Category	Sub-categories
Methods	Study Design	Objectives & Scope, Sample Information & Preparation, QC Spikes & Samples
	Data Acquisition	Analytical Sequence, Chromatography, Mass Spectrometry
	Data Processing & Analysis	Data Processing, Statistical & Chemometric Analysis, Annotation & Identification
Results	Data Outputs	Statistical & Chemometric Outputs, Identification & Confidence Levels
	QA/QC Metrics	Data Acquisition QA/QC, Data Processing & Analysis QA/QC

This organized structure enables systematic evaluation of reporting completeness at each stage of the NTA workflow, from initial study design through final quality assurance documentation [68]. Each sub-category includes examples of the specific information that should be reported, making the SRT accessible to both novice and experienced NTA researchers [73] [68].

Scoring System and Assessment Methodology

The SRT employs a hybrid scoring system that combines color-coding with numerical values to facilitate clear and consistent evaluation of reporting quality [68] [72]. The current version uses a 4-level scoring system where reviewers assign ratings based on the completeness of reporting in each sub-category, with space provided for rationales explaining each score assignment [73] [68].

Table: SRT Scoring System and Criteria

Score	Color	Description
3	Blue	All relevant reporting elements are present
2	Yellow	Some relevant reporting elements are present, but important information is missing
1	Red	Most or all relevant reporting elements are missing
NA	Gray	Reporting on this topic is outside the study scope

It is crucial to emphasize that the SRT focuses exclusively on assessing reporting quality, not the scientific merit or quality of the research data itself [68] [72] [74]. This distinction ensures that the tool remains an objective framework for evaluating the transparency and reproducibility of NTA studies, regardless of their specific research goals or methodological approaches [74].

Implementation and Practical Application

Access and Format Options

The SRT is freely available to the scientific community through the BP4NTA website in two practical formats designed to accommodate different user preferences and use cases [73] [74]. The Excel version provides interactive functionality with dropdown menus for scoring and built-in plotting capabilities to visualize scores from multiple reviewers, while the PDF version offers a static format that can be easily annotated and shared [73]. Both formats maintain the same core structure and scoring system, ensuring consistency regardless of the chosen format [73].

Application Throughout the Research Lifecycle

The SRT is designed for implementation at multiple stages of the research process, providing maximum benefit to the NTA community [68] [74]. During study design and planning, researchers can use the SRT as a checklist to ensure all critical methodological elements are incorporated into their experimental protocols [68] [72]. When preparing manuscripts or research proposals, authors can perform self-evaluation using the SRT to identify reporting gaps before submission [68]. Evaluation results demonstrated that 72% of author self-assigned scores fell within the range of peer-assigned scores, indicating that SRT use for self-evaluation effectively strengthens reporting practices [57]. For peer review, journal referees and editors can apply the SRT to conduct standardized assessments of manuscript completeness, with the Excel version specifically including plotting functionality to facilitate comparison of multiple reviewer scores [73] [68] [74].

Proper citation practices ensure appropriate credit for the SRT developers and enable tracking of the tool's adoption and impact [73] [74]. The BP4NTA provides specific template language for acknowledging SRT use in publications, with slightly different formats depending on whether the tool was used during manuscript preparation or during peer review [73] [74].

Table: SRT Citation Guidelines for Different Use Cases

Use Case	Location in Manuscript	Template Language
Manuscript Preparation	Methods Section	"The NTA Study Reporting Tool (SRT) was used in the preparation of this manuscript (Peter et al., 2021; 10.6084/m9.figshare.19763482 [PDF] or 10.6084/m9.figshare.19763503 [Excel])."
Peer Review	Acknowledgements Section	"The NTA Study Reporting Tool (SRT) was used during peer review to document and improve the reporting and transparency of this study (10.1021/acs.analchem.1c02621; 10.6084/m9.figshare.19763482 [PDF] or 10.6084/m9.figshare.19763503 [Excel])."

Technical Specifications and Workflow Integration

Comprehensive Workflow Visualization

The SRT encompasses the complete NTA workflow, from initial study conception through final reporting of results and quality metrics. The following diagram illustrates the logical relationships between different SRT components and their progression through the research lifecycle:

Essential Research Reagents and Materials

The SRT recognizes that comprehensive reporting of research reagents and materials is fundamental to experimental reproducibility. The following table details key components that should be documented throughout the NTA workflow:

Table: Essential Research Reagents and Materials for NTA Studies

Category	Component	Function and Reporting Requirements
Sample Preparation	QC Spikes & Samples	Internal standards, recovery surrogates, procedural blanks; report compounds, concentrations, and addition points [68]
	Purification Materials	SPE cartridges, filtration devices; report specific sorbents, formats, and processing conditions [27]
Data Acquisition	Chromatography System	Separation mechanism; report column type, dimensions, mobile phases, and gradient program [68]
	Mass Spectrometer	Mass analysis; report instrument type, resolution, mass accuracy, and acquisition mode [68]
	Analytical Sequence	Sample randomization; report injection order, QC frequency, and blank placement [68]
Data Processing	Reference Spectral Libraries	Compound identification; report specific libraries and versions used [68] [27]
	Software Platforms	Data processing; report software, algorithms, and parameter settings [68] [27]

Current Reporting Gaps and Areas for Improvement

The initial validation of the SRT across eight published studies revealed significant disparities in reporting quality across different aspects of NTA workflows [68] [72]. While methods sections generally contained adequate descriptions of chromatography and mass spectrometry parameters, critical information about analytical sequences and quality assurance practices was frequently incomplete or entirely absent [68] [72]. Specifically, the evaluation identified that reporting scores were substantially lower and more variable for QA/QC metrics compared to other sub-categories, highlighting an urgent need for improved documentation of quality control practices across the NTA community [68] [57].

Impact and Future Directions

Adoption in Scientific Publishing and Research

The SRT has already gained traction within the scientific community, with encouraging adoption by journals, researchers, and reviewers [74]. The Journal of Exposure Science and Environmental Epidemiology (JESEE) has incorporated the SRT into its author and reviewer guidelines, particularly for special issues focused on exposomics using non-targeted analysis [74]. This formal integration into the peer review process represents a significant step toward standardizing NTA reporting across the scientific literature [74]. As more researchers and journals adopt the SRT, the tool is expected to drive measurable improvements in the completeness, transparency, and reproducibility of NTA studies [57] [74].

Evolution and Community Feedback

The BP4NTA working group explicitly designed the SRT as a "living framework" that will evolve alongside advancements in NTA research [68] [72]. The digital distribution strategy, with version-controlled files hosted on the BP4NTA website, enables periodic updates based on community feedback, technological developments, and emerging best practices [73] [72]. Researchers are actively encouraged to submit comments, suggestions, and improvement ideas through a dedicated portal on the BP4NTA website, ensuring that the tool remains responsive to the changing needs of the NTA community [73] [72].

The NTA Study Reporting Tool represents a significant advancement in promoting reproducibility and transparency for non-targeted analysis studies using high-resolution mass spectrometry. By providing a comprehensive, standardized framework for evaluating reporting quality across the entire NTA workflow, the SRT addresses critical challenges that have hindered comparison, replication, and utilization of NTA data across research domains. The structured organization, practical scoring system, and flexible implementation options make the SRT accessible to researchers at all experience levels, from novices entering the field to experienced practitioners developing complex methodologies. As adoption increases, the SRT is poised to fundamentally transform reporting practices throughout the NTA community, ultimately enhancing the scientific rigor, reliability, and impact of non-targeted analysis in exposure science, environmental epidemiology, drug development, and related fields.

Comparative Analysis of NTA Software and Computational Tools

Non-targeted analysis (NTA) represents a paradigm shift in analytical chemistry, enabling the comprehensive characterization of chemical composition in complex samples without prior knowledge of their content. Powered primarily by high-resolution mass spectrometry (HRMS), NTA has become an indispensable tool for researchers across diverse fields, including environmental science, drug development, and exposomics [20]. Unlike traditional targeted methods that focus on predefined compounds, NTA aims to capture both known "unknowns" (compounds that exist but are not specifically monitored) and truly novel chemicals, providing a holistic view of the chemical universe within a sample [3]. This capability is particularly valuable for drug development professionals who must understand complex metabolite profiles, identify impurities, and characterize biotransformation products that may elude conventional targeted approaches.

The fundamental challenge in modern NTA lies not in data generation but in computational interpretation. Contemporary HRMS instruments generate immensely complex datasets that require sophisticated software tools for feature extraction, compound identification, and data reduction [27]. The analytical process transforms raw instrument data into chemically meaningful information through a multi-stage workflow encompassing data preprocessing, feature finding, compound annotation, and ultimately, identification and quantification. Each stage introduces specific computational challenges that different software tools address through varied algorithms and processing approaches. A critical study examining the consistency of data processing across different NTA software tools revealed startlingly low coherence, with only approximately 10% overlap of features between all four major programs tested (MZmine2, enviMass, Compound Discoverer, and XCMS online), while 40-55% of features were unique to each individual software platform [75]. This lack of consistency underscores the critical importance of understanding software capabilities, limitations, and algorithmic differences when designing NTA studies and interpreting results.

Critical Evaluation of NTA Software Platforms

Algorithmic Approaches and Performance Metrics

The divergence in results between NTA software platforms stems fundamentally from differences in their underlying algorithms and processing workflows. Each software package implements unique approaches to critical steps including chromatographic peak detection, chromatographic alignment, isotope and adduct grouping, and blank filtration. These algorithmic differences lead to substantial variation in the final feature lists, even when processing identical raw data files [75]. The implementation of replicate and blank filtering has been identified as a particularly significant source of observed divergences between software tools, suggesting that post-processing filters contribute substantially to the variability in reported results.

Performance benchmarking remains challenging due to the absence of standardized metrics for NTA software evaluation. Unlike targeted analysis where metrics like sensitivity and specificity can be calculated based on known compounds, NTA must contend with unknown features whose true presence or absence is difficult to verify. Some researchers have proposed using the number of true positives detected in spiked samples or the consistency of feature detection across replicates as potential metrics, but these approaches have limitations for evaluating true non-targeted performance [75]. The confidence level of identification represents another critical performance dimension, typically following a five-level hierarchy from confirmed structure (Level 1) to unequivocal molecular formula (Level 2) to tentative candidate (Level 3) to unambiguous molecular formula (Level 4) to exact mass (Level 5) [20].

Table 1: Comparative Analysis of Major NTA Software Platforms

Software Tool	Algorithmic Approach	Strengths	Limitations	Optimal Use Cases
MZmine2	Modular workflow with user-defined parameters	High flexibility, open-source, active development	Steeper learning curve, requires parameter optimization	Research requiring customized workflows and method development
XCMS Online	Cloud-based processing with standardized workflows	Accessibility, minimal setup, visualizations	Less customizable, dependency on internet connection	Initial exploratory analysis and collaborative projects
Compound Discoverer	Integrated workflow with automated processing	Streamlined workflow, commercial support, high throughput	Limited algorithmic transparency, cost	Routine screening in regulated environments
enviMass	R-based processing focused on environmental applications	Specialized for time-series data, trend analysis	Narrower focus, less broad applicability	Environmental monitoring and temporal trend analysis

Quantitative Capabilities and Uncertainty Estimation

The transition from qualitative to quantitative NTA represents a critical frontier for the field, particularly for applications in risk assessment and regulatory decision-making where concentration estimates are essential. Significant efforts have been made in recent years to bridge this quantitative gap, with several approaches emerging for deriving quantitative estimates from NTA measurements [28]. These include the use of surrogate standards, machine learning-based prediction, and response factor modeling. However, quantitative NTA methods currently do not fully consider estimation uncertainty, and the effects of experimental recovery on this uncertainty remain largely unexplored in NTA studies [28].

The integration of quantitative NTA estimates with available hazard metrics may facilitate provisional safety evaluations, creating a pathway for NTA data to directly support chemical risk characterization. This is particularly relevant for drug development professionals who must assess the potential risk of identified impurities and metabolites. The conceptual framework for incorporating NTA data into contemporary risk assessment involves linking contaminant discovery with risk characterization through quantitative estimates, though significant methodological challenges remain in properly characterizing and communicating the associated uncertainties [28].

Experimental Design and Methodologies for Software Evaluation

Benchmarking Studies and Proficiency Testing

Rigorous evaluation of NTA software performance requires carefully designed benchmarking studies that assess both method capabilities and limitations. The Benchmarking and Publications for Non-Targeted Analysis Working Group (BP4NTA) has established a framework for such evaluations, emphasizing the need for harmonized terminology and clear guidance about best practices for analysis and reporting results [20]. Effective benchmarking studies should incorporate quality assurance and quality control (QA/QC) approaches specifically designed for NTA, including the use of quality control materials, standardized sample preparation protocols, and data quality assessment metrics.

The Non-Targeted Analysis Collaborative Trial (ENTACT) exemplifies a large-scale collaborative effort to evaluate and benchmark NTA performance across multiple laboratories and platforms [20]. Such proficiency testing exercises reveal substantial variability in results that can be attributed to differences in sample preparation techniques, instrumentation, software, and user settings rather than true sample differences. This highlights the critical importance of standardizing experimental protocols when conducting comparative software evaluations. Key components of an effective benchmarking protocol include: (1) use of standardized reference materials with known composition; (2) implementation of blank samples to identify and filter contamination; (3) incorporation of quality control samples to monitor instrument performance; (4) consistent data processing parameters across software platforms; and (5) standardized reporting formats for features and identifications.

Machine Learning-Enhanced NTA Workflows

The integration of machine learning (ML) with NTA represents a transformative advancement for extracting meaningful environmental information from the vast chemical datasets generated by HRMS [27]. ML algorithms are particularly effective at identifying latent patterns within high-dimensional data, making them well-suited for contamination source identification, sample classification, and biomarker discovery. A comprehensive workflow for ML-assisted NTA encompasses four key stages: (1) sample treatment and extraction; (2) data generation and acquisition; (3) ML-oriented data processing and analysis; and (4) result validation [27].

In the data processing stage, ML techniques address several critical challenges. Data preprocessing methods including noise filtering, missing value imputation (e.g., k-nearest neighbors), and normalization (e.g., TIC normalization) help mitigate batch effects and improve data quality [27]. Dimensionality reduction techniques like principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) simplify high-dimensional data, while clustering methods (hierarchical cluster analysis, k-means clustering) group samples by chemical similarity. Supervised ML models, including Random Forest (RF) and Support Vector Classifier (SVC), can then be trained on labeled datasets to classify contamination sources or predict sample properties. For example, one study successfully implemented ML classifiers to screen 222 targeted and suspect per- and polyfluoroalkyl substances (PFASs) across 92 samples, achieving classification balanced accuracy ranging from 85.5% to 99.5% across different sources [27].

Table 2: Essential Research Reagents and Materials for NTA Studies

Reagent/Material	Function/Purpose	Application Notes
Solid Phase Extraction (SPE) Cartridges	Sample cleanup and analyte enrichment	Multi-sorbent strategies (e.g., Oasis HLB with ISOLUTE ENV+) provide broader chemical coverage
Quality Control Materials	Monitoring instrument performance and data quality	Includes solvent blanks, procedural blanks, and quality control samples
Internal Standards	Correcting for matrix effects and instrumental variance	Both labeled and non-labeled compounds across different retention times
Certified Reference Materials (CRMs)	Method validation and compound verification	Essential for confirming compound identities and validating quantitative approaches
Retention Time Markers	Chromatographic alignment and performance monitoring	Critical for retention time correction across different batches

Integrated Workflow for Comparative Software Analysis

Diagram 1: NTA Software Evaluation Workflow. This workflow illustrates the systematic process for comparing multiple NTA software platforms, from raw data processing through performance evaluation.

The experimental workflow for comparative software analysis begins with standardized raw data acquisition using high-resolution mass spectrometry, typically with liquid or gas chromatography separation (LC/GC-HRMS). Following data acquisition, the same raw data files are processed in parallel through multiple software platforms, ensuring consistent parameter settings where possible [75]. The resulting feature lists from each software undergo blank filtering and replicate analysis to remove artifacts and ensure only reproducible features are considered. The critical analysis phase involves comparing the feature lists to identify overlaps and unique detections, with studies consistently showing approximately 10% overlap between all four major programs and 40-55% of features unique to each software [75].

For the overlapping features, identification confidence is assessed using the five-level hierarchy, examining supporting evidence including exact mass, isotopic patterns, fragmentation spectra, and when available, retention time matching with authentic standards [20]. Performance metrics are then calculated for each software, including sensitivity (number of true positives detected), selectivity (discrimination of true features from noise), and reproducibility (consistency across replicates). The final comparative report should transparently communicate both the strengths and limitations of each software platform for the specific application, noting that software performance may vary significantly based on sample matrix, instrumentation, and analytical objectives.

Implementation Framework and Best Practices

Quality Assurance and Quality Control in NTA

Implementing robust quality assurance and quality control (QA/QC) procedures is essential for generating reliable and reproducible NTA data. The BP4NTA working group emphasizes that study design should intentionally incorporate QA/QC approaches and yield the necessary data to enable performance assessments after data acquisition and analysis is complete [3]. Key QA/QC elements include the analysis of blank samples to identify contamination, quality control samples to monitor instrument stability, and replicate analyses to assess precision. The use of internal standards helps correct for matrix effects and instrumental variance, while standardized protocols for sample preparation and data processing minimize technical variability.

Confidence in compound identification represents a particular challenge in NTA, and should be communicated using a standardized confidence hierarchy. Level 1 identifications require confirmation with an authentic standard analyzed under identical analytical conditions, providing definitive confirmation [20]. Level 2 identifications are supported by library spectrum matches without a reference standard, while Level 3 represents tentative candidates based on diagnostic evidence. Lower confidence levels (4 and 5) provide progressively less certain information, from unambiguous molecular formula to exact mass only. Transparent reporting of identification confidence levels is essential for proper interpretation of NTA results, particularly in regulatory contexts or when informing risk assessment decisions.

Future Directions and Community Initiatives

The NTA field continues to evolve rapidly, with several community-wide initiatives working to address current challenges and advance methodological rigor. The Benchmarking and Publications for Non-Targeted Analysis Working Group (BP4NTA) represents a prominent effort to harmonize approaches and reporting practices across the NTA community [20]. With participation from academic, government, and industry sectors, BP4NTA has developed consensus definitions for NTA-relevant terms, reference content to support methodological reporting, and resources for both novice and experienced NTA researchers.

Future methodological developments are likely to focus on improving quantitative capabilities, with significant efforts underway to bridge the gap between contaminant discovery and risk characterization [28]. The integration of machine learning and artificial intelligence approaches will continue to advance, particularly for compound identification and source attribution [27]. Meanwhile, community-wide proficiency testing exercises and collaborative trials will help establish performance benchmarks and best practices. As these efforts mature, NTA data will see expanded application in regulatory decision-making and risk-based prioritization, moving beyond purely exploratory applications to directly support chemical management and safety assessment.

The comparative analysis of NTA software and computational tools reveals a complex landscape characterized by diverse algorithmic approaches and substantial variability in results. The finding that different software tools applied to identical datasets can yield dramatically different feature lists underscores the critical importance of software selection and transparent methodology reporting in NTA studies [75]. Researchers must carefully consider their analytical objectives, sample characteristics, and available resources when selecting software tools, recognizing that performance is highly context-dependent.

The ongoing harmonization efforts led by community initiatives like BP4NTA provide promising pathways toward improved consistency and reliability in NTA [20]. As the field continues to mature, the integration of quantitative approaches [28] and machine learning methodologies [27] will expand the applications and impact of NTA across diverse scientific domains. For drug development professionals and other researchers leveraging NTA approaches, maintaining awareness of evolving best practices, participating in community initiatives, and implementing rigorous QA/QC procedures will be essential for generating chemically meaningful and scientifically defensible results. Through continued methodological refinement and community-wide collaboration, NTA promises to remain at the forefront of analytical innovation, providing unprecedented insights into the chemical complexity of biological and environmental systems.

Non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS) has emerged as a powerful approach for detecting and identifying unknown and unexpected compounds in complex sample matrices [27] [20]. Unlike targeted methods that focus on predefined analytes, NTA aims to characterize the comprehensive chemical composition of samples without prior knowledge of their chemical content [3]. This capability makes NTA particularly valuable for discovering emerging environmental contaminants, identifying transformation products, and classifying samples based on chemical profiles [1] [22]. However, the inherent uncertainty in NTA results poses significant challenges for interpretation and acceptance by stakeholders, including regulatory bodies [5].

The complexity of NTA data, often comprising thousands of detected features in a single sample, necessitates robust validation strategies to ensure reliable results [22]. Without standardized validation approaches, the translation of NTA findings into actionable environmental insights remains problematic [27]. Tiered validation addresses this challenge by implementing multiple layers of verification that collectively enhance the confidence in NTA results [27]. This systematic framework bridges the gap between analytical capability and environmental decision-making by providing structured approaches to verify compound identifications, assess model generalizability, and contextualize findings within real-world scenarios [27].

The Three-Tiered Validation Framework

The tiered validation framework for NTA consists of three complementary approaches: reference material verification, external dataset validation, and environmental plausibility assessment [27]. This multi-faceted strategy ensures that NTA results are both chemically accurate and environmentally meaningful by addressing different aspects of validation [27]. The framework is particularly crucial for supporting the interpretation of complex HRMS data and advancing the application of NTA in environmental research and decision-making [27].

Table 1: Overview of the Three-Tiered Validation Framework for NTA

Validation Tier	Primary Objective	Key Techniques	Outcomes Assessed
Reference Material Verification	Confirm analytical confidence in compound identities	Certified reference materials (CRMs), spectral library matches, confidence-level assignments [27]	Chemical identity confirmation, analytical accuracy
External Dataset Testing	Evaluate model generalizability and robustness	Independent external datasets, cross-validation techniques (e.g., 10-fold) [27]	Model overfitting, transferability, performance stability
Environmental Plausibility Checks	Correlate model predictions with real-world context	Geospatial analysis, source-specific chemical markers, contextual data integration [27]	Environmental relevance, source-receptor relationships

The integration of these three validation tiers addresses a critical gap in NTA research, where previous approaches often emphasized laboratory-based tests that might underperform in real-world conditions involving field-validated source-receptor relationships [27]. By implementing this comprehensive framework, researchers can more effectively contextualize machine learning outputs within actual contamination scenarios, thereby enhancing the practical utility of NTA data for environmental protection and public health decision-making [27].

Figure 1: The three-tiered validation framework for non-targeted analysis, illustrating the interconnected approach to verifying NTA results through reference materials, external datasets, and environmental plausibility assessments.

Tier 1: Reference Material Verification

Experimental Protocols for Reference Material Verification

Reference material verification constitutes the foundational tier of NTA validation, focusing on confirming the analytical confidence in compound identities [27]. This process begins with the use of certified reference materials (CRMs) to verify the accuracy of compound identifications [27]. The experimental protocol involves analyzing CRMs alongside environmental samples using identical instrumentation and analytical conditions. This parallel analysis enables direct comparison of retention times, mass accuracy, and fragmentation spectra between suspected compounds in samples and authentic standards [27].

Spectral library matching represents another critical component of reference material verification [27]. The protocol requires acquiring MS/MS spectra for detected features in samples and comparing them against reference spectra in curated databases. Confidence-level assignments according to established frameworks (e.g., Level 1-5) provide systematic approaches for communicating identification certainty [27] [20]. Level 1 identification, the highest confidence level, requires matching retention time and fragmentation spectrum with an authentic standard analyzed under identical analytical conditions [27]. This tiered confidence system helps stakeholders appropriately interpret and utilize NTA results based on the strength of evidence supporting compound identifications.

Practical Implementation Considerations

Successful implementation of reference material verification requires careful consideration of several practical factors. The selection of appropriate CRMs should reflect the chemical diversity expected in study samples and the specific research objectives [27]. For emerging contaminants where CRMs may not be commercially available, alternative approaches include synthesizing reference compounds or obtaining well-characterized materials from research collections [27]. Quality control measures, such as batch-specific QC samples and internal standards, should be incorporated throughout the analytical process to monitor instrument performance and ensure data quality [27] [20].

The implementation of confidence-level assignments requires transparent reporting of the evidence supporting each identification [20]. Schymanski et al.'s confidence level framework is widely adopted in NTA studies, with Level 1 representing confirmed structure via reference standard, Level 2 indicating probable structure through library spectrum match, Level 3 suggesting tentative candidate, Level 4 providing unequivocal molecular formula, and Level 5 representing exact mass of interest [27]. Clear communication of these confidence levels in research findings enables appropriate interpretation of results by stakeholders with different informational needs [20].

Tier 2: External Dataset Testing

Methodologies for External Dataset Validation

External dataset testing serves as the second validation tier, focusing on assessing the generalizability and robustness of models developed through NTA [27]. This process involves validating classifiers and statistical models on independent external datasets that were not used during model development [27]. The experimental protocol requires partitioning data into training and completely separate testing sets, with the latter representing different geographical areas, temporal periods, or sample matrices than the original training data [27].

Cross-validation techniques represent a complementary approach to external dataset testing [27]. Methods such as k-fold cross-validation (e.g., 10-fold) involve partitioning the original dataset into k subsets, using k-1 subsets for training, and the remaining subset for testing, with the process repeated k times [27]. This approach provides robust assessment of model performance while maximizing data utility. For NTA applications involving sample classification, performance metrics including balanced accuracy, precision, recall, and F1-score should be calculated for both internal cross-validation and external testing to comprehensively evaluate model efficacy [27] [5].

Assessment of Model Overfitting and Transferability

A primary objective of external dataset testing is evaluating model overfitting and transferability [27]. Overfitting occurs when models perform well on training data but poorly on new, unseen data, indicating limited generalizability [27]. The comparison of performance metrics between internal validation (e.g., cross-validation) and external testing provides critical insights into overfitting risks. Significant performance degradation with external datasets suggests overfitting and limited practical utility of the model [27].

Transferability assessment extends beyond simple performance metrics to evaluate model applicability across different environmental contexts [27]. This process involves testing models on datasets representing varying conditions, such as different sample matrices (e.g., water, sediment, biological tissues), seasonal variations, or geographical diversity [27]. Successful transferability demonstrates model robustness and enhances confidence in its application for environmental decision-making across diverse scenarios. The Benchmarking and Publications for Non-Targeted Analysis Working Group (BP4NTA) has emphasized the importance of transferability assessment for advancing the acceptability of NTA methods in regulatory contexts [20].

Table 2: Performance Metrics for External Dataset Validation in NTA

Metric	Calculation	Interpretation	Optimal Range
Balanced Accuracy	(Sensitivity + Specificity)/2	Overall classification performance accounting for class imbalance	>80%
Precision	True Positives / (True Positives + False Positives)	Proportion of correct positive identifications	Context-dependent
Recall (Sensitivity)	True Positives / (True Positives + False Negatives)	Ability to identify all relevant cases	Context-dependent
F1-Score	2 × (Precision × Recall) / (Precision + Recall)	Harmonic mean of precision and recall	>0.7
Cross-Validation Consistency	Performance stability across validation folds	Model robustness	Low variance

Tier 3: Environmental Plausibility Assessment

Protocols for Environmental Plausibility Evaluation

Environmental plausibility assessment constitutes the third validation tier, focusing on correlating model predictions with real-world contextual data [27]. This process involves integrating multiple lines of evidence to evaluate whether NTA findings align with known environmental patterns and processes [27]. Geospatial analysis represents a key methodological approach, examining the spatial distribution of detected compounds in relation to potential contamination sources, such as industrial facilities, agricultural areas, or wastewater treatment plants [27]. This analysis can reveal spatial gradients that support hypothesized source-receptor relationships.

The identification and evaluation of source-specific chemical markers provides another critical approach for environmental plausibility assessment [27]. This protocol involves identifying compounds with known associations to specific contamination sources (e.g., specific PFAS compounds associated with fire-fighting foams, pharmaceuticals indicative of wastewater influence) and evaluating whether their detection patterns align with expected source contributions [27]. The consistent co-detection of multiple markers from the same source type strengthens the plausibility of source attribution. Additionally, examining temporal patterns, such as seasonal variations in detection frequencies or concentrations, can provide further support for environmental plausibility when aligned with known use patterns or environmental processes [27].

Contextual Data Integration and Interpretation

Effective environmental plausibility assessment requires systematic integration of diverse contextual data types [27]. Land use information, industrial activity records, hydrological data, and known contamination histories should be compiled and analyzed in relation to NTA findings [27]. This integration enables assessment of whether detected chemical patterns logically correspond to potential influences in the study area. For example, detections of agricultural pesticides would be more plausible in areas with documented agricultural activity, while industrial compounds would be more expected near manufacturing facilities.

The interpretation of environmental plausibility involves both confirmatory and exploratory elements [27]. Confirmatory evaluation assesses whether NTA results align with pre-existing knowledge and expectations, while exploratory analysis identifies novel patterns that may reveal previously unrecognized contamination sources or pathways [27]. This dual approach ensures that environmental plausibility assessment neither simply confirms biases nor indiscriminately accepts all findings without critical evaluation. The BP4NTA working group emphasizes the importance of transparently reporting both supporting and contradictory evidence when presenting environmental plausibility assessments [20].

Figure 2: Environmental plausibility assessment framework for NTA, showing the integration of geospatial, chemical marker, temporal, and contextual data to evaluate the real-world relevance of NTA findings.

Implementation Guide for Tiered Validation

Integrated Workflow for Tiered Validation

Successful implementation of tiered validation in NTA requires a systematic, integrated workflow that coordinates activities across all three tiers [27]. The process should begin during study design, with planning for appropriate reference materials, external validation datasets, and contextual data collection [3]. Sample analysis should incorporate quality control materials that enable reference material verification, while data analysis protocols should explicitly include procedures for external validation and environmental plausibility assessment [27] [20].

The workflow should emphasize iterative evaluation across validation tiers [27]. Initial reference material verification establishes foundation confidence in compound identities, which then supports meaningful external dataset testing [27]. Results from both laboratory-based tiers subsequently inform environmental plausibility assessment, which may identify needs for additional reference material verification or model refinement [27]. This iterative approach ensures continuous refinement of NTA results and enhances overall confidence in findings. Documentation of all validation activities, including materials used, methodologies applied, and results obtained, is essential for transparent reporting and stakeholder acceptance [20].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Research Reagents and Materials for Tiered Validation in NTA

Reagent/Material	Application	Function in Validation	Implementation Considerations
Certified Reference Materials (CRMs)	Tier 1: Reference Material Verification	Confirm compound identities through retention time and fragmentation spectrum matching [27]	Select CRMs representative of expected contaminant classes; include isotopically labeled analogs when possible
Quality Control Samples	All Tiers	Monitor instrument performance and data quality throughout analytical process [27] [20]	Include procedural blanks, solvent blanks, and matrix spikes in each analytical batch
Spectral Libraries	Tier 1: Reference Material Verification	Support compound identification through mass spectral comparison [27] [22]	Use curated, domain-specific libraries (e.g., for PFAS, pharmaceuticals, pesticides)
Independent Validation Datasets	Tier 2: External Dataset Testing	Assess model generalizability and robustness [27]	Secure datasets representing different temporal periods, geographical areas, or sample matrices
Contextual Data Resources	Tier 3: Environmental Plausibility Assessment	Correlate chemical patterns with potential sources and environmental factors [27]	Compile land use records, industrial facility data, hydrological information, and known contamination history
Internal Standards	Tier 1: Reference Material Verification	Monitor analytical performance and correct for matrix effects [27]	Select compounds not expected in samples but with similar chemical properties to analytes of interest

The implementation of a systematic tiered validation framework addressing reference material verification, external dataset testing, and environmental plausibility assessment represents a critical advancement for non-targeted analysis using high-resolution mass spectrometry [27]. This comprehensive approach directly addresses key uncertainties in NTA results, enhancing their utility for environmental decision-making [27] [5]. By integrating these complementary validation strategies, researchers can bridge the gap between analytical capability and practical application, supporting more effective contamination source identification, risk assessment, and environmental management [27].

As NTA methodologies continue to evolve, further refinement of tiered validation approaches remains essential [27]. Promising directions include the development of more comprehensive reference material collections, standardized protocols for external validation, and advanced computational methods for environmental plausibility assessment [27] [20]. Community-wide adoption of systematic validation frameworks, as promoted by initiatives such as the BP4NTA working group, will accelerate the transition of NTA from a research tool to a reliable approach for supporting environmental protection and public health decisions [20]. Through continued refinement and implementation of tiered validation strategies, the environmental research community can fully realize the potential of NTA for addressing complex contamination challenges.

Benchmarking Studies and Interlaboratory Comparison Initiatives

In the rapidly evolving field of non-targeted analysis (NTA) using high-resolution mass spectrometry (HRMS), benchmarking studies and interlaboratory comparisons have emerged as indispensable tools for advancing methodological rigor and data reliability. These initiatives address a fundamental challenge in NTA: the inherent uncertainty in identifying and quantifying unknown chemicals across different laboratories, instruments, and data processing workflows [14]. Unlike traditional targeted methods that benefit from well-established performance metrics and standardized protocols, NTA generates information-rich data where results are often ambiguous—if an analyst reports a chemical present, it may actually be absent (e.g., an isomer or incorrect identification), and if reported absent, it may actually be present [14]. This uncertainty has prevented broader adoption of NTA data in regulatory decision-making, creating an urgent need for community-wide efforts to establish performance assessment frameworks [14].

Interlaboratory studies (ILS) represent a strategic response to these challenges, enabling the environmental chemistry community to evaluate reproducibility, identify sources of variability, and agree on harmonized quality control procedures [76]. The growing recognition of this need is evidenced by initiatives such as the NTA collaborative trial organized by the Norman Network, which has promoted clear reporting strategies for confidence levels in identifying chemicals of emerging concern (CECs) in complex environmental samples [76]. Similarly, the Benchmarking and Publications for Non-Targeted Analysis (BP4NTA) working group has formed specifically to address challenges in NTA studies through community-driven subcommittees focused on education, study planning, PFAS analysis, and gas chromatography applications [77]. These collaborative frameworks are essential for transitioning NTA from a research tool to a reliable approach for chemical monitoring and risk assessment.

Key Initiatives and Their Structural Frameworks

The Norman Network Interlaboratory Study

One of the most comprehensive benchmarking efforts documented in literature involved 21 participants from 11 countries organized through the Norman Network [76]. This study was strategically designed to assess uncertainty in identified compounds caused by different NTA workflows and spectral databases. The experimental design employed a passive sampling strategy of river water at a drinking water intake and post-treatment drinking water, enabling identification of substances present in source water, present in finished drinking water, removed during treatment, and generated during treatment processes [76].

The study architecture required all participants to analyze identical "ready for injection" samples consisting of two surface water and two drinking water passive sampler extracts using two distinct approaches: (1) a pre-defined method with detailed instructions on LC separation and MS data acquisition, and (2) individually developed in-house methods that reflected each laboratory's established protocols [76]. This dual approach enabled systematic assessment of method transfer, chromatography, data acquisition, and data processing impacts on the detectable chemical space. Participants provided raw data, converted files, raw feature lists, and their top 50 identified features with confidence levels based on the Schymanski scale—a tiered system for reporting identification confidence [76].

BP4NTA Subcommittee Structure and Focus Areas

The BP4NTA working group has established a structured subcommittee system to address specific challenges in NTA method development and standardization [77]:

Table: BP4NTA Subcommittees and Their Primary Functions

Subcommittee	Primary Focus	Key Activities
Educational Subcommittee	Knowledge dissemination	Developing upper-level college courses on NTA; maintaining literature references; creating educational materials
Study Planning Tool (SPT)	Method standardization	Developing tools for designing high-quality NTA studies; creating SOPs; quality assurance plans
PFAS Subcommittee	PFAS-specific challenges	Facilitating cross-sharing of PFAS spectral libraries; drafting quality control recommendations
GC NTA Subcommittee	Gas chromatography applications	Sharing resources and methods; providing direction on tool development for GC-based NTA

These subcommittees address critical gaps in NTA standardization, particularly through developing guidance for transitioning between targeted and non-targeted analysis in both industrial and academic research settings [77]. The monthly meetings, forums, and collaborative publications facilitate ongoing community engagement and knowledge sharing essential for methodological advancement.

Quantitative Performance Assessment Frameworks

Defining NTA Study Objectives and Performance Metrics

A critical foundation for benchmarking studies is the clear categorization of NTA research objectives. According to performance assessment literature, most NTA projects fall into three distinct categories that determine appropriate evaluation metrics [14]:

Sample Classification: Focused on discriminating between sample groups based on chemical patterns
Chemical Identification: Aimed at confidently identifying unknown chemical structures
Chemical Quantitation: Designed to provide concentration estimates for discovered chemicals

For qualitative studies (sample classification and chemical identification), performance can be assessed using adaptations of the traditional confusion matrix, though with recognized challenges and limitations [14]. The Schymanski scale provides a standardized framework for reporting identification confidence levels, ranging from level 1 (confirmed structure) to level 5 (exact molecular formula unknown) [76]. For quantitative NTA studies, performance assessment can utilize estimation procedures developed for targeted methods, but must account for additional sources of uncontrolled experimental error [14].

Key Performance Metrics from Interlaboratory Studies

The Norman Network ILS provided valuable quantitative data on reproducibility and confidence in chemical identification across multiple laboratories. While the study highlighted significant variability in identified features across different workflows, it also demonstrated the value of standardized reporting frameworks and raw data sharing for improving reproducibility [76].

Table: Performance Metrics for NTA Method Evaluation

Performance Dimension	Assessment Approach	Challenges in NTA Context
Selectivity	Ability to differentiate chemical species from interferents	Difficult without reference standards; isomer distinction problematic
Sensitivity	Limit of detection (LOD) for chemical signals	Varies by compound; requires estimation without standards
Accuracy	Agreement between reported and true values	Challenging for identification without confirmed standards
Precision	Consistency across multiple measurements	Affected by instrumentation, data processing, and operator variability
Reproducibility	Consistency across different laboratories	Impacted by methodological differences; focus of ILS

The integration of machine learning (ML) approaches shows particular promise for enhancing several aspects of NTA performance, including chemical structure identification, advanced quantification methods, and toxicity prediction capabilities [1]. However, challenges remain in refining ML tools for complex environmental mixtures and improving inter-laboratory validation [1].

Experimental Design and Methodologies for Benchmarking

Sample Preparation and Experimental Framework

The Norman Network study implemented a rigorous sample preparation protocol to ensure consistency across participating laboratories [76]. Horizon Atlantic HLB-L disks (47 mm diameter) were deployed for integrative sampling at both input and output of a drinking water treatment plant with exposure times of 2 and 4 days [76]. To increase sampling rates, passive samplers were placed in dynamic passive sampling devices (DPS) consisting of electrically driven large-volume water pumping systems coupled to exposure cells [76].

The sample processing methodology included:

Freeze-drying exposed HLB disks for 48 hours to remove water residues
Spiking with six isotope-labeled internal standards prior to extraction
Three consecutive extractions with 200 mL of acetone for 24 hours each
Solvent exchange to methanol following US EPA Method 3570 [76]
Pooling extracts from 13 different passive samplers to create homogeneous samples
Inclusion of procedural blanks from field blanks not exposed to water

This comprehensive approach generated samples with equivalent water volumes of 4.0-8.7 liters per vial, enabling trace-level detection of contaminants while minimizing matrix effects [76].

Analytical Conditions and Data Acquisition Parameters

The interlaboratory study allowed assessment of both predefined methods and in-house protocols, revealing how analytical conditions influence the detectable chemical space [76]. The predefined method specified detailed conditions for liquid chromatography separation and mass spectrometric data acquisition, while participating laboratories used their own established in-house methods for comparison [76].

The instrumentation landscape across 21 laboratories included various HRMS systems, with the most common being:

Time-of-flight (TOF) mass spectrometers
Orbitrap-based systems (noted for good sensitivity, accuracy, and versatility) [11]
Fourier transform ion cyclotron resonance (FT-ICR) instruments [11]

Among separation techniques, reversed-phase liquid chromatography (RPLC) was predominantly employed, potentially introducing selection bias against polar, highly polar, and ionic compounds [40]. This bias represents a significant methodological consideration for comprehensive chemical space assessment.

NTA Benchmarking Workflow

Essential Research Tools and Reagents for NTA Benchmarking

Implementing robust NTA benchmarking studies requires specific instrumentation, analytical tools, and reference materials. The following table summarizes key components of the NTA research toolkit based on current methodologies and interlaboratory studies:

Table: Essential Research Toolkit for NTA Benchmarking Studies

Tool Category	Specific Examples	Function in NTA Benchmarking
HRMS Instrumentation	Orbitrap, TOF, FT-ICR [11]	High-resolution mass measurement for accurate compound identification
Separation Techniques	RPLC, HILIC, GC [40] [19]	Compound separation prior to MS analysis; different techniques cover complementary chemical spaces
Passive Sampling Media	HLB disks, Silicone sheets [76]	Time-integrated sampling and preconcentration of contaminants from water
Internal Standards	Isotope-labeled compounds (Caffeine-13C3, Carbamazepine-D10) [76]	Retention time modeling and quality control
Performance Reference Compounds	14 PRCs for silicone sheets [76]	Calibration of sampling rates and mass transfer coefficients
Data Processing Software	Vendor software (Compound Discoverer, MassHunter); Open-source platforms (MzMine, MS-DIAL) [19]	Feature detection, peak alignment, and compound identification
Spectral Libraries	MassBank EU, MoNA, NIST [76] [19]	Reference spectra for compound identification
Confidence Assessment	Schymanski scale [76]	Standardized framework for reporting identification confidence levels

The selection of appropriate tools significantly influences the detectable chemical space, with studies showing that 51% of NTA investigations use only LC-HRMS, 32% use only GC-HRMS, and 16% employ both techniques to expand coverage of chemical properties [19]. This distribution highlights a critical methodological consideration, as exclusive use of RPLC-HRMS may systematically exclude certain classes of polar and ionic compounds [40].

Analytical Considerations and Methodological Gaps

Critical Factors Influencing Benchmarking Outcomes

The design and interpretation of NTA benchmarking studies must account for several analytical factors that significantly impact outcomes and reproducibility:

Extraction and Cleanup Bias: Sample preparation protocols, particularly solid-phase extraction (SPE) choices, strongly influence which chemical classes are recovered and detected. Different sorbents exhibit selective retention for compounds with specific physicochemical properties, potentially introducing systematic biases [40] [19].
Chromatographic Selectivity: The overrepresentation of reversed-phase liquid chromatography (RPLC) in NTA studies contributes to selective exclusion of polar, highly polar, and ionic compounds from analysis [40]. This limitation can be mitigated by incorporating alternative separation mechanisms such as hydrophilic interaction liquid chromatography (HILIC) or gas chromatography (GC).
Ionization Efficiency Variability: The choice of ionization techniques (e.g., ESI+, ESI-, APCI, EI) significantly impacts the detectable chemical space. Studies show 43% of LC-HRMS applications use both ESI+ and ESI-, while 18% use only ESI+, and 22% use only ESI- [19], creating substantial differences in coverage.
Data Processing Inconsistencies: The predominance of vendor-specific software (used in 57 studies) over open-source platforms (used in 7 studies) introduces challenges for method standardization and reproducibility [19]. Algorithmic differences in peak picking, alignment, and identification contribute to interlaboratory variability.

Current Limitations and Research Needs

Despite considerable progress, significant methodological gaps remain in NTA benchmarking:

Spectral Library Limitations: Severe limitations exist for liquid chromatography spectral databases compared to GC-MS libraries, with insufficient resources for spectra matching [40] [19]. This bottleneck partially explains why many detected features remain unidentified in NTA studies.
Quantitative Uncertainty: While qualitative NTA has advanced significantly, quantitative NTA (qNTA) approaches still lack standardized approaches for addressing estimation uncertainty, particularly regarding experimental recovery effects [28].
Integration with Risk Assessment: Limited frameworks exist for incorporating NTA data into formal risk assessment paradigms, despite the potential for NTA to bridge contaminant discovery and risk characterization [28].
Effect-Directed Analysis Integration: Combining NTA with effect-directed analysis (EDA) shows promise for identifying toxicity drivers, with studies reporting NTA explains a median of 34% of observed toxicity compared to 8.86% for targeted analysis alone [40]. However, standardized approaches for this integration remain underdeveloped.

NTA Performance Assessment Framework

Future Perspectives and Concluding Remarks

Benchmarking studies and interlaboratory comparisons represent fundamental pillars in the advancement of reliable NTA methods for environmental monitoring and exposure assessment. The growing body of research demonstrates that while significant variability exists across laboratories and methodologies, structured collaborative efforts can progressively improve reproducibility and confidence in results [76]. The future trajectory of NTA benchmarking will likely focus on several critical areas:

First, the integration of machine learning and artificial intelligence holds substantial promise for enhancing chemical structure identification, quantification accuracy, and toxicity prediction capabilities [1]. ML approaches may help address current bottlenecks in data processing and interpretation, particularly for complex environmental mixtures. Second, the development of standardized performance assessment protocols through initiatives like BP4NTA will be essential for translating NTA from a research tool to a reliable approach for regulatory decision-making [14] [77]. This includes establishing agreed-upon metrics for sensitivity, selectivity, accuracy, and precision in the NTA context.

Finally, the environmental chemistry community must address the critical need for expanded spectral libraries, particularly for LC-HRMS applications, and multimethod approaches that combine complementary analytical techniques to overcome the limitations of any single method [40] [19]. As these advancements mature, benchmarking studies will continue to provide the essential foundation for assessing progress, identifying persistent challenges, and directing future research investments toward the shared goal of comprehensive chemical exposure assessment.

Conclusion

Non-targeted analysis with high-resolution mass spectrometry represents a paradigm shift in analytical science, enabling comprehensive characterization of complex chemical mixtures beyond predefined targets. Successful implementation requires meticulous attention throughout the entire workflow—from experimental design and data acquisition to advanced interpretation and validation. The integration of machine learning, development of standardized reporting frameworks, and advancement of quantitative approaches are rapidly addressing historical limitations. As these methodologies mature, NTA is poised to transform exposure science, biomarker discovery, and environmental monitoring by providing unprecedented insights into previously uncharacterized chemical spaces. Future directions should focus on enhancing quantitative rigor, improving interoperability across platforms, establishing standardized performance criteria, and expanding applications in clinical and public health decision-making contexts.