HRMS Retention Time Correction and Alignment: A Comprehensive Guide for Reliable Metabolomics and Proteomics Data

Isaac Henderson Dec 02, 2025 599

This article provides a comprehensive overview of retention time (RT) correction and alignment for Liquid Chromatography-High-Resolution Mass Spectrometry (LC-HRMS) data, a critical preprocessing step in untargeted metabolomics and proteomics.

HRMS Retention Time Correction and Alignment: A Comprehensive Guide for Reliable Metabolomics and Proteomics Data

Abstract

This article provides a comprehensive overview of retention time (RT) correction and alignment for Liquid Chromatography-High-Resolution Mass Spectrometry (LC-HRMS) data, a critical preprocessing step in untargeted metabolomics and proteomics. Aimed at researchers, scientists, and drug development professionals, it covers foundational concepts explaining the sources and impacts of RT variability. The guide details methodological approaches, from traditional warping functions to advanced deep learning and multi-way analysis tools like ROIMCR and metabCombiner. It further offers practical troubleshooting strategies for common challenges and a comparative analysis of software performance to enhance data quality, ensure reproducibility, and unlock robust biological insights in large-cohort studies.

Understanding Retention Time Variability: The Critical Foundation for HRMS Data Integrity

Why RT Alignment is a Non-Negotiable Preprocessing Step in Untargeted HRMS

In untargeted High-Resolution Mass Spectrometry (HRMS), retention time (RT) alignment serves as a foundational preprocessing step that directly determines the reliability and accuracy of downstream analytical results. Liquid Chromatography coupled to HRMS (LC-HRMS) has become a premier analytical technology owing to its superior reproducibility, high sensitivity, and specificity [1]. However, the comparability of measurements carried out with different devices and at different times is inherently compromised by RT shifts resulting from multiple factors, including matrix effects, instrument performance variations, column aging, and contamination [2] [3]. These technical variations create substantial analytical noise that can obscure biological signals and lead to erroneous conclusions if not properly corrected.

The fundamental challenge in untargeted HRMS analysis lies in distinguishing between analytical artifacts and true biological variation across multiple samples. Without robust RT alignment, corresponding analytes cannot be accurately mapped across sample runs, fundamentally undermining the quantitative and comparative analysis that forms the basis of metabolomic, proteomic, and food authentication studies [2] [3]. This correspondence problem—finding the "same compound" in multiple samples—becomes increasingly critical as cohort sizes grow, making RT alignment not merely an optimization step but an essential prerequisite for meaningful data interpretation.

The Critical Impact of RT Variation on Data Quality

Consequences of Poor RT Alignment

The ramifications of inadequate RT alignment permeate every subsequent stage of HRMS data analysis. In feature detection and quantification, misalignment leads to inconsistent peak matching, where the same metabolite is incorrectly identified as different features across samples or different metabolites are erroneously grouped together. This directly compromises data integrity by introducing false positives and negatives in differential analysis [3]. The problem is particularly acute in large-scale studies where samples are analyzed over extended periods or across multiple instruments.

In machine learning applications for sample classification, unaddressed RT shifts substantially reduce model accuracy and generalizability. For instance, in geographical origin authentication of honey, RT variations between analytical batches can overshadow true biological variation, leading to misclassification and reduced predictive performance [2]. Similarly, in clinical biomarker discovery, poor alignment can obscure subtle but statistically significant metabolic changes, preventing the identification of crucial disease indicators. The absence of proper RT alignment thus represents a critical bottleneck in the translation of HRMS data into biologically or clinically meaningful insights.

Quantitative Impacts on Metabolite Identification

Table 1: Impact of RT Alignment on Metabolite Detection and Quantification

Parameter	Without RT Alignment	With Proper RT Alignment	Improvement
Feature Consistency	High variance across runs	Low variance across runs	>70% reduction in technical variation [3]
Missing Values	30-50% missing data in feature table	<10% missing values	60-80% reduction [2]
Quantitative Accuracy	RSD >20-30%	RSD <10-15%	>50% improvement [2]
Identification Sensitivity	Limited to high-abundance features	Comprehensive including low-abundance	25% increase in detected features [3]

Methodological Approaches to RT Alignment

Established Alignment Methods

Current RT alignment methodologies predominantly fall into two categories: warping function methods and direct matching methods. Warping models correct RT shifts between runs using linear or non-linear warping functions, with popular tools including XCMS and MZmine 2 employing this approach [3]. These methods establish mathematical functions that transform the RT space of one sample to match another, effectively stretching or compressing the chromatographic timeline to maximize overlap between corresponding features. While effective for monotonic shifts (where the direction of RT drift is consistent across the separation), these methods struggle with non-monotonic shifts commonly encountered in complex sample matrices.

Direct matching methods attempt to perform correspondence solely based on similarity between specific signals from run to run without a warping function. Representative tools include RTAlign and MassUntangler, which rely on sophisticated algorithms to identify corresponding features across samples through multidimensional similarity measures [3]. While offering potential advantages for non-monotonic shifts, these methods have traditionally demonstrated inferior performance compared to warping function approaches due to uncertainties in MS signals and computational intensity, particularly with large datasets.

Emerging Deep Learning Approaches

Recent advances in deep learning have enabled the development of hybrid approaches that overcome limitations of traditional methods. DeepRTAlign represents one such innovation, combining a coarse alignment (pseudo warping function) with a deep neural network-based direct matching model [3]. This architecture can simultaneously address both monotonic and non-monotonic RT shifts, leveraging the strengths of both methodological paradigms.

The DeepRTAlign workflow begins with precursor detection and feature extraction, followed by coarse alignment that linearly scales RT across samples and applies piecewise correction based on average RT shifts within defined windows [3]. Subsequently, features are binned by m/z, and input vectors constructed from RT and m/z values of target features and their neighbors are processed through a deep neural network classifier that distinguishes between feature pairs that should or should not be aligned. This approach has demonstrated superior performance across multiple proteomic and metabolomic datasets, improving identification sensitivity without compromising quantitative accuracy [3].

Experimental Protocols for RT Alignment

Protocol 1: Standard Warping Function Alignment with XCMS

Principle: This protocol uses a non-linear warping function to correct RT shifts by aligning chromatographic peaks across samples through dynamic time warping algorithms. The approach is particularly effective for monotonic RT drifts commonly observed in batch analyses [2].

Materials and Reagents:

LC-HRMS system (e.g., Thermo Scientific Q Exactive series)
Chromatography column (e.g., Hypersil Gold C18, 150 × 2.1 mm)
Mobile phases: water and acetonitrile with 0.1% formic acid
Quality control (QC) samples: pooled from all experimental samples
R software environment with XCMS package installed

Procedure:

Data Conversion: Convert raw MS files to open-format mzML files using MSConvert (ProteoWizard software package) with the peakPicking filter to convert profile mode data to centroid mode [2].
Parameter Optimization: Set key XCMS parameters for feature detection: peakwidth = c(5,20), snthresh = 10, noise = 1000, and prefilter = c(3,1000).
Peak Detection: Perform chromatographic peak detection using the matchedFilter or centWave algorithm optimized for your instrument type and resolution.
Initial Alignment: Use the retcor function with the obiwarp method for initial RT correction with the following parameters: profStep = 1, center = 3, and response = 1.
Peak Grouping: Apply the group function to group corresponding peaks across samples with bw = 5 (bandwidth) and mzwid = 0.015 (m/z width).
Fill Peaks: Use the fillPeaks function to integrate signal in regions where peaks were detected in some but not all samples.
Quality Assessment: Evaluate alignment quality by examining RT deviation plots and the number of overlapping features in QC samples.

Troubleshooting:

If alignment fails for specific samples, increase the bw parameter to allow for greater RT flexibility.
For poor peak matching, adjust mzwid according to your instrument's mass accuracy (typically 0.01-0.05 for high-resolution instruments).
If processing time is excessive, increase profStep to 2, though this may reduce alignment precision.

Protocol 2: Deep Learning-Based Alignment with DeepRTAlign

Principle: This protocol employs a deep neural network to learn complex RT shift patterns from the data itself, enabling correction of both monotonic and non-monotonic shifts without relying solely on warping functions [3].

Materials and Reagents:

LC-HRMS data from large cohort studies (≥100 samples recommended)
Linux operating system (required for DeepRTAlign)
Python (≥3.8) with PyTorch (≥1.8.0) and required dependencies
Sufficient computational resources (≥16GB RAM, GPU recommended)

Procedure:

Software Setup: Install DeepRTAlign from https://github.com/AGSeifert/BOULS following the provided installation instructions for Linux systems [2].
Feature Extraction: Run XICFinder to detect precursors and extract features from raw MS files using a mass tolerance of 10 ppm for isotope pattern detection [3].
Coarse Alignment: Perform linear RT scaling to a standardized range (e.g., 80 minutes) and divide samples into 1-minute windows for piecewise correction against an anchor sample.
Binning and Filtering: Group features by m/z using default parameters (bin_width = 0.03, bin_precision = 2) and optionally filter to retain only the highest intensity feature in each m/z window.
Model Application: Process the binned features through the pre-trained DeepRTAlign deep neural network classifier (three hidden layers, 5000 neurons each) [3].
Quality Control: Run the built-in QC module to calculate the false discovery rate (FDR) of alignment results using decoy samples.
Result Validation: Compare the number of aligned features with and without DeepRTAlign and assess quantitative consistency in QC samples.

Troubleshooting:

If training a new model is necessary, collect 400,000 feature-feature pairs (200,000 positive, 200,000 negative) from identification results as ground truth [3].
For suboptimal performance on specific datasets, adjust the binning parameters (bin_width and bin_precision) to better match your instrument's precision.
If processing large datasets, consider increasing the RT window size from 1 minute to reduce computational load at the potential cost of alignment precision.

Protocol 3: Bucketing Approach for Large-Scale Studies (BOULS)

Principle: The BOULS (Bucketing Of Untargeted LCMS Spectra) approach enables separate processing of untargeted LC-HRMS data obtained from different devices and at different times through retention time alignment to a central spectrum and a 3D bucketing step [2].

Materials and Reagents:

Multiple LC-HRMS instruments (e.g., minimum of 2-3 systems for validation)
Hydrophilic interaction liquid chromatography (HILIC) and reverse phase (RP) columns
Mobile phase additives: acetic acid for negative ion mode, formic acid for positive ion mode
Reference standards for system suitability testing

Procedure:

Data Acquisition: Perform LC-HRMS analysis using consistent chromatographic methods across instruments. For honey authentication, use HILIC in negative ion mode for polar compounds and RP in positive ion mode for non-polar compounds [2].
Central Spectrum Selection: Choose a high-quality representative sample as the central reference spectrum for all subsequent alignments.
Retention Time Alignment: Align all samples to the central spectrum using the established xcms workflow with modified parameters for consistency across instruments.
Bucketing Implementation: Divide the aligned spectrum into three-dimensional buckets (retention time, m/z, and feature intensity) summing up the total intensity of signals within each bucket.
Data Integration: Compile the bucketed data into a unified feature table without requiring batch correction, feature identification, or feature matching for successful classification [2].
Model Building: Apply machine learning algorithms (e.g., Random Forest) to the compiled data for sample classification, using out-of-bag error estimation for validation.
Continuous Learning: Implement a framework where newly acquired spectra can be classified and added to the training dataset without re-evaluation of the entire dataset.

Troubleshooting:

If inter-instrument variation remains high, increase the number of QC samples analyzed across all instruments to improve alignment.
For poor classification accuracy, optimize the bucket size to balance resolution and signal-to-noise ratio.
When adding new instruments to the workflow, ensure sufficient system suitability testing and cross-calibration with existing systems.

Table 2: Research Reagent Solutions for HRMS RT Alignment Studies

Reagent/Category	Function in RT Alignment	Application Examples	Technical Notes
HILIC Column (Accucore-150-Amide-HILIC)	Separation of polar compounds	Honey origin authentication [2]	Use in negative ion mode with acetic acid modifier
RP Column (Hypersil Gold C18)	Separation of non-polar compounds	Meat authentication [1]	Use in positive ion mode with formic acid modifier
Sorbic Acid Solution (2-10% in ACN-water)	Internal standard for normalization	Inter-instrument alignment [2]	Concentration varies by ion mode (2% positive, 10% negative)
QC Samples (Pooled from all study samples)	Monitoring system performance	Large cohort studies [2] [3]	Analyze at regular intervals throughout sequence
Trypsin (BioReagent grade)	Protein digestion for proteomic alignment	Meat speciation studies [1]	Use 1.0 mg/mL solution, incubate overnight at 37°C

Analytical Workflow Integration

Validation and Quality Control

Establishing robust quality control measures is essential for verifying RT alignment effectiveness. The coefficient of variation (CV) for internal standards should be <15% in QC samples, with >75% of aligned features demonstrating RT deviations <0.1 minutes across technical replicates [2]. Multivariate analysis tools such as Principal Component Analysis (PCA) should show tight clustering of QC samples regardless of analytical batch, indicating successful removal of technical variation.

For the BOULS approach, validation includes demonstrating that classification models maintain accuracy >90% when applied to data from different instruments and timepoints [2]. With DeepRTAlign, quality control involves calculating the false discovery rate (FDR) of alignment results using decoy samples, with successful implementations achieving FDR <1% while increasing feature identification by 15-25% compared to traditional methods [3].

Retention time alignment stands as a non-negotiable preprocessing step in untargeted HRMS, forming the critical bridge between raw instrumental data and biologically meaningful results. As HRMS applications expand toward large-cohort studies, multi-center investigations, and continuous learning models, robust RT alignment becomes increasingly fundamental to data integrity. The development of sophisticated methods like DeepRTAlign and BOULS represents significant advances in addressing both monotonic and non-monotonic shifts while enabling cross-platform and cross-temporal data integration. Implementation of rigorous, validated RT alignment protocols ensures that the full analytical power of modern HRMS platforms is realized in research and diagnostic applications.

Retention time (RT) stability is a cornerstone of reliable liquid chromatography-high-resolution mass spectrometry (LC-HRMS) analysis in untargeted metabolomics, proteomics, and environmental screening. RT shifts, defined as non-biological variations in the elution time of an analyte, can severely compromise feature alignment, quantitative accuracy, and compound identification across large cohort studies [3] [4]. Within the broader context of HRMS data preprocessing research, understanding and correcting these shifts is paramount for data integrity. The primary sources of these shifts can be categorized into instrumental variations, column-related factors, and batch effects. This application note delineates these key sources and provides detailed protocols for their diagnosis and correction, leveraging the latest research and methodologies.

The following table summarizes the core sources of RT shifts and their quantitative impact on data analysis, as evidenced by recent benchmarking studies.

Table 1: Key Sources of Retention Time Shifts and Their Impacts

Source Category	Specific Source	Demonstrated Impact
Instrument	Mass Accuracy Drift	Mass error >3 ppm can cause failure in MS2 selection and molecular formula assignment [5].
	Time Since Calibration	Positive mode exhibits higher mass accuracy and precision than negative mode [5].
Column	Mobile Phase pH & Chemistry	Most impactful factors for the accuracy of retention time projection and prediction models [6].
	Column Hardware Inertness	Inert hardware enhances peak shape and analyte recovery for metal-sensitive compounds like phosphorylated molecules [7].
Batch Effects	Confounded Batch-Batch Effects	Batch effects confounded with biological groups pose a major challenge, requiring specific correction algorithms [8].
	Data Preprocessing Software	Different software (e.g., MZmine, XCMS, MS-DIAL) select different features as statistically important, significantly affecting downstream results [9].

Experimental Protocols for Assessing and Correcting RT Shifts

Protocol: High-Resolution Accurate Mass System Suitability Test (HRAM-SST)

This protocol evaluates instrumental mass accuracy, a prerequisite for reliable RT alignment, and is adapted from recent methodology [5].

1. Reagent Preparation:

Prepare a mixture of 13 reference standards covering both ionization modes and a range of chemical families, polarities, and m/z values (e.g., Acetaminophen, Caffeine, Verapamil).
Create a stock solution at 2.5 μg/mL in methanol and store at -20°C.
For each injection, prepare a working solution at 50 ng/mL in methanol.

2. Instrumental Analysis:

Inject the HRAM-SST working solution at the beginning and end of every sample analysis batch.
Use the same chromatographic column and mobile phases as the analytical method.
Perform the LC-HRMS analysis using the standard data acquisition method.

3. Data Processing and Acceptance Criteria:

For each SST injection, check the mass accuracy for all 13 compounds.
A successful calibration requires a mass error below 3 ppm for the majority of compounds.
If mass accuracy exceeds this threshold, consider system recalibration before proceeding with sample acquisition.

Protocol: Evaluating Data Preprocessing Software for Untargeted Metabolomics

This protocol outlines a comparative approach for selecting preprocessing software, a significant source of variation in feature detection and RT alignment [9].

1. Experimental Design:

Analyze a defined set of samples (e.g., 40 cancer patient urines vs. 40 healthy control urines) using UHPLC-HRMS.

2. Data Preprocessing:

Process the identical raw dataset through multiple preprocessing software tools (e.g., MZmine, XCMS, MS-DIAL, iMet-Q, Peakonly).
Use default or optimized parameters for peak picking, integration, and alignment in each software.

3. Comparative Metrics:

Peak Integration Quality: Compare the number of detected features and integration consistency against manual curation.
Statistical Analysis: Apply consistent statistical tests (e.g., Mann-Whitney U test) to the feature lists from each software.
Model Accuracy: Use top features from each software to build diagnostic models (e.g., using logistic regression) and compare their accuracies.

4. Software Selection:

Prefer software that demonstrates peak integration comparable to manual methods, high feature detection sensitivity, and yields diagnostic models with superior performance. Studies suggest MS-DIAL and iMet-Q often perform well for both identification and statistical analysis [9].

Protocol: Retention Time Index Projection Using a Generalized Additive Model

This protocol describes a method to project RTs from a public database to a specific chromatographic system, addressing column and mobile phase-induced shifts [6].

1. Data Collection:

Analyze a set of calibration chemicals (e.g., 41 compounds) and suspect chemicals (e.g., 45 compounds) on both the source (CS~source~) and target (CS~NTS~) chromatographic systems.

2. Retention Time Index (RTI) Calculation:

For each chromatographic system, calculate the RTI for every detected compound using the formula: RTI = (RT~compound~ - RT~first calibrant~) / (RT~last calibrant~ - RT~first calibrant~) × 1000
This scaling normalizes RTs to a range of 0–1000, accounting for differences in flow rate and column length [6].

3. Model Fitting and Projection:

Using the calibration chemicals detected on both systems, fit a Generalized Additive Model (GAM) between the RTIs from CS~source~ and CS~NTS~.
Apply the fitted GAM to the "known" RTIs of suspect chemicals from the CS~source~ to project their expected RTIs in the CS~NTS~.

4. Performance Evaluation:

The accuracy of this projection is directly linked to the similarity (e.g., mobile phase pH, column chemistry) between the CS~source~ and CS~NTS~ [6].

Workflow Visualization for RT Shift Investigation

The following diagram illustrates a comprehensive workflow for diagnosing and addressing the key sources of RT shifts in an LC-HRMS data preprocessing pipeline.

The Scientist's Toolkit: Essential Reagents and Materials

The following table lists key reagents, materials, and software tools essential for experiments aimed at characterizing and correcting RT shifts.

Table 2: Research Reagent Solutions for RT Shift Analysis

Category	Item	Function & Application	Citation
Reference Standards	HRAM-SST Mixture (13 compounds)	Empirically confirms system mass accuracy readiness before/after sample batches.	[5]
	NORMAN Calibration Chemicals (41 compounds)	Enables retention time index (RTI) projection between different chromatographic systems.	[6]
Chromatography	Inert HPLC Columns (e.g., Halo Inert)	Passivated hardware minimizes analyte adsorption, improving peak shape and recovery for metal-sensitive compounds.	[7]
	C18, Biphenyl, and HILIC Columns	Provides alternative selectivity for method development and analyzing diverse compound classes.	[7]
Software & Algorithms	Data Preprocessing Tools (MS-DIAL, MZmine, XCMS)	Extracts features (m/z, RT, intensity) from raw LC-HRMS data; performance varies.	[9] [4]
	Batch-Effect Correction Algorithms (BECAs)	Removes unwanted technical variation. Protein-level correction with Ratio or Combat is often robust.	[8]
	Deep Learning Aligner (DeepRTAlign)	Corrects complex monotonic and non-monotonic RT shifts in large cohort studies.	[3]

The Impact of Uncorrected Data on Feature Matching and Biomarker Discovery

In liquid chromatography-mass spectrometry (LC-MS)-based proteomic and metabolomic experiments, retention time (RT) alignment is a critical preprocessing step for accurately matching corresponding features (e.g., peptides or metabolites) across multiple sample runs [10]. Uncorrected RT shifts, caused by matrix effects, instrument variability, and chromatographic column aging, introduce significant errors in feature matching. This directly compromises downstream statistical analysis and the sensitivity of biomarker discovery pipelines [3]. In large cohort studies, where thousands of features are tracked across hundreds of samples, the cumulative effect of even minor RT inconsistencies can obscure true biological signals, leading to both false positives and false negatives [11] [12]. This article details the quantitative impact of RT misalignment and provides structured protocols to enhance data quality for more reliable biomarker identification and validation.

The Critical Role of Retention Time Alignment

Liquid chromatography (LC), when coupled with mass spectrometry (MS), separates complex biological samples to reduce ion suppression and increase analytical depth. However, the retention time of the same analyte can vary between runs due to:

Matrix effects from complex biological samples like plasma or serum [3].
Instrument performance fluctuations, including pump pressure inconsistencies and column degradation [13].
Temperature changes and mobile phase composition variations [10].

When uncorrected, these RT shifts disrupt the correspondence process—the matching of the same compound across different samples [3]. This failure directly impacts biomarker discovery by:

Reducing Identification Sensitivity: In data-dependent acquisition (DDA) mode, only 15-25% of precursors are typically identified. Match Between Runs (MBR) algorithms rely on accurate RT alignment to transfer identifications from identified to unidentified precursors across runs. Poor alignment causes this transfer to fail, leading to a significant loss of data for subsequent analysis [3].
Compromising Quantitative Accuracy: Incorrect feature matching results in inaccurate quantification, as the abundance of a given feature is measured from inconsistent peaks across samples. This introduces noise and bias into the statistical models used to distinguish between patient groups (e.g., healthy vs. diseased) [11] [12].

Classification of Alignment Methods and Their Limitations

Computational methods for RT alignment fall into two primary categories, each with distinct strengths and weaknesses for handling different types of RT shifts [10] [3]:

Table 1: Categories of Retention Time Alignment Methods

Method Category	Principle	Representative Tools	Limitations
Warping Function	Corrects RT shifts using a linear or non-linear function applied to the entire chromatogram.	XCMS [14], MZmine 2 [10], OpenMS [11]	Struggles with non-monotonic shifts (local, direction-changing variations) because the warping function is inherently monotonic [3].
Direct Matching	Performs correspondence based on feature similarity (e.g., m/z, RT, intensity) without a global warping function.	RTAlign [13], MassUntangler [15]	Performance can be inferior due to uncertainty in MS signals when relying solely on feature similarity [3].

The fundamental limitation of existing traditional tools is their inability to handle both monotonic and non-monotonic RT shifts simultaneously, a common challenge in large-scale studies [3].

Quantitative Impact of Uncorrected RT Shifts

The performance of an alignment algorithm directly influences the number of true biological features that can be reliably quantified, which is the foundation of biomarker discovery.

Table 2: Performance Comparison of Alignment Tools on a Proteomic Dataset

Tool	Alignment Principle	True Positives Detected	False Discovery Rate (FDR)	Key Strength/Weakness
DeepRTAlign [3]	Deep Learning (Coarse alignment + DNN)	~95%	< 1%	Effectively handles monotonic and non-monotonic shifts.
Tool A [3]	Warping Function	~85%	~5%	Fails with complex, local RT shifts.
Tool B [3]	Direct Matching	~78%	~8%	Performance suffers from signal uncertainty.

Table 2 illustrates that advanced alignment methods can significantly increase the number of correctly aligned features, thereby expanding the pool of potential biomarkers available for downstream analysis. The use of poorly performing alignment tools directly translates into a loss of statistical power. In a typical untargeted metabolomics experiment, high-resolution mass spectrometers can limit m/z shifts to less than 10 ppm, making RT alignment the most variable parameter and thus the most critical for accurate feature matching [3]. The failure to align correctly results in a higher number of missing values across samples and reduces the ability of feature selection algorithms (e.g., Random Forest, SVM-RFE) to identify subtle but biologically significant changes, especially in the early stages of disease [11].

Protocols for Effective Retention Time Alignment

Protocol: Deep Learning-Based Alignment with DeepRTAlign

DeepRTAlign is a advanced tool that combines a coarse alignment with a deep neural network (DNN) to address complex RT shifts [3].

Workflow Diagram: DeepRTAlign

Step-by-Step Methodology:

Precursor Detection and Feature Extraction:
- Input: Raw MS files.
- Process: Use a feature detection tool (e.g., XICFinder, Dinosaur) to detect isotope patterns and group them into features across the retention time dimension [3].
- Parameters: A mass tolerance of 10 ppm is typically used in this step.
Coarse Alignment:
- Linearly scale the RT in all samples to a common range (e.g., 80 minutes).
- Divide all samples (except an anchor sample) into pieces by a fixed RT window (e.g., 1 minute).
- Calculate the average RT shift for features in each piece compared to the anchor sample.
- Apply the average shift to all features within each piece to achieve a rough, piecewise linear alignment [3].
Binning and Filtering:
- Group all features based on their m/z values using a defined bin_width (e.g., 0.03) and bin_precision (e.g., 2 decimal places).
- Optional: For each sample in each m/z bin, retain only the feature with the highest intensity within a user-defined RT range to reduce complexity [3].
Input Vector Construction:
- For a potential feature-feature pair from two different samples, construct a vector using the RT and m/z of the target feature and its two adjacent features (before and after in RT).
- The vector includes both original values and difference values between the two samples, which are then normalized.
- This results in a 5x8 vector that serves as the input to the DNN [3].
Deep Neural Network (DNN) Classification:
- Model: A DNN with three hidden layers (5,000 neurons each) acts as a binary classifier.
- Training: The model is trained on 400,000 feature-feature pairs (200,000 positive pairs from the same peptide, 200,000 negative pairs from different peptides) to distinguish between features that should and should not be aligned.
- Output: The model predicts whether a feature pair represents the same compound [3].
Quality Control:
- A decoy sample is created by randomly shuffling features within an m/z window.
- The false discovery rate (FDR) of the alignment is calculated based on the number of matches to the decoy sample, ensuring the reliability of the final aligned feature list [3].

Protocol: Traditional Warping-Based Alignment

For laboratories using established warping methods, the following protocol outlines key steps and considerations.

Workflow Diagram: Warping-Based Alignment

Step-by-Step Methodology:

Peak Picking:
- Process: Process raw MS data to detect chromatographic peaks and assemble them into features characterized by m/z, RT, and intensity.
- Tools: This is a standard function in packages like XCMS and MZmine 2 [14] [10].
Landmark Selection:
- Identify a set of corresponding features ("landmarks") present across all or most samples. These can be either:
  - Internal standards spiked into each sample.
  - Ubiquitous endogenous features with high intensity and consistent MS2 spectra [10].
Warping Function Calculation:
- Model: Establish a mathematical function that maps the RT of a sample run to a reference run (e.g., a pooled quality control sample or the first run).
- Algorithms: Use methods like Correlation Optimized Warping (COW) or Dynamic Time Warping (DTW) to determine the optimal warping function based on the landmarks [10].
RT Transformation:
- Apply the calculated warping function to the RT of every feature in the sample, thereby adjusting its position to align with the reference run.

Considerations: This method works well for simple, monotonic drifts but will perform poorly if non-monotonic shifts are present, as the warping function cannot correct for local distortions [3].

The Scientist's Toolkit: Essential Reagents and Software

Table 3: Key Research Reagent Solutions for HRMS Biomarker Studies

Item	Function in Workflow	Application Note
Stable Isotope-Labeled Internal Standards (SILIS)	Added to each sample to monitor and correct for RT shifts and quantify analyte abundance.	Essential for targeted validation (e.g., using SRM/PRM) and can aid alignment in warping methods [16].
Quality Control (QC) Pool Sample	A pooled sample from all study samples, injected repeatedly throughout the analytical sequence.	Used to condition the system, monitor stability, and serves as an ideal reference for RT alignment [12].
Depletion/Enrichment Kits	Immunoaffinity columns for removing high-abundance proteins (e.g., albumin, IgG) from plasma/serum.	Reduces dynamic range, improves detection of low-abundance potential biomarkers, and reduces matrix effects [13] [16].
Trypsin (Sequencing Grade)	Protease for digesting proteins into peptides in bottom-up proteomics.	Standardizes protein analysis; digestion efficiency and completeness are critical for reproducibility [13].
LC-MS Grade Solvents	High-purity solvents for mobile phase preparation and sample reconstitution.	Minimizes background chemical noise and ion suppression, improving feature detection and quantification [14].

Uncorrected retention time shifts are a major bottleneck in LC-MS-based omics studies, directly leading to inefficient feature matching and reduced sensitivity in biomarker discovery. The adoption of robust alignment protocols, particularly modern tools like DeepRTAlign that handle complex RT variations, is no longer optional but a necessity for generating high-quality, reproducible data in large cohort studies. By implementing the detailed protocols and utilizing the essential tools outlined in this document, researchers can significantly improve the fidelity of their data, thereby increasing the likelihood of discovering and validating clinically relevant biomarkers for early disease diagnosis and drug development.

Warping Functions, Direct Matching, and Data Dimensionality

Core Terminology and Quantitative Comparison in HRMS Alignment

In liquid chromatography-mass spectrometry (LC-MS)-based proteomic and metabolomic experiments, retention time (RT) alignment is a critical preprocessing step to ensure that the same biological entities from different samples are correctly matched for subsequent quantitative and statistical analysis. The two primary computational strategies for addressing RT shifts are warping functions and direct matching, each with distinct approaches to handling data dimensionality [3] [17].

Warping function methods correct RT shifts by applying a linear or non-linear function that warps the time axis of a sample run to match a reference run. A key characteristic of these methods is that they are monotonic, meaning they preserve the order of peaks and cannot correct for peak swaps [3] [17]. These algorithms typically use the complete chromatographic profile or total ion current (TIC), operating on a one-dimensional data vector (intensity over retention time) for alignment [17].

Direct matching methods, in contrast, attempt to perform correspondence between runs without a warping function. Instead, they rely on the similarity between specific signals, often using features detected in the data (such as m/z and RT) to find corresponding analytes directly [3]. This approach can, in theory, handle non-monotonic shifts, but its performance has been historically limited by the uncertainty inherent in MS signals [3].

The following table summarizes the core characteristics of these approaches and a third, hybrid method.

Table 1: Core Methodologies for Retention Time Alignment in HRMS

Method Category	Core Principle	Data Dimensionality	Handles Non-Monotonic Shifts?	Representative Tools
Warping Functions	Applies a mathematical function to warp the RT axis of a sample to a reference.	Primarily 1D (e.g., TIC) [17].	No [3]	COW, PTW, DDTW [18] [17]
Direct Matching	Matches features between runs based on similarity of their signals (m/z, RT).	Higher-dimensional (e.g., feature lists with m/z, RT, intensity) [3].	Yes, in theory [3]	RTAlign, MassUntangler, Peakmatch [3]
Hybrid (Deep Learning)	Combines a coarse warping function with a deep learning model for direct matching.	Multi-dimensional feature vectors [3].	Yes [3]	DeepRTAlign [3]

A significant limitation of traditional warping methods is their inability to handle cases of peak swapping, where the elution order of compounds changes between runs due to complex chemical interactions. This phenomenon, once thought rare in LC-MS, is increasingly observed in complex proteomics and metabolomics samples [17]. Furthermore, the alignment of multi-trace data like full LC-MS datasets presents unique challenges. While the alignment is typically performed only along the retention time axis, the high-dimensional nature of the data (m/z and intensity at each time point) offers both challenges and opportunities for developing more robust alignment algorithms [17].

Experimental Protocols for HRMS Alignment

Protocol: DeepRTAlign for Large Cohort Studies

DeepRTAlign is a deep learning-based tool designed to handle both monotonic and non-monotonic RT shifts in large cohort LC-MS data analysis [3].

Workflow Overview:

Precursor Detection and Feature Extraction: Use a tool like XICFinder (similar to Dinosaur) to detect isotope patterns and merge them into features from raw MS files. A mass tolerance of 10 ppm is typically used [3].
Coarse Alignment:
- Linearly scale the RT of all samples to a common range (e.g., 80 minutes).
- For each sample, select the feature with the highest intensity for each m/z to create a new list.
- Divide all non-anchor samples into RT windows (e.g., 1 min). For each window, compute the average RT shift of features relative to the anchor sample.
- Apply the average RT shift to all features within each window [3].
Binning and Filtering: Group features based on their m/z values using a defined bin_width (default 0.03) and bin_precision (default 2). Optionally, filter to retain only the highest intensity feature within each m/z bin and RT range per sample [3].
Input Vector Construction for DNN: For a candidate feature pair from two samples, create an input vector using the RT and m/z of the target feature and its two adjacent neighbors (before and after). The vector includes original values and difference values, normalized by base vectors ([5, 0.03] for differences and [80, 1500] for original values) to form a 5x8 matrix [3].
Deep Neural Network (DNN) Classification: A classifier with three hidden layers (5000 neurons each) distinguishes between true alignments (positive pairs) and non-alignments (negative pairs). The model is trained using a large dataset (e.g., 400,000 pairs) derived from identification results [3].
Quality Control: A decoy-based method is used to estimate the false discovery rate (FDR) of the final alignment results [3].

Protocol: Self-Calibrated Warping (SCW) for Spectral Data

SCW uses high-abundance "calibration peaks" to estimate a warping function for aligning mass spectra, such as from SELDI-TOF-MS [18].

Workflow Overview:

Reference Selection: Select a reference spectrum, typically the one with the highest average correlation coefficient with all other spectra [18].
Preprocessing (Optional): Apply smoothing (e.g., nine-point Savitzky-Golay filter twice) and baseline correction to reduce noise and baseline variance [18].
Calibration Peak Identification: Identify peaks corresponding to high-abundance proteins that are present across all samples. These peaks have a high signal-to-noise ratio, making alignment reliable [18].
Warping Function Estimation:
- Align the calibration peaks from a test spectrum to those in the reference spectrum.
- Record the shifts at the apices of these peaks as "calibration points."
- Fit a low-order polynomial (e.g., 3rd or 4th order) or a piecewise polynomial function to these calibration points using weighted least squares fitting. This defines the warping function, w(x) [18].
Application of Warping: Apply the calculated warping function to the entire test spectrum, shifting all data points according to w(x) [18].

Protocol: Machine Learning-Enhanced Identification Probability

This protocol uses machine learning to enhance chemical identification confidence in non-targeted analysis (NTA) by integrating predicted retention time indices (RTIs) with MS/MS spectral matching [19].

Workflow Overview:

Model 1 - Molecular Fingerprint (MF) to RTI: Train a Random Forest (RF) regression model to predict a harmonized RTI value from 790 molecular fingerprints of known calibrants (e.g., 4,713 compounds). This model learns the quantitative structure-retention relationship (QSRR) [19].
Model 2 - Cumulative Neutral Loss (CNL) to RTI: Train a second RF regression model to predict the RTI from experimental MS/MS spectra. The model uses 15,961 cumulative neutral loss masses and the monoisotopic mass as features, trained on a large dataset of reference spectra (e.g., 485,577 spectra) [19].
Spectral Library Matching: Use an algorithm like the Universal Library Search Algorithm (ULSA) to match query MS/MS spectra against reference spectral libraries, generating a matching score [19].
Model 3 - Binary Classification for P(TP): Train a k-nearest neighbors (KNN) classifier to compute the probability of a true positive (P(TP)) spectral match. Input features include the RTI error (between RTI from Model 1 and Model 2), monoisotopic mass, and spectral matching parameters from ULSA. The model is trained on confirmed true positive and semi-synthetic true negative matches [19].
Identification Probability (IP) Calculation: The average P(TP) for a matched compound is used as its Identification Probability, significantly enhancing annotation confidence compared to spectral matching alone [19].

Figure 1: Data preprocessing workflows for retention time alignment and identification.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Software and Computational Tools for HRMS Data Alignment

Tool Name	Type/Function	Key Application in Alignment
DeepRTAlign [3]	Deep Learning Alignment Tool	Corrects both monotonic and non-monotonic RT shifts in large cohort proteomics/metabolomics studies via a hybrid coarse-alignment and DNN model.
XCMS [20]	LC-MS Data Processing Platform	A widely used software for metabolomics providing feature detection and retention time correction based on warping functions.
MZmine 2 [3]	Modular MS Data Processing	Offers various preprocessing modules, including for chromatographic alignment, for metabolomics and imaging MS data.
OpenMS [3]	C++ MS Library & Tools	Provides a suite of tools and algorithms for LC-MS data processing, including retention time alignment and feature finding.
Warp2D [21]	Web-based Alignment Service	A high-throughput processing service that uses overlapping peak volume for retention time alignment of complex proteomics and metabolomics data.
Matlab Bioinformatics Toolbox (MSAlign) [18]	Commercial Computing Environment	Contains built-in functions like `MSAlign` for aligning mass spectra, often based on peak matching.
R/Python [19] [17]	Programming Languages	Essential environments for implementing custom alignment scripts, machine learning models (e.g., Random Forest, KNN), and data visualization.
ULSA (Universal Library Search Algorithm) [19]	Spectral Matching Algorithm	Used for annotating compounds by matching MS/MS spectra against various reference spectral databases.

From Theory to Practice: A Guide to RT Alignment Algorithms and Software Tools

In liquid chromatography-high-resolution mass spectrometry (LC-HRMS) based proteomic and metabolomic experiments, retention time (RT) alignment is a critical preprocessing step for correlating identical components across different samples [10]. Variations in RT occur due to matrix effects, instrument performance, and changes in chromatographic conditions, making alignment essential for accurate comparative analysis [3]. Traditional warping methods, implemented in widely used open-source software like XCMS and MZmine, correct these RT shifts using mathematical models to align peaks across multiple runs [3] [22]. Within the broader context of HRMS data preprocessing research, these algorithms form the foundational approach for handling monotonic RT shifts, upon which newer, more complex methods are built.

Core Algorithms and Quantitative Comparison

The alignment algorithms in XCMS and MZmine operate on the principle of constructing a warping function that maps the retention times from one run to another. This function corrects for the observed shifts, ensuring that features from the same analyte are correctly grouped. The following table summarizes the core characteristics and algorithms of these two platforms.

Table 1: Core Algorithm Comparison between XCMS and MZmine

Feature	XCMS	MZmine 2
Primary Alignment Method	Obiwarp (non-linear alignment) [23]	Random Sample Consensus (RANSAC) [22]
Algorithm Type	Warping function-based [3]	Warping function-based [3]
Key Strength	High flexibility with numerous supported algorithms and parameters [23]	Robustness against outlier peaks due to the RANSAC algorithm [22]
Typical Input	Peak-picked feature lists from centroid or profile data [24]	Peak lists generated by its modular detection algorithms [25]
Handling of RT Shifts	Corrects monotonic shifts [3]	Corrects monotonic shifts [3]

The performance of these traditional warping methods has been extensively evaluated. In a comparative study of untargeted data processing workflows, XCMS and MZmine demonstrated similar capabilities in detecting true features. Notably, some research recommends combining the outputs of MZmine 2 and XCMS to select the most reliable discriminating markers [26].

Experimental Protocols for Retention Time Alignment

Protocol for RT Alignment using XCMS

The following protocol outlines a standard workflow for peak picking and alignment in XCMS within the Galaxy environment [23].

Step 1: Data Preparation and Import

Obtain raw LC-MS data files in an open format (e.g., mzML, mzXML).
Import files into a data analysis platform. In Galaxy, create a dataset collection containing all sample files to process them efficiently in parallel.
Use the MSnbase readMSData tool to read the raw files and generate RData objects suitable for XCMS processing [23].

Step 2: Peak Picking

Execute the xcms findChromPeaks function. Select an appropriate algorithm based on data characteristics:
- Massifquant: A Kalman filter (KF)-based chromatographic peak detection method for centroid mode data. Key parameters include peakwidth (e.g., c(20, 50)), snthresh (signal-to-noise threshold, e.g., 10), and prefilter (e.g., c(3, 100)) [24].
- CentWave: Ideal for high-resolution data with a high sampling rate; effective for detecting peaks with a broader width [24].
The output is a peak list for each sample, containing columns for mz, mzmin, mzmax, rtmin, rtmax, rt (retention time), into (integrated intensity), and maxo (maximum intensity) [24].

Step 3: Retention Time Alignment with Obiwarp

Apply the xcms adjustRtime function with the Obiwarp method to perform nonlinear alignment.
This method calculates a warping function for each sample based on a chosen reference, correcting for monotonic RT drifts across the run set [23].

Step 4: Correspondence and Grouping

Use the xcms group function to match peaks across samples by grouping features with similar m/z and aligned retention times.
Finally, fill in any missing peaks using a gap-filling algorithm to generate a complete feature table for statistical analysis [23].

Protocol for RT Alignment using MZmine

Step 1: Peak Detection and Peak List Building

Process raw data files through MZmine's peak detection modules. The software supports multiple algorithms, including:
- Local Maxima: A simple method for well-defined spectra.
- Wavelet Transform: Suitable for noisy data, based on continuous wavelet transform matched to a "Mexican hat" model [22].
The result is a peak list for each sample.

Step 2: Configuring the RANSAC Aligner

Run the Join Aligner module, which utilizes the RANSAC algorithm.
The RANSACParameters class handles the user-configurable settings. Critical parameters include:
- mzTolerance: The maximum allowed m/z difference for two peaks to be considered a match.
- RTTolerance: The maximum allowed retention time difference before alignment.
- Iterations: The number of RANSAC iterations to perform.
The RANSACPeakAlignmentTask class contains the logic for executing the alignment [22].

Step 3: Executing the RANSAC Algorithm

The algorithm works by iteratively selecting random subsets of potential peak matches to form a candidate warping model.
It evaluates how many other peaks in the dataset are consistent with this model (the "consensus set").
The model with the largest consensus set is chosen as the optimal warping function, making the alignment robust to outlier peaks that do not fit the general RT shift pattern [22].

Step 4: Review and Export

Inspect the aligned peak list within MZmine's interactive table and visualization windows.
Export the final, aligned feature table for downstream statistical analysis.

The logical flow of the RANSAC alignment process within MZmine's modular framework is illustrated below.

The Scientist's Toolkit: Essential Research Reagents and Software

Successful implementation of RT alignment protocols relies on a suite of software tools and computational resources. The following table details key components of the research toolkit.

Table 2: Essential Research Reagents and Software Solutions

Tool/Resource	Function in RT Alignment Research	Source/Availability
XCMS R Package	Open-source software for peak picking, alignment, and statistical analysis of LC/MS data [23].	Available via Bioconductor [23].
MZmine 2	Modular, open-source framework for processing, visualizing, and analyzing MS-based molecular profile data [22].	Available from http://mzmine.sourceforge.net/ [22].
Galaxy / W4M	Web-based platform providing a user-friendly interface for XCMS workflows, enabling tool use without advanced programming [23].	Public instance at https://workflow4metabolomics.us/ [23].
metabCombiner	An R package for matching features in disparately acquired LC-MS data sets, overcoming significant RT alterations [27].	R package at https://github.com/hhabra/metabCombiner [27].
DeepRTAlign	A deep learning-based tool demonstrating improved performance over traditional warping for complex monotonic/non-monotonic shifts [3].	Method described in Nature Communications [3].
PARSEC	A post-acquisition strategy for improving metabolomics data comparability across separate studies or batches [28].	Method described in Analytica Chimica Acta [28].

Traditional warping methods, as implemented in XCMS and MZmine, provide robust and well-established solutions for the crucial data preprocessing step of RT alignment. While their core warping function approach is highly effective for correcting monotonic RT shifts, a key limitation is their inability to handle non-monotonic shifts [3]. The emergence of new computational strategies, including deep learning-based tools like DeepRTAlign and advanced post-acquisition correction workflows like PARSEC, points toward the future of alignment research [3] [28]. These next-generation methods aim to overcome the limitations of traditional algorithms, particularly for integrating and performing meta-analyses on large-scale cohort data acquired under disparate conditions, thereby enhancing the reproducibility and scalability of HRMS-based studies [27] [28].

Advanced Direct Matching and Multi-Dataset Alignment with metabCombiner

Liquid Chromatography–High-Resolution Mass Spectrometry (LC-HRMS) has become an indispensable analytical technique in untargeted metabolomics, enabling the simultaneous detection of thousands of small molecules in biological samples [29]. A fundamental challenge in processing this complex data involves feature alignment, a computational process where LC-MS features derived from common ions across multiple samples or datasets are assembled into a unified data matrix suitable for statistical analysis [29] [30]. This alignment process is crucial for comparative analysis but is significantly complicated by analytical variability introduced when data is acquired across different laboratories, generated using non-identical instruments, or collected in multiple batches of large-scale studies [29]. Such variability manifests as retention time (RT) shifts that can be substantial (up to several minutes) and cannot be adequately corrected using conventional alignment approaches [29].

Several computational strategies have been developed to address the LC-MS alignment problem. Traditional methods can be broadly categorized into warping function approaches (e.g., XCMS, MZmine 2, OpenMS), which correct RT shifts using linear or non-linear warping functions but struggle with non-monotonic shifts, and direct matching methods (e.g., RTAlign, MassUntangler, Peakmatch), which perform correspondence based on signal similarity without a warping function but often exhibit inferior performance due to MS signal uncertainty [3]. More recently, deep learning approaches such as DeepRTAlign have emerged, combining pseudo warping functions with deep neural networks to handle both monotonic and non-monotonic RT shifts [3]. Additionally, optimal transport methods like GromovMatcher leverage correlation structures between feature intensities and advanced mathematical frameworks to align datasets [31]. Within this evolving landscape, metabCombiner occupies a unique position as a robust solution specifically designed for aligning disparately acquired LC-MS metabolomics datasets through a direct matching framework with retention time mapping capabilities [29].

metabCombiner Technical Framework

Core Algorithmic Architecture

metabCombiner employs a stepwise alignment workflow that enables the integration of multiple untargeted LC-MS metabolomics datasets through a cyclical process consisting of six distinct phases [29]. The software introduces a multi-dataset representation class called the "metabCombiner object," which serves as the main framework for executing the package workflow steps [29]. This object maintains two closely linked report tables: a combined table containing possible feature pair alignments (FPAs) with their associated per-sample abundances and alignment scores, and a feature data table that organizes aligned features and their descriptors by constituent dataset of origin [29].

A key innovation in metabCombiner 2.0 is its use of a template-based matching strategy, where one input object is designated as the projection ("X") feature list and the other serves as the reference ("Y") [29]. In this framework, a "primary" feature list acts as a template for matching compounds in "target" feature lists, facilitating inter-laboratory reproducibility studies [29]. The algorithm constructs a combined table showing possible FPAs arranged into m/z-based groups, constrained by a 'binGap' parameter [29]. For each feature pair, the table includes a 'score' column representing calculated similarity, rankX and rankY ordering alignment scores by individual features, and "rtProj" showing the mapping of retention times from the projection set to the reference [29].

Workflow and Process Diagram

The metabCombiner alignment process follows a structured, cyclical workflow consisting of six method steps that transform raw feature tables into aligned datasets [29]. The following diagram illustrates this comprehensive process:

Quantitative Similarity Scoring System

The similarity scoring system in metabCombiner represents a sophisticated computational approach that evaluates potential feature matches across multiple dimensions. The calcScores() function computes a similarity score between 0 and 1 for all grouped feature pairs using an exponential penalty function that accounts for differences in m/z, retention time (comparing model-projected RTy versus observed RTy), and quantile abundance (Q) [29]. This multi-parameter approach ensures that the highest scores are assigned to feature pairs with minimal differences across all three critical dimensions.

Following score calculation, pairwise score ranks (rankX and rankY) are computed for each unique feature with respect to their complements [29]. The most plausible matches are ranked first (rankX = 1 and rankY = 1) and typically score close to 1, providing a straightforward mechanism for identifying high-confidence alignments [29]. The algorithm also incorporates a conflict resolution system that identifies and resolves competing alignment hypotheses, particularly for closely eluting isomers, by selecting the combination of feature pair alignments within each subgroup with the highest sum of scores [29].

Comparative Analysis of Alignment Methodologies

Feature Comparison with Alternative Approaches

Table 1: Comparative analysis of LC-MS alignment methodologies

Method	Algorithm Type	RT Correction Approach	Multi-Dataset Capability	Strengths	Limitations
metabCombiner	Direct matching with warping	Penalized basis spline (GAM)	Yes (stepwise)	Handles disparate datasets; maintains non-matched features; requires no identified peptides	Limited functionality for >3 tables in initial version
DeepRTAlign [3]	Deep learning	Coarse alignment + DNN refinement	Limited	Handles monotonic and non-monotonic shifts; improved identification sensitivity	Requires significant training data; computational complexity
GromovMatcher [31]	Optimal transport	Nonlinear map via weighted spline regression	Yes	Uses correlation structures; robust to data variations; minimal parameter tuning	Limited validation with non-curated datasets
ROIMCR [15] [32]	Multivariate curve resolution	Not required (direct component analysis)	Yes	Processes positive/negative data simultaneously; reduces dimensionality	Lower treatment sensitivity; different conceptual approach
Traditional Warping (XCMS, MZmine) [3]	Warping function	Linear/non-linear warping	Limited	Established methodology; extensive community use	Cannot correct non-monotonic shifts; struggles with disparate data

Performance Benchmarking

When evaluated on experimental data, metabCombiner has demonstrated robust performance in challenging alignment scenarios. In an inter-laboratory lipidomics study involving four core laboratories using different in-house LC-MS instrumentation and methods, metabCombiner successfully aligned datasets despite significant analytical variability [29] [30]. The method's template-based approach allowed for the stepwise integration of multiple datasets, facilitating reproducibility assessments across participating institutions [29].

Comparative benchmarking studies have revealed that alignment tools exhibit significantly different characteristics in practical applications. While feature profiling methods like MZmine3 show increased sensitivity to treatment effects, they also demonstrate increased susceptibility to false positives [32]. Conversely, component-based approaches like ROIMCR provide superior consistency and reproducibility but may exhibit lower treatment sensitivity [32]. These findings highlight the importance of selecting alignment methodologies appropriate for specific research objectives and data characteristics.

Experimental Protocols

Multi-Dataset Alignment Procedure

Protocol 1: Stepwise Alignment of Disparate LC-MS Datasets

This protocol describes the procedure for aligning multiple disparately acquired LC-MS metabolomics datasets using metabCombiner 2.0, demonstrated through an inter-laboratory lipidomics study with four participating core laboratories [29].

Input Data Preparation
- Process raw LC-MS data from each laboratory using feature detection software (XCMS, MZmine, or MS-DIAL) to generate feature tables [29]
- Format all feature tables as metabData objects using the metabData() constructor function
- Apply filters to exclude features based on proportion missingness and retention time ranges
- Merge duplicate features representing the same compound into single representative rows [29]
metabCombiner Object Construction
- Construct a metabCombiner object from two single datasets, a single and combined dataset, or two combined dataset objects
- Designate one input object as the projection ("X") feature list and the other as the reference ("Y")
- Assign unique identifiers to each dataset for tracking throughout the alignment process [29]
Retention Time Mapping and Alignment
- Execute the selectAnchors() function to choose feature pairs among highly abundant compounds for modeling RT warping
- Run fit_gam() to compute a penalized basis spline model for RT mapping using selected anchors
- Constrain the RT mapping to appropriate ranges by removing empty head or tail chromatographic regions [29]
- Perform calcScores() to compute similarity scores (0-1) for all grouped feature pairs
Feature Table Reduction and Annotation
- Apply reduceTable() to assign one-to-one correspondence between feature pairs using calculated alignment scores and ranks
- Implement thresholding for alignment scores, pairwise ranks, and RT prediction errors to exclude over 90% of mismatches
- Resolve conflicting alignment possibilities using the integrated competitive hypothesis testing [29]
Multi-Dataset Integration
- Utilize updateTables() to restore features from original inputs lacking complementary matches
- Repeat the alignment cycle to incorporate additional single or combined datasets
- Export the final aligned feature matrix containing matched features and their abundances across all datasets [29]

Batch Alignment for Large-Scale Studies

Protocol 2: batchCombine for Multi-Batch Experiments

This protocol outlines the application of the metabCombiner framework for aligning experiments composed of multiple batches, serving as an alternative to processing large datasets in single batches [29].

Batch Data Organization
- Organize feature tables by batch, ensuring consistent formatting across all batches
- Designate a primary batch with highest data quality to serve as the alignment template
Sequential Batch Processing
- Align the primary batch with the first secondary batch using the standard metabCombiner workflow
- Use the resulting combined dataset as the reference for subsequent batch alignments
- Iterate through all batches until complete dataset integration is achieved [29]
Quality Assessment and Validation
- Examine the distribution of alignment scores across batches to identify potential issues
- Verify retention time mapping consistency across all integrated batches
- Assess the proportion of features successfully matched versus those carried forward as unique to specific batches

Research Toolkit for LC-MS Alignment

Table 2: Essential research reagents and computational tools for LC-MS alignment studies

Category	Item/Software	Specifications	Application Function
Software Packages	metabCombiner	R package (Bioconductor), R Shiny App	Primary alignment tool for disparate datasets
	XCMS [29]	Open-source R package	Feature detection and initial processing
	MZmine [29]	Java-based platform	Alternative feature detection and processing
	MS-DIAL [29]	Comprehensive platform	Data processing and preliminary alignment
Data Objects	metabData object	Formatted feature table (m/z, RT, abundance)	Single dataset representation class
	metabCombiner object	Multi-dataset representation	Main framework for executing alignment workflow
Instrumentation	LC-HRMS Systems	Various vendors (Thermo, Waters, etc.)	Raw data generation with high mass accuracy
Reference Materials	Quality Control Samples	Matrix-matched with study samples	Monitoring instrument performance and alignment quality

Advanced Applications and Implementation

Inter-Laboratory Reproducibility Assessment

The enhanced multi-dataset alignment capability of metabCombiner 2.0 enables systematic reproducibility assessments across laboratories and analytical platforms. In the demonstrated inter-laboratory lipidomics study, the algorithm successfully aligned datasets from four core laboratories generated using each institution's in-house LC-MS instrumentation and methods [29]. This application highlights metabCombiner's utility in addressing the significant challenges to data interoperability that persist despite efforts to standardize protocols in the metabolomics field [29].

For implementation of inter-laboratory studies, researchers should designate a reference dataset with the highest data quality or most comprehensive feature detection to serve as the primary alignment template. Subsequent laboratory datasets can then be sequentially aligned to this reference, with careful documentation of alignment quality metrics for each pairwise combination. This approach facilitates the identification of systematic biases and platform-specific sensitivities that may impact cross-study comparisons and meta-analyses.

Integration with Downstream Bioinformatics Workflows

Aligned feature matrices generated by metabCombiner serve as critical inputs for subsequent metabolomic data analysis steps. The unified data structure enables reliable comparative statistics to identify differentially abundant metabolites across experimental conditions, datasets, or laboratories. Additionally, the aligned features can be integrated with pathway analysis tools to elucidate altered metabolic pathways in biological studies.

For the ELEMENT (Early Life Exposures in Mexico to Environmental Toxicants) cohort study, which involved multi-batch untargeted LC-MS metabolomics analyses of fasting blood serum from Mexican adolescents, the batchCombine application of metabCombiner provided an effective solution for handling the significant chromatographic drift encountered between batches in large-scale studies [29]. This demonstrates the method's utility in epidemiological applications where data collection necessarily spans extended periods and multiple analytical batches.

Liquid chromatography-mass spectrometry (LC-MS) is a cornerstone technique in proteomics and metabolomics, enabling the separation, identification, and quantification of thousands of analytes in complex biological samples. However, a persistent challenge in experiments involving multiple samples is the shift in analyte retention time (RT) across different LC-MS runs. These shifts, caused by factors such as matrix effects and instrumental performance variations, complicate the correspondence process—the critical task of matching the same compound across multiple samples [3] [33]. In large cohort studies, which are essential for robust biomarker discovery and systems biology, accurate alignment becomes a major bottleneck [34].

Traditional computational strategies for RT alignment fall into two main categories. The warping function method (used by tools like XCMS, MZmine 2, and OpenMS) corrects RT shifts using a linear or non-linear warping function. A key limitation of this approach is its inherent inability to handle non-monotonic RT shifts because the warping function itself is monotonic [3] [33]. The direct matching method (exemplified by tools like RTAlign and MassUntangler) attempts correspondence based on signal similarity without a warping function but often underperforms due to the uncertainty of MS signals [3]. Consequently, existing tools struggle with complex RT shifts commonly found in large-scale clinical datasets. DeepRTAlign was developed to overcome these limitations by integrating a robust coarse alignment with a deep learning-based direct matching strategy, proving effective for both monotonic and non-monotonic shifts [3] [34].

DeepRTAlign: Methodology and Workflow

DeepRTAlign employs a hybrid workflow that synergizes a traditional coarse alignment with an advanced deep neural network (DNN). The entire process is divided into a training phase (which produces a reusable model) and an application phase (which uses the model to align new datasets) [3].

Detailed Workflow and Input Vector Construction

The workflow begins with precursor detection and feature extraction. While DeepRTAlign uses an in-house tool called XICFinder for this purpose, it is highly flexible and supports feature lists from other popular tools like Dinosaur, OpenMS, and MaxQuant, requiring only simple text or CSV files containing m/z, charge, RT, and intensity information [3] [35].

Next, a coarse alignment is performed to handle large-scale monotonic shifts. The retention times in all samples are first linearly scaled to a common range (e.g., 80 minutes). An anchor sample (typically the first sample) is selected, and all other samples are divided into fixed RT windows (e.g., 1 minute). For each window, features are compared to the anchor sample within a small m/z tolerance (e.g., 0.01 Da). The average RT shift for matched features within the window is calculated, and this average shift is applied to all features in that window to coarsely align it with the anchor [3].

After coarse alignment, features are binned and filtered. Binning groups features based on their m/z values within a user-defined window (bin_width, default 0.03) and precision (bin_precision, default 2 decimal places). This step ensures that only features with similar m/z are considered for alignment, drastically reducing computational complexity. An optional filtering step can retain only the most intense feature within a specified RT range for each sample in each m/z bin [3] [35].

A critical innovation of DeepRTAlign is its input vector construction. Inspired by word embedding methods in natural language processing, the model considers the contextual neighborhood of a feature. For a target feature pair from two samples, the input vector incorporates the RT and m/z of the two target features, plus the two adjacent features (before and after) in each sample based on RT. This creates a comprehensive vector that includes both original values and difference values between the samples, which are then normalized using base vectors ([5, 0.03] for differences and [80, 1500] for original values). The final input to the DNN is a 5x8 vector that richly represents the feature and its local context [3] [34].

Deep Neural Network Architecture and Training

The core of DeepRTAlign is a deep neural network with three hidden layers, each containing 5000 neurons [3]. The network functions as a binary classifier, determining whether a pair of features from two different samples should be aligned (positive class) or not (negative class).

Training Data: The model is trained on a large set of 400,000 feature-feature pairs. Half of these are positive pairs, derived from the same peptides based on identification results (from search engines like Mascot), and the other half are negative pairs, collected from different peptides but with a small m/z tolerance (0.03 Da) [3].
Hyperparameters: The model uses the BCELoss loss function from PyTorch and the sigmoid activation function. Optimization is performed using the Adam optimizer with an initial learning rate of 0.001, which is reduced by a factor of 10 every 100 epochs. Training runs for 400 epochs with a batch size of 500 [3].
Quality Control: A crucial feature of DeepRTAlign is its built-in quality control module, which estimates the false discovery rate (FDR) of the alignment results. For each m/z window, a decoy sample is randomly created. Since features in this decoy should not genuinely align, the rate at which they are incorrectly aligned provides an FDR estimate, allowing users to filter results with a desired confidence level (e.g., FDR < 1%) [3].

The following diagram illustrates the complete DeepRTAlign workflow, from raw data input to the final aligned feature list.

DeepRTAlign Workflow: From Raw Data to Aligned Features

Performance Benchmarking and Quantitative Evaluation

DeepRTAlign has been rigorously benchmarked against state-of-the-art tools like MZmine 2 and OpenMS across multiple real-world and simulated proteomic and metabolomic datasets [3] [33]. The performance is typically evaluated using precision (the fraction of correctly aligned features among all aligned features) and recall (the fraction of true corresponding features that are successfully aligned) [33].

Performance on Diverse Datasets

The following table summarizes the documented performance advantages of DeepRTAlign over existing methods on various test datasets.

Table 1: Performance Benchmarking of DeepRTAlign Across Diverse Datasets

Dataset Name	Sample Numbers	Key Finding	Performance Improvement	Reference
HCC (Liver Cancer)	101 Tumor + 101 Non-Tumor	Improved biomarker discovery classifier	AUC of 0.995 for recurrence prediction	[34] [33]
Single-Cell DIA	Not Specified	Increased peptide identifications	298 more peptides aligned per cell vs. DIA-NN	[33]
Multiple Test Sets	6 Datasets	Average performance increase	~7% higher precision, ~20% higher recall	[33]
UPS2-Y / UPS2-M	12 per set	Handles complex samples better	Outperformed MZmine 2 & OpenMS	[3]

Comparison with Machine Learning Models

Beyond traditional tools, DeepRTAlign's DNN has been compared against other machine learning classifiers, including Random Forests (RF), k-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Logistic Regression (LR). After parameter optimization, the DNN consistently demonstrated superior performance, confirming that the depth and architecture of the neural network are well-suited for this complex matching task [33].

Application Notes and Experimental Protocols

This section provides a detailed, step-by-step protocol for applying DeepRTAlign to a typical large-cohort LC-MS dataset, enabling researchers to replicate and implement this method successfully.

Protocol: Aligning a Large-Cohort LC-MS Dataset using DeepRTAlign

Objective: To accurately align LC-MS features across multiple samples in a large cohort study using DeepRTAlign, enabling downstream comparative analysis.

I. Prerequisite Software and Data Preparation

Install DeepRTAlign: Install the tool using pip with the command pip install deeprtalign [35]. The software is compatible with Windows 10, Ubuntu 18.04, and macOS 12.1.
Input Data Preparation:
- Feature Lists: Generate feature lists for each sample in your cohort using a supported feature extraction tool (e.g., Dinosaur, OpenMS, MaxQuant, XICFinder). Alternatively, prepare a custom text (TXT) or comma-separated value (CSV) file containing the following columns for each feature: m/z, charge, retention time (RT), and intensity [35].
- Sample List: Create an Excel file (.xlsx) that maps each feature file name to a unique sample name. This file is crucial for the tool to correctly identify and label samples.

II. Configuration and Command Line Execution

Set Up Working Directory: Create a new project folder and place the file_dir (containing all feature files) and the sample_file.xlsx inside it. Navigate your command line to this folder. Note: Run different projects in separate folders to avoid overwriting results [35].
Execute Alignment Command: Run DeepRTAlign from the command line. A typical command for data extracted with Dinosaur is:
- -m: Specifies the feature extraction method.
- -f: Path to the directory containing feature files.
- -s: Path to the sample list Excel file.
- -pn: (Recommended) Sets the number of parallel processes. Set this according to your CPU core count for significantly faster execution [35].

III. Advanced Parameter Tuning (Optional)

For datasets with unique characteristics, the following parameters can be adjusted for optimal results. The default values are suitable for most scenarios [35].

Table 2: Key Configurable Parameters in DeepRTAlign

Parameter	Command Flag	Default Value	Description
Processing Number	`-pn`	-1 (use all CPUs)	Number of parallel processes. Adjust for speed.
Time Window	`-tw`	1 (minute)	RT window size for coarse alignment.
Bin Width	`-bw`	0.03	m/z window size for binning features.
FDR Cutoff	`-fd`	0.01	False discovery rate threshold for QC.
Max m/z Threshold	`-mm`	20 (ppm)	m/z tolerance for candidate feature pairing.
Max RT Threshold	`-mt`	5 (minutes)	RT tolerance for candidate feature pairing.

IV. Output Interpretation and Quality Control

Results Location: After successful execution, results are saved in the mass_align_all_information folder within your working directory.
Output File: The primary output file is information_target.csv. This file contains the final aligned feature list after quality control. Key columns include:
- sample: The sample name.
- group: The aligned feature group identifier. Features with the same group ID are considered the same analyte across samples.
- mz, time, charge, intensity: The aligned feature's properties.
- score: The DNN's confidence score for the alignment [35].
Handling Low-Output Scenarios: If the alignment produces very few results, which may happen with highly dissimilar samples, it is recommended to set the FDR (-fd) to 1 and manually filter the information_target.csv file to retain features with a score greater than 0.5 [35].

The Scientist's Toolkit: Essential Research Reagents and Solutions

The following table lists the key software tools and resources essential for implementing the DeepRTAlign protocol.

Table 3: Key Research Reagent Solutions for DeepRTAlign Implementation

Item Name	Function / Role in the Workflow	Example / Note
DeepRTAlign Python Package	The core alignment tool performing coarse alignment and deep learning-based matching.	Install via `pip install deeprtalign` [35].
Feature Extraction Software	Generates the input feature lists from raw MS data.	Dinosaur, OpenMS, MaxQuant, XICFinder, or custom TXT/CSV [3] [35].
Python Environment (v3.x)	The runtime environment required to execute DeepRTAlign.	Version 1.2.2 tested with PyTorch v1.8.0 [3] [35].
Sample List File (.xlsx)	Maps feature files to sample names, ensuring correct sample tracking.	A critical metadata input [35].
High-Resolution LC-MS Data	The raw data source from which features are extracted.	Data from Thermo or other high-resolution mass spectrometers [3].

Downstream Biological Applications

The primary value of accurate RT alignment is its ability to empower downstream biological analyses. A compelling application of DeepRTAlign was demonstrated in a study on hepatocellular carcinoma (HCC). Using the features aligned by DeepRTAlign from a large cohort of patients, the researchers trained a robust classifier to predict the early recurrence of HCC. This classifier, built on only 15 aligned features, was validated on an independent cohort using targeted proteomics, achieving an area under the curve (AUC) of 0.833, showcasing strong predictive power for a critical clinical outcome [34]. This success underscores how DeepRTAlign can directly contribute to advancing clinical proteomics and biomarker discovery.

DeepRTAlign represents a significant advancement in RT alignment by successfully leveraging deep learning to solve the long-standing problem of non-monotonic RT shifts in large-cohort LC-MS data. Its hybrid approach, combining a robust coarse alignment with a context-aware DNN, has proven more accurate and sensitive than current state-of-the-art tools across diverse datasets [3] [33]. Furthermore, its flexibility in accepting input from multiple feature extraction tools makes it a versatile solution for the proteomics and metabolomics community [35].

The developers have outlined clear future directions for DeepRTAlign. Planned improvements include reducing its current dependence on the specific training dataset (HCC-T), enhancing user-friendliness by developing a graphical interface, and boosting processing speed, potentially through a C++ implementation [34]. By continuing to address these limitations, DeepRTAlign is poised to become an even more accessible and powerful tool, solidifying its role in overcoming one of the major bottlenecks in large-scale omics research.

In liquid chromatography-high-resolution mass spectrometry (LC-HRMS) based untargeted analysis, the challenge of processing highly complex and voluminous datasets is a significant bottleneck. Traditional data analysis strategies often involve multiple steps, including chromatographic alignment and peak shaping, which can introduce errors and require extensive parameter optimization [36]. The ROIMCR (Regions of Interest Multivariate Curve Resolution) strategy emerges as a powerful component-based alternative that efficiently filters, compresses, and resolves LC-MS datasets without the need for prior retention time alignment or peak modeling [36] [37].

This methodology is particularly relevant in the context of HRMS data preprocessing retention time correction research, as it fundamentally bypasses the alignment problem. Instead of correcting for retention time shifts between samples, ROIMCR operates by resolving the data into their pure constituent components, effectively side-stepping the need for complex alignment procedures that can be problematic in large cohort studies [3] [15]. The method combines the benefits of data compression through region of interest (ROI) searching, which preserves spectral accuracy, with the powerful resolution capabilities of Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) [36] [37].

Table 1: Core Advantages of ROIMCR Over Traditional Feature-Based Approaches

Aspect	Traditional Feature-Based Approaches (e.g., XCMS, MZmine)	ROIMCR Component-Based Approach
Retention Time Alignment	Requires explicit alignment (warping or direct matching) [3]	No alignment needed; resolves components across samples directly [36] [15]
Peak Modeling	Often requires chromatographic peak modeling/shaping (e.g., Gaussian fitting) [36]	No peak shaping required; handles real peak profiles [36]
Data Compression	Often uses binning, which can reduce spectral accuracy [36]	ROI compression preserves original spectral accuracy [36] [37]
Data Structure Output	Produces a "feature profile" (FP) table (m/z, RT, intensity) [38]	Produces "component profiles" (CP) with resolved spectra and elution profiles [38]
Handling of Co-elution	Can be challenging, may lead to missed or split features	Excellently resolves co-eluting compounds via multivariate resolution [36]

Theoretical Foundation and Workflow

The ROIMCR methodology is built upon a two-stage process that transforms raw LC-MS data into interpretable component information.

Regions of Interest (ROI) Compression

The first stage addresses the challenge of data volume and complexity. Raw LC-MS datasets are massive, making direct processing computationally intensive. Unlike traditional binning approaches, which divide the m/z axis into fixed-size bins and risk peak splitting or loss of spectral accuracy, ROI compression identifies contiguous regions in the m/z domain where analyte signals are concentrated [36]. These ROIs are defined based on specific criteria such as a signal intensity threshold, an admissible mass error, and a minimum number of consecutive scans where the signal appears [36] [38]. The result is a significantly compressed data matrix that retains the original spectral resolution of the high-resolution MS instrument.

Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS)

The second stage resolves the compressed data into pure chemical constituents. MCR-ALS is a bilinear model that decomposes the compressed data matrix ( D ) into the product of a matrix of pure elution profiles ( C ) and a matrix of pure mass spectra ( S T ), according to the equation:

D = C S^T + E

where E is a matrix of residuals not explained by the model [36]. The "Alternating Least Squares" part refers to the iterative algorithm used to solve for C and S T under suitable constraints (e.g., non-negativity of ion intensities and chromatographic profiles) [36]. This resolution occurs without requiring the chromatographic peaks to be aligned across different samples, a significant advantage when analyzing large sample sets where retention time shifts are inevitable [36] [15]. The final output consists of resolved components, each defined by a pure mass spectrum and its corresponding elution profile across samples.

The following diagram illustrates the logical flow of the complete ROIMCR procedure, from raw data to resolved components:

ROIMCR Analytical Workflow

Comparative Performance and Applications

The practical utility of ROIMCR has been demonstrated in various scientific applications, from environmental monitoring to clinical biomarker discovery. A recent 2025 study provided a direct comparison between ROIMCR and the popular feature-based tool MZmine3, highlighting their distinct characteristics [38].

Table 2: Performance Comparison of ROIMCR vs. MZmine3 in a Non-Target Screening Study

Performance Metric	MZmine3 (Feature Profile)	ROIMCR (Component Profile)
Dominant Variance	Comparable contributions from time (20.5-31.8%) and sample type (11.6-22.8%) [38]	Temporal variation dominated (35.5-70.6% variance) [38]
Treatment Sensitivity	Higher sensitivity to treatment effects [38]	Lower treatment sensitivity [38]
False Positives	Increased susceptibility to false positives [38]	Superior consistency and reproducibility [38]
Temporal Pattern Clarity	Less clear temporal trends [38]	Excellent clarity for temporal dynamics [38]
Workflow Agreement	Agreement between workflows diminishes with more specialized analytical objectives [38]	Complementary use with feature-based methods is beneficial [38]

In a clinical application, ROIMCR was successfully used for plasma metabolomic profiling in a study on chronic kidney disease (CKD). The method simultaneously processed data from both positive (MS1+) and negative (MS1-) ionization modes without requiring time alignment, increasing metabolite coverage and identification efficiency. The analysis revealed distinct metabolic profiles for healthy controls, intermediate-stage CKD patients, and end-stage (dialysis) patients, successfully identifying both recognized CKD biomarkers and potential new indicators of disease onset and progression [15]. This demonstrates ROIMCR's capability to handle complex biological datasets and generate biologically meaningful results.

Detailed Experimental Protocol for ROIMCR Analysis

This protocol provides a step-by-step guide for implementing the ROIMCR strategy on LC-HRMS datasets using the MATLAB environment, based on the methodology described in the literature [36] [38] [3].

Software and Material Requirements

Table 3: Essential Research Reagent Solutions and Software for ROIMCR

Item Name	Type	Function/Purpose
MATLAB	Software Platform	The primary computing environment for running ROIMCR scripts and toolboxes [36].
MCR-ALS 2.0 Toolbox	Software Library	Provides the core functions for Multivariate Curve Resolution-Alternating Least Squares analysis [38].
MSroi GUI App	Software Tool	A MATLAB-based application for importing chromatograms and performing the initial ROI compression [38].
Centroided .mzXML Files	Data Format	The standard input file format; conversion from vendor raw files is required [38].
Quality Control (QC) Samples	Sample Type	Samples spiked with chemical standards used to optimize ROI and MCR-ALS parameters [38].

Step-by-Step Procedure

Data Preparation and Conversion:
- Acquire raw LC-HRMS data in profile mode.
- Convert the vendor-specific raw files into the open .mzXML format using a tool like msConvert [38].
- Ensure the data is centroided during the conversion process.
ROI Compression and Matrix Building:
- Import the converted .mzXML files into the MATLAB environment using the MSroi GUI app or equivalent functions [38].
- Set the ROI search parameters:
  - Signal Intensity Threshold: Set based on instrument noise level and QC sample response.
  - Mass Tolerance: Typically set to 0.005 Da or according to the mass accuracy of the instrument [38].
  - Minimum Consecutive Scans: A parameter such as 30 consecutive scans ensures a stable signal and helps filter noise [38].
- Execute the ROI search. This step processes all samples and generates a column-wise augmented data matrix. In this matrix, rows represent the product of elution times and different samples, and columns represent the compressed m/z variables (ROIs) [36] [38].
MCR-ALS Modeling and Resolution:
- Load the augmented data matrix into the MCR-ALS 2.0 toolbox.
- Initial Estimate: Provide an initial guess for the pure components. This can be done using methods like Singular Value Decomposition (SVD) or by selecting pure variables [36].
- Apply Constraints: Define and apply appropriate constraints during the ALS optimization to ensure chemically meaningful solutions. Crucial constraints include:
  - Non-negativity: Applied to both chromatographic profiles and mass spectra, as ion intensities cannot be negative.
  - Closure (or Unimodality): Can be applied to chromatographic profiles if needed.
- Run the MCR-ALS algorithm until convergence is achieved (e.g., until the relative change in residuals falls below a predefined threshold).
Interpretation and Component Validation:
- Analyze the results. The output consists of two main matrices:
  - Matrix C: Contains the resolved elution profiles for each component across all samples.
  - Matrix S T : Contains the resolved pure mass spectra for each component.
- Use the resolved mass spectra for metabolite identification by comparing them against online databases (e.g., HMDB, MassBank) or authentic standards.
- Use the resolved elution profile areas from matrix C as quantitative scores for subsequent statistical analysis (e.g., PCA, PLS-DA) to identify differentially abundant components between sample groups [15] [38].

ROIMCR represents a paradigm shift in HRMS data preprocessing, moving away from feature-based workflows that rely on error-prone alignment and peak modeling steps. By combining intelligent data compression via ROIs with the powerful component resolution of MCR-ALS, it offers a streamlined and robust analytical pipeline. The method has proven effective in diverse fields, from unveiling disease biomarkers in clinical metabolomics to clarifying temporal chemical dynamics in environmental monitoring. While feature-based and component-based approaches each have their own strengths, ROIMCR stands out as a powerful, alignment-free solution for the efficient and reproducible analysis of complex LC-HRMS datasets.

Post-Acquisition Correction Strategies to Improve Data Comparability

In high-resolution mass spectrometry (HRMS)-based metabolomics, the post-acquisition phase is critical for transforming raw instrumental data into biologically meaningful information. A significant challenge in this process, especially within large-scale or multi-batch studies, is maintaining data comparability by correcting for technical variations that occur after data acquisition. These variations, often termed batch effects, can arise from instrumental drift, environmental fluctuations, or differences in reagent batches, and they can severely obscure true biological signals if not properly addressed [28]. The issue is particularly acute in retention time alignment, where subtle shifts can misalign peaks across samples, leading to inaccurate feature matching and quantification. This document details application notes and protocols for post-acquisition correction strategies, framed within the broader context of HRMS data preprocessing and retention time correction alignment research. The goal is to provide researchers, scientists, and drug development professionals with robust methodologies to enhance data quality, interoperability, and the reliability of subsequent biological conclusions [28] [20].

Application Notes

The Challenge of Analytical Variability in HRMS Data

Liquid Chromatography-High Resolution Mass Spectrometry (LC-HRMS) is a cornerstone of modern untargeted metabolomics due to its high sensitivity and specificity [20]. However, the analytical process is susceptible to unwanted variability. Without correction, this variability limits the integration of data collected separately, creating a significant bottleneck that prevents meaningful inter-comparisons across studies and limits the impact of metabolomics in precision biology and drug development [28]. The initial output from a typical LC-HRMS data processing workflow is a peak table that records the intensities of detected signals. Preprocessing this data to harmonize the dataset and minimize noise is a necessary first step for ensuring data quality and consistency, which in turn enhances the reliability of all downstream machine learning and statistical outcomes [39].

A modern solution to this challenge is the Post-Acquisition Correction Strategy (PARSEC). PARSEC is a three-step workflow designed to improve metabolomics data comparability without the need for long-term quality controls. The workflow consists of:

Combined Data Extraction: Raw data is extracted from different studies or cohorts.
Standardization: The data undergoes batch-wise standardization.
Filtering: Features are filtered based on analytical quality criteria [28].

This strategy, which combines batch-wise standardization and mixed modeling, has been shown to enhance data comparability and scalability. It minimizes the influence of analytical conditions while preserving biological variability, allowing biological information initially masked by unwanted sources of variability to be revealed [28]. Its performance has been demonstrated to outperform the classically used LOESS (Locally Estimated Scatterplot Smoothing) method [28].

The Critical Role of Data Alignment

A fundamental aspect of post-acquisition correction is data alignment. Variations in MS data can arise from differences in analytical platforms or acquisition dates, making alignment essential to ensure the comparability of chemical features across all samples [39]. This alignment process primarily involves three key steps:

Retention Time Correction: Compensates for slight shifts in retention times caused by variations in chromatographic conditions.
Mass-to-Charge Ratio (m/z) Recalibration: Standardizes mass accuracy across different batches.
Peak Matching: Aligns identical chemical features detected across different batches, facilitating accurate compound identification and cross-sample comparison [39].

It is worth noting that different software platforms and instrument types can exhibit different behaviors. For instance, Orbitrap systems coupled with high-performance liquid chromatography often show lower retention time drift than some Q-TOF systems, though their higher mass accuracy may demand more stringent alignment procedures [39].

Validation as a Key to Reliability

For any HRMS method, including those used in large-scale untargeted metabolomics, demonstrating fitness-for-purpose through validation is crucial. One established approach involves validation experiments acquired in untargeted mode across multiple batches, evaluating key performance metrics such as reproducibility, repeatability, and stability [40]. This process often employs levelled quality control (QC) samples to monitor response linearity between batches. A laboratory that successfully validates its methods demonstrates its capability to produce reliable results, which in turn bolsters the credibility of the hypotheses generated from its studies [40].

Experimental Protocols

Protocol 1: Implementing the PARSEC Workflow

This protocol outlines the steps for implementing the PARSEC post-acquisition correction strategy to improve data comparability in multi-batch HRMS metabolomics studies.

3.1.1 Application: Improving data comparability and revealing masked biological variability in multi-batch or multi-study LC-HRMS metabolomics datasets.
3.1.2 Experimental Workflow:
- The following diagram illustrates the three-step PARSEC workflow and its position within a broader LC-HRMS data processing pipeline:

3.1.3 Step-by-Step Procedure:
- Combined Data Extraction:
  - Gather the raw or preprocessed peak tables from all the studies, cohorts, or analytical batches to be integrated.
  - Ensure that the data from different sources is in a compatible format, merging the datasets into a unified initial data matrix where rows represent samples and columns represent aligned chemical features (e.g., m/z and retention time pairs) [28].
- Batch-wise Standardization:
  - For each analytical batch individually, apply a standardization algorithm to correct for systematic bias. This typically involves centering and scaling the intensity values for each feature within a batch.
  - The specific method may involve Z-score normalization or other scaling techniques that adjust for differences in the baseline and variance between batches [28].
  - This step is crucial for minimizing the influence of analytical conditions while preserving the relative biological variability within each batch.
- Quality-Based Feature Filtering:
  - Evaluate the quality of each feature across the entire dataset. Apply filters to remove features that demonstrate poor analytical quality.
  - Common filtering criteria include:
    - High percentage of missing values (e.g., >20%) across QC samples or biological replicates.
    - Poor reproducibility in QC samples (e.g., coefficient of variation >20-30%) [40].
  - This step retains only high-quality, reliable features for downstream analysis, improving the overall robustness of the dataset [28].
3.1.4 Expected Outcome: The final output is a corrected and filtered feature-intensity matrix ready for statistical analysis. The application of PARSEC should result in reduced inter-batch variability, a more homogeneous sample distribution in multivariate models like PCA, and an enhanced ability to detect biologically significant features [28].

Protocol 2: ML-Oriented Data Preprocessing for Retention Time Alignment

This protocol details a machine learning-oriented data preprocessing workflow, with a focus on robust retention time alignment, to prepare HRMS data for advanced pattern recognition.

3.2.1 Application: Preparing a high-quality feature table from raw LC-HRMS data for downstream machine learning tasks such as contaminant source identification or biomarker discovery.
3.2.2 Step-by-Step Procedure:
- Noise Filtering and Missing Value Imputation:
  - Remove features with intensity signals that are indistinguishable from background noise.
  - Impute missing values using methods such as k-nearest neighbors (KNN) to create a complete data matrix, which is required for many ML algorithms [39].
- Data Normalization:
  - Apply normalization to correct for sample-to-sample variations, such as those caused by differences in sample concentration or instrument sensitivity.
  - Common methods include Total Ion Current (TIC) normalization or probabilistic quotient normalization to mitigate batch effects [39].
- Retention Time Alignment and m/z Recalibration:
  - Retention Time Correction: Use algorithms (e.g., in software like XCMS) to compensate for retention time drift. This typically involves identifying a set of anchor peaks present in most samples and using them to model and correct the drift across the entire run [39].
  - m/z Recalibration: Standardize the mass accuracy by aligning the m/z values to a reference, which could be a set of internal standards or a pooled QC sample analyzed throughout the batch.
  - Peak Matching: Finally, apply peak matching algorithms to align identical chemical features detected across all samples in the study, ensuring that each row in the final feature table consistently represents the same compound across the entire dataset [39].
- Exploratory Analysis and Feature Selection:
  - Perform univariate statistics (e.g., t-tests, ANOVA) to identify features with large fold changes between groups.
  - Use dimensionality reduction techniques like Principal Component Analysis (PCA) to visualize sample grouping and identify potential outliers.
  - Before training an ML model, employ feature selection algorithms (e.g., recursive feature elimination) to refine the input variables, which optimizes model accuracy and interpretability [39].

The Scientist's Toolkit: Key Research Reagents & Software

The following table details essential reagents, software, and materials used in post-acquisition HRMS data correction.

Table 1: Essential Research Reagents and Software for HRMS Data Correction

Item Name	Type	Function / Application
Reference Standards / QC Pool	Reagent	A consistent, pooled sample analyzed throughout the batch run; serves as a reference for retention time alignment, m/z recalibration, and monitoring instrumental performance [39].
Certified Reference Materials (CRMs)	Reagent	Used for result validation to verify compound identities and ensure analytical confidence, particularly when identifying key biomarkers or contaminants [39].
Multi-Sorbent SPE Cartridges	Reagent	Used in sample preparation for broad-spectrum analyte recovery; combinations like Oasis HLB with ISOLUTE ENV+ help maximize metabolome coverage, improving downstream data quality [39].
XCMS	Software	A widely used open-source software platform for processing LC-MS data; provides comprehensive tools for peak picking, retention time correction, alignment, and statistical analysis [20] [39].
MZmine	Software	A modular, open-source software for mass spectrometry data processing, offering advanced methods for visualization, peak detection, alignment, and deconvolution [20].
MS-DIAL	Software	An integrated software for mass spectrometry-based metabolomics, providing a workflow from raw data to metabolite annotation, including retention time correction and alignment [20].

The performance of analytical methods and correction strategies is quantified using specific metrics. The table below summarizes key validation parameters from relevant studies.

Table 2: Key Performance Metrics from HRMS Metabolomics Validation and Correction Studies

Metric / Parameter	Reported Value (Method A)	Reported Value (Method B)	Context & Interpretation
Median Repeatability (CV%)	4.5%	4.6%	For validated metabolites on RPLC-ESI+- and HILIC-ESI--HRMS, respectively; indicates high precision within a single run [40].
Median Within-run Reproducibility (CV%)	1.5%	3.8%	For validated metabolites on RPLC-ESI+- and HILIC-ESI--HRMS, respectively; indicates precision across runs within a batch [40].
Median Spearman Correlation (rₛ)	0.93 (N=9)	0.93 (N=22)	Concordance of semi-quantitative results from individual serum samples between methods; shows strong rank-order correlation [40].
Classification Balanced Accuracy	85.5% to 99.5%	N/A	Achieved by ML classifiers (SVC, LR, RF) for screening PFASs from different sources, demonstrating the power of ML after proper data processing [39].
D-ratio (median)	1.91	1.45	A measure of identification selectivity; a lower D-ratio indicates better separation of analyte signal from matrix background [40].

Solving Common RT Alignment Challenges: Parameter Optimization and Quality Control

In high-resolution mass spectrometry (HRMS)-based research, particularly in non-targeted analysis (NTA) and large-scale omics studies, the preprocessing of raw data is a critical step that directly impacts the quality and reliability of all subsequent biological interpretations. A cornerstone of this preprocessing is retention time (RT) correction and alignment, which ensures that the same chemical entities detected across multiple sample runs are accurately matched. The performance of these algorithms is governed by three fundamental parameters: m/z tolerance, RT windows, and score thresholds. Their optimal setting is not universal but is highly dependent on the specific instrumentation, chromatographic setup, and study cohort size. This application note provides a detailed protocol for optimizing these parameters within the context of HRMS data preprocessing, drawing on recent advancements in the field.

Parameter Optimization Guide

The following tables summarize recommended parameter ranges and strategies for optimization based on current literature and software benchmarks.

Table 1: Optimization Guidelines for Critical Preprocessing Parameters

Parameter	Recommended Range	Influencing Factors	Optimization Strategy
m/z Tolerance	5-10 ppm (for high-res MS) [3]0.005 Da (for alignment) [38]	Mass spectrometer accuracy and resolution; Data acquisition mode.	Use instrument's calibrated mass accuracy; Can be widened for complex samples or lower-resolution data.
RT Window	0.3 min (for alignment) [38]Linear scaling to a fixed range (e.g., 80 min for cohort alignment) [3]	Chromatographic system stability; Cohort size and run duration; LC gradient.	Perform pilot tests to assess RT drift; Implement coarse alignment before fine alignment.
Score Thresholds	FDR < 1-5% (for confident alignment) [3]Metrics: Accuracy, F1, MCC (for classifier-based alignment) [41]	Data complexity; Required confidence level; Downstream application.	Use decoy samples for FDR estimation [3]; Validate with known standards or identified features.

Table 2: Parameter Settings from Representative HRMS Studies

Study Context	Software / Tool	m/z Tolerance	RT Window/Alignment	Score/QC Method
Large Cohort Proteomics [3]	DeepRTAlign	10 ppm (feature detection), 0.01 Da (coarse alignment)	1 min window for coarse alignment; Linear scaling	DNN classifier; Decoy sample for FDR
Environmental NTS [38]	MZmine3	0.005 Da	0.3 min (max RT ambiguity)	Gap-filling; Blank subtraction
Mycotoxin Screening [41]	HPLC-HRMS with QSRR	N/S	Machine Learning for RT prediction	Accuracy, F1 Score, MCC
Metabolomics (Cheese) [42]	ROI-MCR & Compound Discoverer	N/S	Data compression via ROI	PCA and ASCA for feature analysis

Experimental Protocols

Protocol 1: Benchmarking Alignment Performance Using a Spiked Standard Mixture

This protocol evaluates the effectiveness of RT alignment parameters by tracking known compounds in a complex matrix.

1. Reagent Preparation:

Internal Standard Mix: Prepare a solution of stable isotope-labeled or chemically distinct analogs of target compounds. Spike into all samples at a consistent concentration.
Quality Control (QC) Sample: Create a pooled sample comprising equal aliquots of all experimental samples.
Matrix-Matched Calibrants: Spike native analytical standards into the sample matrix at various concentrations.

2. Sample Analysis:

Analyze the QC sample repeatedly throughout the analytical sequence to characterize RT drift over time.
Analyze experimental samples in a randomized order.

3. Data Processing and Parameter Optimization:

Feature Detection: Extract all chromatographic features from raw data files.
Alignment with Varied Parameters: Run the alignment algorithm (e.g., in MZmine3, XCMS, or DeepRTAlign) iteratively, systematically varying the m/z tolerance (e.g., from 0.001 to 0.01 Da) and RT window (e.g., from 0.1 to 1.0 min).
Performance Assessment: For each parameter set, calculate:
- Recall: Percentage of spiked internal standards correctly aligned across all runs.
- Precision: Percentage of aligned internal standard features that are correct matches.
- FDR: Estimated using decoy compounds or misaligned features [3].

4. Optimal Parameter Selection:

Select the parameter combination that maximizes recall while maintaining FDR below a pre-defined threshold (e.g., 5%).

Protocol 2: Implementing a Deep Learning-Based Alignment Workflow

This protocol outlines the steps for utilizing a tool like DeepRTAlign, which combines a pseudo warping function with a deep neural network (DNN) for high-performance alignment in large cohort studies [3].

1. Precursor Detection and Feature Extraction:

Use a feature detection tool (e.g., XICFinder, Dinosaur, MZmine3) on raw MS files.
Critical Parameters: Set mass_tolerance to 10 ppm for isotope pattern detection and feature grouping [3].

2. Coarse Alignment (Pseudo Warping):

Linearly scale the RT of all samples to a common range (e.g., 80 minutes).
Divide each sample into sequential RT windows (e.g., 1 minute).
Within each window, calculate the average RT shift of features relative to an anchor sample.
Apply the average shift to all features within that window.
Critical Parameter: Use a 0.01 Da m/z tolerance for matching features between the sample and anchor during this step [3].

3. Binning and Filtering:

Group features by m/z using a sliding window (bin_width) and precision (bin_precision), with default values of 0.03 Da and 2 decimal places, respectively [3].
This step reduces computational complexity by restricting alignment to features within the same m/z bin.

4. Deep Neural Network (DNN) for Fine Alignment:

Input Vector Construction: For each feature-feature pair, create a normalized input vector that includes original RT and m/z values, as well as difference values between the two features [3].
DNN Classification: The DNN (e.g., 3 hidden layers of 5000 neurons each) acts as a classifier, determining if a feature pair should be aligned.
Training: The DNN requires a pre-trained model, typically generated from hundreds of thousands of feature pairs labeled as positive (same peptide) or negative (different peptides) based on identification results [3].

5. Quality Control:

Implement a QC module that constructs decoy samples to empirically estimate the False Discovery Rate (FDR) of the final alignment results [3].

Workflow Visualization

The following diagram illustrates the logical workflow and decision points for parameter optimization in HRMS data preprocessing.

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 3: Key Reagents and Software for HRMS Preprocessing Workflows

Item Name	Function / Application	Example Use Case
Stable Isotope-Labeled Internal Standards	Acts as a reliable internal control for tracking RT shifts and evaluating alignment accuracy.	Spiked into all samples to measure alignment recall and precision in Protocol 1.
Certified Reference Materials (CRMs)	Provides a ground truth for validating compound identities and RT alignment performance.	Used in model validation and for verifying alignment accuracy [39].
Quality Control (QC) Pool Sample	A representative sample used to monitor instrument stability and RT drift over the sequence.	Injected at regular intervals to assess system performance and the need for RT correction [38].
DeepRTAlign	A deep learning-based tool for accurate RT alignment in large cohort studies.	Used in Protocol 2 to handle both monotonic and non-monotonic RT shifts [3].
MZmine3	An open-source software for LC-MS data processing, including feature detection and alignment.	Employed in environmental NTS for feature extraction and alignment with defined parameters [38].
ROIMCR	A chemometric approach for component resolution from LC-HRMS data without peak-picking.	Serves as an alternative to feature-based workflows, offering high consistency [42] [38].

Handling Non-Monotonic Shifts and Large Cohort Variability

In liquid chromatography-high resolution mass spectrometry (LC-HRMS) based proteomic and metabolomic studies, retention time (RT) alignment is a critical preprocessing step, especially for large cohort analyses. RT shifts occur between samples due to various reasons, including matrix effects, instrument performance variability, and chromatographic column aging [3]. While traditional alignment tools have served the community for years, they often struggle with non-monotonic RT shifts—irregular shifts that don't consistently increase or decrease over time—and the substantial variability present in large sample cohorts [3] [43]. These limitations present significant bottlenecks in proteomics and metabolomics research, potentially leading to inaccurate biological conclusions and reduced analytical sensitivity.

The challenge is particularly pronounced in large-scale studies such as clinical proteomics or environmental exposure monitoring, where hundreds or thousands of samples are analyzed. Without proper alignment, corresponding analytes cannot be accurately matched across samples, compromising downstream quantitative, comparative, and statistical analyses [3] [4]. This article examines advanced computational strategies that effectively handle both monotonic and non-monotonic RT shifts while maintaining robustness across large sample sets, thereby enabling more reliable biomarker discovery and clinical translation.

Theoretical Foundations and Methodological Approaches

Classification of RT Alignment Methods

RT alignment methods can be broadly categorized into three computational approaches, each with distinct strengths and limitations for handling non-monotonic shifts and cohort variability:

Warping Function Methods: These approaches correct RT shifts between runs using linear or non-linear warping functions. Representative tools include XCMS, MZmine 2, and OpenMS [3]. These methods model the relationship between retention times in different samples using mathematical functions that compress or stretch the time axis to maximize alignment. A significant limitation of conventional warping methods is their inherent monotonicity constraint—they cannot effectively correct non-monotonic shifts because the warping function must consistently increase or decrease across the chromatographic run [3] [43]. The adjustRtime function in XCMS implements several warping algorithms including Obiwarp, which performs retention time adjustment based on the full m/z-RT data using the original obiwarp algorithm with enhancements for multiple sample alignment by aligning each against a center sample [44].

Direct Matching Methods: These approaches attempt to perform correspondence solely based on feature similarity between runs without using a warping function. Representative tools include RTAlign, MassUntangler, and Peakmatch [3]. These methods typically compare features directly using their m/z values, retention times, and potentially other characteristics like spectral similarity or peak shape. While potentially more flexible for handling non-monotonic patterns, these methods have generally demonstrated inferior performance compared to warping-based approaches due to uncertainties in MS signals [3].

Hybrid and Machine Learning Approaches: Emerging methods combine elements of both approaches while incorporating advanced computational techniques. DeepRTAlign implements a two-stage alignment combining coarse alignment (pseudo warping function) with a deep learning-based model (direct matching) [3]. This hybrid approach allows it to handle both monotonic and non-monotonic shifts effectively. Automatic Time-Shift Alignment (ATSA) employs a multi-stage process involving automatic baseline correction, preliminary alignment through adaptive segment partition, and precise alignment based on test chromatographic peak information [43]. MetHR, designed for GC-HRTMS data, performs peak list alignment using both retention time and mass spectra similarity and can process heterogeneous data acquired under different experimental conditions [45].

The Deep Learning Revolution in RT Alignment

DeepRTAlign represents a significant advancement in RT alignment methodology by leveraging deep neural networks (DNNs) to overcome limitations of conventional approaches. The tool employs a sophisticated architecture with three hidden layers containing 5000 neurons each, functioning as a classifier that distinguishes between feature-feature pairs that should or should not be aligned [3].

The network is trained on 400,000 feature-feature pairs—200,000 positive pairs (features from the same peptides that should be aligned) and 200,000 negative pairs (features from different peptides that should not be aligned) [3]. During training, the model uses BCELoss function in PyTorch with sigmoid activation functions and Adam optimizer with an initial learning rate of 0.001, which is multiplied by 0.1 every 100 epochs [3]. The input vector construction is particularly innovative, considering both the RT and m/z of each feature along with two adjacent features before and after the target feature, normalized using base vectors [5, 0.03] for difference values and [80, 1500] for original values [3].

This deep learning approach enables the model to learn complex, non-monotonic shift patterns directly from data rather than relying on predefined warping functions, resulting in improved handling of the retention time variability commonly encountered in large cohort studies.

Comparative Analysis of Alignment Methods

Table 1: Performance Comparison of RT Alignment Methods for Handling Non-Monotonic Shifts

Method	Algorithm Type	Non-Monotonic Shift Handling	Large Cohort Suitability	Key Advantages	Reported Limitations
DeepRTAlign	Deep Learning Hybrid	Excellent	Excellent	Handles both monotonic and non-monotonic shifts; Improved identification sensitivity without compromising quantitative accuracy	Requires substantial training data; Computational intensity
ATSA	Multi-stage Segmentation	Good	Good	Peak-to-peak alignment strategy; Total Peak Correlation (TPC) criterion	Complex parameter optimization; Segment definition challenges
MetHR	Similarity-based Matching	Good	Moderate	Uses both RT and mass spectra; Handles heterogeneous experimental conditions	GC-MS focused; Limited LC-MS validation
Obiwarp (XCMS)	Warping Function	Limited	Good	Processes full m/z-RT data; No prerequisite feature detection	Primarily designed for monotonic shifts; Profile data required
PeakGroups (XCMS)	Feature-based Warping	Limited	Good	Uses housekeeping compounds; Flexible smooth functions (loess/linear)	Requires preliminary feature grouping; Dependent on reference compounds
Traditional Warping Methods	Warping Function	Poor	Moderate	Established algorithms; Wide implementation	Monotonicity constraint; Limited complex shift correction

Table 2: Technical Specifications of Advanced Alignment Tools

Tool	Input Data Format	Feature Detection	Alignment Basis	Quality Control Metrics	Implementation
DeepRTAlign	Raw MS files	XICFinder (in-house)	DNN classification with coarse alignment	Decoy-based FDR calculation	Python/PyTorch
ATSA	Chromatographic signals	Multi-scale Gaussian smoothing	Segment-based peak matching	Correlation coefficients; Total Peak Correlation	Not specified
MetHR	Peak lists	Spectral deconvolution	z-score transformed RT + mass spectral similarity	AUC >0.85 in ROC curves for spiked-in compounds	Not specified
XCMS	Raw/profile data	CentWave or MatchedFilter	Obiwarp or PeakGroups	Alignment quality visualizations	R/Bioconductor

Detailed Experimental Protocols

DeepRTAlign Implementation Protocol

Experimental Workflow Overview:

Figure 1: DeepRTAlign Computational Workflow

Step-by-Step Procedure:

Precursor Detection and Feature Extraction
- Utilize XICFinder (similar to Dinosaur) for precursor detection and feature extraction [3].
- Apply a mass tolerance of 10 ppm for isotope pattern detection.
- Process Thermo raw files directly using the Component Object Model (COM) of MSFileReader.
- Detect isotope patterns in each spectrum, then merge isotope patterns from subsequent spectra into features.
Coarse Alignment
- Linearly scale retention times across all samples to a standardized range (e.g., 80 minutes).
- For each m/z value, select the feature with the highest intensity to build a representative list for each sample.
- Divide all samples (except the anchor sample) into segments using a user-defined RT window (default: 1 minute).
- Compare features in each segment with features in the anchor sample using a mass tolerance of 0.01 Da.
- Calculate the average RT shift for feature pairs in each segment and apply this correction.
Binning and Filtering
- Group features based on m/z values using parameters bin_width (default: 0.03) and bin_precision (default: 2).
- Apply optional filtering to retain only the highest intensity feature in each m/z window within the user-defined RT range.
Input Vector Construction
- Construct 5×8 input vectors considering RT and m/z of each feature along with two adjacent features.
- Include both original values (Part 1 and Part 4) and difference values between samples (Part 2 and Part 3).
- Normalize difference values using base1 [5, 0.03] and original values using base2 [80, 1500].
DNN Processing
- Process feature pairs through the three hidden layers (5000 neurons each).
- Classify feature pairs as alignable or non-alignable using sigmoid activation function.
- Utilize trained model (400,000 feature-feature pairs) for prediction.
Quality Control
- Implement decoy-based false discovery rate (FDR) calculation.
- Randomly select samples as targets and build decoys in each m/z window.
- Assume no features in decoy samples should align, enabling FDR estimation.

Critical Parameters for Large Cohort Studies:

Epoch number: 400 (determined empirically when loss stabilizes)
Batch size: 500
Learning rate: 0.001 with reduction factor of 0.1 every 100 epochs
Anchor sample: Typically the first sample in the cohort
RT window for segmentation: 1 minute (adjustable based on chromatographic complexity)

ATSA Alignment Protocol for Complex Chromatograms

Experimental Workflow Overview:

Figure 2: ATSA Method Workflow

Step-by-Step Procedure:

Baseline Correction and Peak Detection
- Apply Local Minimum Values-Robust Statistical Analysis (LMV-RSA) to eliminate baseline drift [43].
- Extract local minimum values (LMVs) in the chromatogram and mark corresponding positions.
- Use iterative optimization with RSA to remove LMVs belonging to chromatographic peaks.
- Estimate baseline drift using linear interpolation.
- Perform automatic chromatographic peak detection using multi-scale Gaussian smoothing [43].
- Identify chromatographic peaks as local maximal values maintained under various Gaussian smoothing scales.
- Extract chromatographic information: retention time, peak elution range, peak height, and peak area.
Preliminary Alignment Stage
- Select the chromatogram with the highest correlation coefficient as the reference.
- Set initial time-shift value (default: 0.5 minutes).
- Initialize segment size (default: 3 minutes) based on elution range.
- Assign peaks to segments: if the retention time distance between current and first peaks in a segment is less than 3 minutes, add to the same segment; otherwise, create a new segment.
- Consolidate segments with fewer than three chromatographic peaks with neighboring segments.
- Determine segment boundaries as the average value of two successive segments.
- Establish test segment boundaries by combining reference segment boundaries with pre-estimated time-shift value.
Precise Alignment Stage
- Utilize Total Peak Correlation (TPC) as the alignment criterion instead of standard correlation coefficient [43].
- Calculate TPC using the formula: [ \text{TPC} = \left( \frac{\sum{i=1}^I wi \cdot ci}{\sum{i=1}^I wi} \right) \cdot \left( \frac{I}{N} \right) ] where (wi = \frac{\text{PeakArea}i}{\text{PeakLength}i}), (c_i) is the peak correlation coefficient, (I) is the number of matched peaks, and (N) is the number of reference peaks [43].
- Implement peak-to-peak alignment strategy constrained by reference chromatogram boundaries.
- Perform alignment based on the largest peak in the reference segment.
- Apply robust statistical methods to detect and correct inadequately aligned segments.

Validation and Quality Assessment:

Calculate correlation coefficients between aligned samples
Verify alignment accuracy using spiked-in standards when available
Assess peak matching accuracy across the entire cohort

Quality Assurance and Method Validation

QA/QC Framework for LC-HRMS Data Preprocessing

For large cohort studies, implementing robust quality assurance and quality control (QA/QC) procedures is essential to ensure alignment reliability. The European Partnership for the Assessment of Risks from Chemicals (PARC) has proposed harmonized QA/QC guidelines to assess the sensitivity of feature detection, reproducibility, integration accuracy, precision, accuracy, and consistency of data preprocessing [4].

Key QA/QC provisions include:

Sensitivity of Feature Detection: Assess the ability to detect low-abundance compounds and minimize false negatives.
Reproducibility: Evaluate consistency of feature detection and alignment across technical replicates and batches.
Integration Accuracy: Verify precision of peak area and height quantification post-alignment.
Preprocessing Consistency: Monitor performance across different sample types and concentrations.

Validation Methods for Alignment Accuracy

Spiked-in Standard Validation:

Incorporate known compounds at defined concentrations across samples
Measure alignment accuracy for spiked-in compounds
Target AUC >0.85 in ROC curves for correctly aligned spiked-in compounds [45]

Cross-Validation Approaches:

Implement k-fold cross-validation for parameter optimization
Use separate training and validation cohorts when applying machine learning methods
Assess generalizability across different sample types and conditions

Decoy-Based FDR Estimation:

Implement target-decoy approaches similar to DeepRTAlign
Calculate FDR based on alignments to decoy samples
Maintain FDR below established thresholds (e.g., 1% or 5%)

Table 3: QA/QC Metrics for Alignment Validation

Validation Type	Specific Metrics	Acceptance Criteria	Application Context
Spiked-in Standards	Alignment recovery rate; Quantitative accuracy	>85% recovery; AUC >0.85	Targeted validation; Method development
Feature-Based QC	Peak capacity; Total feature count; Missing data rate	Consistent across samples; <20% missing data after alignment	Large cohort studies; Batch effects monitoring
Reproducibility	Coefficient of variation; Intra-batch correlation	CV <30%; Correlation >0.8	Technical replicates; Process evaluation
Downstream Analysis	Multivariate model quality; Classification accuracy	Improved post-alignment; Statistically significant gains	Biological validation; Method impact assessment

Table 4: Key Research Reagent Solutions for RT Alignment Studies

Reagent/Resource	Function/Application	Implementation Example	Considerations for Large Cohorts
Spiked-in Compound Standards	Alignment accuracy assessment; Quantitative calibration	28 acid standards in MetHR validation; Restek MegaMix for GC	Cover relevant RT range; Non-interfering with samples
Internal Standard Mixtures	Retention time normalization; Instrument performance monitoring	Deuterated semi-volatile internal standards; C7-C40 n-alkanes	Consistent addition across all samples
Reference Chromatograms	Alignment targets; Quality benchmarks	Highest correlation sample; Pooled quality control samples	Representativeness of entire cohort
Benchmark Datasets	Method development; Comparative performance assessment	Publicly available LC-HRMS datasets; Simulated shift datasets	Documented shift patterns; Ground truth availability
Quality Control Samples	Process monitoring; Batch effect correction	Pooled samples; Reference materials	Even distribution throughout sequence
Software Containers	Computational reproducibility; Environment consistency	Docker/Singularity containers with tool dependencies	Version control; Dependency management

Effective handling of non-monotonic shifts and large cohort variability remains a critical challenge in LC-HRMS data preprocessing. Traditional warping methods face fundamental limitations due to their monotonicity constraints, while direct matching approaches often lack robustness. The emerging generation of alignment tools—particularly deep learning-based hybrids like DeepRTAlign and sophisticated segmentation approaches like ATSA—demonstrate significantly improved capability for managing complex retention time shifts in large sample cohorts.

These advanced methods share several key characteristics: multi-stage alignment strategies that combine global and local correction, intelligent use of feature relationships beyond simple retention time, incorporation of quality control mechanisms, and flexibility to accommodate both monotonic and non-monotonic shift patterns. Implementation requires careful attention to parameter optimization, quality assurance protocols, and validation using appropriate standards and benchmarks.

As LC-HRMS technologies continue to evolve toward higher throughput and larger cohort sizes, further development of robust, scalable alignment methods will remain essential. Integration of these alignment tools with comprehensive QA/QC frameworks will enhance reliability and reproducibility in proteomic and metabolomic studies, ultimately supporting more confident biological conclusions and clinical translations.

High-Resolution Mass Spectrometry (HRMS) generates complex, information-rich datasets essential for modern applications in exposomics, environmental monitoring, and drug development. However, the analytical workflow is frequently compromised by three pervasive data quality issues: missing values, high noise levels, and low-abundance features. These challenges are particularly pronounced in non-targeted analysis (NTA), where comprehensive detection of unknown compounds is paramount [46]. In typical NTA studies, fewer than 5% of detected compounds can be confidently identified, partly due to these data quality limitations [46]. Effectively addressing these issues during preprocessing is therefore critical for ensuring the reliability of downstream chemical and biological interpretations. This protocol provides detailed methodologies for diagnosing and correcting these common data quality problems within HRMS preprocessing workflows, with particular emphasis on their impact on retention time correction and alignment.

The table below summarizes the core data quality issues, their estimated prevalence in HRMS data, and primary origins.

Table 1: Prevalence and Origins of Major Data Quality Issues in HRMS Data

Data Quality Issue	Typical Prevalence in HRMS Data	Primary Causes
Missing Values	Up to 20% of all values in MS-based datasets [47]	- MCAR (Missing Completely at Random): Measurement errors.- MAR (Missing at Random): Probability of missingness depends on other variables.- MNAR (Missing Not at Random): Peaks below detection limit or peak picking thresholds [47].
Noise	Varies significantly with instrumentation and sample matrix	- Electronic noise from detectors.- Chemical background from samples or solvents.- Co-elution and matrix effects that obscure relevant signals [46].
Low-Abundance Features	A large proportion of detected features; exact quantification is complex	- Trace-level contaminants or metabolites.- Ion suppression from high-abundance compounds.- Inefficient ionization for certain chemical classes [46].

Experimental Protocols for Data Quality Assessment and Mitigation

Protocol for Classifying and Imputing Missing Values

The accurate handling of missing values requires a methodical approach to classify the nature of the missingness before applying an appropriate imputation strategy.

A. Missing Value Classification A critical first step is to classify missing values as either Missing at Random (MAR) or Missing Not at Random (MNAR). This classification can be performed using criteria based on technical replicates [47].

Data Preparation: Utilize a data matrix where features (ions) are defined by their m/z, retention time, and intensity across samples.
Threshold Definition: For each feature, establish an intensity threshold, which can be derived from the Limit of Detection (LOD) or the minimum intensity observed in samples where the feature is present.
Classification Rule:
- A value is classified as MNAR if it is missing in a sample but detected in at least one technical replicate of that same sample at an intensity above the defined threshold. This pattern indicates the analyte was present but failed to be detected or picked by the peak-picking algorithm.
- A value is classified as MAR if it is missing in a sample and also absent from all technical replicates of that sample, suggesting a random failure in measurement or peak extraction.

B. Strategic Imputation Based on Classification Following classification, apply targeted imputation methods [47]:

For MNAR Values: Impute using a method like "mean-LOD", where missing values are replaced by a value derived from the limit of detection (e.g., L/2, where L is the estimated LOD for that feature). This reflects the fact that the signal was likely below the instrument's detection capability.
For MAR Values: Impute using a multivariate statistical method like SVD-QRILC (Singular Value Decomposition with Quantile Regression Imputation of Left-Censored Data), which leverages correlations between features to estimate plausible values.
Forced Integration: As an alternative to the above classification-based approach, tools like the xcms.fillPeaks module in the XCMS R package can be used to perform a forced integration of the raw data in the regions where peaks are expected, often providing imputed values closer to reality [47].

Protocol for Mitigating Noise and Prioritizing Features in NTS

Reducing noise and prioritizing chemically relevant features is essential for managing dataset complexity.

A. Data Quality Filtering

Blank Subtraction: Remove features detected in procedural blank samples using a defined fold-change threshold (e.g., feature intensity in the sample must be X times greater than in the blank) to eliminate background noise and carryover [48] [38].
Componentization: Group related signals (adducts, in-source fragments, isotopologues) that arise from a single compound. This reduces data dimensionality and minimizes the misclassification of noise as unique features [46].

B. Advanced Prioritization Strategies To focus on the most relevant features, employ a multi-strategy prioritization framework [48] [49]:

Chemistry-Driven Prioritization: Use HRMS data properties to flag features of interest, such as halogenated substances (based on isotopic patterns) or specific transformation products.
Process-Driven Prioritization: Use spatial, temporal, or process-based comparisons (e.g., pre- vs. post-treatment samples) to identify features with significant changes in intensity.
Effect-Directed Analysis (EDA): Link chemical features to observed biological effects. A novel extension is to use machine learning models (e.g., Random Forest Classification) to predict toxicity categories directly from MS1, retention time, and fragmentation data, bypassing the need for full compound identification [46].
Prediction-Based Prioritization: Apply Quantitative Structure-Property Relationship (QSPR) models or other machine learning approaches to estimate environmental risk or concentration levels for prioritization [48].

The following workflow diagram integrates the protocols for handling missing values and noise into a comprehensive HRMS data preprocessing pipeline.

Diagram 1: HRMS Data Preprocessing Workflow

Performance Benchmarking of Imputation and Preprocessing Methods

Selecting the optimal method requires benchmarking, as performance is highly dependent on data characteristics such as distribution, missingness mechanism, and skewness.

Table 2: Benchmarking of Selected Imputation and Preprocessing Methods

Method Category	Example Tool/Algorithm	Key Performance Findings	Considerations for HRMS Data
Flexible Imputation	Pympute's Flexible Algorithm	Significantly outperformed single-model approaches on real-world EHR datasets, achieving the lowest MAPE and RMSE [50].	Intelligently selects the best imputation model (linear or nonlinear) for each variable. On skewed data, it consistently favored nonlinear models like Random Forest (RF) and XGBoost [50].
Non-Targeted Screening Workflows	MZmine3 (Feature Profile)	Showed high sensitivity to treatment effects but increased susceptibility to false positives. Performance varied significantly with processing parameters [38].	Offers flexibility but requires careful parameter optimization. Agreement with other workflows can be low [38].
Non-Targeted Screening Workflows	ROIMCR (Component Profile)	Provided superior consistency, reproducibility, and temporal clarity, but exhibited lower treatment sensitivity compared to MZmine3 [38].	A powerful multi-way chemometric alternative to standard peak-picking, directly recovering "pure" component profiles from complex data [38].
Toxicity Prioritization	Random Forest Classification (RFC) with MS1, RT, and Fragmentation Data	Effectively linked LC-HRMS features to aquatic toxicity categories without requiring full compound identification, enabling risk-based prioritization [46].	Highly valuable for offline prioritization in environmental studies. Requires good-quality MS2 data for optimal performance [46].

The Scientist's Toolkit: Essential Research Reagents and Software

A robust HRMS preprocessing workflow relies on a combination of specialized software tools and analytical standards.

Table 3: Essential Reagents and Software for HRMS Data Preprocessing

Item Name	Category	Function/Benefit	Example Use Case
Internal Standards (ISs)	Research Reagent	Correct for instrumental drift, matrix effects, and variations in sample preparation. Enable retention time calibration.	Added to every sample and quality control (QC) sample before analysis to normalize feature intensities and aid alignment [38].
Chemical Standards for QC	Research Reagent	Used to monitor instrument stability, optimize data processing parameters, and ensure all target substances are detected.	A set of 11 chemical standards used in QC samples to tune ROI and MZmine3 feature extraction parameters [38].
Pympute	Software Package (Python)	A flexible imputation toolkit that intelligently selects the optimal imputation algorithm for each variable in a dataset.	Addressing missing values in EHR or other structured data where variables have different underlying distributions [50].
MZmine 3	Software Tool	An open-source, flexible software for LC-MS data processing, supporting feature detection, alignment, and identification.	Building a feature list from raw LC-HRMS data in an environmental NTS study [38].
ROIMCR	Software Tool (MATLAB)	A multi-way chemometric method that uses Regions of Interest and Multivariate Curve Resolution to resolve component profiles directly.	Processing complex LC-HRMS datasets to achieve more consistent and reproducible feature detection than traditional peak-picking [38].
SIRIUS/CSI:FingerID	Software Tool	Powerful tools for predicting molecular formulas and compound structures from MS/MS data.	Identifying unknown features after preprocessing and prioritization, leveraging in-silico fragmentation matching [46].

Best Practices for Quality Control and False Discovery Rate (FDR) Estimation

In high-resolution mass spectrometry (HRMS)-based omics studies, the preprocessing of raw data is a critical step that directly impacts all subsequent biological interpretations. Retention time (RT) alignment across multiple liquid chromatography (LC)-MS runs is particularly crucial in large cohort studies, as it ensures that the same analyte is correctly matched despite analytical variations [3]. Without rigorous quality control (QC) and false discovery rate (FDR) estimation, even sophisticated alignment algorithms can produce results plagued by both false positive and false negative features, leading to compromised biological conclusions [51] [4]. This application note establishes a comprehensive framework for implementing QC procedures and FDR estimation methods specifically within the context of HRMS data preprocessing, with emphasis on retention time correction alignment research.

Foundational Concepts

The Critical Role of Quality Control in Data Preprocessing

Data preprocessing in LC-HRMS workflows transforms raw instrument data into a list of detected signals (features) characterized by mass-to-charge ratio (m/z), retention time, and intensity [4]. The quality of this step is paramount, as unoptimized feature detection can lead to:

False Positives (Type I Error): Noise being reported as a real feature.
False Negatives (Type II Error): Genuine peaks being missed, potentially omitting biologically relevant compounds from downstream analysis [4].

The limitations observed during preprocessing include incomplete peak-picking for low-abundance compounds and significant reproducibility issues between different laboratories and software tools [4]. These challenges are exacerbated in non-targeted analysis (NTA) and suspect screening analysis (SSA), where the goal is comprehensive detection of chemicals without prior knowledge of all potential compounds.

Understanding False Discovery Rate in Multiple Testing

When conducting thousands of statistical tests simultaneously in omics studies, traditional significance thresholds become problematic. The False Discovery Rate (FDR) has emerged as the standard error metric for large-scale inference problems, defined as the expected proportion of false discoveries among all features called significant [52]. Formally, FDR = E[V/R], where V is the number of false positives and R is the total number of discoveries [52] [53].

The q-value is the FDR analog of the p-value, representing the minimum FDR at which a feature may be called significant [52]. A q-value threshold of 0.05 indicates that approximately 5% of the features called significant are expected to be false positives. This approach provides more power compared to family-wise error rate (FWER) controls like Bonferroni correction, especially in high-dimensional settings where many true positives are expected [52] [53].

Table 1: Comparison of Error Control Methods in Multiple Testing

Method	Error Rate Controlled	Key Principle	Best Use Case
Bonferroni	Family-Wise Error Rate (FWER)	Controls probability of ≥1 false positive by using α/m threshold	Small number of hypotheses; confirmatory studies
Benjamini-Hochberg	False Discovery Rate (FDR)	Sequential p-value method controlling expected proportion of false discoveries	High-dimensional data with positive dependency between tests
Storey's q-value	FDR (estimated)	Bayesian approach estimating FDR for each feature; uses proportion of true null hypotheses (π₀)	Large-scale discovery studies with many expected true positives
Two-stage Benjamini-Hochberg	FDR (adaptive)	Adapts to estimated proportion of true null hypotheses	Independent tests with moderate proportion of true alternatives

Quality Control Procedures for Retention Time Alignment

QC Integration in Alignment Workflows

Effective retention time alignment requires integrated QC measures throughout the preprocessing pipeline. The DeepRTAlign tool exemplifies this approach by incorporating a dedicated QC module that calculates the final FDR of alignment results [3]. This is achieved by randomly selecting a sample as a target and constructing its decoy, under the principle that all features in the decoy sample should not be aligned, thus providing a basis for FDR estimation [3].

Adaptive algorithms, such as the one implemented in the Proteios Software Environment, actively incorporate quality metrics into parameter estimation rather than merely reporting them post-analysis [54]. These algorithms estimate critical alignment parameters (m/z and retention time tolerances) directly from the data by maximizing precision and recall metrics, thereby minimizing systematic bias introduced by inappropriate default settings [54].

Practical QC Protocol for RT Alignment

The following protocol outlines a standardized approach for implementing QC during retention time alignment:

Sample Preparation and Experimental Design:

Incorporate quality control (QC) samples: pooled samples or reference standards analyzed at regular intervals throughout the analytical sequence.
Ensure QC samples closely mimic the composition of actual study samples.

Data Preprocessing with Integrated QC:

Perform feature detection using optimized parameters. Tools like XICFinder, OpenMS, or MZmine can be employed [3] [4].
Execute retention time alignment using advanced tools (e.g., DeepRTAlign) that handle both monotonic and non-monotonic RT shifts [3].
Apply the algorithm's built-in QC measures. For custom pipelines, implement:
- Precision and Recall Calculation: Assess alignment accuracy using known internal standards or identified features [54].
- FDR Estimation: Use decoy-based methods to estimate false alignment rates [3].

Alignment Quality Assessment:

Monitor intensity profiles of internal standards across samples for consistent peak shapes and intensities post-alignment.
Track the number of aligned features across samples; sudden drops may indicate alignment failures.
Calculate coefficients of variation (CV) for internal standards; CV > 20-30% suggests potential alignment issues requiring investigation.

Documentation and Reporting:

Record all software tools, versions, and key parameters used.
Document QC metrics including number of features detected, percentage aligned, and FDR estimates.

QC Workflow for RT Alignment

FDR Estimation Methods in Practice

Target-Decoy Strategy for FDR Estimation

The target-decoy approach has become the gold standard for FDR estimation in proteomics and metabolomics [55] [56]. The fundamental principle involves searching spectra against a concatenated database containing real (target) and artificial (decoy) sequences, with the assumption that false identifications are equally likely to match target or decoy sequences [56].

The standard FDR calculation is: FDR = (2 × Number of Decoy Hits) / (Total Number of Hits) [56]. Advanced implementations, such as the "picked" protein FDR approach, treat target and decoy sequences of the same protein as a pair rather than individual entities, choosing either the target or decoy based on which receives the highest score [55]. This method eliminates conceptual issues in the classic protein FDR approach that cause overprediction of false-positive protein identification in large data sets [55].

Table 2: Common Target-Decoy Methods and Applications

Method	Key Principle	Advantages	Limitations
Standard Target-Decoy	Concatenated target/decoy database search	Simple implementation; widely understood	Requires careful decoy generation; assumptions of equal size
Decoy Fusion	Target and decoy sequences fused for each protein	Maintains equal target/decoy size in multi-round searches; avoids uneven bonus scoring	More complex database preparation
Picked FDR	Target/decoy pairs chosen by highest score	More accurate for protein-level FDR; stable for large datasets	Primarily applied at protein level rather than PSM level

Important Considerations and Limitations of FDR Methods

While FDR methods are powerful, they have important limitations that researchers must recognize:

Dependency Structure: In datasets with strongly correlated features, FDR correction methods like Benjamini-Hochberg (BH) can counter-intuitively report very high numbers of false positives, even when all null hypotheses are true [51]. This is particularly problematic in metabolomics data where high degrees of dependency are common [51].
Low-dimensional Settings: FDR methods, particularly those estimating the proportion of true null hypotheses (π₀), perform poorly when the number of tested hypotheses is small [53]. In such cases, FWER methods like Bonferroni may be more appropriate despite being more conservative [53].
Common Misapplications: Several practices invalidate target-decoy FDR estimation, including: (1) using multi-round search approaches that create unequal target/decoy sizes; (2) incorporating protein-level information into peptide scoring without appropriate adjustments; and (3) overfitting during result re-ranking that eliminates decoy hits but not false target hits [56].

Integrated Workflow for QC and FDR Control

Implementing a robust framework that integrates both QC procedures and appropriate FDR control is essential for generating reliable results in HRMS-based studies. The following workflow represents best practices:

Integrated QC-FDR Workflow

The Scientist's Toolkit: Essential Research Reagents and Computational Solutions

Table 3: Key Research Reagent Solutions for HRMS Data Preprocessing

Tool/Category	Specific Examples	Primary Function	Application Notes
RT Alignment Tools	DeepRTAlign, XCMS, MZmine 2, OpenMS	Correct retention time shifts between runs	DeepRTAlign handles both monotonic and non-monotonic shifts using deep learning [3]
Feature Detection	XICFinder, Dinosaur, MS-DIAL	Extract peptide/compound features from raw MS data	Optimize parameters for specific instrument platforms [3] [4]
FDR Estimation	Target-decoy, Picked FDR, Decoy Fusion	Estimate false discovery rates for identifications	Decoy fusion method avoids common pitfalls of standard target-decoy [55] [56]
Quality Metrics	Precision, Recall, CV, Feature Counts	Assess data quality throughout pipeline	Implement automated quality monitoring with predefined thresholds [54]
Benchmark Datasets	PARC QA/QC provisions, Public repositories	Validate preprocessing pipelines	Use to optimize parameters and compare software performance [4]

Robust quality control and appropriate false discovery rate estimation are not optional components but fundamental requirements for generating reliable results in HRMS-based omics studies. As retention time alignment algorithms become more sophisticated, integrating comprehensive QC procedures and validated FDR estimation methods throughout the data preprocessing pipeline ensures that technical artifacts do not obscure genuine biological signals. The protocols and guidelines presented here provide a structured approach to maintaining data integrity from raw data acquisition through to biological interpretation, with particular emphasis on the challenges specific to retention time correction in large cohort studies. By adopting these best practices, researchers can significantly enhance the reproducibility and reliability of their findings in chemical exposure assessment, biomarker discovery, and other applications of HRMS-based technologies.

Integrating Multiple Ion Species and Adducts for Comprehensive Coverage

In high-resolution mass spectrometry (HRMS)-based proteomic and metabolomic studies, a single analyte can generate a multitude of ions, including adducts, isotopes, and fragments, during the electrospray ionization (ESI) process [57] [58]. This diversity, while rich in information, presents a significant challenge for accurate compound quantification and alignment across multiple samples. Traditional methods that select a single ion species for quantification are often inadequate, as the relative abundance of different ion types can vary considerably with instrumental conditions, such as the type of electrospray source and temperature [58]. Failure to integrate information from these multiple ion species can lead to incomplete feature detection, misalignment, and ultimately, quantitation errors, thereby reducing the coverage and accuracy of an experiment [57] [58] [48].

This article details protocols for the comprehensive annotation and integration of multiple ion species to enhance coverage in LC-HRMS data preprocessing, with a specific focus on improving retention time (RT) alignment for large cohort studies. By correctly grouping all ions derived from the same metabolite or peptide, researchers can represent a compound by its monoisotopic mass, which provides a more stable and accurate basis for matching corresponding features across different runs, even in the presence of complex RT shifts [57] [3].

Key Concepts and Definitions

Ion Annotation is the computational procedure for recognizing groups of ions that originate from the same underlying compound [57]. In LC-MS based omics, one analyte is frequently represented by several peak features with distinct m/z values but similar retention times. The primary types of ions include:

Adducts: Ions formed by the interaction of the analyte with other molecules or ions present in the solvent or system (e.g., [M+H]⁺, [M+Na]⁺, [M+NH₄]⁺ in positive mode, or [M-H]⁻ in negative mode) [57].
Isotopes: Ions representing the same molecule but containing heavier natural isotopes (e.g., ¹³C, ²H, ¹⁵N), leading to characteristic mass differences and intensity patterns in the mass spectrum [57].
Multiply Charged Ions: Large molecules, such as palytoxin analogues, can acquire multiple charges during ionization (e.g., [M+2H]²⁺, [M+3H]³⁺), which appear at lower m/z values [58].
In-Source Fragments: Ions resulting from the fragmentation of the parent molecule within the ion source before mass analysis [57].

Retention Time Alignment is a critical preprocessing step that corrects for retention time shifts of the same analyte across different LC-MS runs. These shifts can be monotonic (increasing or decreasing linearly over time) or non-monotonic (variable and complex), caused by factors like column aging, sample matrix effects, and gradient inconsistencies [10] [3]. Accurate alignment is a prerequisite for correct correspondence, which is the process of finding the same compound across multiple samples [3].

Protocol: Ion Annotation-Assisted Workflow for Enhanced Coverage

This protocol describes a method to determine overlapping ions across multiple experiments by leveraging ion annotation, thereby providing better coverage and more accurate metabolite identification compared to traditional methods [57].

Research Reagent Solutions and Materials

Table 1: Essential Research Reagents and Software Tools

Item Name	Function/Description	Example Sources/Platforms
LC-HRMS System	Separates and detects ions from complex mixtures.	UHPLC coupled to high-resolution mass spectrometer (e.g., Orbitrap) [58].
Data Preprocessing Software	Detects peaks, performs initial RT alignment, and normalizes data.	XCMS, MZmine 2, MetaboAnalyst [57].
Ion Annotation Tools	Groups isotopes, adducts, and fragments into ion clusters.	Built-in modules in XCMS, MZmine 2, or SIRIUS [57].
Statistical Analysis Software	Identifies significant differences in ion intensities between sample groups.	R, Python with appropriate packages (e.g., for t-test, ANOVA) [57].
Metabolite Databases	Used for mass-based search and putative identification.	Human Metabolome Database (HMDB), Metlin, LipidMaps [57].

Experimental Workflow and Methodology

The following diagram illustrates the logical workflow of the ion annotation-assisted method for analyzing ions from multiple LC-MS experiments.

Figure 1: Workflow for ion annotation-assisted analysis across multiple experiments. This process improves the accuracy of identifying overlapping metabolites by using monoisotopic mass for comparison instead of individual ion masses.

Step-by-Step Procedure:

LC-MS Data Preprocessing [57]
- Input: Raw LC-MS data files from multiple experiments.
- Peak Detection: Use tools like XCMS [57] or MZmine 2 [57] [3] to convert raw data into a list of ion features, each characterized by its mass-to-charge ratio (m/z), retention time (RT), and intensity.
- Retention Time Alignment: Perform an initial rough alignment of features across samples using algorithms within the preprocessing software (e.g., correlation optimized warping) [10] [57].
- Normalization: Correct for systematic biases across samples induced during preparation and data acquisition.
Ion Annotation [57]
- For each sample, process the feature list to recognize and group ions that are likely derived from the same metabolite.
- Deisotoping: Cluster isotopic ions that correspond to the same compound. Tools like MZmine 2 or Decon2LS can be used, which analyze isotopic distributions [57].
- Adduct and Fragment Identification: Identify peaks that represent common adducts or in-source fragments associated with a parent ion. This is typically based on known m/z differences and correlated retention times.
Representation by Monoisotopic Mass
- Following annotation, each group of ions (cluster) is represented by the monoisotopic mass of the compound [57].
- This step collapses multiple feature entries (e.g., one for [M+H]⁺, one for [M+Na]⁺, and one for the ¹³C isotope of [M+H]⁺) into a single, more robust data point for cross-sample comparison.
Determining Overlapping Ions Across Experiments
- Compare the processed ion lists from each experiment using the monoisotopic mass as the key identifier [57].
- A mass tolerance window (e.g., ±10 ppm) is applied to match features across different runs.
- This method allows for the correct identification of overlaps even if different derivative ions (e.g., an adduct in one run and an isotope in another) were selected as significant in different experiments.

Application Note: Quantitative Analysis of Multiply Charged Toxins

The principles of integrating multiple ion species are critically important for the accurate quantification of complex molecules, as demonstrated in the analysis of palytoxin (PLTX) analogues [58].

Challenge: PLTX analogues produce ESI-HRMS spectra with a large number of mono- and multiply charged ions, including adducts with cations (Na⁺, K⁺, Ca²⁺). The profile and relative abundance of these ions can vary with instrument conditions, such as the electrospray source temperature [58]. Relying on a single ion for quantification can lead to significant errors.

Solution: A robust quantitative method was developed that incorporates ions from different multiply charged species to overcome variability in the toxin's mass spectrum profile [58].

Table 2: Key Ions for Palytoxin Analogues Quantification

Toxin Type	Ion Species Examples	Charge State	Quantitation Relevance
Palytoxin (PLTX) Analogues	[M + 2H–H₂O]²⁺, [M + 2H]²⁺, [M + H + Ca]³⁺, [M + 2H + K]³⁺	Doubly and Triply Charged	Using a heated electrospray (HESI) at 350°C and integrating signals from multiple charged species provides a more reliable and robust quantitative result than using a single ion [58].

Protocol: Advanced Retention Time Alignment with DeepRTAlign

For large cohort studies, advanced RT alignment tools are required to handle complex non-monotonic shifts. DeepRTAlign is a deep learning-based tool that combines a pseudo-warping function with a direct-matching neural network to address this challenge [3].

Experimental Workflow and Methodology

The following diagram outlines the two-part workflow of DeepRTAlign for aligning features across multiple LC-MS runs.

Figure 2: DeepRTAlign workflow for large cohort LC-MS data analysis. The tool combines coarse alignment with a deep neural network for high-accuracy feature matching.

Step-by-Step Procedure:

Feature Extraction and Coarse Alignment [3]
- Input: Raw MS files from a large cohort.
- Precursor Detection: Use a feature extraction tool (e.g., XICFinder) to detect isotope patterns and consolidate them into features.
- Coarse Alignment: Linearly scale the RT of all samples to a common range. Then, divide each sample into small RT pieces (e.g., 1-minute windows) and calculate the average RT shift of features within each piece relative to an anchor sample. Apply this average shift to correct all features in the piece.
Binning and Input Vector Construction [3]
- Group all features from all samples based on their m/z values within a specified window (e.g., bin width of 0.03 m/z).
- For each feature in a bin, construct an input vector for the neural network. This vector includes:
  - The original RT and m/z of the target feature and its neighbors.
  - The difference in RT and m/z between features from two different samples.
  - These values are normalized using base vectors before being fed into the network.
Deep Neural Network for Feature Matching [3]
- The DNN in DeepRTAlign is a classifier with three hidden layers (5,000 neurons each).
- It is trained on hundreds of thousands of feature-feature pairs labeled as either "should be aligned" (positive) or "should not be aligned" (negative), based on ground-truth identification results.
- The trained model evaluates pairs of features and assigns a probability that they represent the same analyte.
Quality Control [3]
- A QC module calculates the false discovery rate (FDR) of the alignment by creating decoy samples. Features aligned to these decoys are used to estimate the error rate.

The integration of multiple ion species and adducts is not merely an optional refinement but a necessary strategy for achieving comprehensive and accurate coverage in HRMS-based omics studies. By systematically annotating all derivative ions and representing a compound by its monoisotopic mass, researchers can significantly improve the reliability of cross-sample comparison and metabolite identification [57]. This approach, when coupled with modern, robust retention time alignment tools like DeepRTAlign that can handle complex RT shifts in large cohorts [3], provides a powerful framework for maximizing the value of LC-HRMS data. The protocols outlined herein for ion annotation and advanced alignment provide a actionable path for researchers in drug development and biomarker discovery to enhance the rigor and reproducibility of their data preprocessing pipelines.

Benchmarking Alignment Performance: Software Comparisons and Validation Metrics

Liquid Chromatography-High-Resolution Mass Spectrometry (LC-HRMS) has become a cornerstone technique for untargeted analysis in metabolomics, lipidomics, and environmental analytical chemistry. The complex, multi-dimensional data generated by these instruments require sophisticated computational processing to extract biologically meaningful information. The selection of an appropriate data processing workflow significantly influences experimental outcomes, biomarker discovery, and subsequent biological interpretations [26] [59]. This application note provides a detailed comparative analysis of three prominent computational workflows: MZmine 3 (open-source), ROIMCR (chemometric), and Compound Discoverer (commercial).

The challenge of analyzing LC-MS data represents a significant bottleneck in untargeted studies. As noted in recent literature, "the analysis of LC-MS metabolomic datasets appears to be a challenging task in a wide range of disciplines since it demands the highly extensive processing of a vast amount of data" [59]. Different software solutions employ distinct algorithms and philosophical approaches for feature detection, retention time alignment, and data compression, which can lead to varying results even when analyzing identical datasets [26] [32]. Understanding these fundamental differences is crucial for proper method selection and interpretation of results.

Within this context, we frame our comparison within the broader research scope of HRMS data preprocessing, with particular emphasis on retention time correction and alignment methodologies. We evaluate these workflows based on their technical approaches, performance characteristics, and suitability for different research scenarios, providing researchers with practical guidance for selecting and implementing these tools in drug development and other analytical applications.

Workflow Fundamentals and Theoretical Frameworks

Core Architectural Philosophies

The three workflows represent distinct architectural philosophies in LC-HRMS data processing. MZmine 3 employs a feature-based profiling approach, ROIMCR utilizes a component-based resolution strategy, and Compound Discoverer provides an all-in-one commercial solution.

MZmine 3 is an open-source, platform-independent software that supports diverse MS data types including LC-MS, GC-MS, IMS-MS, and MS imaging [60] [61]. Its modular architecture allows for flexible workflow construction and extensive customization. MZmine 3 performs conventional feature detection through sequential steps including mass detection, chromatogram building, deconvolution, alignment, and annotation [62] [32]. A key advantage is its integration with third-party tools like SIRIUS, GNPS, and MetaboAnalyst for downstream analysis [60].

ROIMCR (Regions of Interest Multivariate Curve Resolution) combines data compression through ROI searching with component resolution using Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) [59] [42]. This approach avoids traditional peak modeling and alignment steps required by most other workflows. Instead, it performs bilinear decomposition of augmented ROI-based matrices to generate resolved "pure" LC profiles, their mass spectral counterparts, and quantification scores [59] [32]. This method is particularly powerful for resolving co-eluting compounds and maintaining spectral accuracy.

Compound Discoverer is a commercial software solution developed by Thermo Scientific, designed as an integrated platform specifically optimized for their instrumentation. It provides predefined workflows for untargeted metabolomics with minimal parameter adjustment required [26]. The software follows a traditional feature detection approach but with proprietary algorithms and limited customization options compared to open-source alternatives.

Data Processing Approaches

Table 1: Fundamental Data Processing Characteristics

Characteristic	MZmine 3	ROIMCR	Compound Discoverer
Primary Approach	Feature profiling	Component resolution	All-in-one solution
Data Compression	CentWave algorithm (ROI-based) [59]	ROI strategy with maintained spectral accuracy [59]	Proprietary methods
Retention Time Alignment	Join aligner with mass tolerance and RT ambiguity [32]	Not required (MCR-ALS resolution) [59]	Proprietary alignment
Peak Modeling	Local minimum resolver [32]	No peak modeling required [59]	Gaussian fitting likely
Customization Level	High (modular workflows)	Medium (MATLAB implementation)	Low (predetermined workflows)
Programming Skills	Intermediate	Advanced (MATLAB)	None required

Figure 1: Fundamental workflow architectures of the three compared approaches. MZmine 3 employs sequential feature processing, ROIMCR uses multivariate resolution after compression, and Compound Discoverer provides an integrated automated solution.

Experimental Protocols and Implementation

MZmine 3 Processing Protocol

Sample Preparation and Data Acquisition:

Analyze samples using LC-HRMS with appropriate chromatographic separation (e.g., 30-minute gradient elution)
Convert raw data to open formats (mzML, mzXML) using msConvert for platform independence [32]
Include quality control samples (pooled quality controls and solvent blanks) throughout sequence

Data Processing in MZmine 3:

Mass Detection: Set noise level to 1,000-10,000 counts depending on instrument sensitivity
Chromatogram Building: Use ADAP chromatogram builder with minimum group size of 3 scans, minimum group intensity of 5,000-20,000, and m/z tolerance of 0.005 Da or 5 ppm [32]
Deconvolution: Apply local minimum resolver with chromatographic threshold of 90%, minimum retention time range of 0.1-0.2 min, and minimum absolute height of 5,000-10,000
Isotopic Peak Grouping: Set m/z tolerance to 0.005 Da and retention time tolerance to 0.1-0.2 min
Alignment: Use join aligner with m/z tolerance of 0.005 Da and retention time tolerance of 0.3-0.5 min (depending on chromatographic stability) [32]
Gap Filling: Apply peak finder method with intensity tolerance of 20%, m/z tolerance of 0.005 Da, and retention time tolerance of 0.2-0.5 min
Annotation: Perform formula prediction, isotopic pattern matching, and compound identification using internal databases or GNPS integration

Validation and Quality Control:

Process QC samples to evaluate reproducibility and feature detection stability
Monitor internal standards for retention time shifts and intensity deviations
Verify feature detection using known compounds spiked into quality controls

ROIMCR Processing Protocol

Data Preparation and ROI Compression:

Import centroided mzXML files into MATLAB environment using MSroi GUI app [32]
Set ROI intensity threshold comparable to MZmine 3 settings (typically 5,000-10,000 counts)
Define mass tolerance parameter (0.005 Da or 5 ppm) and minimum consecutive scans (typically 30) to define valid ROIs [32]
Generate column-wise augmented data matrix where rows represent elution times × samples and columns represent ROIs

MCR-ALS Resolution:

Data Segmentation: Divide global data matrix into chromatographic regions to reduce complexity
Initialization: Apply Singular Value Decomposition (SVD) to determine initial estimates of component profiles
Constraints Implementation: Apply non-negativity constraints to both elution and spectral profiles
ALS Optimization: Iterate until convergence (typically 50-200 iterations) with convergence criterion of 0.1-1%
Model Dimensionality: Determine optimal number of components through PCA evaluation or core consistency diagnosis

Component Analysis and Identification:

Resolve "pure" LC profiles and corresponding mass spectra for each component
Use quantification scores (area under resolved LC profiles) for statistical analysis
Annotate components by comparing resolved mass spectra with spectral libraries
Perform two-way ANOVA and ASCA to evaluate experimental factors and their interactions [63]

Compound Discoverer Protocol

Workflow Selection and Configuration:

Select "Untargeted Metabolomics with Statistics" workflow template
Use default parameters optimized for Thermo Scientific instruments
Define experimental groups and sample relationships

Automated Processing:

Peak Detection: Apply proprietary algorithms with minimal user intervention
Alignment: Use automated retention time correction based on detected compounds
Compound Annotation: Search against online databases (ChemSpider, mzCloud)
Statistical Analysis: Perform Welch t-test for two-group comparisons (software limitation prevents multi-group ANOVA) [26]

Results Review and Validation:

Manually inspect chromatographic peak quality for significant features
Verify compound identifications using fragmentation spectra when available
Export results for external statistical analysis if needed

Performance Comparison and Benchmarking

Quantitative Performance Metrics

Table 2: Experimental Performance Comparison Across Workflows

Performance Metric	MZmine 3	ROIMCR	Compound Discoverer
Significant Features (RP+)	13 [32]	N/A	5 [26]
Significant Features (NP+)	32 (XCMS/MetaboAnalyst) [26]	11 shared features [42]	15 [26]
False Positive Rate	Moderate (increased susceptibility) [32]	Low (superior consistency) [32]	Low (conservative detection) [26]
Temporal Variance Captured	20.5-31.8% [32]	35.5-70.6% [32]	Not reported
Treatment Variance Captured	11.6-22.8% [32]	Lower sensitivity [32]	Not reported
Isotope/Adduct Annotation	Comprehensive [64]	Integrated in resolution [59]	Limited [26]
Processing Time	Fast (47 min for 8273 samples) [60]	Moderate (MATLAB dependency) [59]	Fast (optimized commercial) [26]
Multi-group Statistics	Full capability [32]	Full capability [63]	Limited to pairwise [26]

Qualitative Strengths and Limitations

MZmine 3 demonstrates high sensitivity for detecting treatment effects and comprehensive feature annotation capabilities. Its open-source nature and active community development ensure continuous improvement and extensive third-party integrations [60]. However, it shows increased susceptibility to false positives compared to ROIMCR and requires intermediate bioinformatics skills for optimal implementation [32]. The software shows exceptional scalability, processing 8,273 fecal LC-MS² samples in just 47 minutes [60].

ROIMCR provides superior consistency and reproducibility, with enhanced capability for capturing temporal patterns in longitudinal studies [32]. The method excels at resolving co-eluting compounds without requiring traditional peak modeling or alignment steps [59]. However, it has lower sensitivity for detecting treatment effects and requires advanced knowledge of chemometric methods and MATLAB programming [32] [63]. The approach is particularly valuable for complex multi-factor experimental designs where interaction effects are anticipated [63].

Compound Discoverer offers ease of use with minimal programming skills required, making it accessible to researchers with limited computational background [26]. The software provides tight integration with Thermo Scientific instrumentation, potentially optimizing performance on these platforms. However, it demonstrates limited statistical capabilities (particularly for multi-group comparisons), reduced flexibility in parameter adjustment, and less comprehensive annotation of isotopes and adducts compared to open-source alternatives [26].

Case Studies and Applications

Environmental Toxicology Application

A 2025 comparative study analyzed river water samples impacted by treated wastewater effluent using both MZmine 3 and ROIMCR workflows [32]. The research employed a mesocosm experimental design with sampling over a 10-day exposure period. Results demonstrated that both workflows significantly differentiated treatment and temporal effects but exhibited distinct characteristics. MZmine 3 showed increased sensitivity to treatment effects but higher susceptibility to false positives, while ROIMCR provided superior consistency and temporal clarity but lower treatment sensitivity [32].

The study revealed that workflow agreement diminished with more specialized analytical objectives, highlighting the non-holistic capabilities of individual non-target screening workflows and the potential benefits of their complementary use. For environmental applications requiring high reproducibility, ROIMCR demonstrated advantages, while MZmine 3 proved more sensitive for detecting subtle treatment effects [32].

Food Authenticity and Metabolomics

A 2024 study compared ROI-MCR and Compound Discoverer for differentiating Parmigiano Reggiano cheese samples based on mountain quality certification versus conventional protected designation of origin [42]. Both approaches indicated that amino acids, fatty acids, and bacterial activity-related compounds played significant roles in distinguishing between the two sample types. The study concluded that while both methods yielded similar overall conclusions, ROI-MCR provided a more streamlined and manageable dataset, facilitating easier interpretation of the metabolic differences [42].

This application demonstrates the utility of both workflows for food authentication studies, with ROI-MCR offering advantages in data compression and management for complex sample matrices.

Lipidomics and Temporal Response Studies

Research on the disruptive effects of tributyltin (TBT) on Daphnia magna lipidomics demonstrated ROIMCR's capability for analyzing multi-factor experimental designs over time [63]. The approach successfully identified 87 lipids, with some proposed as biomarkers for the effects of TBT exposure and time. The study highlighted ROIMCR's strength in modeling the interaction between experimental factors (time and dose) and confirmed a reproducible multiplicative effect between these factors [63].

This case study illustrates how the component-based resolution approach of ROIMCR can provide unique insights into complex biological responses to environmental stressors, particularly when temporal dynamics and multiple experimental factors are involved.

Table 3: Key Research Reagents and Computational Resources

Resource Category	Specific Tools/Reagents	Function/Purpose
Software Platforms	MZmine 3 (mzmine.org) [64]	Open-source MS data processing
	MATLAB with MCR-ALS toolbox [59]	ROIMCR implementation
	Compound Discoverer (Thermo Scientific) [26]	Commercial all-in-one solution
Data Conversion Tools	msConvert [32]	Raw file conversion to open formats
	MSroi GUI app [32]	ROI compression for MATLAB
Annotation Resources	SIRIUS suite [60]	In-silico metabolite annotation
	GNPS platform [60]	Molecular networking and library matching
	MetaboAnalyst [60]	Statistical analysis and visualization
Reference Materials	Internal standard mixtures [32]	Quality control and retention time monitoring
	Chemical standards [32]	Method optimization and validation
Computational Infrastructure	High-performance workstations	Data processing and visualization
	MATLAB licensing [59]	ROIMCR implementation

The comparative analysis of MZmine 3, ROIMCR, and Compound Discoverer reveals distinctive strengths and optimal application domains for each workflow. The selection of an appropriate data processing strategy should be guided by specific research objectives, computational resources, and technical expertise.

MZmine 3 is recommended for large-scale studies requiring comprehensive feature annotation, high sensitivity, and integration with diverse downstream analysis tools. Its scalability and active community support make it suitable for high-throughput applications in drug development and clinical metabolomics. The software's balance of performance and accessibility provides an excellent option for research groups with intermediate bioinformatics capabilities.

ROIMCR excels in studies prioritizing reproducibility, temporal dynamics analysis, and resolution of complex metabolite mixtures. Its component-based approach is particularly valuable for multi-factor experimental designs and when analyzing samples with significant co-elution. The methodology requires advanced chemometrics expertise but offers unique advantages for modeling complex biological responses to environmental exposures or pharmaceutical interventions.

Compound Discoverer provides an optimal solution for researchers seeking a streamlined, commercially supported workflow with minimal computational expertise requirements. Its ease of use and instrument integration make it valuable for routine analyses and quality control applications. However, its limited statistical capabilities and reduced flexibility may constrain more advanced research applications.

For comprehensive untargeted analysis, a complementary approach utilizing multiple workflows may provide the most robust results, particularly for novel biomarker discovery or complex sample analysis. Future developments in HRMS data preprocessing will likely focus on improved integration of feature-based and component-based approaches, enhanced retention time prediction models, and more efficient data compression strategies to handle increasingly complex datasets generated by modern instrumentation.

In high-resolution mass spectrometry (HRMS), the data preprocessing steps of retention time correction and alignment are critical for ensuring data quality and reliability in downstream analyses. The choice of preprocessing workflow directly impacts key performance metrics, including sensitivity, reproducibility, and false positive rates. Variations in data processing algorithms can lead to significantly different biological or environmental interpretations, making systematic performance evaluation essential [38]. This application note provides detailed protocols for evaluating preprocessing workflows and summarizes quantitative performance data from recent studies to guide researchers in selecting and optimizing HRMS data processing strategies.

Performance Comparison of Preprocessing Workflows

Workflow Characteristics and Performance Trade-offs

Different HRMS data preprocessing approaches exhibit distinct strengths and limitations. Feature profiling (FP) methods, such as MZmine3, and component profile (CP) approaches, such as Regions of Interest Multivariate Curve Resolution-Alternating Least Squares (ROIMCR), represent two fundamentally different strategies with characteristic performance trade-offs [38].

Table 1: Performance Characteristics of FP versus CP Preprocessing Workflows

Performance Metric	MZmine3 (FP-based)	ROIMCR (CP-based)
Treatment Effect Sensitivity	Increased sensitivity (11.6–22.8% variance explained)	Lower treatment sensitivity
Temporal Effect Clarity	Moderate (20.5–31.8% variance explained)	Superior clarity (35.5–70.6% variance explained)
False Positive Rate	Increased susceptibility to false positives	Reduced false positives
Consistency & Reproducibility	Variable between runs	Superior consistency and reproducibility
Data Utilization	Feature-based peak detection	Direct decomposition of raw data arrays
Workflow Agreement	High for general analysis, diminishes for specialized objectives	High for general analysis, diminishes for specialized objectives

Mass Spectrometry Acquisition Modes

The data acquisition mode significantly impacts feature detection and identification reproducibility in HRMS analyses. Recent comparative studies have quantified the performance of different acquisition modes for detecting low-abundance metabolites in complex matrices.

Table 2: Performance Comparison of HRMS Acquisition Modes

Performance Metric	Data-Dependent Acquisition (DDA)	Data-Independent Acquisition (DIA)	AcquireX
Average Feature Detection	18% fewer than DIA	1036 metabolic features	37% fewer than DIA
Reproducibility (CV)	17%	10%	15%
Identification Consistency	43% overlap between days	61% overlap between days	50% overlap between days
MS² Spectral Quality	High quality spectra	Complex deconvolution required	Iterative improvement
Low-Abundance Detection	Cut-off at 0.1-0.01 ng/mL	Best detection power at 1-10 ng/mL	Cut-off at 0.1-0.01 ng/mL

Experimental Protocols

Protocol 1: Technical Replicates for Enhanced Reproducibility

Purpose: To increase repeatability and reduce false positive/negative findings in non-target screening through replicate analysis [65].

Materials:

LC-HRMS system
Quality control samples
Data processing software (e.g., MZmine3, XCMS)
Minimum of 3 technical replicates per sample

Procedure:

Sample Preparation: Prepare a minimum of three technical replicates for each sample to be analyzed.
Data Acquisition: Analyze replicates using consistent LC-HRMS parameters with quality control samples interspersed throughout the batch.
Peak Finding: Apply peak detection algorithms to all replicates using appropriate intensity thresholds.
Data Alignment: Align peaks across all replicates using retention time and m/z tolerance.
Combinatorial Filtering: Apply frequency-based filtering to retain only features detected in at least two-thirds of technical replicates.
Blank Subtraction: Subtract features present in blank samples using defined fold-change thresholds.
Performance Assessment: Calculate peak recognition rates and false positive/negative rates using spiked standards.

Expected Outcomes: This protocol typically recovers >93% of spiked standards at 100 ng/L while filtering <5% of recognized standards, significantly improving repeatability and data quality [65].

Protocol 2: Comparing FP versus CP Workflows

Purpose: To quantitatively evaluate sensitivity and false positive rate differences between feature profiling and component profiling workflows [38].

Materials:

LC-HRMS dataset from controlled experiment
MZmine3 software
ROIMCR software (MATLAB with MCRALS2.0 toolbox)
Multivariate statistical packages (ASCA, PLS-DA)

Procedure:

Experimental Design: Utilize samples with known treatment and temporal factors (e.g., wastewater effluent exposure over time).
Data Preprocessing: Process identical datasets through both MZmine3 and ROIMCR workflows.
Multivariate Analysis: Apply ANOVA Simultaneous Component Analysis (ASCA) to quantify variance contributions from treatment and temporal factors.
Discriminant Analysis: Perform Partial Least Squares Discriminant Analysis (PLS-DA) to identify features with high discriminatory power.
Performance Quantification: Calculate variance percentages attributed to treatment and temporal effects for each workflow.
False Positive Assessment: Compare feature lists against known standards or spiked compounds to estimate false positive rates.

Expected Outcomes: ROIMCR typically explains 35.5-70.6% of variance from temporal effects, while MZmine3 shows more balanced contributions from time (20.5-31.8%) and treatment (11.6-22.8%) with higher false positive susceptibility [38].

Protocol 3: Reproducibility Assessment Using MaRR

Purpose: To assess reproducibility of metabolite features across replicate experiments using the nonparametric MaRR procedure [66].

Materials:

LC-HRMS data from replicate experiments
R statistical software with marr package
High-performance computing resources for large datasets

Procedure:

Data Preparation: Process replicate data through standard preprocessing (peak picking, alignment, normalization).
Feature Ranking: Rank metabolites by abundance or significance measure separately for each replicate.
MaRR Application: Apply the Maximum Rank Reproducibility procedure to identify the change point from reproducible to irreproducible signals.
False Discovery Control: Estimate and control the false discovery rate using the MaRR algorithm.
Reproducibility Classification: Classify metabolites as reproducible or irreproducible based on the MaRR statistical output.
Visualization: Create reproducibility plots showing correlation between replicate ranks.

Expected Outcomes: Technical replicates typically show higher reproducibility than biological replicates. The MaRR procedure effectively controls FDR while identifying reproducible metabolites without parametric assumptions [66].

Workflow Visualization

HRMS Preprocessing Evaluation Workflow

Technical Replicates Impact Visualization

Technical Replicates Quality Impact

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Key Research Reagents and Computational Tools for HRMS Preprocessing Evaluation

Reagent/Tool	Function	Application Context
Isotopically Labeled Standards	Quality control and recovery rate calculation	Protocol 1: Spiked at known concentrations to assess detection efficiency [65]
MZmine3 Software	Feature profiling-based preprocessing	Protocol 2: FP workflow for comparative performance analysis [38]
ROIMCR Software	Component profiling-based preprocessing	Protocol 2: CP workflow using multivariate curve resolution [38]
MATLAB with MCRALS2.0	Computational environment for ROIMCR	Protocol 2: Implementation of multi-way decomposition algorithms [38]
marr R Package	Reproducibility assessment using MaRR	Protocol 3: Nonparametric evaluation of replicate consistency [66]
Progenesis QI	Data quality metric calculation	General Use: Retention time drift, missing values, reproducibility measures [67]
Cent2Prof Package	Centroid to profile data conversion	Data Enhancement: Recovers mass peak width information lost during centroiding [68]
Quality Control Samples	System performance monitoring	Protocol 1: Interspersed throughout batches to ensure measurement stability [38]

Lipidomics, the large-scale determination of lipids in biological systems, has become one of the fastest expanding scientific disciplines in biomedical research [69]. As the field continues to advance, self-evaluation within the community is critical, particularly concerning inter-laboratory reproducibility [70] [71]. The translation of mass spectrometry (MS)-based lipidomic technologies to clinical applications faces significant challenges stemming from technical aspects such as dependency on stringent and consistent sampling procedures and reproducibility between different laboratories [72]. Prior interlaboratory studies have revealed substantial variability in lipid measurements when laboratories use non-standardized workflows [70]. This case study examines the sources of variability in lipidomic analyses and evaluates strategies that the community has developed to improve the harmonization of lipidomics data across different laboratories and platforms, with particular emphasis on implications for HRMS data preprocessing retention time correction alignment research.

Quantitative Evidence: Assessing the Reproducibility Problem

Recent large-scale interlaboratory studies provide quantitative evidence of both the challenges and progress in lipidomics reproducibility. A landmark study involving 34 laboratories from 19 countries quantified four clinically relevant ceramide species in the NIST human plasma Standard Reference Material (SRM) 1950 [72]. The results demonstrated that calibration using authentic labelled standards dramatically reduces data variability, achieving intra-laboratory coefficients of variation (CVs) ≤ 4.2% and inter-laboratory CVs < 14% [72]. These values represent the most precise and concordant community-derived absolute concentration values reported to date for these clinically used ceramides.

Earlier interlaboratory comparisons revealed greater variability. The 2017 NIST interlaboratory comparison exercise comprised 31 diverse laboratories, each using different lipidomics workflows [70]. This study identified 1,527 unique lipids measured across all laboratories but could only determine consensus location estimates and associated uncertainties for 339 lipids measured at the sum composition level by five or more participating laboratories [70]. The findings highlighted the critical need for standardized approaches to enable meaningful comparisons across studies and laboratories.

Table 1: Interlaboratory Reproducibility Assessment in Lipidomics Studies

Study Reference	Number of Laboratories	Sample Material	Key Reproducibility Metrics	Major Findings
Torta et al., 2024 [72]	34	NIST SRM 1950 plasma	Intra-lab CV: ≤4.2%; Inter-lab CV: <14%	Authentic standards dramatically reduce variability
Bowden et al., 2017 [70]	31	NIST SRM 1950 plasma	339 lipids with consensus estimates of 1,527 detected	Highlighted need for standardized workflows
Shen et al., 2023 [73]	5	Mammalian tissue and biofluid	Common method improved detection of shared features	Harmonized methods improve inter-site reproducibility

Advanced analytical workflows can achieve high reproducibility even with minimal sample volumes. A recent LC-HRMS workflow for combined lipidomics and metabolomics demonstrated excellent analytical precision using only 10 μL of serum, achieving relative standard deviations of 6% (positive mode) and 5% (negative mode) through internal standard normalization [74]. This workflow identified over 440 lipid species across 23 classes and revealed biologically significant alterations in age-related macular degeneration patients, including a 34-fold increase in a highly unsaturated triglyceride (TG 22:622:622:6) [74].

Experimental Protocols for Enhanced Reproducibility

Standardized Sample Preparation Protocol

Preanalytical procedures constitute a critical source of variability in lipidomics. The International Lipidomics Society (ILS) and Lipidomics Standards Initiative (LSI) have developed best practice guidelines covering all aspects of the lipidomics workflow [69]. The following protocol represents a consensus approach for reproducible sample preparation:

Sample Collection and Storage: Tissues should be immediately frozen in liquid nitrogen, while biofluids like plasma should be either immediately processed or frozen at -80°C. Enzymatic and chemical degradation processes can rapidly alter lipid profiles at room temperature, with particular impact on lysophospholipids, lysophosphatidic acid (LPA), and sphingosine-1-phosphate (S1P) [69].
Liquid-Liquid Extraction: The methyl-tert-butyl ether (MTBE) extraction method provides reduced toxicity and improved sample handling compared to traditional chloroform-based methods (Folch and Bligh & Dyer) [74] [69]. The recommended protocol uses methanol/MTBE (1:1, v/v) extraction, which enables simultaneous lipid-metabolite coverage from minimal sample volumes (10 μL serum) [74].
Internal Standard Addition: Internal standards should be added prior to extraction for internal control and quantification [69]. Ready-to-use internal standard mixtures normalize analytical precision and improve quality control clustering [74].
Quality Control Measures: Include quality control (QC) samples from pooled aliquots of study samples throughout the analytical sequence. Monitor lipid class ratios that reflect potential degradation, such as lyso-phospholipid to phospholipid ratios [69].

LC-HRMS Analysis with Retention Time Alignment

Liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS) has become the gold standard for comprehensive lipidomic analysis [74] [69]. The following protocol details a reproducible analytical workflow:

Chromatographic Separation: Utilize reversed-phase C18 columns with water/acetonitrile or water/methanol gradients containing 10 mM ammonium formate or acetate. The equivalent carbon number (ECN) model provides a regular retention behavior framework for validating lipid identifications [75].
Mass Spectrometry Parameters: Employ both positive and negative ionization modes with data-dependent acquisition (DDA) or data-independent acquisition (DIA). High-resolution mass analyzers (Orbitrap, TOF) with resolving power >30,000 provide accurate mass measurements for elemental composition determination [74] [69].
Retention Time Calibration: Implement indexed retention time (iRT) calibration using a set of endogenous reference lipids that span the LC gradient. This approach standardizes retention times across runs and facilitates prediction of retention times for unidentified features [76]. Studies have demonstrated an average of 2% difference between predicted and observed retention times with proper iRT calibration [76].
Ion Mobility Integration: When available, incorporate ion mobility separation to provide collision cross section (CCS) values as an additional molecular descriptor for improved identification confidence [76] [75].

Data Processing and Lipid Annotation Protocol

The data processing workflow significantly impacts reproducibility and requires careful implementation:

Feature Detection and Alignment: Use software tools (e.g., MzMine, MS-DIAL) with parameters optimized for lipidomic data. Apply retention time correction algorithms to align features across samples [77].
Lipid Identification: Employ a multi-parameter identification approach requiring: (1) accurate mass match (typically <5-10 ppm); (2) MS/MS spectral match to reference standards or libraries; (3) retention time consistency with lipid class-specific ECN patterns; and (4) when available, CCS value match to reference databases [77] [75].
Molecular Networking: Implement molecular networking through platforms such as GNPS to organize MS/MS spectra based on similarity and facilitate annotation of unknown lipids [77].
Quantification and Normalization: Use internal standard-based quantification with class-specific internal standards when available. Apply quality control-based normalization (e.g., QC-RLSC) to correct for instrumental drift [74] [69].

The following workflow diagram illustrates the integrated protocol for reproducible lipidomics:

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 2: Essential Research Reagents and Materials for Reproducible Lipidomics

Item	Function	Application Notes
NIST SRM 1950	Reference material for method validation	Commercial frozen human plasma with consensus values for 339 lipids [70]
Synthetic Lipid Standards	Internal standards for quantification	Isotopically labelled ceramides, phospholipids; added prior to extraction [72]
MTBE Extraction Solvents	Lipid extraction	Reduced toxicity vs. chloroform; compatible with automation [74] [69]
iRT Calibrant Lipids	Retention time calibration	Set of 20 endogenous lipids spanning LC gradient; enables RT prediction [76]
Quality Control Materials	Monitoring analytical performance	Pooled study samples, commercial QC materials; interspersed in sequence [69]
Chromatographic Columns	Lipid separation	Reversed-phase C18 columns; consistent batch-to-batch performance [69]
Data Processing Software	Lipid identification/quantification	Skyline, MzMine, MS-DIAL; open-source options available [77] [76]

Quality Control Framework and Reporting Standards

The Lipidomics Standards Initiative (LSI) has developed community-wide standards to enhance transparency, comparability, and repeatability of lipidomic studies [69] [78]. The key components of this framework include:

Lipidomics Minimal Reporting Checklist: This dynamic checklist condenses key information about lipidomic experiments into common terminology, covering preanalytics, sample preparation, MS analysis, lipid identification, and quantitation [78]. Adoption of this checklist ensures critical methodological details are reported, enabling proper interpretation and potential repurposing of resource data.
Standardized Nomenclature: Implement the shorthand nomenclature for lipids that reflects experimental evidence for existence, following the principle: "Report only what is experimentally proven, and clearly state where assumptions were made" [69]. This is particularly important for distinguishing between molecular species identification (e.g., PC 16:0/18:1) versus sum composition annotation (e.g., PC 34:1).
Validation Requirements: Correct lipid annotation requires multiple lines of evidence: (1) retention time consistency with ECN model predictions; (2) detection of expected adduct ions based on mobile phase composition; (3) presence of class-specific fragments in MS/MS spectra; and (4) when applicable, matching CCS values to reference standards [75]. Automated software annotations should be manually verified for a subset of features to ensure validity [75].

The following diagram illustrates the relationship between various quality control components in establishing reproducible lipidomics data:

Implications for HRMS Data Preprocessing Research

The findings from interlaboratory reproducibility studies have significant implications for HRMS data preprocessing retention time correction alignment research:

Retention Time Prediction Models: The demonstrated regular retention behavior of lipids according to the ECN model provides a powerful constraint for retention time prediction algorithms [75]. Advanced models that incorporate both molecular structure descriptors and chromatographic parameters show promise for improving identification confidence and detecting erroneous annotations [75].
Multi-dimensional Alignment Strategies: The integration of multiple separation dimensions (retention time, ion mobility, m/z) creates opportunities for more robust alignment strategies in HRMS data preprocessing [76]. The use of CCS values as stable molecular descriptors can complement retention time alignment, particularly for compensating for chromatographic shifts in large batch sequences.
Error Detection in Automated Annotations: The documented patterns of questionable annotations in published datasets provide valuable training data for developing error-detection algorithms in preprocessing pipelines [75]. Rule-based systems can flag features that violate expected chromatographic behavior or adduct formation patterns.
Standardized Data Exchange Formats: Community efforts toward harmonization highlight the need for standardized data formats that capture not only intensity values but also quality metrics, processing parameters, and evidence trails for lipid identifications [78]. This facilitates the repurposing of resource data and comparative analyses across studies.

Inter-laboratory reproducibility in lipidomics has significantly improved through community-wide efforts to establish standardized protocols, reference materials, and reporting standards. The key advancements include the adoption of harmonized sample preparation methods, implementation of multi-parameter lipid identification requiring retention time consistency with physicochemical models, and the use of authentic standards for quantification. For HRMS data preprocessing research, these developments highlight the critical importance of retention time correction and alignment that respects the fundamental chromatographic behavior of lipid classes. Continued community efforts through organizations such as the International Lipidomics Society and Lipidomics Standards Initiative provide the framework for ongoing improvement in lipidomics reproducibility, ultimately supporting the translation of lipidomic technologies to clinical applications.

Assessing Impact on Downstream Statistical and Multivariate Analysis

In liquid chromatography-high resolution mass spectrometry (LC-HRMS), the data preprocessing steps of retention time (Rt) correction and alignment are critical for the integrity of downstream statistical and multivariate analyses [20] [4]. Technical variations during instrument operation introduce shifts and drifts in both retention time and mass-to-charge ratio (m/z) dimensions [79]. These inconsistencies, if uncorrected, propagate through the data processing workflow, compromising the accuracy of the resulting feature table and leading to erroneous biological interpretations [4]. This application note examines how the technical precision of Rt alignment protocols directly influences the reliability of subsequent data analysis within the broader context of HRMS data preprocessing research.

Quantitative Impact of Rt Alignment on Data Quality

The performance of Rt alignment algorithms directly determines the quality of the feature table, which is the foundation for all subsequent statistical analysis. Inconsistent feature matching across samples creates artifactual variance that can obscure true biological signals.

Performance Metrics for Alignment Quality

The following metrics are essential for evaluating how Rt alignment impacts data quality:

Feature Consistency: Measures the number of features reliably detected and matched across multiple sample runs. Poor alignment increases missing values and reduces dataset completeness [4].
Peak Area Precision: Quantifies the relative standard deviation (RSD) of peak areas for replicated samples. Effective alignment and batch correction reduce technical variance, lowering RSD values and improving quantification accuracy [80].
Multivariate Model Quality: Assessed via the clustering of quality control (QC) samples in Principal Component Analysis (PCA) scores plots. Tight QC clusters indicate low technical noise, a direct result of successful alignment [80].

Comparative Performance of Alignment Approaches

Table 1: Impact of Different Rt Correction Methods on Downstream Data Quality

Correction Method	Principle	Impact on Feature Detection	Effect on Multivariate Analysis
Constant Shift [79]	Applies a uniform Rt shift across entire chromatogram	Limited effectiveness for non-linear drift; higher false negatives	Introduces artifacts in regions of non-linear drift, reducing model clarity
Linear Warping [79]	Applies a constant change in shift over time	Improves alignment over constant shift but may not capture complex patterns	Moderate improvement in sample clustering in PCA
Non-Linear Warping (e.g., COW, DTW) [10]	Uses complex functions (polynomials, splines) for local stretching/compression	Maximizes true positive feature matching; minimizes missing values	Leads to tight QC clustering and clear biological group separation in PCA [79]
QC-Based Batch Correction [80]	Uses quality control samples to model and correct systematic drift	Can be highly effective but relies on QC consistency; risk of over-fitting	Can significantly improve replicate similarity and multivariate model performance
Background Correction (non-QC) [80]	Uses all experimental samples to estimate variation	Avoids issues related to QC/sample response differences; can be more robust	Proven to reduce replicate differences and reveal hidden biological variations

The choice of algorithm has a direct and measurable effect. For instance, non-linear warping methods, while potentially more computationally intensive, generally yield superior results by accurately modeling complex retention time shifts [79] [10]. Furthermore, the method of batch correction is pivotal. While QC-based methods are widespread, non-QC "background correction" methods that utilize all experimental samples have demonstrated potential to uncover biological differences previously masked by instrumental variation [80].

Essential Software Tools and Their Alignment Protocols

Several software tools are available for LC-HRMS data preprocessing, each implementing distinct algorithms for Rt correction and alignment. The choice of software and its correct parameterization is a critical determinant for downstream analysis success.

Comparison of LC-HRMS Data Processing Software

Table 2: Key Software Tools for LC-HRMS Data Preprocessing and Rt Alignment

Software Tool	Rt Alignment Methodology	Key Strengths	Considerations for Downstream Analysis
XCMS [20] [4]	Non-linear, warping-based	High flexibility; widely used and cited; active community	Parameter optimization is crucial to avoid false positives/negatives [4]
MS-DIAL [20] [4]	Integrated with deconvolution and identification	Streamlined workflow; high identification confidence	May be less flexible for non-standard datasets
MZmine [20] [4]	Modular, with multiple algorithm options	High customizability; supports advanced workflows	Steeper learning curve due to extensive options
MetMatch [79]	Efficient non-linear alignment with ion accounting	Accounts for different ion species; intuitive interface	Particularly useful for cross-batch or cross-study comparisons
IDSL.IPA [81]	Multi-layered untargeted pipeline	Comprehensive from ion pairing to visualization; high-throughput	Provides a complete, integrated solution for large datasets
OpenMS [4]	Toolchain with MapAligner	Modular and flexible for building custom workflows	Requires computational expertise for pipeline setup

Detailed Experimental Protocol: Retention Time Alignment with MetMatch

The following protocol outlines a typical workflow for semi-automated Rt alignment using MetMatch, which efficiently corrects for non-linear shifts and accounts for varying ion species [79].

Principle: A target dataset is aligned to a reference feature list by iteratively determining an m/z offset and a non-linear retention time shift function. The algorithm accounts for the formation of different ion adducts, ensuring comprehensive feature matching.

Materials:

Software: MetMatch software (Java-based) [79].
Data Files: LC-HRMS data files in open formats (e.g., mzML, mzXML).
Reference Feature List: A list containing reference metabolites and their ions (m/z, Rt), which can be generated manually or by processing a representative LC-HRMS chromatogram with tools like XCMS or MZmine2 [79].

Procedure:

Import Data: Load the reference feature list and the target LC-HRMS data files into MetMatch.
Define Parameters:
- Ion Adduct List: Specify possible ion adducts ([M+H]+, [M+Na]+, [M-H]-, etc.) for the software to consider during matching.
- Tolerance Windows: Set initial m/z (e.g., in ppm) and Rt (e.g., in seconds) search frames.
- Peak Picking: Select a chromatographic peak picking algorithm (e.g., MassSpecWavelet, Gaussian correlation, or Savitzky-Golay filter).
m/z Offset Detection: The software will automatically parse signals within the search frames, calculate a weighted average m/z shift, and determine a systematic m/z offset for the target file via histogram analysis. It then corrects all signals by this offset [79].
Rt Shift Detection: The software constructs an Rt shift matrix and applies an iterative algorithm to deduce a non-linear Rt shift function. This function maps the retention times in the target file to those in the reference.
Validation and Matching: Visually inspect the alignment results. The software will match target peaks to reference features based on the corrected m/z and Rt values.
Output: Export the final data matrix containing matched features, their corrected m/z and Rt, chromatographic peak areas, and calculated shifts.

Downstream Analysis: The output matrix is now suitable for statistical and multivariate analysis (e.g., PCA, ASCA). The reduced technical variance resulting from proper alignment will lead to more robust models and reliable biomarker discovery [42] [79].

Workflow Visualization: From Raw Data to Aligned Features

The following diagram illustrates the logical workflow for LC-HRMS data preprocessing, highlighting the central role of retention time alignment in ensuring data quality for downstream analysis.

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials and Reagents for LC-HRMS Preprocessing Experiments

Item	Function / Purpose
Quality Control (QC) Samples [4] [80]	Pooled samples from the study itself or a mixture of standard analytes; injected at regular intervals to monitor and correct for instrumental drift over the sequence.
Standard Reference Materials	Commercially available metabolite mixes with known retention times; used to create a calibration curve for Rt correction and to verify mass accuracy.
Blank Solvent Samples	Samples of the pure mobile phase; used to identify and filter out background ions and contaminants originating from the solvent or system.
Benchmark Datasets [4]	Publicly available LC-HRMS datasets with known features and expected outcomes; used to optimize preprocessing parameters and benchmark software performance.
Software Containers/Virtual Machines [20]	Pre-configured computational environments (e.g., Docker, Singularity); ensure software version and dependency control, enhancing the reproducibility of the preprocessing workflow.

Retention time alignment is not merely a data cleaning step but a foundational process that dictates the validity of all subsequent conclusions drawn from LC-HRMS data. The choice of alignment algorithm and software tool directly influences the completeness of the feature table, the precision of quantification, and the discriminative power of multivariate models. As the field moves towards more complex and large-scale studies, adopting robust, reproducible, and well-documented alignment protocols is paramount. Ensuring the quality of this initial step is the key to unlocking biologically meaningful and statistically sound results in metabolomics, exposomics, and drug development research.

In liquid chromatography–high-resolution mass spectrometry (LC-HRMS) based proteomic and metabolomic experiments, retention time (RT) alignment is a critical preprocessing step, especially for large cohort studies [3]. The retention time of each analyte can shift between samples for multiple reasons, including matrix effects, instrument performance variability, and operational conditions over time [3] [82]. These shifts introduce errors in the correspondence process—the identification of the same compound across multiple samples—which is fundamental to comparative, quantitative, and statistical analysis [3].

The central challenge in RT alignment lies in effectively correcting for both monotonic shifts (consistent drift in one direction) and complex non-monotonic shifts (local, non-linear variations) that occur simultaneously in experimental data [3]. Failure to properly align retention times severely compromises downstream data interpretation, leading to inaccurate compound identification, unreliable quantification, and ultimately, flawed biological conclusions. This application note provides a structured framework for selecting and implementing RT correction strategies based on specific research objectives, data characteristics, and analytical requirements.

Key Alignment Methodologies and Tools

Primary Computational Approaches

Current computational methods for RT alignment fall into two main categories, each with distinct strengths and limitations:

Warping Function Methods: These approaches correct RT shifts between runs using a linear or non-linear warping function. Tools like XCMS, MZmine 2, and OpenMS employ this methodology [3]. A significant limitation of traditional warping models is their inherent difficulty in handling non-monotonic RT shifts because the warping function itself is monotonic [3] [83]. They work best for datasets with consistent, predictable drift.
Direct Matching Methods: These methods attempt to perform correspondence solely based on the similarity between specific signals from run to run without constructing a warping function [3]. Representative tools include RTAlign and Peakmatch [3]. While offering potential advantages for complex shifts, the performances of existing direct matching tools have often been reported as inferior to warping function methods due to uncertainties in MS signals [3].
Hybrid and Advanced Learning Methods: To overcome the limitations of the above approaches, newer tools combine elements of both methods or leverage machine learning. DeepRTAlign, for instance, integrates a coarse alignment (pseudo warping function) with a deep learning-based direct matching model, enabling it to address both monotonic and non-monotonic shifts effectively [3]. Other advanced methods utilize support vector regression (SVR) and Random Forest algorithms for normalization, particularly in scenarios involving long-term instrumental drift [82].

Essential Research Reagents and Materials

Successful RT alignment and HRMS analysis depend on the use of proper quality control measures and reference standards.

Table 1: Key Research Reagent Solutions for HRMS Data Preprocessing

Reagent/Material	Function in RT Alignment & Quality Control
Pooled Quality Control (QC) Sample	A composite sample from all study samples; analyzed at regular intervals to establish a normalization curve or algorithm for correcting signal drift over time [82].
Internal Standards (IS)	A set of well-characterized compounds used to monitor and correct for RT shifts and signal intensity variations within and between batches [82].
System Suitability Test (SST) Mix	A defined set of reference standards covering a range of chemical properties, analyzed to verify instrument performance and mass accuracy before and after sample batches [5].
Virtual QC Sample	A computational construct incorporating chromatographic peaks from all QC results, serving as a meta-reference for analyzing and normalizing test samples when physical QC composition changes [82].

Decision Framework for Tool Selection

Selecting the optimal RT alignment tool requires a systematic assessment of your data and research goals. The following framework guides this decision-making process.

Assess Data Characteristics and Shift Complexity

The first step involves characterizing the nature of the RT shifts in your dataset.

For primarily monotonic shifts: If your data, potentially from a short-term, well-controlled experiment, exhibits consistent drift, traditional warping function tools like XCMS, MZmine 3, or OpenMS are appropriate and often show high consistency [3] [83].
For complex non-monotonic shifts: If your dataset is from a large cohort or long-term study (e.g., over 155 days) with local, non-linear RT variations, you need more advanced tools. A hybrid tool like DeepRTAlign is specifically designed to handle this complexity by combining a coarse alignment with a deep learning model [3].
When alignment is not desired: For specific untargeted metabolomics workflows, methods like Regions of Interest-Multivariate Curve Resolution (ROIMCR) can process data without requiring prior time alignment, simplifying the workflow for certain biological applications [15] [42].

After identifying the tool category, the next step is to consider implementation factors.

For adequate computational resources: If your infrastructure supports it, leveraging a tool with a built-in deep learning model (DeepRTAlign) or creating custom correction models is feasible. DeepRTAlign uses a neural network with three hidden layers (5,000 neurons each) for highly accurate feature pairing [3].
For long-term, highly variable data: When correcting for instrumental drift over extended periods, Random Forest (RF) algorithms have been shown to provide the most stable and reliable correction model, outperforming Spline Interpolation (SC) and Support Vector Regression (SVR), which can over-fit [82].
For precise local correction: If your focus is on accurate, localized RT adjustments and the data variation is not excessively high, SVR-based models can be effectively applied using batch number and injection order as input parameters for normalization [82].

Detailed Experimental Protocols

Protocol 1: DeepRTAlign for Complex Cohort Data

DeepRTAlign provides a robust solution for aligning large-scale proteomic and metabolomic datasets with complex RT shifts [3].

Workflow Overview:

Step-by-Step Methodology:

Precursor Detection and Feature Extraction: Use a feature detection tool (e.g., XICFinder, Dinosaur) on raw MS files. Detect isotope patterns in each spectrum and merge subsequent patterns into a feature. A mass tolerance of 10 ppm is recommended [3].
Coarse Alignment: Linearly scale the RT in all samples to a common range (e.g., 80 minutes). For each m/z, select the feature with the highest intensity. Divide samples into pieces by a defined RT window (e.g., 1 min). Calculate the average RT shift for features in each piece relative to an anchor sample and apply this shift [3].
Binning and Filtering: Group all features based on m/z using parameters bin_width (default 0.03) and bin_precision (default 2). Optionally, for each sample and m/z window, retain only the feature with the highest intensity in a user-defined RT range [3].
Input Vector Construction: Construct a 5x8 input vector for the neural network using the RT and m/z of the target feature and two adjacent features before and after it. Normalize the values using predefined base vectors (e.g., [5, 0.03] for difference values, [80, 1500] for original values) [3].
Deep Neural Network Processing: Process feature pairs through a DNN classifier with three hidden layers (5,000 neurons each). The model, trained on hundreds of thousands of feature pairs, distinguishes between alignable and non-alignable pairs using BCELoss and Adam optimizer [3].
Quality Control: Implement a QC module to calculate the final False Discovery Rate (FDR) by building decoy samples and verifying that features in the decoy are not aligned [3].

Protocol 2: QC-Based Drift Correction for Long-Term Studies

This protocol is essential for studies involving data acquisition over weeks or months, where significant instrumental drift occurs [82].

Step-by-Step Methodology:

QC Sample Preparation and Analysis: Create a pooled QC sample from aliquots of all study samples. Analyze this QC sample repeatedly (e.g., 20 times over 155 days) at regular intervals throughout the data acquisition period [82].
Calculate Correction Factors: For each component k in the n QC measurements, calculate a correction factor: y_i,k = X_i,k / X_T,k, where X_i,k is the peak area in the i-th measurement, and X_T,k is the median peak area across all n measurements [82].
Model Correction Function: Express the correction factor y_k as a function of the batch number p and injection order number t: y_k = f_k(p, t). Use the calculated {y_i,k} as the target dataset and the corresponding {p_i} and {t_i} as inputs to train a correction model [82].
Apply Correction to Samples: For a given sample S with raw peak area x_S,k for component k, calculate the corrected peak area x'_S,k = x_S,k / y, where y is the predicted correction factor from the model f_k for the sample's specific p and t [82].
Handle Missing QC Components: For compounds found in samples but absent from the QC, apply normalization using the average correction coefficient from all QC data or use the correction factor from an adjacent chromatographic peak [82].

Table 2: Comparison of Alignment and Correction Tools

Tool / Method	Primary Methodology	Best For	Key Strengths	Limitations / Considerations
DeepRTAlign [3]	Hybrid (Coarse alignment + DNN)	Large cohorts; complex non-monotonic shifts	Handles both shift types; improved identification sensitivity	Complex setup; requires computational resources
XCMS / MZmine / OpenMS [3] [83]	Warping Function	Datasets with primarily monotonic shifts	High consistency; widely used and tested	Poor performance on non-monotonic shifts
ROIMCR [15] [42]	Data compression / Multivariate resolution	Untargeted metabolomics; avoiding alignment	Processes +/- mode data simultaneously without alignment	May not be suitable for all quantitative applications
Random Forest (RF) Correction [82]	Machine Learning (QC-based)	Long-term drift correction; highly variable data	Most stable and reliable model for long-term data	Requires extensive QC data for training
Support Vector Regression (SVR) [82]	Machine Learning (QC-based)	Precise local correction	Effective for modeling complex non-linear relationships	Can over-fit and over-correct highly variable data

Concluding Recommendations

Selecting the right tool for HRMS RT alignment is a critical determinant of data quality and research outcomes. The choice must be guided by the specific nature of the RT shifts (monotonic vs. non-monotonic), the scale and duration of the study, and available computational resources. For modern large-cohort studies exhibiting complex RT behavior, hybrid tools like DeepRTAlign represent a powerful solution. For managing long-term instrumental drift, QC-based correction protocols using Random Forest algorithms offer superior stability. By applying this structured decision framework, researchers can make informed, objective choices that enhance the reliability and reproducibility of their HRMS-based findings.

Conclusion

Retention time correction is a pivotal, non-negotiable step in the HRMS data processing pipeline, directly influencing the validity of downstream biological conclusions. A one-size-fits-all solution does not exist; the choice of alignment strategy—be it traditional warping, direct matching, or innovative deep learning and multi-way component analysis—must be guided by the specific data characteristics and research objectives. As metabolomics and proteomics increasingly move toward large-scale, multi-center cohort studies, robust and automated alignment tools that handle disparate datasets and complex variability will be paramount. Future directions point toward the tighter integration of alignment with feature identification, the development of more intelligent, self-optimizing algorithms, and standardized reporting frameworks. By mastering these alignment techniques, researchers can significantly enhance data quality, ensure cross-study comparability, and confidently uncover the subtle molecular signatures that drive advancements in biomedical and clinical research.