Subscribe to learn more about this topic
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

MetaClean Automates Peak Quality Assessments

Leverage machine learning to detect poor quality integrations and save hours assessing peaks manually

Markus Schmitt

Even the best metabolomics pipelines have a degree of variance, which can cause poor peak integration between samples. This reduces your ability to accurately quantify a metabolite, and usually means that you have to manually check the quality of every peak (of interest). 

Kelsey Chetnik, Lauren Petrick, and Gaurav Pandey have developed a new framework and R package called MetaClean. This combines eleven peak quality metrics and eight machine learning algorithms to automatically detect poorly integrated peaks. 

We interviewed Petrick and Pandey about the recent paper in Metabolomics and learned how MetaClean works and how you can apply it to your data.  

Current automatic peak integration is unreliable

Liquid chromatography coupled mass spectrometry (LC-MS) is a standard tool for identifying and quantifying compounds in metabolomics workflows. In LC-MS, retention time and peak shapes of identical compounds (metabolites) will vary slightly from one measurement to another. Slight fluctuations are normal, but automated integration can still mischaracterize peaks and complicate sample to sample comparison. 

In many cases, up to 30% of peaks integrated by standard post-processing tools may fail manual inspection. Low-quality peaks can be products of coeluting analytes, faulty alignment, or background noise. As a result, you can’t expect any two samples to have the exact same peaks, even when their makeup and concentrations match. But in order to compare concentrations, you still need to accurately integrate peaks from different samples. 

A set of four peak integrations. The top left integration is done correctly. The top right has integrated a badly shaped peak. The bottom right is integrating two peaks that are not resolved. The bottom left integrates only the first half of the peak. 
Three examples of possible integration problems common to LC-MS.

One way to evaluate peak integration is to review manually and judge the peak shape and included areas. This isn’t just extremely time-consuming; it’s practically impossible when you’re analyzing thousands of peaks per batch. As a result, manual QC is often done on a small subset of data and often only on peaks of interest. While this may remove false positives from your analysis, it can’t address false negatives and poorly integrated peaks can slip through. 

Machine learning can automate peak evaluation

Individual quality metrics can quantify individual peak quality. However, by themselves, these metrics are often insufficient to fully differentiate between low and high quality integrations. 

Machine learning models can find complex and valuable combinations of metrics that can aid this differentiation. All you need are the metrics (your features) and labels that describe whether a peak integration is high or low quality (your target variable). 

With these features and the target variables, you can train models and evaluate how well they predict the peak integration of unseen examples. If you find the models are highly accurate, you can easily apply those models to every single peak in a dataset (containing thousands of peaks). You can then filter out the low-quality peaks before you analyze your data.

The MetaClean features: Eleven peak quality metrics  

Chetnik, Petrick and Pandey explored whether a combination of quality metrics could work better as an effective feature set for a peak quality classifier. The team created MetaClean to generate these classifiers and evaluate their performance against several datasets. 

The team used a set of four metrics from Zhang et al. (M4 Metrics) and seven metrics from Eshghi et al. (M7 Metrics), as well as the combined M11 set consisting of all the metrics. These metrics were used to quantify overall peak shape and retention-time consistency between samples.

The MetaClean training data: 500 manually reviewed peaks

The team classified 500 pre-processed peaks by hand from 89 blood plasma samples: almost 40% failed manual visual inspection. They ensured a variety of different peak shapes were in the training data so the model would be trained on a representative set of peaks. 

The MetaClean model: The best of eight ML classification algorithms 

The next step was to find the machine learning algorithm that best classifies peak quality. MetaClean evaluates eight of the most commonly used algorithms, each paired  with the M4, M7, and M11 metrics to develop 24 different peak quality classifiers. The performance of each classifier is assessed with five-fold cross-validation, repeated ten times, and results are averaged. For this, MetaClean evaluates: 

  • Decision Tree;
  • Logistic Regression;
  • Naive Bayes;
  • Neural Network;
  • SVM with linear kernel;
  • AdaBoost;
  • Model Averaged Neural Network;
  • Random Forest.
A process flow chart showing that each peak metric set is evaluated with all eight machine learning algorithms using a five-fold cross-validation setup. The setup is repeated ten times to ensure confidence in the results. 
MetaClean evaluates 24 different potential peak classifiers built from a combination of three peak quality metric sets and eight machine learning algorithms using five-fold cross-validation.

After comparing the performance of each model, the team observed the AdaBoost algorithm using the M11 metric set performed the best. They called this model the “Global Peak Quality Classifier,” and it achieved almost 85% accuracy in classification on the development dataset. 

However, just testing the model on the same batch of samples doesn't provide an accurate picture of how well the model will perform in a real situation. An effective model also needs to perform well cross-batch and cross-platform:

  • Cross-batch: Other samples, from a different batch, measured on the same machine with the same settings.
  • Cross-platform: On samples measured on different machines, with different settings.

So the team evaluated 500 peaks from four more datasets to test. They evaluated the Global Peak Quality Classifier on:

  • a different dataset from the same instrument to test same-platform performance;
  • three datasets from different platforms to test cross-platform performance.

MetaClean performs well in analyses of data from the same platform

The Global Peak Quality Classifier evaluated a different dataset from the same instrument (Test 1). The classifier categorized peaks accurately in almost 81% of cases, even though this dataset was from a different sample type. 

MetaClean performs reasonably well in cross-platform analyses

The team applied the Global Peak Quality Classifier to four publicly accessible datasets, each using different MS instruments, LC columns, and/or ionization modes:

Even with these platform and sample matrix variabilities, the classifier correctly categorized between 65–80% of peaks accurately. However, the reduced performance for these datasets showed that cross-platform analyses are still quite challenging. 

A bar chart showing that the MetaClean Global Peak Quality Classifier achieved 85% accuracy against the development dataset, 81% accuracy against a dataset with a different sample matrix (Test 1), 65% accuracy against a dataset using a different ionization mode (Test 2), 75% accuracy against data from a different instrument in positive mode (Test 3) with a different sample matrix, and 79% accuracy against data from a different instrument in negative mode (Test 4) with a different sample matrix.
The Global Peak Quality Classifier achieved over 65% accuracy against all test sets, including those from different platforms and/or sample matrices. 

Train MetaClean on a per-platform basis

The team used two of these datasets (Test 3 and Test 4) from a lab that is different from the one that generated the development set as an independent case to re-train and evaluate models on their data. They trained two new models with the positive and negative mode data from the same platform, evaluating each against the other. They showed that using each model against the other dataset produces great cross-mode results. Each model can be optimized in a platform-specific fashion, and can even be applied to different ionization modes.  

A bar chart showing that the positive mode model achieved 85% accuracy against the negative mode test dataset. The negative mode model achieved 81% accuracy against the positive mode test dataset. 
The model trained on the positive mode data performed very well against the negative mode data, and vice versa, when trained on a per-platform basis.

Comparing MetaClean with RSD filtration

Random analytical errors can be difficult to address, especially in complex samples. Often, they are the product of background noise or sample carryover from a previous run and are not relevant to the question at hand. These peaks are often highly variable, and their integrations are not consistent. 

One way to find these errors is through Relative Standard Deviation (RSD) filtration, in which pooled samples are analyzed multiple times. Random peaks are discarded and common peaks are evaluated for consistency. When the standard deviation of peak area is outside of a cutoff, they are removed. 

Since MetaClean operates independently from RSD filtration, the team thought that first subjecting the data to RSD filtration could serve dual purposes: 

  • Comparing the performance of MetaClean against an established method;
  • Increasing the performance of MetaClean through combination with RSD filtration.

The team found that MetaClean outperformed RSD filtration in all test cases, accurately classifying the quality of the peak better than this widely used method. 

While RSD does a great job of ensuring that random analytical error is minimized, MetaClean focuses on removing integration issues that may pass RSD filtration. Since RSD and MetaClean are complementary approaches to data quality filtering, they work very effectively in combination.

Use Metaclean with your data

Training a MetaClean model on your data could be valuable for your workflow. With little up-front time investment, you can analyze all peaks from all the runs in your lab rapidly and greatly reduce the rate of false integrations without dedicating hours to manual peak reviews. 

Kelsey, Lauren, and Gaurav provide an open R package that evaluates your data against the eight ML and three quality metric sets to develop a model. You can download the package today and train your model to evaluate your LC-MS integrations. 

Be sure to read the MetaClean article published in Metabolomics to learn more about the platform.

Get Notified of New Articles

Leave your email to get our weekly newsletter.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.