close-icon
Subscribe to learn more about this topic
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

AutoTuner: Lightning-Fast Automatic Parameter Selection for MS2 Data Analysis

AutoTuner is a new Bioconductor package that quickly and accurately computes preprocessing parameters for your MS2 metabolomics data

datarevenue-icon
by
Markus Schmitt
Andrew Patt

You may have found MS2 parameter selection a major obstacle in your metabolomics research. You need proper parameters for feature finding algorithms to see past noise and extract useful features. But manually optimizing these parameters is really inefficient, as there are too many possible parameter combinations.

AutoTuner is a new tool built to address this issue: To find the correct parameters much more quickly, without compromising on accuracy. Craig McLean and Elizabeth Kujawinski, the creators of Autotuner, were happy to talk us through the details of how AutoTuner works.

AutoTuner and another commonly used automatic optimization tool, isotopologue parameter optimization (IPO), find the best parameters by sifting through many candidate combinations for you. However, IPO relies on gradient descent for optimization, which requires repeated runs of the expensive centWave algorithm. This can tie up your computers for days on end. 

Instead, AutoTuner relies on a statistical inference strategy that slashes runtime down to minutes in many cases. In this post, we dig into how AutoTuner achieves this blazing fast computational speed, and show how AutoTuner can improve your results, as well as how to get started with it.

AutoTuner quickly optimizes parameters for the centWave algorithm

AutoTuner optimizes seven different parameters for the popular centWave peak-picking algorithm used in XCMS and mzMine. This means you can integrate AutoTuner into any workflow that already uses centWave.

Craig and Elizabeth chose seven centWave parameters to optimize that have the highest impact on peak selection results:

CentWave parameters optimized by AutoTuner

How AutoTuner finds optimal parameters

At the outset, AutoTuner asks you to identify peaks in the total ion chromatogram (TIC) of the experiment. This is a representation of the total amount of ions that passed through the column at each time point in the chromatography run. AutoTuner helps you out by applying a sliding window analysis to suggest potential peaks.

It then automatically estimates the group difference parameter using the maximum time difference of pairs of peaks from the same feature in the TIC coming from different samples.

Once you’ve identified TIC peaks, AutoTuner computes the remaining parameters by itself:

  • It estimates PPM (error) using empirical distributions of binned m/z values which are used to test the difference between hypothesized true features and noise.
  • It calculates S/N Threshold using the minimum intensity per bin, minus the average noise intensity, divided by the standard deviation of noise intensity.
  • It estimates Scan count as the smallest number of scans contained across all bins.
  • It computes Noise and Prefilter intensity as 90% of the minimum integrated bin and single scan intensities, respectively.
  • The Minimum Peak-width is the smallest number of scans in any bin, multiplied by the duty cycle (specific to instrument).
  • AutoTuner estimates Maximum Peak-width by expanding bins until it finds a scan that falls below the computed PPM threshold.

Autotuner then estimates dataset-wide parameters: 

  • PPM and S/N Threshold are the average of these values across peaks weighted by the number of bins per peak. 
  • For Scan count, Noise, Prefilter intensity, and Minimum Peak-width, AutoTuner returns the minimum values from all bins detected. 
  • Group difference is the dataset-wide maximum.
  • Maximum Peak-width is the average of each largest peak-width by-sample. 

Elizabeth and Craig designed each step of AutoTuner to minimize computational load and improve performance, without sacrificing accuracy. To test this design, they compared AutoTuner’s performance to IPO in an example metabolomics dataset with 85 pure metabolite standards.

AutoTuner finds more true features than IPO

Elizabeth and Craig first compared the rates at which AutoTuner and IPO detected features belonging to standards spiked into their samples. AutoTuner detected a larger portion of known features across isotopologues.

Bar chart comparing the performance of AutoTuner and IPO in recovering true features in a data set.
AutoTuner detected features corresponding to standards at a higher rate than IPO. [M] represents the ¹²Cn13C0 isotopologue, [M +1] represents the ¹²Cn-113C1 isotopologue, and [M] represents the ¹²Cn-213C2 isotopologue.

They then compared AutoTuner and IPO on a different dataset, generated from cell cultures. Although the two methods identified a large number of features in common (1,022), IPO found many features not seen by AutoTuner (2,606) while AutoTuner found few that were not found by IPO (203).

Could this be a matter of quantity vs quality? To answer this question, Craig and Elizabeth calculated the continuous wavelet transform (CWT) coefficient of these non-overlapping features – which measures peak steepness – and compared them between methods. Here, Craig and Elizabeth used CWT as a proxy for feature resolution (a measure of how well peaks were separated by chromatography), allowing them to compare the quality of features found uniquely by each method.

A graphic explaining the meaning of CWT coefficients
The CWT coefficient is higher for “steeper” peaks. High CWT typically corresponds to singular rather than compound peaks. Therefore, features with high CWT are more likely to be correctly distinguished from adjacent features.

They found that the CWT coefficient was significantly higher in unique AutoTuner features compared to IPO features. This suggests that the unique features picked up by AutoTuner were more likely to be real

Of course, when you’re preprocessing data, your end goal is often to find features that can be matched to database spectra. It’s also easier to be confident in your feature table when it contains more identifiable metabolites. Interestingly, when Craig and Elizabeth looked for MS2 spectra associated with features found by one method and not the other, they found that unique AutoTuner-identified features were more likely to have associated MS2 spectra than IPO-identified features.

Overall, AutoTuner identified more isotopologues of standards than IPO, and found identifiable metabolites at a higher rate than IPO. However, given the stochastic nature of some of AutoTuner’s calculations, Craig and Elizabeth wanted to make sure that AutoTuner’s estimates were stable when calculated many times in the same data. Consistent parameter estimates would ensure they could see AutoTuner’s high performance consistently in datasets of different sizes.

Estimated parameters are consistent

To test the stability of AutoTuner parameter estimates, Craig and Elizabeth compared parameter estimates that were made from 55 subsets (3 to 9 samples each) of a 90-sample rat fecal microbiome (community) dataset.

They found that results were highly consistent across these subsets, although the coefficient of variation (CV) of these predictions decreased with sample size. In six out of the seven parameters tested in the community dataset, the CV across predictions was less than or equal to 0.1 when using a sample number of 9. In other words, AutoTuner parameter estimates are highly stable and become more stable as more samples are included.

Box plots showing dispersion of AutoTuner parameter estimates across data subsets
Comparing parameter estimates generated in subsets of the community dataset of varying size. Estimates had low variance across subsets, and variance decreased with increasing sample size.

AutoTuner dramatically improves run time

The most remarkable feature of AutoTuner is its blazing fast speed.

Across two ionization modes in the standards, culture, and community dataset, AutoTuner ran between 400 and 7,000 times faster than IPO. It’s important to note that these computations were run on subsets of the full data (6, 4, and 6 samples from culture, standards, and community data, respectively). But this illustrates the amazing gains in efficiency that AutoTuner provides. 

AutoTuner can automatically compute parameters in minutes for datasets that tie up IPO for over a day

How to use AutoTuner in your work

AutoTuner is implemented as a Bioconductor package for the R programming language. As with all Bioconductor packages, instructions on installation, documentation, and vignettes can be found on the Bioconductor website. 

A potential drawback of AutoTuner is that it is designed only for use with the centWave feature detection algorithm, as opposed to IPO, which can estimate parameters for many different feature detection methods. Therefore, you should be sure to integrate AutoTuner into a workflow that uses MZmine or XCMS. Craig and Elizabeth also recommend a minimum sample size of 9 for best results in culture data, and 12 for community data.

We love helping researchers implement cutting-edge tools and methods in metabolomics. If you have any questions, please contact us.

Get Notified of New Articles

Leave your email to get our weekly newsletter.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.