In-source fragmentation (ISF) can be a pain in metabolomics MS experiments. Because of ISF, one metabolite can produce multiple features, corresponding to the formation of different adducts or multimers of the parent ion. ISFs are sometimes difficult to assign identities to, especially when no MS2 fragmentation information is available, since they can masquerade as fragments from other compounds.
However, researchers have noted that ISF produces intelligible patterns that can actually be helpful for metabolite identification. Tools like RAMClust and Enhanced in-Source Fragmentation Annotation (eISA) already use these patterns to their advantage.
CliqueMS is a new tool that also uses this ISF information to cluster and annotate features, as well as additional criteria like coelution profiles. We interviewed Antoni Aguilar-Mogas, a postdoc researcher at East Carolina University who helped create CliqueMS, to learn more about what makes CliqueMS a special addition to the metabolomics software ecosystem.
CliqueMS excels at accurately grouping features resulting from the same compound, which lets you correctly identify metabolites at a higher rate than other tools. In this post, we dive into the similarity network strategy that CliqueMS relies on, show some examples of CliqueMS at work in a variety of real datasets, and point you to resources to get started with CliqueMS.
How CliqueMS works
CliqueMS groups together related features using a network clustering strategy:
- Build network: CliqueMS calculates the similarity between coelution profiles of different features using the cosine similarity metric, and constructs a similarity network. Features are nodes in this network and edges are the cosine similarity between those features.
- Find feature clusters: To find cliques in this network, CliqueMS employs a deterministic, greedy clustering approach to group features that probably originated from the same compound.
- Link related features within clusters: Just like in MS2 fragmentation, in-source fragmentation can produce coherent patterns of adduct formation. CliqueMS exploits this fact by examining the correlation of abundance of features within cliques. It compares these abundance profiles to an expected frequency of adducts and ISFs based on frequencies observed in real data. Once features are grouped together, CliqueMS uses differences in m/z values to identify putative isotopes, adducts and fragments within a group.
- Annotate compounds: Lastly, CliqueMS matches these features to spectra in the National Institute of Standards and Technology 14 MS/MS library. It returns the top five candidate matches for each clique, allowing you to inspect highly likely alternative identities as well.
The feature clustering step is notably similar to the spectral grouping performed by the popular CAMERA package. The principal difference is that CliqueMS uses cosine similarity to group features, while CAMERA uses Pearson correlation. As an initial validation, Antoni and the team decided to compare how CliqueMS and CAMERA cluster features in a simple dataset.
Comparing CliqueMS feature clustering to CAMERA
Antoni and the team generated a real dataset on a mixture of nine standards. They then artificially introduced noise into the elution profiles of features, randomly increasing or decreasing the number of coeluting features. Using both algorithms, they clustered features and tracked how many features CliqueMS and CAMERA correctly or falsely clustered with one another.
The team found that across simulations, CliqueMS was better than CAMERA at grouping features from the same standard together. This is possibly because Pearson correlation relies on a linear relationship between abundances of features, whereas cosine similarity can detect nonlinear relationships, which are common in this data.
With their next round of experiments, the team showed how better grouping techniques lead to higher quality identifications.
CliqueMS agrees with manual annotations in a simple dataset
The team next tested CliqueMS by examining the identities found in the nine-standard dataset described above. In this data, MS1 spectra were manually annotated, and their identities were further verified using MS2 spectral matching, to establish a ground truth. They again compared CliqueMS’ performance against CAMERA’s.
CliqueMS correctly assigned all nine standards their true identities: It identified between two and five adducts associated with each standard, and within each cluster the correct identity was always ranked among the top two candidate identities.
By contrast, CAMERA only correctly identified five of nine standards. Of the remaining four, three were unannotated and one was misidentified.
The major driver behind this discrepancy in performance is CliqueMS’ superior clustering approach. CliqueMS grouped the 275 features in this data into a smaller number of groups (69) than CAMERA (164). This leveraged more spectra per clique when assigning a parent mass.
As a result, CliqueMS annotated 29 adducts and 13 isotopes, while CAMERA only annotated 17 adducts and 10 isotopes. By better grouping features from the same parent compound, CliqueMS annotates more features. This results in better quality and higher quantity identifications.
CliqueMS also performs well in complex data
To test if this trend held true in a noisier, more complex dataset, Antoni’s team applied CliqueMS and CAMERA in a more complex LC-MS1 dataset generated from mouse retinal tissue. The data contained 8,489 features from the positive ionization mode and 3,893 features in the negative ionization mode.
As before, CliqueMS organized the features into a smaller number of groups than CAMERA (606 vs 2,836 in positive and 349 vs 1,083 in negative). And again, CliqueMS annotated a higher proportion of features than CAMERA (70% vs 43% in positive and 44% vs 32% in negative).
Antoni and the team note that the larger number of potential adducts in the positive mode gives the clustering algorithm more information to work with. This leads to the greater disparity between CliqueMS and its competition.
CliqueMS outperforms other clustering-based methods that rely on multiple samples
CliqueMS can be applied to individual samples, like in the pilot experiments above. This makes it useful for both tiny and large experiments. Since CAMERA shares this capability, it is simple to benchmark CliqueMS against CAMERA using single-sample datasets. Other clustering-based annotation methods do exist, but they require the data to contain more than one sample.
You might expect that CliqueMS’ annotation rates would compare poorly with other methods because it only annotates samples independently, rather than using data from the entire experiment. To test this, Antoni and the team selected two additional tools (xMSannotator and MS-FLO), and applied all four to a publicly available dataset from MetaboLights (MBTLS103). The data comprised 13 and 18 samples that were run in two different chromatographic column conditions (HILIC and RP-C18, respectively).
Excitingly, the team still found that CliqueMS often had the highest annotation rate. In results across all samples, CliqueMS had the most annotated metabolites in the HILIC column, and finished second only to xMSannotator in the RP-C18 column. But CliqueMS is the clear winner when considering features annotated in only a portion of samples. This suggests carrying out annotations in each individual sample is more productive than in the whole data at once.
How to get started with CliqueMS
CliqueMS is implemented as a Bioconductor package for the R programming language. You can find information on installation and use at this page. You can also get additional info on installation, getting started, and implementation details in the supplemental section of The CliqueMS publication in Bioinformatics, which is open-access.
Or contact us with any questions on how to integrate CliqueMS into your work!