close-icon
Subscribe to learn more about this topic
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Improved Molecular Networking with GNPS

How Feature-Based and Ion Identity Molecular Networking improve the accuracy of GNPS molecular networks

datarevenue-icon
by
Markus Schmitt
Andrew Patt

The Global Natural Products Social Molecular Networking (GNPS) service is a popular online toolbox for analyzing untargeted tandem mass spectrometry data. The most popular tool on GNPS is Molecular Networking (MN). MN is great for:

  • Collapsing similar spectra into consensus spectra, reducing the size of a dataset;
  • Helping you visualize the chemical space represented in your experiment;
  • Suggesting identities for spectra that don’t have library matches.

Crucially, MN doesn’t use MS1-level data generated in your experiments. This can cause it to miss links between similar spectra, or make it lump together spectra that belong to different compounds. This means you miss compounds that were in your sample, many of which may be relevant to your research.

To remedy this, GNPS has released two new Molecular Networking tools that use MS1-level data in addition to the MS2 data used by classic MN. This allows you to better distinguish between similar compounds such as isotopes or coeluting metabolites, which were both indistinguishable in classic MN. 

We spoke to Daniel Petras, part of the team that developed these tools, to understand how they work and how you can use them to annotate metabolites in your data.

Let’s first delve deeper into how classical MN annotates your spectra, and see where Daniel and the GNPS team saw room for improvement.

Classical molecular networking only uses MS2 data

MN is used to group similar spectra into clusters, which helps propagate  spectral ids from spectral library matching. MN accomplishes this by carrying out a few steps under the hood:

  • Condenses matching spectra into consensus spectra: Identifies spectra with the same precursor ion/high spectral similarity and groups them, then collapses grouped spectra into a consensus spectrum (one per group). These groups become nodes in the molecular network.
  • Links similar consensus spectra: Calculates pairwise cosine similarity between consensus spectra and constructs edges in the molecular network based on these similarity values.
  • Library annotation: Matches as many nodes in the network to library spectra as possible.
  • Propagates identities: Uses chemical rationale to infer the identity of unannotated nodes that neighbor putatively identified nodes (called “identity propagation”).
A high-level schematic detailing the steps in Molecular Networking. The workflow shows how MS2 spectra are organized based on similarity/precursor ion, collapsed, and then clustered based on similarity between consensus spectra.
You run MN on a collection of MS2 spectra (1). Spectra in your data are grouped based on their spectral similarity, as well as the mass of their precursor ions (2). MN collapses spectra in the same group into a consensus spectrum, represented by a node in the molecular network (3), with molecular masses a, b and c in this example. Finally, MN draws edges in the network between nodes with highly similar consensus spectra. Connected nodes are likely to represent chemically similar compounds.

Despite the power of MN, there are some issues that it cannot resolve, since it doesn’t use MS1-level data. For example, sometimes you’ll see isomers exhibit the same fragmentation patterns, but elute from an LC at different times. These isomers would be lumped together by MN. You can circle back to the original data to distinguish these compounds using additional software packages like MS-DIAL or MZmine. But using multiple software tools to analyze the same data set is laborious and time-consuming.

To allow MN to capture these differences, and to make MN a one-stop shop for annotating your data, Daniel and his team developed a new version of MN that leverages MS1 data called Feature-Based Molecular Networking. 

Feature-Based Molecular Networking uses MS1 data to improve clustering results

Like MN, Feature-Based Molecular Networking (FBMN) clusters MS2 spectra, produces a consensus spectrum for each cluster, and builds edges between spectra based on cosine similarity. FBMN achieves superior resolving power to MN by incorporating MS1-level information into its clustering calculations, as well as two optional information sources: ion mobility separation and MSE.

To illustrate the advantages of FBMN, Daniel and the GNPS team ran FBMN on data generated in a drug discovery project of Euphorbia samples. They found that application of FBMN uncovered several positional isomers/stereoisomers of 4-deoxyphorbol ester compounds that were missed by MN. 

Example molecular networks generated by Molecular Networking and Feature-Based Molecular Networking. Incorporating chromatographic information from the MS1 level improves the ability of the Molecular Networking algorithm to distinguish between compounds with similar MS2 spectra.
Feature-Based Molecular Networking distinguishes isomers that were collapsed by MN. Several isomers of 4-deoxyphorbol ester compounds could not be identified in the example data from their MS2 spectra alone. However, their different elution times allowed FBMN to distinguish between them.

In this example, using FBMN allowed the team to identify potential new anti-viral compounds that would have remained hidden had they used classical MN. Incorporating MS1-level information into clustering through FBMN is obviously a step in the right direction, which is why FBMN has been cited in more than 80 publications since its release in 2017. 

However, even with the advanced power of FBMN, you can still encounter difficulties when linking nodes derived from the same precursor compound in your data. Ions from the same compound sometimes fragment differently, resulting in different MS2 spectra, which isn’t traceable using the information leveraged by MN and FBMN. To fix this, a cutting-edge tool released this year called Ion Identity Molecular Networking (IIMN) improves the accuracy of Molecular Networking results by accounting for these scenarios using new ion identity information sources. 

Ion Identity Molecular Networking uses ion identity correlation to find further links between spectra

IIMN lets you link together ions derived from the same compound by using MS1 information such as chromatographic feature shapes. FBMN and IIMN have a lot of similarities, but they use different types of MS1 information.

MS1 information used by FBMN and IIMN. IIMN focuses on linking ions from the same parent compound using feature shape, as well as instrument/experimental parameters, to narrow the list of plausible compounds. FBMN can accept optional experimental information like ion mobility, and uses MS1-derived isotope patterns to link features from the same compound.

Ion Identity Molecular Networking significantly increases your annotation rate 

Daniel and the GNPS team tested IIMN on 24 public datasets, using the metaCorrelate algorithm from the MZmine workflow. These public datasets ran the gamut of biospecimen varieties, from standards to cell cultures, feces, food, marine samples, and biofluids (saliva, urine, and plasma). The team propagated identities to first neighboring IIMN nodes built on these datasets. They found that 16 out of the 24 datasets saw the number of MS2 library-annotated features increase by over 10%.

Bar plot displaying the number of features assigned MS2 library identities using IIMN, normalized by the total number of library matches in the dataset.
IIMN identity propagation increased the proportion of features with MS2 library annotations by an average of 35% across 24 publicly available datasets.

One way that IIMN improves annotation rates is that IIMN can help overcome the spectral bias of most MS2 libraries. The team observed differing proportions of protonated, sodiated and ammoniated adducts across their sample, which all differed from the adduct proportions of MoNA and GNPS, (both are ~65% protonated adducts, while in most datasets the proportion of protonated adducts was ~23%). IIMN can be used to expand spectral libraries to these less well-annotated adducts, which can have big payoffs for biological samples that feature unusual ions.

Find all the links between your spectra by combining FBMN and IIMN

Lastly, you can use FBMN and IIMN in tandem for maximum benefit, since they leverage distinct information sources. To demonstrate this, the team applied FBMN to a dataset with 88 feces/gall bladder extracts from several animal species. They zoomed in on a specific class of lipids (bile acids and bile acid conjugates) and observed that FBMN placed highly similar molecules with the same adducts in different subnetworks – an undesirable outcome. 

The team then applied IIMN to this same network, and it introduced additional edges based on ion identity information. As a result, the bile acid subnetworks became far more connected. Additionally, the number of nodes was reduced because IIMN was able to collapse different ion species from the same compound into a single node. IIMN also integrated a number of singleton nodes whose ion identity could be determined, which incorporated more features into the final network. 

Molecular networks generated by FBMN and IIMN. IIMN simplifies networks and finds additional relationships between features missed by FBMN.
FBMN alone (top left) was unable to link together ions with the same adducts, resulting in several subnetworks and a large number of singleton nodes. When IIMN (top right) was performed in addition to FBMN, the chemical links between features were recognized, and singletons were correctly clustered with other ions combined with the same adducts. Edges introduced by IIMN are represented by red-dashed lines. When features are collapsed to consensus spectra and matched to a library, lipids from the same class cluster together (bottom left). 

Increasing the number of connected nodes in your molecular networks greatly improves your chances of annotating as many features as possible in your data. Getting started with the right tools is as easy as accessing the GNPS website.

How to get started with FBMN and IIMN

Integrating FBMN and IIMN into your computational workflow is a breeze. FBMN accepts MS1-level information from popular feature-detection and alignment tools like XCMS, mzMine, MS-DIAL, and standard formats like mzTab-M. You can try out FBMN at the GNPS website, and also find instructions on exporting output from your tools for use in the online FBMN interface. You can also read more details about how FBMN in the original publication through Nature Methods.

IIMN is also available on the GNPS website. IIMN is fully integrated into the FBMN workflow and interfaces with all the tools above that work with FBMN. IIMN is also featured in a publication in Nature Communications.

We help researchers to understand and use the latest algorithms in metabolomics. If you want to improve your workflow and need advice on integrating FMBN and IIMN into your research, go ahead and reach out to us.

Get Notified of New Articles

Leave your email to get our weekly newsletter.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.