net4pg: Handle Ambiguity of Protein Identifications from Shotgun
Proteomics
In shotgun proteomics, shared peptides (i.e., peptides that might
originate from different proteins sharing homology, from different
proteoforms due to alternative mRNA splicing, post-translational
modifications, proteolytic cleavages, and/or allelic variants) represent a
major source of ambiguity in protein identifications. The 'net4pg' package
allows to assess and handle ambiguity of protein identifications. It
implements methods for two main applications. First, it allows to represent
and quantify ambiguity of protein identifications by means of graph
connected components (CCs). In graph theory, CCs are defined as the largest
subgraphs in which any two vertices are connected to each other by a path
and not connected to any other of the vertices in the supergraph. Here,
proteins sharing one or more peptides are thus gathered in the same CC
(multi-protein CC), while unambiguous protein identifications constitute CCs
with a single protein vertex (single-protein CCs). Therefore, the proportion
of single-protein CCs and the size of multi-protein CCs can be used to
measure the level of ambiguity of protein identifications. The package
implements a strategy to efficiently calculate graph connected
components on large datasets and allows to visually inspect them.
Secondly, the 'net4pg' package allows to exploit the increasing
availability of matched transcriptomic and proteomic datasets to
reduce ambiguity of protein identifications. More precisely, it
implement a transcriptome-based filtering strategy fundamentally
consisting in the removal of those proteins whose corresponding
transcript is not expressed in the sample-matched transcriptome. The
underlying assumption is that, according to the central dogma of
biology, there can be no proteins without the corresponding
transcript. Most importantly, the package allows to visually inspect
the effect of the filtering on protein identifications and quantify
ambiguity before and after filtering by means of graph connected
components. As such, it constitutes a reproducible and transparent
method to exploit transcriptome information to enhance protein
identifications. All methods implemented in the 'net4pg' package are fully
described in Fancello and Burger (2021) <doi:10.1101/2021.09.07.459229>.
Version: |
0.1.0 |
Depends: |
R (≥ 3.6.0) |
Imports: |
data.table, graph, magrittr, Matrix, methods, utils |
Suggests: |
BiocStyle, ggplot2, igraph, knitr, rmarkdown, roxygen2, testthat (≥ 3.0.0) |
Published: |
2021-09-20 |
Author: |
Laura Fancello
[aut, cre],
Thomas Burger
[aut, ctb] |
Maintainer: |
Laura Fancello <laura.fancello at cea.fr> |
BugReports: |
https://github.com/laurafancello/net4pg/issues |
License: |
GPL-3 |
URL: |
https://github.com/laurafancello/net4pg |
NeedsCompilation: |
no |
Materials: |
README NEWS |
CRAN checks: |
net4pg results |
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=net4pg
to link to this page.