agvgd

CRAN status Codecov test coverage

The R package {agvgd} provides an R implementation of the Align-GVGD1,2,3 (A-GVGD) method.

A-GVGD combines multiple sequence alignment of orthologous sequences with the Grantham distance4 to classify missense variants, i.e. to distinguish human disease susceptibility missense changes from changes of little clinical significance.

Installation

Install {agvgd} from CRAN:

install.packages("agvgd")

You can install the development version of {agvgd} like so:

# install.packages("remotes")
remotes::install_github("maialab/agvgd")

Usage

Here is a minimal example using a dummy alignment constructed from an R matrix:

library(agvgd)

# The alignment can be provided as a simple matrix:
alignment <- matrix(
  c('P', 'M', 'I',
    'P', 'I', 'I',
    'P', 'L', 'I'),
  nrow = 3,
  byrow = TRUE
)

# In this example we will interrogate changes to the second position in the
# alignment, i.e., changes to Methionine (M); therefore we set the position of
# interest (poi) to 2.
poi <- 2

# Here's a set of three possible missense changes (substitutions): Isoleucine
# (Ile, I), Leucine (Leu, L), and Tryptophan (Trp, W):
substitutions <- c('I', 'L', 'W')

# agvgd package's main function is `agvgd()` :)
agvgd(alignment = alignment, poi = poi, sub = substitutions)
#> # A tibble: 3 × 7
#>     res   poi ref   sub      gv    gd prediction
#>   <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#> 1     2     2 M     I      14.3   0   C0        
#> 2     2     2 M     L      14.3   0   C0        
#> 3     2     2 M     W      14.3  60.4 C35

This is another simple example but using one of the bundled alignments with {agvgd}. This one is the alignment for the gene ATM. Let’s say you are interested in position 16 of the alignment, and that you would like to know the impact of all possible missense substitutions:

# `read_alignment()` either reads in one of the bundled alignments with
# `{agvgd}` by gene name or directly from a FASTA file.
alignment <- read_alignment(gene = 'ATM')

# For the sake of illustration, let us define a new alignment based on a narrow
# region of the original alignment by focusing only on the first 30 positions.
# Note: rows are protein sequences; columns are alignment positions.
narrow_alignment <- alignment[, 1:30]

# poi: position of interest
poi <- 16

# Print the alignment to the console and highlight the position of interest
# (POI). NB: The POI is not highlighted in GitHub's README but it is on the R
# console.
print(narrow_alignment, poi = poi)
#> Hsap_ATM_AAB65827.1  1 M-SLVLNDLLICCRQLEHDRATERKKEVEK
#> Mmus_ATM_NP_031525.2 1 M-SLALNDLLICCRQLEHDRATERRKEVDK
#> Sscr_ATM_AAT01608.1  1 M-SLALNDLLICCRQLEHDRATERRKAVEN
#> Mdom_ATM_IARC        1 M-SLALNDLLLCCRQLENDRATERRKEVEK
#> Ggal_ATM_edited      1 M-SLVLHDLLTCCRRLENERATERRNEIEN
#> Xlae_ATM_AAT72929.1  1 M-SLALHELLLCCRQIETDKATERKKEIVK
#> Drer_ATM_IARC_v2     1 M-SLALHELLVCCRGLENEKATERKKEVDR
#> Bflo_ATM_IARC        1 MTDLLTHDLRDCCCHLESDKVTERKKNAEK
#> Spur_ATM_ABY60856.1  1 MAEVLIP-LRTACGYLGSDKITERKKQIDI

# You may use `amino_acids()` to get a vector of all the 20 standard amino acids
# and hence easily get a vector of all possible substitutions:
all_substitutions <- amino_acids()

agvgd(alignment = narrow_alignment, poi = poi, sub = all_substitutions)
#> # A tibble: 20 × 7
#>      res   poi ref   sub      gv    gd prediction
#>    <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#>  1    15    16 L     S      4.86 142.  C65       
#>  2    15    16 L     R      4.86  97.6 C65       
#>  3    15    16 L     L      4.86   0   C0        
#>  4    15    16 L     P      4.86  95.4 C65       
#>  5    15    16 L     T      4.86  89.3 C65       
#>  6    15    16 L     A      4.86  93.7 C65       
#>  7    15    16 L     V      4.86  29.6 C15       
#>  8    15    16 L     G      4.86 135.  C65       
#>  9    15    16 L     I      4.86   0   C0        
#> 10    15    16 L     F      4.86  21.3 C0        
#> 11    15    16 L     Y      4.86  33.0 C25       
#> 12    15    16 L     C      4.86 197.  C65       
#> 13    15    16 L     H      4.86  94.3 C65       
#> 14    15    16 L     Q      4.86 109.  C65       
#> 15    15    16 L     N      4.86 149.  C65       
#> 16    15    16 L     K      4.86 102.  C65       
#> 17    15    16 L     D      4.86 168.  C65       
#> 18    15    16 L     E      4.86 134.  C65       
#> 19    15    16 L     M      4.86  10.1 C0        
#> 20    15    16 L     W      4.86  60.5 C55

The {agvgd} logo, agvgd.png, is a derivative work of an illustration of “Globin Evolution” by David S. Goodsell and the RCSB PDB, used under CC-BY-4.0. agvgd.png is licensed under CC-BY-4.0 by Ramiro Magno.

Acknowledgements

The work on this package benefited greatly from the feedback of Professor Sean Tavtigian, and Dr. Russell Bell.

References

1. Tavtigian, S.V., Deffenbaugh, A. M., Yin, L., Judkins, T., Scholl, T., Samollow, P.B., de Silva, D., Zharkikh, A., Thomas, A. Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. Journal of Medical Genetics 43, 295–305 (2006). doi: 10.1136/jmg.2005.033878.

2. Mathe, E., Olivier, M., Kato, S., Ishioka, C., Hainaut, P., Tavtigian, S.V. Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods. Nucleic Acids Research 34, 1317–1325 (2006). doi: 10.1093/nar/gkj518.

3. Tavtigian, S.V., Byrnes, G. B, Goldgar, D. E., Thomas, A. Classification of rare missense substitutions, using risk surfaces, with genetic- and molecular-epidemiology applications. Human Mutation 29, 1342–1354. doi: 10.1002/humu.20896

3. Grantham, R. Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974). doi: 10.1126/science.185.4154.862.

4. Stone, E. A., Sidow, Arend. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Research 15, 978–986 (2005). doi: 10.1101/gr.3804205.