README

The bpcs package performs Bayesian estimation of Paired Comparison models utilizing Stan, such as variations of the Bradley-Terry (Bradley and Terry 1952) and the Davidson models (Davidson 1970).

Installation

For the bpcs package to work, we rely upon the Stan software and the rstan package (Stan Development Team 2020).

To install the development version of the bpcs package, install directly from the Github repository.

remotes::install_github('davidissamattos/bpcs')

Minimal example

The main function of the package is the bpc function. For the simple Bradley-Terry model, this function requires a specific type of data frame that contains:

We will utilize the tennis dataset available (Agresti 2003). The dataset can be seen below and is available as data(tennis_agresti):

dplyr::sample_n(tennis_agresti,10) %>% 
  knitr::kable()

player0	player1	y	id
Seles	Sanchez	0	14
Sabatini	Sanchez	1	42
Seles	Graf	0	1
Graf	Sabatini	1	22
Sabatini	Sanchez	1	41
Seles	Navratilova	1	12
Graf	Sanchez	0	32
Sabatini	Navratilova	0	35
Sabatini	Sanchez	0	39
Seles	Graf	1	4

Based on the scores of each contestant, the bpc function computes automatically who won the contest. Alternatively, you can provide a vector of who won if that is already available (for more information see ?bpc.

For the simple Bradley Terry Model we specify the model type as 'bt'. Here we hide the MCMC sampler chain messages for simplicity in the output.

m<-bpc(data = tennis_agresti, #datafrane
       player0 = 'player0', #name of the column for player 0
       player1 = 'player1', #name of the column for player 1
       result_column = 'y', #name of the column for the result of the match
       model_type = 'bt', #bt = Simple Bradley Terry model
       solve_ties = 'none' #there are no ties in the dataset so we can choose none here
       )
#> 
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 8.7e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.87 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.299618 seconds (Warm-up)
#> Chain 1:                0.312838 seconds (Sampling)
#> Chain 1:                0.612456 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 3.9e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 0.284584 seconds (Warm-up)
#> Chain 2:                0.344625 seconds (Sampling)
#> Chain 2:                0.629209 seconds (Total)
#> Chain 2: 
#> 
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 3).
#> Chain 3: 
#> Chain 3: Gradient evaluation took 4.1e-05 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.41 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3: 
#> Chain 3: 
#> Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 3: 
#> Chain 3:  Elapsed Time: 0.305793 seconds (Warm-up)
#> Chain 3:                0.292243 seconds (Sampling)
#> Chain 3:                0.598036 seconds (Total)
#> Chain 3: 
#> 
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 4).
#> Chain 4: 
#> Chain 4: Gradient evaluation took 4.2e-05 seconds
#> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds.
#> Chain 4: Adjust your expectations accordingly!
#> Chain 4: 
#> Chain 4: 
#> Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 4: 
#> Chain 4:  Elapsed Time: 0.30643 seconds (Warm-up)
#> Chain 4:                0.300302 seconds (Sampling)
#> Chain 4:                0.606732 seconds (Total)
#> Chain 4:

If rstan is available and correctly working this function should sample the posterior distribution and create a bpc object.

To see a summary of the results we can run the summary function. Here we get three tables:

summary(m)
#> Estimated baseline parameters with HPD intervals:
#> 
#> 
#> Parameter               Mean   HPD_lower   HPD_higher    n_eff   Rhat
#> --------------------  ------  ----------  -----------  -------  -----
#> lambda[Seles]           0.51       -2.37         3.34   663.50   1.01
#> lambda[Graf]            0.94       -1.85         3.71   622.34   1.01
#> lambda[Sabatini]       -0.33       -3.12         2.50   662.40   1.01
#> lambda[Navratilova]     0.04       -2.87         2.91   637.48   1.01
#> lambda[Sanchez]        -1.12       -4.04         1.64   670.86   1.01
#> NOTES:
#> * A higher lambda indicates a higher team ability
#> 
#> 
#> Posterior probabilities:
#> These probabilities are calculated from the predictive posterior distribution
#> for all player combinations
#> 
#> 
#> i             j              i_beats_j
#> ------------  ------------  ----------
#> Graf          Navratilova         0.70
#> Graf          Sabatini            0.77
#> Graf          Sanchez             0.86
#> Graf          Seles               0.60
#> Navratilova   Sabatini            0.58
#> Navratilova   Sanchez             0.74
#> Navratilova   Seles               0.41
#> Sabatini      Sanchez             0.66
#> Sabatini      Seles               0.32
#> Sanchez       Seles               0.20
#> 
#> 
#> Rank of the players' abilities:
#> The rank is based on the posterior rank distribution of the lambda parameter
#> 
#> 
#> Parameter              MedianRank   MeanRank   StdRank
#> --------------------  -----------  ---------  --------
#> lambda[Graf]                    1       1.42      0.64
#> lambda[Seles]                   2       2.08      0.88
#> lambda[Navratilova]             3       3.04      0.91
#> lambda[Sabatini]                4       3.65      0.83
#> lambda[Sanchez]                 5       4.81      0.48

Features of the bpcs package

Models available

Roadmap

Goals for bpcs 1.1.0 (Before June 2021)

Vignettes

This package provides a series of small and self contained vignettes that exemplify the use of each model. In the vignettes, we also provide examples of code for data transformation, tables and plots.

Contributing and bugs

If you are interested you are welcome to contribute to the repository through pull requests.

Icon credits

References

Agresti, Alan. 2003. Categorical Data Analysis. Vol. 482. John Wiley & Sons.

Böckenholt, Ulf. 2001. “Hierarchical Modeling of Paired Comparison Data.” Psychological Methods 6 (1): 49.

Bradley, Ralph Allan, and Milton E Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39 (3/4): 324–45.

Davidson, Roger R. 1970. “On Extending the Bradley-Terry Model to Accommodate Ties in Paired Comparison Experiments.” Journal of the American Statistical Association 65 (329): 317–28.

Davidson, Roger R, and Robert J Beaver. 1977. “On Extending the Bradley-Terry Model to Incorporate Within-Pair Order Effects.” Biometrics, 693–702.

Springall, A. 1973. “Response Surface Fitting Using a Generalization of the Bradley-Terry Paired Comparison Model.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 22 (1): 59–68.

Stan Development Team. 2020. “RStan: The R Interface to Stan.” https://mc-stan.org/.

bpcs - A package for Bayesian Paired Comparison analysis with Stan