The package zoolog includes functions and reference data to generate and manipulate log-ratios (also known as log size index (LSI) values) from measurements obtained on zooarchaeological material. Log ratios are used to compare the relative (rather than the absolute) dimensions of animals from archaeological contexts (Meadow 1999). Essentially, the method compares archaeological measurements to a standard, producing a value that indicates how much larger or smaller the archaeological specimen is compared to that standard. zoolog is also able to seamlessly integrate data and references with heterogeneous nomenclature, which is internally managed by a zoolog thesaurus.
The methods included in the package were first developed in the framework of the ERC-Starting Grant 716298 ZooMWest (PI S. Valenzuela-Lamas), and were first used in the paper (Trentacoste, Nieto-Espinet, and Valenzuela-Lamas 2018). They are based on the techniques proposed by Simpson (1941) and Simpson, Roe, and Lewontin (1960), which calculates log size index (LSI) values as: \[ \mbox{LSI} = \log_{10} x - \log_{10} x_{\text{ref}} = \log_{10}(x/x_\text{ref}), \] where \(x\) is the considered measure value and \(x_\text{ref}\) is the corresponding reference value. Observe that LSI is defined using logarithms with base 10.
Several different sets of standard reference values are included in the package. These standards include several published and widely used biometric datasets (e.g. Davis (1996), Albarella and Payne (2005)) as well as other less known standards. These references, as well as the data example provided with the package, are based on the measures and measure abbreviations defined in Von den Driesch (1976) and Davis (1992).
In general, zooarchaeological datasets are composed of skeletal remains representing many different anatomical body parts. In investigation of animal size, the analysis of measurements from a given anatomical element provides the best control for the variables affecting size and shape and, as such, it is the preferable option. Unfortunately, this approach is not always viable due to low sample sizes in some archaeological assemblages. This problem can be mitigated by calculating the LSI values for measurements with respect to a reference, which provides a means of aggregating biometric information from different body parts. The resulting log ratios can be compared and statistically analysed under reasonable conditions (Albarella 2002). However, length, width, and depth measurements of the anatomical elements still should not be directly aggregated for statistical analysis.
The package includes a zoolog thesaurus to facilitate its usage by research teams across the globe, and working in different languages and with different recording traditions. The thesaurus enables the zoolog package to recognises many different names for taxa and skeletal elements (e.g. “Bos taurus”, “cattle”, “BT”, “bovino”, “bota”). Consequently, there is no need to use a particular, standardised recording code for the names of different taxa or elements.
The package also includes a zoolog taxonomy to facilitate the management of cases recorded as identified at different taxonomic ranks (Species, Genus, Tribe) and their match with the corresponding references.
We are particularly grateful to Sabine Deschler-Erb and Barbara Stopp, from the University of Basel (Switzerland) for making the reference values of several specimens available through the ICAZ Roman Period Working Group, which have been included here with their permission. We also thank Francesca Slim and Dimitris Filioglou from the University of Groningen, Claudia Minniti from University of Salento, Sierra Harding and Nimrod Marom from the University of Haifa, Carly Ameen and Helene Benkert from the University of Exeter, and Mikolaj Lisowski from the University of York for providing additional reference sets. Allowen Evin (CNRS-ISEM Montpellier) saw potential pitfalls in the use of Davis’ references for sheep, which have been now solved.
The thesaurus has benefited from the contributions from Moussab Albesso, Canan Çakirlar, Jwana Chahoud, Jacopo De Grossi Mazzorin, Sabine Deschler-Erb, Dimitrios Filioglou, Armelle Gardeisen, Sierra Harding, Pilar Iborra, Michael MacKinnon, Nimrod Marom, Claudia Minniti, Francesca Slim, Barbara Stopp, and Emmanuelle Vila.
We are grateful to all of them for their contributions, comments, and help. In addition, users are encouraged to contribute to the thesaurus and other references so that zoolog can be expanded and adapted to any database.
You can install the released version of zoolog from CRAN with:
And the development version from GitHub with:
The package zoolog includes several osteometrical references. Currently, the references include reference values for the main domesticates and their agriotypes (Bos, Ovis, Capra, Sus), red deer (Cervus elaphus), fallow deer (Dama mesopotamica), gazelle (Gazella gazella), donkey (Equus asinus), horse (Equus caballus) and rabbit (Oryctolagus cuniculus). These are drawn from a variety of publications and resources (see below). In addition, the user can consider other references, or the provided references can be extended and updated integrating newer research data. Submission of extended/improved references is encouraged. Please, contact the maintainer through the provided email address to make the new reference fully accessible within the package.
These references, as well as the data example provided with the package, are based on the measurements and measure abbreviations defined in Von den Driesch (1976) and Davis (1992). Please, note that Davis’ standard SD, equivalent to Von den Driesch’s DD, has been denoted as Davis.SD in order to resolve its incompatibility with Von den Driesch’s SD. This affects only Davis’ sheep reference and was introduced in Release 1.0.0.
The predefined reference sets included in zoolog are provided in the named list reference
, currently comprising the following 4 sets:
library(zoolog)
str(reference, max.level = 1)
#> List of 4
#> $ NietoDavisAlbarella:'data.frame': 133 obs. of 4 variables:
#> $ Basel :'data.frame': 176 obs. of 4 variables:
#> $ Combi :'data.frame': 558 obs. of 4 variables:
#> $ Groningen :'data.frame': 246 obs. of 4 variables:
The reference set reference$Combi
is the default reference for computing the log ratios, since it includes the most comprehensive reference for each species.
The package also includes a referencesDatabase
collecting the taxon-specific reference standards from all the considered resources. Each reference set is composed of a different combination of taxon-specific standards selected from this referencesDatabase
. The selection is defined by the data frame referenceSets
:
Bos taurus | Bos primigenius | Ovis aries | Ovis orientalis | Capra hircus | Capra aegagrus | Sus domesticus | Sus scrofa | Cervus elaphus | Dama mesopotamica | Gazella gazella | Equus asinus | Equus caballus | Oryctolagus cuniculus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NietoDavisAlbarella | Nieto | Davis | Albarella | |||||||||||
Basel | Basel | Basel | Basel | Basel | Basel | |||||||||
Combi | Nieto | Clutton | Clutton | Basel | Basel | Haifa | Haifa | Haifa | Johnstone | Nottingham | ||||
Groningen | Degerbol | Uerpmann | Uerpmann | Hongo |
A detailed description of the reference data, including its structure, properties, and considered resources can be found in the ReferencesDatabase help page.
A thesaurus set is defined in order to make the package compatible with the different recording conventions and languages used by authors of zooarchaeological datasets. This enables the function LogRatios to match values in the user’s dataset with the corresponding ones in the reference standard, regardless of differences in nomenclature or naming conventions, as long as both terms are included in the relevant thesaurus. The thesaurus also allows the user to standardize the nomenclature of the dataset if desired.
The user can also use other thesaurus sets or modify the provided one. In this latter case, we encourage the user to contact the maintainer at the provided email address so that the additions can be incorporated into the new versions of the package.
Currently, the zoolog thesaurus set includes four thesauri:
A taxonomy for the most typical zooarchaeological taxa has been introduced in the zoolog major release 1.0.0. The taxonomic hierarchy is structured up to the family level:
Species | Genus | Tribe | Subfamily | Family |
---|---|---|---|---|
Bos taurus | Bos | Bovini | Bovinae | Bovidae |
Bos primigenius | Bos | Bovini | Bovinae | Bovidae |
Ovis aries | Ovis | Caprini | Caprinae | Bovidae |
Ovis orientalis | Ovis | Caprini | Caprinae | Bovidae |
Capra hircus | Capra | Caprini | Caprinae | Bovidae |
Capra aegagrus | Capra | Caprini | Caprinae | Bovidae |
Gazella gazella | Gazella | Antilopini | Antilopinae | Bovidae |
Sus domesticus | Sus | Suini | Suinae | Suidae |
Sus scrofa | Sus | Suini | Suinae | Suidae |
Cervus elaphus | Cervus | Cervini | Cervinae | Cervidae |
Dama mesopotamica | Dama | Cervini | Cervinae | Cervidae |
Equus asinus | Equus | Equini | Equinae | Equidae |
Equus caballus | Equus | Equini | Equinae | Equidae |
Oryctolagus cuniculus | Oryctolagus | Leporidae |
This taxonomy is intended to facilitate the management of cases recorded as identified at different taxonomic ranks and their match with the corresponding references.
The taxonomy is integrated in the function LogRatios, enabling it to automatically match different species in data and reference that are under the same genus. For instance, data of Bos taurus can be matched with reference of Bos primigenius, since both are Bos. It also enables the function LogRatios
to detect when a case taxon has been only partially identified and recorded at a higher taxonomic rank, such as tribe or family, and to suggest the user the set of possible reference species.
Besides, it is complemented with a series of functions enabling the user to query for the subtaxonomy or the set of species under a queried taxon at any taxonomic rank.
The full list of functions is available under the zoolog help page. We list them here sorted by their prominence for a typical user, and grouped by functionality:
reference$Combi
is used. The function includes the option ‘joinCategories’ allowing several taxa (typically Ovis, Capra, and unknown Ovis/Capra) to be considered together with the same reference taxon.
reference$NietoDavisAlbarella
, log ratios for goats will not be calculated unless ‘joinCategories’ is set to indicate that the Ovis aries standard should also be applied to goats.
CondenseLogs
employs the following by-default group and prioritization introduced in Trentacoste, Nieto-Espinet, and Valenzuela-Lamas (2018): Length considers in order of priority GL, GLl, GLm, and HTC. Width considers in order of priority BT, Bd, Bp, SD, Bfd, and Bfp. Depth considers in order of priority Dd, DD, BG, and Dp. This order maximises the robustness and reliability of the measurements, as priority is given to the most abundant, more replicable, and less age dependent measurements. But users can set their own features with any group of measures and priorities. The method “average” extracts the mean per group, ignoring the non-available log ratios.
InCategory
checks for equality assuming the equivalencies defined in the given thesaurus. It is intended for the user to easily select a subset of data without having to standardize the analysed dataset.
This includes two functions enabling the user to map data with heterogeneous nomenclature into a standard one as defined in a thesaurus:
StandardizeNomenclature
standardizes a character vector according to a given thesaurus.StandardizeDataSet
standardizes column names and values of a data frame according to a thesaurus set.This includes three functions enabling the user to query for some information on the subtaxonomy under a queried taxon at any taxonomic rank:
Subtaxonomy
provides the subtaxonomy dataframe collecting the species (rows) included in the queried taxon, and the taxonomic ranks (columns) up to its level.SubtaxonomySet
provides the set (character vector without repetions) of taxa, at any taxonomic rank, under the queried taxon.GetSpeciesIn
provides the set of species included in the queried taxon.By default, the subtaxonomy information is extracted from the zoolog taxonomy.
referencesDataSet
or in any other provided by the user.
ReadThesaurus
and WriteThesaurus
) and a thesaurus set (ReadThesaurusSet
and WriteThesaurusSet
).
This includes functions to modify and check thesauri:
NewThesaurus
generates an empty thesarus.AddToThesaurus
adds new names and categories to an existing thesaurus.RemoveRepeatedNames
cleans a thesarus from any repeated names on any category.ThesaurusAmbiguity
checks if there are names included in more than one category in a thesaurus.The following examples are designed to be read and run sequentially. They represent a possible pipeline, meaningful for the processing and analysis of a dataset. Only occasionally, a small diversion is included to illustrate some alternatives.
This example reads a dataset from a file in csv format and computes the log-ratios. Then, the cases with no available log-ratios are removed. Finally, the resulting dataset is saved in a file in csv format.
The first step is to set the local path to the folder where you have the dataset to be analysed (this is typically a comma-separated value (csv) file). Here the example dataset from Valenzuela-Lamas (2008) included in the package is used:
library(zoolog)
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
package = "zoolog")
data = read.csv2(dataFile,
quote = "\"", header = TRUE, na.strings = "",
encoding = "UTF-8")
knitr::kable(head(data)[, -c(6:20,32:64)])
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 4918 | 10364 | bota | 1 fal | 54.0 | 31.3 | 30.6 | 28.1 | 26.3 | 27.5 | 20.0 | ||||
ALP | 4919 | 10364 | bota | 1 fal | 54.5 | 27.9 | 31.8 | 26.0 | 22.8 | 25.3 | 19.5 | ||||
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | |||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | |||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | ||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 |
To enhance the visibility, we have shown only the most relevant columns.
We now calculate the log-ratios using the function LogRatios
. Only measurements that have an associated standard will be included in this calculation. The log values will appear as new columns with the prefix ‘log’ following the original columns with the raw measurements:
dataWithLog <- LogRatios(data)
#> Warning: Reference for Sus scrofa used for cases of Sus domesticus.
#> Reference for Sus scrofa used for cases of Sus.
#> Reference for Oryctolagus cuniculus used for cases of Oryctolagus.
#> Reference for Ovis aries used for cases of Ovis.
#> Set useGenusIfUnambiguous to FALSE if this behaviour is not desired.
#> Warning: Data includes some cases recorded as
#> * Equus (which is a Genus)
#> for which the reference for Equus asinus or Equus caballus could be used.
#> * Caprini (which is a Tribe)
#> for which the reference for Ovis aries or Capra hircus could be used.
#> Set joinCategories as appropriate if you want to use any of them.
knitr::kable(head(dataWithLog)[, -c(6:20,32:64)])
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 4918 | 10364 | bota | 1 fal | 54.0 | 31.3 | 30.6 | 28.1 | 26.3 | 27.5 | 20.0 | |||||||||||||||||||||||||||
ALP | 4919 | 10364 | bota | 1 fal | 54.5 | 27.9 | 31.8 | 26.0 | 22.8 | 25.3 | 19.5 | |||||||||||||||||||||||||||
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | ||||||||||||||||||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | ||||||||||||||||||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.4016796 | -0.2116296 | -0.1513050 | -0.0678788 | |||||||||||||||||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.1740822 | -0.0828273 |
Observe that LogRatios
has output two warnings above. The first one informs the user that the data includes some cases of recorded taxa that did not match any species in the reference, but was matched by genus. In many cases this can be the behaviour expected by the user. For instance, using the reference of Sus scrofa for computing the log ratios of cases of Sus domesticus, can be acceptable if no better reference is available or if both (sub)species are being compared. The warning also includes cases recorded by genus, a recording practice that is understandable when working only with one species in the genus, such as Oryctolagus or Ovis above. However, the warning also informs the user that this matching by genus can be suppressed by setting the parameter useGenusIfUnambiguous = FALSE
. This can be the case if the user wants to exclude from the analysis the cases with not matching species. Besides, this warning could make the user realize that the wrong reference was set by mistake.
The second warning informs the user that the data includes some cases recorded with a taxon not specifying a species or genus, but at a higher taxonomical rank (for instance undecided ovis/capra (equivalent to tribe Caprini)). It also informs of the cases recorded by the genus but for which the reference includes more than one species (as happens with Equus above). In addition, the user is remembered of the option to use the parameter joinCategories
to indicate which reference species should be used for them.
These relationships between taxonomical categories is possible thanks to the taxonomy included in the package. The user can include their own taxonomy if desired.
In the following examples, these warnings are suppressed unless some particular message is of interest for them.
If we observe the example dataset more carefully, we can see that the measures recorded for the astragali presents a deviation from the measure definitions in Von den Driesch (1976) and Davis (1992). To see this, we can select the cases where the element is an astragalus. The function InCategory allows us to select them with the help of the thesaurus without requiring to know the terms actually used:
AScases <- InCategory(dataWithLog$Os, "astragalus", zoologThesaurus$element)
knitr::kable(head(dataWithLog[AScases, -c(6:20,32:64)]))
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7503 | ALP | 4410 | 8001 | bota | talus | 58.4 | 31.5 | 31.4 | -0.0927541 | ||||||||||||||||||||||||||||||
7504 | ALP | 1181 | 7011 | bota | talus | 57.7 | 35.7 | 32.8 | -0.0383964 | ||||||||||||||||||||||||||||||
7505 | ALP | 1180 | 7011 | bota | talus | 58.8 | |||||||||||||||||||||||||||||||||
7506 | ALP | 1136 | 7039 | bota | talus | 58.9 | 39.3 | 34.4 | 0.0033279 | ||||||||||||||||||||||||||||||
7507 | ALP | 1678 | 7092 | bota | talus | 59.4 | 39.8 | 34.9 | 0.0088185 | ||||||||||||||||||||||||||||||
7508 | TFC | 617 | 1061 | bota | talus | 56.6 | 34.0 | 32.3 | -0.0595857 |
For the involved taxa, according to the measure definitions, astragali should have no GL measurement, but GLl. However, in the example dataset the GLl measurements have been recorded merged in the GL column. This is a data-entry simplification that is used by some researchers. It is possible because GLl is only relevant for the astragalus, while GL is not applicable to it. Thus, there cannot be any ambiguity between both measures since they can be identified by the bone element. However, since the zoolog reference uses the proper measure name for each bone element (GLl for the astragalus), the reference measure has not been correctly identified. Consequently, the log ratio logGL has NA
values and the column logGLl does not exists.
The same effect happens for the measure GLpe, only relevant for the phalanges.
The optional parameter mergedMeasures facilitates the processing of this type of simplified datasets. For the example data, we can use
GLVariants <- list(c("GL", "GLl", "GLpe"))
dataWithLog <- LogRatios(data, mergedMeasures = GLVariants)
knitr::kable(head(dataWithLog[AScases, -c(6:20,32:64)]))
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7503 | ALP | 4410 | 8001 | bota | talus | 58.4 | 31.5 | 31.4 | -0.0259788 | -0.0927541 | |||||||||||||||||||||||||||||
7504 | ALP | 1181 | 7011 | bota | talus | 57.7 | 35.7 | 32.8 | -0.0312159 | -0.0383964 | |||||||||||||||||||||||||||||
7505 | ALP | 1180 | 7011 | bota | talus | 58.8 | -0.0230144 | ||||||||||||||||||||||||||||||||
7506 | ALP | 1136 | 7039 | bota | talus | 58.9 | 39.3 | 34.4 | -0.0222764 | 0.0033279 | |||||||||||||||||||||||||||||
7507 | ALP | 1678 | 7092 | bota | talus | 59.4 | 39.8 | 34.9 | -0.0186052 | 0.0088185 | |||||||||||||||||||||||||||||
7508 | TFC | 617 | 1061 | bota | talus | 56.6 | 34.0 | 32.3 | -0.0395753 | -0.0595857 |
This option allows us to automatically select, for each bone element, the corresponding measure present in the reference. Observe that now the log ratios have been computed and assigned to the column logGL.
We could be interested in obtaining the log ratios of all caprines, including Ovis aries, Capra hircus, and undetermined Ovis/Capra, with respect to the reference for Ovis aries. This can be set using the argument joinCategories
.
caprineCategory <- list(ovar = SubtaxonomySet("caprine"))
dataWithLog <- LogRatios(data, joinCategories = caprineCategory, mergedMeasures = GLVariants)
knitr::kable(head(dataWithLog)[, -c(6:20,32:64)])
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 4918 | 10364 | bota | 1 fal | 54.0 | 31.3 | 30.6 | 28.1 | 26.3 | 27.5 | 20.0 | |||||||||||||||||||||||||||
ALP | 4919 | 10364 | bota | 1 fal | 54.5 | 27.9 | 31.8 | 26.0 | 22.8 | 25.3 | 19.5 | |||||||||||||||||||||||||||
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | |||||||||||||||||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | |||||||||||||||||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | ||||||||||||||||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 |
The category to join can be manually defined, but here we have conveniently used the function Subtaxonomy
applied to the tribe Caprini:
SubtaxonomySet("caprine")
#> [1] "Ovis aries" "Ovis orientalis" "Capra hircus" "Capra aegagrus"
#> [5] "Ovis" "Capra" "Caprini"
Note that this option does not remove the distinction in the data between the different species, it just indicates that for these taxa the log ratios must be computed from the same reference (“ovar”).
The cases without log-ratios can be removed to facilitate subsequent analyses:
dataWithLogPruned=RemoveNACases(dataWithLog)
knitr::kable(head(dataWithLogPruned[, -c(6:20,32:64)]))
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | |||||||||||||||||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | |||||||||||||||||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | ||||||||||||||||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | ||||||||||||||||||||||||||||
ALP | 4085 | 10253 | cahi | hum | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | ||||||||||||||||||||||||||||
TFC | 24 | 407 | ceel | mc | 262.7 | 41.3 | 30.8 | 25.0 | 21.2 | 41.1 | 27.1 | -0.0335411 |
You may want to write the resulting file in the working directory (you need to set it first):
After calculating log ratios using the LogRatios
function, many rows in the resultant dataframe (dataWithLog in the example above) may contain multiple log values, i.e. you will have several log values associated with a particular archaeological specimen. When analysing log ratios, it is preferential to avoid over-representation of bones with a greater number of measurements and account for each specimen only once. The CondenseLogs
function extracts one length, one width, and one depth value from each row and places these in new Length, Width, and Depth columns. The ‘priority’ method described in Trentacoste, Nieto-Espinet, and Valenzuela-Lamas (2018) has been set as default. In this case, the default option has been used:
dataWithSummary <- CondenseLogs(dataWithLogPruned)
knitr::kable(head(dataWithSummary)[, -c(6:20,32:64,72:86)])
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logH | Length | Width | Depth |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | ||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | ||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | ||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | ||||||||||||||
ALP | 4085 | 10253 | cahi | hum | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | ||||||||||||||
TFC | 24 | 407 | ceel | mc | 262.7 | 41.3 | 30.8 | 25.0 | 21.2 | 41.1 | 27.1 | -0.0335411 | -0.0335411 |
Nevertheless, other options (e.g. average of all width log values for a given specimen) can be chosen if preferred.
The integration of the thesaurus functionality facilitates the use of datasets with heterogeneous nomenclatures, without further preprocessing. An extensive catalogue of names for equivalent categories has been integrated in the provided thesaurus set zoologThesaurus
. These equivalences are internally and silently managed without requiring any action from the user. However, it can be also interesting to explicitly standardize the data to make figures legible to a wider audience. This is especially useful when different nomenclature for the same concept is found in the same dataset, for instance “sheep” and “ovis” for the same taxon or “hum” and “HU” for the bone element.
If we standardize the studied data, we can see that zoologThesaurus
will change “ovar” to “Ovis aries”, “hum” to “humerus”, and “Especie” to “Taxon”, for instance.
dataStandardized <- StandardizeDataSet(dataWithSummary)
knitr::kable(head(dataStandardized)[, -c(6:20,32:64,72:86)])
Site | N.inv | UE | Taxon | Element | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLC | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logH | Length | Width | Depth |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 3453 | 10410 | Ovis aries | anterior first phalanx | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | ||||||||
ALP | 3455 | 10410 | Ovis aries | anterior first phalanx | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | ||||||||
ALP | 4245 | 7036 | Capra hircus | humerus | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | ||||||||||
ALP | 4674 | 10227 | Capra hircus | humerus | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | ||||||||||||||
ALP | 4085 | 10253 | Capra hircus | humerus | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | ||||||||||||||
TFC | 24 | 407 | Cervus elaphus | metacarpus | 262.7 | 41.3 | 30.8 | 25.0 | 21.2 | 41.1 | 27.1 | -0.0335411 | -0.0335411 |
We may be interested in selecting all caprine elements. This can be done even without standardizing the data using the function InCategory
:
dataOC <- subset(dataWithSummary, InCategory(Especie,
SubtaxonomySet("caprine"),
zoologThesaurus$taxon))
knitr::kable(head(dataOC)[, -c(6:20,32:64)])
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | Length | Width | Depth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | |||||||||||||||||||||||
2 | ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | |||||||||||||||||||||||
3 | ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | |||||||||||||||||||||||||
4 | ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | |||||||||||||||||||||||||||||
5 | ALP | 4085 | 10253 | cahi | hum | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | |||||||||||||||||||||||||||||
7 | ALP | 3524 | 10122 | ovar | 1fal ant | 27.7 | 9.2 | 11.5 | 7.5 | 9.4 | 8.1 | -0.0983500 | -0.1117591 | -0.1018666 | -0.1148333 | -0.1348773 | -0.0983500 | -0.1348773 | -0.1018666 |
Observe that no standardization is performed in the output subset. To standardize the subset data, StandardizeDataSet
can be applied either before or after the subsetting.
dataOCStandardized <- StandardizeDataSet(dataOC)
knitr::kable(head(dataOCStandardized)[, -c(6:20,32:64)])
Site | N.inv | UE | Taxon | Element | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLC | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | Length | Width | Depth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | ALP | 3453 | 10410 | Ovis aries | anterior first phalanx | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | |||||||||||||||||||||||
2 | ALP | 3455 | 10410 | Ovis aries | anterior first phalanx | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | |||||||||||||||||||||||
3 | ALP | 4245 | 7036 | Capra hircus | humerus | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | |||||||||||||||||||||||||
4 | ALP | 4674 | 10227 | Capra hircus | humerus | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | |||||||||||||||||||||||||||||
5 | ALP | 4085 | 10253 | Capra hircus | humerus | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | |||||||||||||||||||||||||||||
7 | ALP | 3524 | 10122 | Ovis aries | anterior first phalanx | 27.7 | 9.2 | 11.5 | 7.5 | 9.4 | 8.1 | -0.0983500 | -0.1117591 | -0.1018666 | -0.1148333 | -0.1348773 | -0.0983500 | -0.1348773 | -0.1018666 |
Observe also that the distinction between Ovis aries, Capra hircus, and Ovis/Capra has not been removed from the data.
If we were interested only in one summary measure, Width or Length, we could retain the cases including this measure:
dataOCWithWidth <- RemoveNACases(dataOCStandardized, measureNames = "Width")
dataOCWithLength <- RemoveNACases(dataOCStandardized, measureNames = "Length")
which gives respectively 237 (=nrow(dataOCWithWidth)
) and 149 (=nrow(dataOCWithLength)
) cases.
Condensed log values can be visualised as histograms and box plots using ggplot (Wickham 2011). Here we will look at some examples of plotting values from caprines.
For the example plots we will use the package ggplot2.
We can now create a boxplot for the widths:
ggplot(dataOCStandardized, aes(x = Site, y = Width)) +
geom_boxplot(outlier.shape = NA, na.rm = TRUE) +
geom_jitter(width = 0.2, height = 0, alpha = 1/2, color = 4, na.rm = TRUE) +
theme_bw() +
ggtitle("Caprine widths") +
ylab("Width log-ratio") +
coord_flip()
And another boxplot for the lengths:
We may choose to plot the width data as a histogram:
ggplot(dataOCStandardized, aes(Width)) +
geom_histogram(bins = 30, na.rm = TRUE) +
ggtitle("Caprine widths") +
xlab("Width log-ratio") +
facet_grid(Site ~.) +
theme_bw() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()) +
theme(plot.title = element_text(hjust = 0.5, size = 14),
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 10),
axis.text = element_text(size = 10) ) +
scale_y_continuous(breaks = c(0, 10, 20, 30))
Here we reorder the factor levels of dataOCStandardized$Taxon
to make the order of the boxplots more intuitive.
dataOCStandardized$Taxon <- factor(dataOCStandardized$Taxon,
levels = levels0[c(1,3,2)])
levels(dataOCStandardized$Taxon)
#> [1] "Ovis aries" "Caprini" "Capra hircus"
and assign specific colours for each category:
Ocolour <- c("#A2A475", "#D8B70A", "#81A88D")
ggplot(dataOCStandardized, aes(x=Site, y=Width)) +
geom_boxplot(aes(fill=Taxon),
notch = TRUE, alpha = 0, lwd = 0.377, outlier.alpha = 0,
width = 0.5, na.rm = TRUE,
position = position_dodge(0.75),
show.legend = FALSE) +
geom_point(aes(colour = Taxon, shape = Taxon),
alpha = 0.7, size = 0.8,
position = position_jitterdodge(jitter.width = 0.3),
na.rm = TRUE) +
scale_colour_manual(values=Ocolour) +
scale_shape_manual(values=c(15, 18, 16)) +
theme_bw(base_size = 8) +
ylab("LSI value") +
ggtitle("Sheep/goat LSI width values")
We may run a statistical test (here a Student t-test) to check whether the differences in log ratio length values between the sites “OLD” and “ALP” are statistically significant:
t.test(Length ~ Site, dataOCStandardized,
subset = Site %in% c("OLD", "ALP"),
na.action = "na.omit")
#>
#> Welch Two Sample t-test
#>
#> data: Length by Site
#> t = -12.23, df = 4.2233, p-value = 0.0001865
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -0.11262797 -0.07165242
#> sample estimates:
#> mean in group ALP mean in group OLD
#> -0.04589475 0.04624545
Similarly, for the differences in log ratio width values:
t.test(Width ~ Site, dataOCStandardized,
subset = Site %in% c("OLD", "ALP"),
na.action = "na.omit")
#>
#> Welch Two Sample t-test
#>
#> data: Width by Site
#> t = -3.0613, df = 8.8203, p-value = 0.01386
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -0.09749890 -0.01448767
#> sample estimates:
#> mean in group ALP mean in group OLD
#> -0.03922705 0.01676624
For testing all possible pairs of sites, the p-values must be adjusted for multiple comparisons:
library(stats)
pairwise.t.test(dataOCStandardized$Width, dataOCStandardized$Site,
pool.sd = FALSE)
#>
#> Pairwise comparisons using t tests with non-pooled SD
#>
#> data: dataOCStandardized$Width and dataOCStandardized$Site
#>
#> ALP OLD
#> OLD 0.042 -
#> TFC 0.450 0.051
#>
#> P value adjustment method: holm
We have invested a lot of time and effort in creating zoolog. Please cite the package if you publish an analysis or results obtained using zoolog. For example, “Log ratios calculation and analysis was done using the package zoolog (Pozo et al. 2022) in R 4.0.3 (R Core Team 2020).”
To get the details of the most recent version of the package, you can use the R citation function: citation("zoolog")
.
When publishing it is also recommended that the references for standards are properly cited, as well as any details on selecting feature values. The details on the source of each of the reference standards are given in the help page referencesDatabase. For instance, if using the zoolog reference$NietoDavisAlbarella
and the default selection method of features, a fair description may be: “Published references for cattle (Nieto-Espinet 2018), sheep/goat (Davis 1996) and pigs (Albarella and Payne 2005) were used as standards. One length and one width log ratio value from each specimen were included in the analysis, with values selected following the default zoolog ‘priority’ method (Trentacoste, Nieto-Espinet, and Valenzuela-Lamas 2018): length values - GL, GLl, GLm, HTC; width values - BT, Bd, Bp, SD, Bfd, Bfp.”
Albarella, Umberto. 2002. “’Size Matters’: How and Why Biometry Is Still Important in Zooarchaeology.” In Bones and the Man: Studies in Honour of Don Brothwell, edited by Keith Dobney and Terry O’Connor, 51–62. Oxbow Books, Oxford.
Albarella, Umberto, and Sebastian Payne. 2005. “Neolithic Pigs from Durrington Walls, Wiltshire, England: A Biometrical Database.” Journal of Archaeological Science 32 (4): 589–99.
Davis, Simon JM. 1996. “Measurements of a Group of Adult Female Shetland Sheep Skeletons from a Single Flock: A Baseline for Zooarchaeologists.” Journal of Archaeological Science 23 (4): 593–612.
Davis, SJ. 1992. “A Rapid Method for Recording Information About Mammal Bones from Archaeological Sites (Aml Report 19/92; English Heritage).” London.
Meadow, Richard H. 1999. “The Use of Size Index Scaling Techniques for Research on Archaeozoological Collections from the Middle East.” In Historia Animalium Ex Ossibus: Festschrift Für Angela von Den Driesch, edited by C Becker, H Manhart, J Peters, and J Schibler, 285–300. Verlag Marie Leidorf GmbH, Rahden.
Nieto-Espinet, A. 2018. “Element Measure Standard Biometrical Data from a Cow Dated to the Early Bronze Age (Minferri, Catalonia).” https://doi.org/10.13140/RG.2.2.13512.78081.
Pozo, Jose M, Angela Trentacoste, Ariadna Nieto-Espinet, Silvia Guimarães, and Silvia Valenzuela-Lamas. 2022. “Zoolog R Package: Zooarchaeological Analysis with Log-Ratios.” Quaternary International Submitted. https://josempozo.github.io/zoolog/.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Simpson, George Gaylord. 1941. “Large Pleistocene Felines of North America.” American Museum Novitates 1136: 1–27.
Simpson, GG, A Roe, and RC Lewontin. 1960. “Quantitative Zoology.,(Harcourt, Brace and Co.: New York.).”
Trentacoste, Angela, Ariadna Nieto-Espinet, and Silvia Valenzuela-Lamas. 2018. “Pre-Roman Improvements to Agricultural Production: Evidence from Livestock Husbandry in Late Prehistoric Italy.” PloS One 13 (12): e0208109. https://doi.org/10.1371/journal.pone.0208109.
Valenzuela-Lamas, Silvia. 2008. Alimentació I Ramaderia Al Penedès Durant La Protohistòria (Segles Vii-Iii aC). Societat Catalana d’Arqueologia (Premi d’Arqueologia - Memorial Josep Barberà i Farràs, 5a edició). http://www.scarqueologia.com/?page_id=10.
Von den Driesch, Angela. 1976. A Guide to the Measurement of Animal Bones from Archaeological Sites: As Developed by the Institut Für Palaeoanatomie, Domestikationsforschung Und Geschichte Der Tiermedizin of the University of Munich. Vol. 1. Peabody Museum Press.
Wickham, Hadley. 2011. “Ggplot2.” Wiley Interdisciplinary Reviews: Computational Statistics 3 (2): 180–85.