charlatan
is a wee bit complex. This vignette aims to help you contribute to the package. For a general introduction on contributing to rOpenSci packages see our Contributing guide.
Let’s start with some definitions.
For the purposes of this package:
charlatan
. For example, we have providers for phone numbers, addresses and people’s names. Adding a provider may involve a single file, more than one file; and a single R6 class or many R6 classes.en-US
, en-GB
). Some fakers won’t have any locales, whereas others can have many.If you aren’t familiar with R6, have a look at the R6 website, in particular the introductory vignette.
Open an issue if you want to add a new provider or locale to an existing provider; it helps make sure there’s no duplicated effort and we can help make sure you have the knowledge you need.
Providers are generally first created by making an R6 class. Let’s start with a heavily simplified base R6 class that defines some utility methods. We call it BaseProvider
in charlatan
, but here we’ll call it MyBaseProvider
to avoid confusion.
library(R6)
MyBaseProvider <- R6::R6Class(
'MyBaseProvider',
public = list(
random_element = function(x) {
if (length(x) == 0) return('')
if (inherits(x, "character")) if (!any(nzchar(x))) return('')
x[sample.int(n = length(x), size = 1)]
},
random_int = function(min = 0, max = 9999, size = 1) {
stopifnot(max >= min)
num <- max - min + 1
sample.int(n = num, size = size, replace = TRUE) + (min - 1)
}
)
)
If you don’t need to handle locales it becomes simpler:
FooBar <- R6::R6Class(
'FooBar',
inherit = charlatan::BaseProvider,
public = list(
integer = function(n = 1, min = 1, max = 1000) {
super$random_int(min, max, n)
}
)
)
We can create an instance of the FooBar
class by calling $new()
on it. It only has one method integer()
, which we can call to get a random integer.
x <- FooBar$new()
x
#> <FooBar>
#> Inherits from: <BaseProvider>
#> Public:
#> bothify: function (text = "## ??")
#> check_locale: function (x)
#> clone: function (deep = FALSE)
#> integer: function (n = 1, min = 1, max = 1000)
#> lexify: function (text = "????")
#> numerify: function (text = "###")
#> random_digit: function ()
#> random_digit_not_zero: function ()
#> random_digit_not_zero_or_empty: function ()
#> random_digit_or_empty: function ()
#> random_element: function (x)
#> random_element_prob: function (x)
#> random_int: function (min = 0, max = 9999, size = 1)
#> random_letter: function ()
#> randomize_nb_elements: function (number = 10, le = FALSE, ge = FALSE, min = NULL, max = NULL)
x$integer()
#> [1] 40
If your provider will need to handle different locales, it gets a bit more complex. In the Python library faker from which this package draws inspiration, you can create separate folders for each provider within the Python library.
However, R doesn’t allow this, so instead we categorize different locales for each provider within the file names. For example, for the address provider we have files in the package:
Where the latter two provides specific data for each locale, and the first file has the AddressProvider
class that pulls in the locale specific data.
Here, we’ll create a very simplified AddressProvider
class using an example locale file.
library(charlatan)
file <- system.file("examples", "address-provider-en_US.R", package = "charlatan")
source(file)
MyAddressProvider <- R6::R6Class(
inherit = MyBaseProvider,
'MyAddressProvider',
lock_objects = FALSE,
public = list(
locale = NULL,
city_suffixes = NULL,
initialize = function() {
self$locale <- 'en_us'
self$city_suffixes <-
eval(parse(text = paste0("city_suffixes_", self$locale)))
},
city_suffix = function() {
super$random_element(self$city_suffixes)
}
)
)
We can create an instance of the MyAddressProvider
class by calling $new()
on it. It only has one method city_suffix()
, which we can call to get a random city suffix.
x <- MyAddressProvider$new()
x
#> <MyAddressProvider>
#> Inherits from: <MyBaseProvider>
#> Public:
#> city_suffix: function ()
#> city_suffixes: town ton land ville berg burgh borough bury view port mo ...
#> clone: function (deep = FALSE)
#> initialize: function ()
#> locale: en_us
#> random_element: function (x)
#> random_int: function (min = 0, max = 9999, size = 1)
x$city_suffix()
#> [1] "bury"
When you want to add a new locale to an existing provider, look in the R/
folder of the package and the locales that are available are in the file names.
Pick one of the locale files for the provider you’re extending, make a duplicate of it and rename the file with your new locale. Then modify the duplicate, copying the format but putting in place the appropriate information for the new locale.
Where the data comes from for the new locale may vary. One easy way to start may be porting over locales in the faker Python library that are not yet in charlatan
.
If it’s a locale for which you can’t easily port over from another library, you need to get the data from a variety of sources. There are some R based packages that should help:
Keep in mind when using data to look at their license, if any, and any implications with respect to whether it can be used in this package.
It’s a little tricky how this is done. In the initialize()
block of each main provider file (e.g., address-provider.R
) we pull in the appropriate locale specific data based on the user input locale. For example, here’s an abbreviated initialize
block from the AddressProvider
:
initialize = function(locale = NULL) {
if (!is.null(locale)) {
# check global locales
super$check_locale(locale)
# check address provider locales
check_locale_(locale, address_provider_locales)
self$locale <- locale
} else {
self$locale <- 'en_US'
}
self$city_prefixes <- parse_eval("city_prefixes_", self$locale)
}
A few things to note:
en_US
parse_eval()
to pull in the data. Essentially, parse_eval()
makes the string city_prefixes_en_US
, then finds that in the package environment and eval()
’s it to bring the data into the R6 object in the city_prefixes
slot. We repeat this for each data type. The result is the user initialized class with locale specific data.