If you write an R package that wraps one or more Python packages,
it’s likely that you’ll be importing Python modules within the
.onLoad
method of your package so that you can have
convenient access to them within the rest of the package source
code.
When you do this, you should use the delay_load
flag to
the import()
function, for example:
# global reference to scipy (will be initialized in .onLoad)
<- NULL
scipy
<- function(libname, pkgname) {
.onLoad # use superassignment to update global reference to scipy
<<- reticulate::import("scipy", delay_load = TRUE)
scipy }
Using the delay_load
flag has two important
benefits:
It allows you to successfully load your package even when Python / Python packages are not installed on the target system (this is particularly important when testing on CRAN build machines).
It allows users to specify a desired location for Python before interacting with your package. For example:
library(mypackage)
::use_virtualenv("~/pythonenvs/userenv")
reticulate# call functions from mypackage
Without the delay_load
, Python would be loaded
immediately and the user’s call to use_virtualenv
would
have no effect.
Your R package likely depends on the installation of one or more Python packages. As a convenience to your users, you may want to provide a high-level R function to allow users to install these Python packages. It’s furthermore beneficial if multiple R packages that depend on Python packages install their dependencies in the same Python environment (so that they can be easily used together).
The py_install()
function provides a high-level
interface for installing one or more Python packages. The packages will
by default be installed within a virtualenv or Conda environment named
“r-reticulate”. For example:
library(reticulate)
py_install("scipy")
You can document the use of this function along with your package or
alternatively provide a wrapper function for py_install()
.
For example:
<- function(method = "auto", conda = "auto") {
install_scipy ::py_install("scipy", method = method, conda = conda)
reticulate }
While reticulate is capable of binding to any Python environment available on a system, it’s much more straightforward for users if there is a common environment used by R packages with convenient high-level functions provided for installation. We therefore strongly recommend that R package developers use the approach described here.
If you use reticulate in another R package you need to account for the fact that when your package is submitted to CRAN, the CRAN test servers may not have Python, NumPy, or whatever other Python modules you are wrapping in your package. If you don’t do this then your package may fail to load and/or pass its tests when run on CRAN.
There are two things you should do to ensure your package is well behaved on CRAN:
Use the delay_load
option (as described above) to
ensure that the module (and Python) is loaded only on its first use. For
example:
# python 'scipy' module I want to use in my package
<- NULL
scipy
<- function(libname, pkgname) {
.onLoad # delay load foo module (will only be loaded when accessed via $)
<<- import("scipy", delay_load = TRUE)
scipy }
When writing tests, check to see if your module is available and if it isn’t then skip the test. For example, if you are using the testthat package, you might do this:
# helper function to skip tests if we don't have the 'foo' module
<- function() {
skip_if_no_scipy <- py_module_available("scipy")
have_scipy if (!have_scipy)
skip("scipy not available for testing")
}
# then call this function from all of your tests
test_that("Things work as expected", {
skip_if_no_scipy()
# test code here...
})
Python objects exposed by reticulate carry their
Python classes into R, so it’s possible to write S3 methods to customize
e.g. the str
or print
behavior for a given
class (note that it’s not typically necessary that you do this since the
default str
and print
methods call
PyObject_Str
, which typically provides an acceptable
default behavior).
If you do decide to implement custom S3 methods for a Python class
it’s important to keep in mind that when an R session ends the
connection to Python objects is lost, so when the .RData saved from one
R session is restored in a subsequent R session the Python objects are
effectively lost (technically they become NULL
R
externalptr
objects).
By default when you attempt to interact with a Python object from a
previous session (a NULL
R externalptr
) an
error is thrown. If you want to do something more customized in your S3
method you can use the py_is_null_xptr()
function. For
example:
<- function(x, y, ...) {
method.MyModule.MyPythonClass if (py_is_null_xptr(x))
# whatever is appropriate
else
# interact with the object
}
Note that this check isn’t required, as by default an R error will
occur. If it’s desirable to avoid this error for any reason then you can
use py_is_null_xptr()
to do so.
The reticulate package exports a py_str
generic method which is called from the str
method only
after doing appropriate validation (if the object is NULL then
<pointer: 0x0>
is returned). You can implement the
py_str
method as follows:
#' @importFrom reticulate py_str
#' @export
<- function(object, ...) {
py_str.MyModule.MyPythonClass # interact with the object to generate the string
}
The print
and summary
methods for Python
objects both call the str
method by default, so if you
implement py_str()
you will automatically inherit
implementations for those methods.
reticulate provides the generics
r_to_py()
for converting R objects into Python objects, and
py_to_r()
for converting Python objects back into R
objects. Package authors can provide methods for these generics to
convert Python and R objects otherwise not handled by
reticulate.
reticulate provides conversion operators for some of the most commonly used Python objects, including:
Index
, Series
,
DataFrame
),datetime
objects.If you see that reticulate is missing support for conversion of one or more objects from these packages, please let us know and we’ll try to implement the missing converter. For Python packages not in this set, you can provide conversion operators in your own extension package.
r_to_py()
methodsr_to_py()
accepts a convert
argument, which
controls how objects generated from the created Python object are
converted. To illustrate, consider the difference between these two
cases:
library(reticulate)
# [convert = TRUE] => convert Python objects to R when appropriate
<- import("sys", convert = TRUE)
sys class(sys$path)
# [1] "character"
# [convert = FALSE] => always return Python objects
<- import("sys", convert = FALSE)
sys class(sys$path)
# [1] "python.builtin.list" "python.builtin.object"
This is accomplished through the use of a convert
flag,
which is set on the Python object wrappers used by
reticulate
. Therefore, if you’re writing a method
r_to_py.foo()
for an object of class foo
, you
should take care to preserve the convert
flag on the
generated object. This is typically done by:
Passing convert
along to the appropriate lower-level
r_to_py()
method;
Explicitly setting the convert
attribute on the
returned Python object.
As an example of the second:
# suppose 'make_python_object()' creates a Python object
# from R objects of class 'my_r_object'.
<- function(x, convert) {
r_to_py.my_r_object <- make_python_object(x)
object assign("convert", convert, envir = object)
object }
Travis-CI is a commonly used
platform for continuous integration and testing of R packages. Making it
work with reticulate is pretty simple - all you need to
do is add a before_install
section to a standard R
.travis.yml
file that asks Travis to guarantee the testing
machine has numpy
(which reticulate
depends on) and any Python modules you’re interacting with that don’t
ship with the language itself:
before_install:
- pip install numpy any_other_dependencies go_here