Package 'abcrf'

Title:	Approximate Bayesian Computation via Random Forests
Description:	Performs Approximate Bayesian Computation (ABC) model choice and parameter inference via random forests. Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert C. P. (2016) <doi:10.1093/bioinformatics/btv684>. Estoup A., Raynal L., Verdu P. and Marin J.-M. <http://journal-sfds.fr/article/view/709>. Raynal L., Marin J.-M., Pudlo P., Ribatet M., Robert C. P. and Estoup A. (2019) <doi:10.1093/bioinformatics/bty867>.
Authors:	Jean-Michel Marin [aut, cre], Louis Raynal [aut], Pierre Pudlo [aut], Christian P. Robert [ctb], Arnaud Estoup [ctb]
Maintainer:	Jean-Michel Marin <[email protected]>
License:	GPL (>= 2)
Version:	1.9
Built:	2025-03-09 04:41:31 UTC
Source:	https://github.com/jmm34/abcrf

Help Index

Create an ABC-RF object: a classification random forest from a reference table towards performing an ABC model choice
Predict posterior covariance between two parameters for new data using two reg-ABC-RF objects
Plot the posterior density given a new summary statistic
Calculate and plot for different numbers of tree, the out-of-bag errors associated with an ABC-RF object
Calculate and plot for different numbers of tree, the out-of-bag mean squared errors associated with a REG-ABC-RF object
Plot of an ABC-RF object
Plot of a reg-ABC-RF object
Predict and evaluate the posterior probability of the MAP for new data using an ABC-RF object
Predict posterior expectation, median, variance and quantiles given a new dataset using a reg-ABC-RF object
Predict out-of-bag posterior expectation, median, variance, quantiles and error measures using a reg-ABC-RF object
Read a reference table simulated from DIYABC
Create a reg-ABC-RF object: a regression random forest from a reference table aimed out predicting posterior expectation, variance and quantiles for a parameter
A simulated example in population genetics
Variable importance plot from a random forest

Create an ABC-RF object: a classification random forest from a reference table towards performing an ABC model choice

Description

abcrf constructs a random forest from a reference table towards performing an ABC model choice. Basically, the reference table (i.e. the dataset that will be treated with the present package) includes a column with the index of the models to be compared and additional columns corresponding to the values of the simulated summary statistics.

Usage

## S3 method for class 'formula'
abcrf(formula, data, group=list(), lda=TRUE, ntree=500, sampsize=min(1e5, nrow(data)),
paral=FALSE, ncores= if(paral) max(detectCores()-1,1) else 1, ...)
## S3 method for class 'formula'
abcrf(formula, data, group=list(), lda=TRUE, ntree=500, sampsize=min(1e5, nrow(data)),
paral=FALSE, ncores= if(paral) max(detectCores()-1,1) else 1, ...)

Arguments

`formula`	a formula: left of ~, variable representing the model index; right of ~, summary statistics of the reference table.
`data`	a data frame containing the reference table.
`group`	a list containing groups (at least 2) of model(s) on which the model choice will be performed. This is not necessarily a partition, one or more models can be excluded from the elements of the list and by default no grouping is done.
`lda`	should LDA scores be added to the list of summary statistics?
`ntree`	number of trees to grow in the forest, by default 500 trees.
`sampsize`	size of the sample from the reference table to grow a tree of the classification forest, by default the minimum between the number of elements of the reference table and 100,000.
`paral`	a boolean that indicates if the calculations of the classification random forest (forest used to assign a model to the observed dataset) should be parallelized.
`ncores`	the number of CPU cores to use. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`...`	additional arguments to be passed on to `ranger` used to construct the classification random forest that preditcs the selected model.

Value

An object of class abcrf, which is a list with the following components:

`call`	the original call to `abcrf`,
`lda`	a boolean indicating if LDA scores have been added to the list of summary statistics,
`formula`	the formula used to construct the classification random forest,
`group`	a list contining the groups of model(s) used. This list is empty if no grouping has been performed,
`model.rf`	an object of class `randomForest` containing the trained forest with the reference table,
`model.lda`	an object of class `lda` containing the Linear Discriminant Analysis based on the reference table,
`prior.err`	prior error rates of model selection on the reference table, estimated with the "out-of-bag" error of the forest.

References

Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert, C. P. (2016) Reliable ABC model choice via random forests Bioinformatics doi:10.1093/bioinformatics/btv684

Estoup A., Raynal L., Verdu P. and Marin J.-M. (2018) Model choice using Approximate Bayesian Computation and Random Forests: analyses based on model grouping to make inferences about the genetic history of Pygmy human populations Jounal de la Société Française de Statistique http://journal-sfds.fr/article/view/709

Examples

data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf1 <- abcrf(modindex~., data = data1, ntree=100)
model.rf1
model.rf2 <- abcrf(modindex~., data = data1, group = list(c("1","2"),"3"), ntree=100)
model.rf2
data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf1 <- abcrf(modindex~., data = data1, ntree=100)
model.rf1
model.rf2 <- abcrf(modindex~., data = data1, group = list(c("1","2"),"3"), ntree=100)
model.rf2

Predict posterior covariance between two parameters for new data using two reg-ABC-RF objects

Description

Using two reg-ABC-RF objects constructed on the same reference table for two different response variables, this function predicts the posterior covariance between those two response variables, given a new dataset of summaries.

Usage

## S3 method for class 'regAbcrf'
covRegAbcrf(regForest1, regForest2, obs, training1, training2,
ntree=500, mtry=max(floor((dim(training1)[2]-1)/3), 1),
sampsize=min(1e5, dim(training1)[1]), paral = FALSE, 
ncores = if(paral) max(detectCores()-1,1) else 1, paral.predict = FALSE,
ncores.predict = if(paral.predict) max(detectCores()-1,1) else 1, ... )
## S3 method for class 'regAbcrf'
covRegAbcrf(regForest1, regForest2, obs, training1, training2,
ntree=500, mtry=max(floor((dim(training1)[2]-1)/3), 1),
sampsize=min(1e5, dim(training1)[1]), paral = FALSE, 
ncores = if(paral) max(detectCores()-1,1) else 1, paral.predict = FALSE,
ncores.predict = if(paral.predict) max(detectCores()-1,1) else 1, ... )

Arguments

`regForest1`, `regForest2`	`regAbcrf` objects.
`obs`	a data frame containing the summary statistics of the observed data sets.
`training1`, `training2`	data frames containing the reference table respectively used to train the `regAbcrf` objects regForest1 and regForest2.
`ntree`	number of trees to grow in the forest, by default equal to 500 trees.
`mtry`	Number of variables to possibly split at in each node for the regression random forest. Default is the minimum between 1 and the number of variables divided by 3.
`sampsize`	size of the sample from the reference table used to grow a tree of the regression forest, by default the minimum between the number of elements of the reference table and 100,000.
`paral`	a boolean that indicates whether or not the calculations of the regression random forest (forest used to predict a response from the observed dataset) should be parallelized.
`ncores`	the number of CPU cores to use. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`paral.predict`	a boolean that indicates if random forests predictions should be parallelized.
`ncores.predict`	the number of CPU cores to use for the regression random forest predictions. If paral.predict=TRUE, it is used the number of CPU cores minus 1. If ncores.predict is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`...`	additional arguments to be passed on to `ranger` used to construct the regression random forest that predicts posterior covariance.

Value

covRegAbcrf returns predicted posterior covariances between response variables of two reg-ABC-RF objects, for a new data set.

References

Raynal L., Marin J.-M. Pudlo P., Ribatet M., Robert C. P. and Estoup, A. (2019) ABC random forests for Bayesian parameter inference Bioinformatics doi:10.1093/bioinformatics/bty867

Examples

data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
N1 <- snp$param$N1[modindex == "3"]
N1 <- N1[1:500]
data3 <- data.frame(N1, sumsta) 
model.rf.N1 <- regAbcrf(N1~., data3, ntree=100)
data(snp.obs)
covRegAbcrf(model.rf.r, model.rf.N1, snp.obs, 
            data2, data3, ntree=100)
data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
N1 <- snp$param$N1[modindex == "3"]
N1 <- N1[1:500]
data3 <- data.frame(N1, sumsta) 
model.rf.N1 <- regAbcrf(N1~., data3, ntree=100)
data(snp.obs)
covRegAbcrf(model.rf.r, model.rf.N1, snp.obs, 
            data2, data3, ntree=100)

Plot the posterior density given a new summary statistic

Description

Given a reg-ABC-RF object and a new value of the summary statistics, densityPlot gives the corresponding posterior density plot of the parameter, as well as the prior (in grey).

Usage

## S3 method for class 'regAbcrf'
densityPlot(object, obs, training,
add=TRUE, main="Posterior density", log="", xlim=NULL,
ylim=NULL, xlab=NULL, ylab=NULL, paral=FALSE,
ncores= if(paral) max(detectCores()-1,1) else 1, ...)
## S3 method for class 'regAbcrf'
densityPlot(object, obs, training,
add=TRUE, main="Posterior density", log="", xlim=NULL,
ylim=NULL, xlab=NULL, ylab=NULL, paral=FALSE,
ncores= if(paral) max(detectCores()-1,1) else 1, ...)

Arguments

`object`	a `regAbcrf` object.
`obs`	a data frame containing the summary statistics of the observed data sets.
`training`	the data frame containing the reference table used to train the `regAbcrf` object.
`add`	a boolean that indicates if the posterior distributions should be ploted on the same graph or not, when more than one observed summary statistics is given.
`main`	main title to be used for the posterior density plot.
`log`	a character string which contains "x" if the x axis is to be logarithmic, "y" if the y axis is to be logarithmic and "xy" or "yx" if both axes are to be logarithmic. The default value "" implies no logarithmic transfomation.
`xlim`	range of the abscissa.
`ylim`	range of the ordinate.
`xlab`	label of the abscissa.
`ylab`	label of the ordinate.
`paral`	a boolean that indicates if random forests predictions should be parallelized.
`ncores`	the number of CPU cores to use for the regression random forest predictions. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`...`	additional arguments to be passed on to `density`, as for example the smoothing bandwidth `bw` to be used.

References

Raynal L., Marin J.-M. Pudlo P., Ribatet M., Robert C. P. and Estoup, A. (2019) ABC random forests for Bayesian parameter inference Bioinformatics doi:10.1093/bioinformatics/bty867

Examples

data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
data(snp.obs)
densityPlot(model.rf.r, snp.obs, data2, ylab="density", main = "Posterior density of r")
data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
data(snp.obs)
densityPlot(model.rf.r, snp.obs, data2, ylab="density", main = "Posterior density of r")

Calculate and plot for different numbers of tree, the out-of-bag errors associated with an ABC-RF object

Description

err.abcrf returns out-of-bag errors and plot them.

Usage

err.abcrf(object, training, paral=FALSE,
ncores= if(paral) max(detectCores()-1,1) else 1)
err.abcrf(object, training, paral=FALSE,
ncores= if(paral) max(detectCores()-1,1) else 1)

Arguments

`object`	an `abcrf` object.
`training`	the data frame containing the reference table used to train the `abcrf` object.
`paral`	a boolean that indicates if random forests predictions should be parallelized.
`ncores`	the number of CPU cores to use for the random forest predictions. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.

Value

A matrix with 2 columns: the number of trees and the out-of-bag errors. Errors are computed from 40 trees to the total number.

References

Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert, C. P. (2016) Reliable ABC model choice via random forests Bioinformatics doi:10.1093/bioinformatics/btv684

Examples

data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
err.rf <- err.abcrf(model.rf, data1)
data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
err.rf <- err.abcrf(model.rf, data1)

Calculate and plot for different numbers of tree, the out-of-bag mean squared errors associated with a REG-ABC-RF object

Description

err.regAbcrf returns out-of-bag mean squared errors and plot them.

Usage

err.regAbcrf(object, training, paral=FALSE, 
ncores= if(paral) max(detectCores()-1,1) else 1, what="mean")
err.regAbcrf(object, training, paral=FALSE, 
ncores= if(paral) max(detectCores()-1,1) else 1, what="mean")

Arguments

`object`	a `regAbcrf` object.
`training`	the data frame containing the reference table used to train the `regAbcrf` object.
`paral`	a boolean that indicates if random forests predictions should be parallelized.
`ncores`	the number of CPU cores to use for the random forest predictions. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`what`	a string caracter "mean" or "median" indicating if the predictions are computed with mean or median of the response variable.

Value

A matrix with 2 columns: the number of trees and the out-of-bag mean squared errors. NAs might be returned if the number of trees is too low. Errors are computed from 40 trees to the total number.

References

Raynal L., Marin J.-M. Pudlo P., Ribatet M., Robert C. P. and Estoup, A. (2019) ABC random forests for Bayesian parameter inference Bioinformatics doi:10.1093/bioinformatics/bty867

Examples

  data(snp)
  modindex <- snp$modindex
  sumsta <- snp$sumsta[modindex == "3",]
  r <- snp$param$r[modindex == "3"]
  r <- r[1:500]
  sumsta <- sumsta[1:500,]
  data2 <- data.frame(r, sumsta)
  model.rf.r <- regAbcrf(r~., data2, ntree=100)
  err.regAbcrf(model.rf.r, data2)
data(snp)
  modindex <- snp$modindex
  sumsta <- snp$sumsta[modindex == "3",]
  r <- snp$param$r[modindex == "3"]
  r <- r[1:500]
  sumsta <- sumsta[1:500,]
  data2 <- data.frame(r, sumsta)
  model.rf.r <- regAbcrf(r~., data2, ntree=100)
  err.regAbcrf(model.rf.r, data2)

Plot of an ABC-RF object

Description

plot.abcrf provides both a variable importance plot of a model choice ABC-RF object and the projection of the reference table on the LDA axes.

Usage

## S3 method for class 'abcrf'
plot(x, training, obs=NULL, n.var=20, pdf=FALSE, xlim=NULL, ...)
## S3 method for class 'abcrf'
plot(x, training, obs=NULL, n.var=20, pdf=FALSE, xlim=NULL, ...)

Arguments

`x`	an `abcrf` object.
`training`	the data frame containing the reference table used to train the `abcrf` object.
`obs`	a vector containing the summary statistics of an observed dataset that will be added to the graph of the projected reference table (black star or vertical line).
`n.var`	number of variables in the variable importance representation.
`pdf`	a boolean that indicates if a pdf version of the graph(s) should be saved in the current directory.
`xlim`	range of the abscissa for the variable importance plot.
`...`	not used.

Note

The graph of the reference table projected on the LD axes is shown only if LD axes has been added to the set of summary statistics in the call of abcrf.

References

Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert, C. P. (2016) Reliable ABC model choice via random forests Bioinformatics doi:10.1093/bioinformatics/btv684

Examples

data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
plot(model.rf, data1)
data(snp.obs)
plot(model.rf, data1, obs=snp.obs[1,])
data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
plot(model.rf, data1)
data(snp.obs)
plot(model.rf, data1, obs=snp.obs[1,])

Plot of a reg-ABC-RF object

Description

plot.regAbcrf provides a variable importance plot used to construct the reg-ABC-RF object, as measured by ranger with the argument importance='impurity'.

Usage

## S3 method for class 'regAbcrf'
plot(x, n.var=min(30, length(x$model.rf$variable.importance)), xlim=NULL, main=NULL, ...)
## S3 method for class 'regAbcrf'
plot(x, n.var=min(30, length(x$model.rf$variable.importance)), xlim=NULL, main=NULL, ...)

Arguments

`x`	a `regAbcrf` object.
`n.var`	number of variables in the variable importance representation. The default value is equal to the minimum between 30 and the number of summary statistics.
`xlim`	range of the abscissa for the variable importance plot.
`main`	an overall title for the variable importance plot.
`...`	not used.

References

Raynal L., Marin J.-M. Pudlo P., Ribatet M., Robert C. P. and Estoup, A. (2019) ABC random forests for Bayesian parameter inference Bioinformatics doi:bioinformatics/bty867

Examples

data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
plot(model.rf.r)
data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
plot(model.rf.r)

Predict and evaluate the posterior probability of the MAP for new data using an ABC-RF object

Description

Based on an ABC-RF object this function predicts the best model for new data and evaluate the posterior probability of the MAP.

Usage

## S3 method for class 'abcrf'
predict(object, obs, training, ntree = 1000, 
sampsize = min(1e5,  object$model.rf$num.samples ), paral = FALSE,
ncores = if(paral) max(detectCores()-1,1) else 1, paral.predict = FALSE, 
ncores.predict = if(paral.predict) max(detectCores()-1,1) else 1, ...)
## S3 method for class 'abcrf'
predict(object, obs, training, ntree = 1000, 
sampsize = min(1e5,  object$model.rf$num.samples ), paral = FALSE,
ncores = if(paral) max(detectCores()-1,1) else 1, paral.predict = FALSE, 
ncores.predict = if(paral.predict) max(detectCores()-1,1) else 1, ...)

Arguments

`object`	an `abcrf` object.
`obs`	a data frame containing the summary statistics of the observed data sets.
`training`	the data frame containing the reference table used to train the `abcrf` object.
`ntree`	number of trees to grow in the regression forest, by default 1,000 trees.
`sampsize`	size of the sample from the reference table used to grow a tree of the forest, by default the minimum between the number of elements of the reference table and 100,000.
`paral`	a boolean that indicates if the calculations of the regression random forest (forest that returns the posterior probability of the selected model) should be parallelized.
`ncores`	the number of CPU cores to use for the regression random forest construction. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`paral.predict`	a boolean that indicates if random forests predictions should be parallelized.
`ncores.predict`	the number of CPU cores to use for random forest predictions (classification and regression). If paral.predict=TRUE, it is used the number of CPU cores minus 1. If ncores.predict is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`...`	additional arguments to be passed on to `ranger` used to construct the regression random forest that estimates the posterior probability of the selected model.

Value

An object of class abcrfpredict, which is a list with the following components:

`allocation`	indices of the selected models for each observed data set,
`vote`	votes for each observed dataset,
`post.prob`	ABC-RF approximations of the posterior probability of the selected model for each observed dataset.

References

Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert, C. P. (2016) Reliable ABC model choice via random forests Bioinformatics doi:10.1093/bioinformatics/btv684

Examples

data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
data(snp.obs)
predict(model.rf, snp.obs, data1, ntree=100)
data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
data(snp.obs)
predict(model.rf, snp.obs, data1, ntree=100)

Predict posterior expectation, median, variance and quantiles given a new dataset using a reg-ABC-RF object

Description

Based on a reg-ABC-RF object this function predicts the posterior expectation, median, variance, quantiles for the corresponding parameter given new dataset. Somes posterior errors can be computed at an higher computational price.

Usage

## S3 method for class 'regAbcrf'
predict(object, obs, training, quantiles=c(0.025,0.975),
paral = FALSE, ncores = if(paral) max(detectCores()-1,1) else 1, rf.weights = FALSE,
post.err.med = FALSE, ...)
## S3 method for class 'regAbcrf'
predict(object, obs, training, quantiles=c(0.025,0.975),
paral = FALSE, ncores = if(paral) max(detectCores()-1,1) else 1, rf.weights = FALSE,
post.err.med = FALSE, ...)

Arguments

`object`	a `regAbcrf` object.
`obs`	a data frame containing the summary statistics of the observed data sets.
`training`	the data frame containing the reference table used to train the `regAbcrf` object.
`quantiles`	numeric vector of probabilities with values in [0,1]. The default value is equal to `c(0.025, 0.975)`.
`paral`	a boolean that indicates if random forests predictions should be parallelized.
`ncores`	the number of CPU cores to use for the regression random forest predictions. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`rf.weights`	a boolean that indicates if the random forest weights used to predict quantities of interest should we returned. The default value is FALSE.
`post.err.med`	a boolean that indicates if posterior errors based on posterior medians should be computed. The default value is FALSE. If computed, this function might take a much more time.
`...`	optional arguments to be passed on to the function `predict.ranger`.

Value

An object of class regAbcrfpredict, which is a list with the following components:

`expectation`	predicted posterior expectation for each oberved data set,
`med`	predicted posterior median for each oberved data set,
`variance`	predicted posterior variance for each observed data set, computed by reusing weights, this quantity is also the posterior mean squared error,
`variance.cdf`	predicted posterior variance for each observed data set, computed by approximation of the cumulative distribution function,
`quantiles`	predicted posterior quantiles for each observed data set,
`weights`	a matrix composed of the weights used to predict quantities of interest. Returned if `rf.weights` is TRUE,
`post.NMAE.mean`	posterior normalized mean absolute error obtained using the out-of-bag posterior expectation (mean) and previously computed random forest weights, for each observed data set,
`post.MSE.med`	posterior mean squared error obtained using the out-of-bag posterior median and previously computed random forest weights, for each observed data set,
`post.NMAE.med`	posterior normalized mean absolute error obtained using the out-of-bag posterior expectation (mean) and previously computed random forest weights, for each observed data set,
`prior.MSE`	prior mean squared error computed with training out-of-bag prediction based on mean of response variable,
`prior.NMAE`	prior normalized mean absolute error computed with training out-of-bag predictions based on mean of response variable,
`prior.MSE.med`	prior mean squared error computed with training out-of-bag predictions based on median of response variable,
`prior.NMAE.med`	prior normalized mean absolute error with training out-of-bag predictions based on median of response variable,
`prior.coverage`	prior credible inteval coverage computed for training instances, if only two quantiles are of interest, NULL otherwise.

References

Raynal L., Marin J.-M. Pudlo P., Ribatet M., Robert C. P. and Estoup, A. (2019) ABC random forests for Bayesian parameter inference Bioinformatics doi:10.1093/bioinformatics/bty867

Examples

data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
data(snp.obs)
predict(model.rf.r, snp.obs, data2)
data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
data(snp.obs)
predict(model.rf.r, snp.obs, data2)

Predict out-of-bag posterior expectation, median, variance, quantiles and error measures using a reg-ABC-RF object

Description

Based on a reg-ABC-RF object this function predicts the out-of-bag posterior expectation, median, variance, quantiles, mean squared errors, normalized mean absolute errors, credible interval and coverage, for the corresponding parameter using the out-of-bag observations of the training data set.

Mean squared errors and normalized mean absolute errors are computed both with mean and median of the response variable.

Memory allocation issues might be encountered when the size of the training data set is large.

Usage

## S3 method for class 'regAbcrf'
predictOOB(object, training, quantiles=c(0.025,0.975), paral = FALSE,
ncores = if(paral) max(detectCores()-1,1) else 1,...)
## S3 method for class 'regAbcrf'
predictOOB(object, training, quantiles=c(0.025,0.975), paral = FALSE,
ncores = if(paral) max(detectCores()-1,1) else 1,...)

Arguments

`object`	a `regAbcrf` object.
`training`	the data frame containing the reference table used to train the `regAbcrf` object.
`quantiles`	numeric vector of probabilities with values in [0,1]. The default value is equal to `c(0.025, 0.975)`.
`paral`	a boolean that indicates if training data predictions should be parallelized or not.
`ncores`	the number of CPU cores to use for the regression random forest predictions. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`...`	optional arguments to be passed on to the function `predict.ranger`.

Value

An object of class regAbcrfOOBpredict, which is a list with the following components:

`expectation`	predicted posterior expectation for each oberved data set,
`med`	predicted posterior median for each oberved data set,
`variance`	predicted posterior variance for each observed data set, computed by reusing weights,
`variance.cdf`	predicted posterior variance for each observed data set, computed by approximation of the cumulative distribution function,
`quantiles`	predicted posterior quantiles for each observed data set,
`MSE`	mean squared error computed with prediction based on mean of response variable,
`NMAE`	normalized mean absolute error computed with predictions based on mean of response variable,
`MSE.med`	mean squared error computed with predictions based on median of response variable,
`NMAE.med`	normalized mean absolute error with predictions based on median of response variable,
`coverage`	credible inteval coverage if only two quantiles are of interest, NULL otherwise.

References

Raynal L., Marin J.-M. Pudlo P., Ribatet M., Robert C. P. and Estoup, A. (2019) ABC random forests for Bayesian parameter inference Bioinformatics doi:10.1093/bioinformatics/bty867

Examples

data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
res <- predictOOB(model.rf.r, data2) 
data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
res <- predictOOB(model.rf.r, data2)

Read a reference table simulated from DIYABC

Description

readRefTable reads a reference table simulated from DIYABC thanks to a .bin and a .txt file, respectively containing the reference table and its header.

Usage

readRefTable(filename = "reftable.bin", header = "header.txt", N = 0)
readRefTable(filename = "reftable.bin", header = "header.txt", N = 0)

Arguments

`filename`	a .bin file from DIYABC containing the reference table.
`header`	a .txt file containing the header of the reference table in `filename`.
`N`	an integer indicating the number of observations to extract from the reference table. The default is 0 indicating that the whole reference table is used. Warning: if N is specified, nrecscen returns is obsolete.

Value

A list with the following components:

`nrec`	number of individuals of the reference table,
`nscen`	number of scenarios in the reference table,
`nrecscen`	number of individuals by scenario,
`nparam`	number of parameters by scenario,
`scenarios`	a vector of factor containing the scenario indices,
`params`	a matrix with the parameters,
`stats`	a matrix with the summary statistics.

References

Cornuet J.-M., Pudlo P., Veyssier J., Dehne-Garcia A., Gautier M., Leblois R., Marin J.-M. and Estoup A. (2014) DIYABC v2.0: a software to make Approximate Bayesian Computation inferences about population history using Single Nucleotide Polymorphism, DNA sequence and microsatellite data Bioinformatics doi:10.1093/bioinformatics/btn514

Create a reg-ABC-RF object: a regression random forest from a reference table aimed out predicting posterior expectation, variance and quantiles for a parameter

Description

regAbcrf constructs a regression random forest from a reference table towards predicting posterior expectations, variances and quantiles of a parameter.

Usage

## S3 method for class 'formula'
regAbcrf(formula, data, ntree=500,
mtry=max(floor((dim(data)[2]-1)/3), 1), sampsize=min(1e5, nrow(data)),
paral=FALSE, ncores=if(paral) max(detectCores()-1,1) else 1, ...)
## S3 method for class 'formula'
regAbcrf(formula, data, ntree=500,
mtry=max(floor((dim(data)[2]-1)/3), 1), sampsize=min(1e5, nrow(data)),
paral=FALSE, ncores=if(paral) max(detectCores()-1,1) else 1, ...)

Arguments

`formula`	a formula: left of ~, variable representing the response variable; right of ~, summary statistics of the reference table.
`data`	a data frame containing the reference table, composed of response variable (parameter) and summary statistics.
`ntree`	number of trees to grow in the forest, by default 500 trees.
`mtry`	Number of variables to possibly split at in each node. Default is the minimum between 1 and the number of variables divided by 3.
`sampsize`	size of the sample from the reference table used to grow a tree of the regression forest, by default the minimum between the number of elements of the reference table and 100,000.
`paral`	a boolean that indicates if the calculations of the regression random forest should be parallelized.
`ncores`	the number of CPU cores to use. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and `detectCores` does not detect the number of CPU cores with success then 1 core is used.
`...`	additional arguments to be passed on to `ranger` used to construct the regression random forest that predicts the response variable.

Value

An object of class regAbcrf, which is a list with the following components:

`call`	the original call to `regAbcrf`,
`formula`	the formula used to construct the regression random forest,
`model.rf`	an object of class `ranger` containing the trained forest with the reference table.

References

Raynal L., Marin J.-M. Pudlo P., Ribatet M., Robert C. P. and Estoup, A. (2019) ABC random forests for Bayesian parameter inference Bioinformatics doi:10.1093/bioinformatics/bty867

Examples

data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
model.rf.r
data(snp)
modindex <- snp$modindex
sumsta <- snp$sumsta[modindex == "3",]
r <- snp$param$r[modindex == "3"]
r <- r[1:500]
sumsta <- sumsta[1:500,]
data2 <- data.frame(r, sumsta)
model.rf.r <- regAbcrf(r~., data2, ntree=100)
model.rf.r

A simulated example in population genetics

Description

The simulated example of population genetics with SNP loci used in Pudlo et al. (2016): snp contains the reference table on which to perform ABC model choice, it also contains the simulated parameters to perform regression random forest. snp.obs contains two pseudo-observed data sets. The first one (favorable) should be easily allocated to a model, while that is not the case for the second one (unfavorable).

Usage

data(snp)
data(snp.obs)
data(snp)
data(snp.obs)

Format

snp is a list containing an ABC reference table of 10,000 simulations from a Bayesian prior predictive model (see Pudlo et al., 2016, for a description of the model choice issue). The first element, named modindex is a factor containing the model indices, the second element, param, is a data frame with seven simulated parameters. The last element of this list, named sumsta, contains the reference table on which to perform ABC model choice and parameter estimation.

snp.obs is a data frame containing the summary statistics of two pseudo-observed data sets.

Source

Pudlo, P., Marin, J.-M., Estoup, A., Cornuet, J.-M., Gautier, M. and Robert, C.P. (2016) Reliable ABC model choice via random forests Bioinformatics doi:10.1093/bioinformatics/btv684

Examples

data(snp)
data(snp.obs)
data(snp)
data(snp.obs)

Variable importance plot from a random forest

Description

variableImpPlot provides a dotchart of variable importance as measured by ranger with the argument importance='impurity'.

Usage

variableImpPlot(object, 
n.var=min(30, length(object$model.rf$variable.importance)),
xlim=NULL, main=NULL)
variableImpPlot(object, 
n.var=min(30, length(object$model.rf$variable.importance)),
xlim=NULL, main=NULL)

Arguments

`object`	an `abcrf` or `regAbcrf` object.
`n.var`	number of variables in the variable importance representation.
`xlim`	range of the abscissa.
`main`	an overall title for the variable importance plot.

Value

Invisibly, the importance of the variables that were plotted.

Examples

data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
variableImpPlot(model.rf)
data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf <- abcrf(modindex~., data1, ntree=100)
variableImpPlot(model.rf)

Package 'abcrf'

Help Index

Create an ABC-RF object: a classification random forest from a reference table towards performing an ABC model choice

Description

Usage

Arguments

Value

References

See Also

Examples

Predict posterior covariance between two parameters for new data using two reg-ABC-RF objects

Description

Usage

Arguments

Value

References

See Also

Examples

Plot the posterior density given a new summary statistic

Description

Usage

Arguments

References

See Also

Examples

Calculate and plot for different numbers of tree, the out-of-bag errors associated with an ABC-RF object

Description

Usage

Arguments

Value

References

See Also

Examples

Calculate and plot for different numbers of tree, the out-of-bag mean squared errors associated with a REG-ABC-RF object

Description

Usage

Arguments

Value

References

See Also

Examples

Plot of an ABC-RF object

Description

Usage

Arguments

Note

References

See Also

Examples

Plot of a reg-ABC-RF object

Description

Usage

Arguments

References

See Also

Examples

Predict and evaluate the posterior probability of the MAP for new data using an ABC-RF object

Description

Usage

Arguments

Value

References

See Also

Examples

Predict posterior expectation, median, variance and quantiles given a new dataset using a reg-ABC-RF object

Description

Usage

Arguments

Value

References

See Also

Examples

Predict out-of-bag posterior expectation, median, variance, quantiles and error measures using a reg-ABC-RF object

Description

Usage

Arguments

Value

References

See Also

Examples