Get 'observation' table

Download data from the 'observation' ("observacao") table of one or more datasets published in the Free Brazilian Repository for Open Soil Data (FEBR), https://www.pedometria.org/febr/. This table includes data such as latitude, longitude, date of observation, underlying geology, land use and vegetation, local topography, soil classification, and much more.

observation(
  data.set,
  variable,
  stack = FALSE,
  missing = list(coord = "keep", time = "keep", data = "keep"),
  standardization = list(crs = NULL, time.format = NULL, units = FALSE, round = FALSE),
  harmonization = list(harmonize = FALSE, level = 2),
  progress = TRUE,
  verbose = TRUE,
  febr.repo = NULL
)

Arguments

data.set	Character vector indicating the identification code of one or more data sets. Use `data.set = "all"` to download all data sets.
variable	(optional) Character vector indicating one or more variables. Accepts only general identification codes, e.g. `"ferro"` and `"carbono"`. If missing, then a set of standard identification variables is downloaded. Use `variable = "all"` to download all variables. See ‘Details’ for more information.
stack	(optional) Logical value indicating if tables from different datasets should be stacked on a single table for output. Requires `standardization = list(units = TRUE)` -- see below. Defaults to `stack = FALSE`, the output being a list of tables.
missing	(optional) List with named sub-arguments indicating what should be done with an observation missing spatial coordinates, `coord`, date of observation, `time`, or data on variables, `data`. Options are `"keep"` (default) and `"drop"`.
standardization	(optional) List with named sub-arguments indicating how to perform data standardization. `crs` Character string indicating the EPSG code of the coordinate reference system (CRS) to which spatial coordinates should be transformed. For example, `crs = "EPSG:4674"`, i.e. SIRGAS 2000, the standard CRS for Brazil. Defaults to `crs = NULL`, i.e. no transformation is performed. `time.format` Character string indicating how to format dates. For example, `time.format = "%d-%m-%Y"`, i.e. dd-mm-yyyy such as in 31-12-2001. Defaults to `time.format = NULL`, i.e. no formatting is performed. See `base::as.Date()` for more details. `units` Logical value indicating if the measurement unit(s) of the continuous variable(s) should be converted to the standard measurement unit(s). Defaults to `units = FALSE`, i.e. no conversion is performed. See `dictionary()` for more information. `round` Logical value indicating if the values of the continuous variable(s) should be rounded to the standard number of decimal places. Requires `units = TRUE`. Defaults to `round = FALSE`, i.e. no rounding is performed. See `dictionary()` for more information.
harmonization	(optional) List with named sub-arguments indicating if and how to perform data harmonization. `harmonize` Logical value indicating if data should be harmonized. Defaults to `harmonize = FALSE`, i.e. no harmonization is performed. `level` Integer value indicating the number of levels of the identification code of the variable(s) that should be considered for harmonization. Defaults to `level = 2`. See ‘Details’ for more information.
progress	(optional) Logical value indicating if a download progress bar should be displayed.
verbose	(optional) Logical value indicating if informative messages should be displayed. Generally useful to identify datasets with inconsistent data. Please report to febr-forum@googlegroups.com if you find any issue.
febr.repo	(optional) Defaults to the remote file directory of the Federal University of Technology - Paraná at https://cloud.utfpr.edu.br/index.php/s/Df6dhfzYJ1DDeso. Alternatively, a local directory path can be informed if the user has a local copy of the data repository.

Value

A list of data frames or a data frame with data on the chosen variable(s) of the chosen dataset(s).

Details

Standard identification variables

Standard identification variables and their content are as follows:

dataset_id. Identification code of the dataset in the FEBR to which an observation belongs.
observacao_id. Identification code of an observation in a dataset.
sisb_id. Identification code of an observation in the Brazilian Soil Information System maintained by the Brazilian Agricultural Research Corporation (EMBRAPA).
ibge_id. Identification code of an observation in the database of the Brazilian Institute of Geography and Statistics (IBGE).
observacao_data. Date (dd-mm-yyyy) in which an observation was made.
coord_sistema. EPSG code of the coordinate reference system.
coord_x. Longitude (deg) or easting (m).
coord_y. Latitude (deg) or northing (m).
coord_precisao. Precision with which x- and y-coordinates were determined (m).
coord_fonte. Source of the x- and y-coordinates.
pais_id. Country code (ISO 3166-1 alpha-2).
estado_id. Code of the Brazilian federative unit where an observation was made.
municipio_id. Name of the Brazilian municipality where as observation was made.
amostra_tipo. Type of sample taken.
amostra_quanti. Number of samples taken.
amostra_area. Sampling area.

Further details about the content of the standard identification variables can be found in https://docs.google.com/document/d/1Bqo8HtitZv11TXzTviVq2bI5dE6_t_fJt0HE-l3IMqM (in Portuguese).

Harmonization

Data harmonization consists of converting the values of a variable determined using some method B so that they are (approximately) equivalent to the values that would have been obtained if the standard method A had been used instead. For example, converting carbon content values obtained using a wet digestion method to the standard dry combustion method is data harmonization.

A heuristic data harmonization procedure is implemented in the febr package. It consists of grouping variables based on a chosen number of levels of their identification code. For example, consider a variable with an identification code composed of four levels, aaa_bbb_ccc_ddd, where aaa is the first level and ddd is the fourth level. Now consider a related variable, aaa_bbb_eee_fff. If the harmonization is to consider all four coding levels (level = 4), then these two variables will remain coded as separate variables. But if level = 2, then both variables will be re-coded as aaa_bbb, thus becoming the same variable.

Note

Check the new core data download function readFEBR().

Author

Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com

Examples

res <- observation(data.set = "ctb0013")
#> 
  |                                                                            
  |                                                                      |   0%
#> 
#> Reading ctb0013-observacao...
#> 
  |                                                                            
  |======================================================================| 100%

if (interactive()) {
# Download two data sets and standardize CRS
res <- observation(
  data.set = paste("ctb000", 4:5, sep = ""),
  variable = "taxon",
  standardization = list(crs = "EPSG:4674"))

# Try to download a data set that is not available yet
res <- observation(data.set = "ctb0020")

# Try to download a non existing data set
#res <- observation(data.set = "ctb0000")

# Try to read all files from local directory
febr.repo <- "~/ownCloud/febr-repo/publico"
febr.repo <- ifelse(dir.exists(febr.repo), febr.repo, NULL)
res <- observation(data.set = "all", febr.repo = febr.repo)
}