Expert Tools Glossary of Terms
Species distribution modeling is a method of statistically modeling the suitable habitat of a species based on a set of known occurrence points for that species and environmental layers.
Species distribution requires a set of observed presence and absence points of a given species (i.e., geographic coordinates) and environmental layers such as temperature, soil type, or precipitation. The model calculates the values of the environmental layers at each presence and absence point to find a correlation between environmental characteristics and the geographic distribution of the species points; then, it applies that correlation to the whole modeling extent to predict the suitability of different areas for the modeled species.
True absence data is difficult to get for a species – it is easy enough to record a species in a particular location, but much more difficult to say with certainty that a species does not occur at all in a particular location. So, we generate “pseudo-absence” points, called “Background Points” in the evaluation interface, that simulate random absence points for the given species. SDMs made with pseudo-absence points can still generate accurate predictions, but we must be careful of the bias that may be introduced by the pseudo-absence points.
A polygon (or several polygons) that were drawn by a human expert for a species showing where that species is expected to occur.
The range map for a species that was used in the creation of the species distribution model; either the expert range map if it is available for that species, otherwise the ecoregion range map.
Computer-generated points that simulate observations of the species’ absence.
An SDM outputs a map where each grid cell contains a value representing the likelihood that the species occurs there. The prediction threshold is the upper and lower limits of those likelihoods: values above the upper limit are considered present (100% likelihood), values below the lower limit are considered absent (0% likelihood), and values in between are considered to have a likelihood between 0 and 100%.
A binary version of the prediction uses just one threshold value where every grid cell above that value is considered present (100% likelihood) and every grid cell below that value is considered absent (0% likelihood).
Stands for Area Under the Curve. This is a metric of how well a statistical model can classify binary data – in this case, presence or absence of a species. In theory, the closer the AUC value is to 1, the “better” the model has performed; however, since we do not have true absence data, the AUC value may not be reliable for all models.
An area where the species was falsely predicted to be present (or have a high likelihood of being present).
An area where the species was falsely predicted to be absent (or have a low likelihood of being present).