Generating pseudo absences

using SimpleSDMLayers
using Plots
using GBIF
using StatsBase

Justification for this use case: by contrast to the BIOCLIM model from the previous use case, many models require background knowledge about where the species is not, which is rarely available. For this reason, we often need to resort to generating pseudo-absences, by applying various guesses based on where we know species are.

In this example, we will see how to generate pseudo-absences (according to Barbet-Massin et al.) using three methods: radius-based, surface range envelope, and random selection. To begin with, we will occurrences for the Lobster mushroom in Canada and the US.

sp = GBIF.taxon("Hypomyces lactifluorum")
observations = occurrences(sp, "hasCoordinate" => true, "limit" => 300, "country" => "CA", "country" => "US")
while length(observations) < size(observations)
    occurrences!(observations)
end

In order to have a layer to start working, we will get the precipitation layer:

layer = clip(SimpleSDMPredictor(WorldClim, BioClim, 12), observations)
SDM predictor → 278×712 grid with 100109 Float32-valued cells
  Latitudes	24.666666666666668 ⇢ 71.0
  Longitudes	-169.0 ⇢ -50.33333333333333

We can visualize the results of this query:

plot(layer, c=:devon)
scatter!(longitudes(observations), latitudes(observations), lab="", msw=0.0, ms=1, c=:orange)

The first step here is to remove the redundancy in observations: multiple observations in the same cell do not really convey a lot of information. For this reason, we can create a very sparse layer with only presences:

presences = mask(layer, observations, Bool)
SDM response → 278×712 grid with 100109 Bool-valued cells
  Latitudes	24.666666666666668 ⇢ 71.0
  Longitudes	-169.0 ⇢ -50.33333333333333

This is enough to start generating pseudo-absences. We will first use the RandomSelection method, which will pick positions anywhere on the layer except in places that are already occupied. Because our species has one occurrence far away in Alaska this might not be the best method, but this is a simple one to grasp.

rs_pa = rand(RandomSelection, presences)
SDM response → 278×712 grid with 100109 Bool-valued cells
  Latitudes	24.666666666666668 ⇢ 71.0
  Longitudes	-169.0 ⇢ -50.33333333333333

We can plot this layer to see what it looks like:

plot(convert(Float32, rs_pa), c=:Greys, leg=false)
scatter!(longitudes(observations), latitudes(observations), lab="", msw=0.0, ms=1, c=:orange)