Working with GBIF data

Justification for this use case: building species distribution models requires information of where species are. In this document, we will see how the SimpleSDMLayers and GBIF packages interact, as a first step to get in a usable state. Specifically, we will work on the occurences of the relationship between temperature and precipitation for a few occurrences of the fungus Hypomyces lactifluorum, which will be the taxon used for all SDM-related vignettes.

using SimpleSDMLayers
using GBIF
using Plots
using Statistics

We will focus on showing where on the temperature/precipitation space the occurrences are, so we will download these layers:

temperature, precipitation = SimpleSDMPredictor(WorldClim, BioClim, [1, 12])
2-element Vector{SimpleSDMPredictor{Float32}}:
 SDM predictor → 1080×2160 grid with 808053 Float32-valued cells
 SDM predictor → 1080×2160 grid with 808053 Float32-valued cells

And now, we can follow the GBIF documentation to get some occurrences

observations = occurrences(
    GBIF.taxon("Hypomyces lactifluorum"; strict=true),
    "hasCoordinate" => "true",
    "country" => "CA",
    "country" => "US",
    "limit" => 300,
while length(observations) < size(observations)

@info observations
[ Info: GBIF records: downloaded 2921 out of 2921

We can then extract the temperature for the first occurrence:


Of course, it would be unwieldy to do this for every occurrence in our dataset, and so we will see a way do it much faster. But first, we do not need the entire surface of the planet to perform our analysis, and so we will instead clip the layers:

temperature = clip(temperature, observations)
precipitation = clip(precipitation, observations)
SDM predictor → 278×712 grid with 100109 Float32-valued cells
  Latitudes	24.666666666666668 ⇢ 71.0
  Longitudes	-169.0 ⇢ -50.33333333333333

This will make the future queries a little faster. By default, the clip function will ad a 5% margin on every side. To get the values of a layer at every occurrence in a GBIFRecord, we simply pass the records as a position:

histogram2d(temperature, precipitation; c=:devon, frame=:zerolines, leg=false, lab="")
xaxis!("Temperature (°C)")
yaxis!("Precipitation (mm)")

This will return a record of all data for all geo-localized occurrences (i.e. neither the latitude nor the longitude is missing) in a GBIFRecords collection, as an array of the eltype of the layer. Note that the layer values can be nothing, in which case you might need to run filter(!isnothing, temperature_clip[kf_occurrences] for it to work with the plotting functions.

We can also plot the records over space, using the overloads of the latitudes and longitudes functions:

contour(temperature; c=:cork, frame=:box, fill=true, clim=(-20, 20), levels=6)
    longitudes(observations), latitudes(observations); lab="", c=:white, msc=:orange, ms=2