# Working with DataFrames

Both SimpleSDMLayers.jl and GBIF.jl offer an optional integration with the DataFrames.jl package. Therefore, our previous example with the kingfisher Megaceryle alcyon could also be approached with a DataFrame-centered workflow.

We will illustrate this using the same data and producing the same figures as in the previous example. To do so, we will use GBIF.jl to produce the occurrence DataFrame we will use throughout this example. However, it is also possible to use a DataFrame of your choosing instead of one generated by GBIF.jl, as long as it holds one occurrence per row, a column with the latitude coordinates, and a column with longitude coordinates. For the rest, it can hold whatever information you like. Most of our functions assume by default that the coordinates are stored in columns named :latitude and :longitude (the order doesn't matter), but you can generally specify other names with latitude = :lat in case you don't want to rename them (we will show you how below).

So let's start by getting our data:

# Load packages
using SimpleSDMLayers
using GBIF
using Plots
using Statistics
using DataFrames

temperature, precipitation = SimpleSDMPredictor(WorldClim, BioClim, [1,12])

# Get GBIF occurrences
kingfisher = GBIF.taxon("Megaceryle alcyon", strict=true)
kf_occurrences = occurrences(kingfisher,
"hasCoordinate" => "true",
"decimalLatitude" => (0.0, 65.0),
"decimalLongitude" => (-180.0, -50.0),
"limit" => 200)
for i in 1:4
occurrences!(kf_occurrences)
end
@info kf_occurrences
[ Info: Loading DataFrames support for SimpleSDMLayers.jl
[ Info: GBIF records: downloaded 1000 out of 100000

Once the data is loaded, we can easily convert the environmental layers to a DataFrame with the corresponding coordinates. We can do this for a single layer or for multiple layers at the same time:

# Single layer
temperature_df = DataFrame(temperature)
# Multiple layers
env_layers = [temperature, precipitation]
env_df = DataFrame(env_layers)
rename!(env_df, :x1 => :temperature, :x2 => :precipitation)
first(env_df, 5)

5 rows × 4 columns

longitudelatitudetemperatureprecipitation
Float64Float64Float32?Float32?
1-179.917-89.9167-31.017143.0
2-179.917-89.75-30.391940.0
3-179.917-89.5833-33.482240.0
4-179.917-89.4167-33.610437.0
5-179.917-89.25-33.719940.0

Note that the resulting DataFrame will include missing values for the elements set to nothing in the layers. We might want to remove those rows using filter! or dropmissing!:

dropmissing!(env_df, [:temperature, :precipitation]);
last(env_df, 5)

5 rows × 4 columns

longitudelatitudetemperatureprecipitation
Float64Float64Float32Float32
1179.91770.9167-11.1038153.0
2179.91771.0833-12.7957161.0
3179.91771.25-12.8151148.0
4179.91771.4167-12.3703136.0
5179.91771.5833-12.3328122.0

GBIF.jl allows us to convert a set of occurrences to a DataFrame just as easily:

kf_df = DataFrame(kf_occurrences)
last(kf_df, 5)

5 rows × 18 columns

Int64Abstrac…?Abstrac…?Abstrac…?Abstrac…?Abstrac…?Abstrac…?DateTim…?Abstrac…?Abstrac…?Abstrac…?String?String?String?String?String?String?String?

We can then extract the temperature values for all the occurrences.

temperature[kf_df]
1000-element Array{Float32,1}:
18.931166
18.931166
17.857887
17.08904
9.419209
12.142196
10.534634
12.142196
24.394215
14.874448
⋮
18.559376
18.931166
18.26275
10.0455265
9.921742
20.455812
9.230989
15.448521
20.617886

Or we can clip the layers according to the occurrences:

temperature_clip = clip(temperature, kf_df)
precipitation_clip = clip(precipitation, kf_df)
SDM predictor → 289×738 grid with 75299 Float32-valued cells
Latitudes	(13.083333333333334, 61.083333333333336)
Longitudes	(-172.58333333333334, -49.75)

In case your DataFrame has different column names for the coordinates, for example :lat and :lon, you can clip it like this:

kf_df_shortnames = rename(kf_df, :latitude => :lat, :longitude => :lon)
clip(temperature, kf_df_shortnames; latitude = :lat, longitude = :lon)
SDM predictor → 289×738 grid with 75299 Float32-valued cells
Latitudes	(13.083333333333334, 61.083333333333336)
Longitudes	(-172.58333333333334, -49.75)

We can finally plot the layer and occurrence values in a similar way to any DataFrame or Array.

histogram2d(temperature_clip, precipitation_clip, c = :viridis)
scatter!(temperature_clip[kf_df], precipitation_clip[kf_df],
lab= "", c = :white, msc = :orange)

To plot the occurrence values over space, you can use:

contour(temperature_clip, c = :alpine, title = "Temperature",
frame = :box, fill = true)
scatter!(kf_df.longitude, kf_df.latitude,
lab = "", c = :white, msc = :orange, ms = 2)