Node-based analysis of species distributions
This example demonstrates how to do a node-based comparison of species distributions, as described in Borregaard, M.K., Rahbek, C., Fjeldså, J., Parra, J.L., Whittaker, R.J. and Graham, C.H. (2014). Node-based analysis of species distributions. Methods in Ecology and Evolution 5: 1225-1235.
We will reimplement the method from the paper from first principles, using SpatialEcology functionality and the ecojulia phylogenetics package Phylo. We start by loading the basic data objects and end up with defining a function with the full functionality of the published paper.
We will work with the distributions and phylogenetic relationships for all species of the family Tyrannidae
in the Americas. The species occurrences are defined on a regular grid with a cellsize of 1 lat/long degree. This is one of the datasets used in the Borregaard et al. (2014) paper.
Load data and create objects
First, let's load the data.
Species occurrence data for spatial ecological analysis exists in a variety of different formats. A common format is to have the data in one or several CSV files.
In this case, we have the data in two CSV tables, one of species occurrences in each grid cell, and one with the lat-long coordinates of each grid cell.
The CSV table of occurrences is in the widely used phylocom format, which is a long-form format for associating the occurrence of species in sites. It consists of three columns, a column of species names, one of abundances (here all have the value 1, as it's a presence-absence data set) and a column of sites.
using CSV, DataFrames, SpatialEcology
phylocom = CSV.read("../../data/tyrann_phylocom.tsv", DataFrame)
4 rows × 3 columns
Plot | Record | Species | |
---|---|---|---|
Int64 | Int64 | String6… | |
1 | 6504 | 1 | Empidonax_hammondii |
2 | 6504 | 1 | Empidonax_alnorum |
3 | 6504 | 1 | Sayornis_saya |
4 | 6504 | 1 | Contopus_cooperi |
The coordinates is a simple DataFrame with a column of sites, one of latitude and one of longitude
coord = CSV.read("../../data/tyrann_coords.tsv", DataFrame)
4 rows × 3 columns
Long | Lat | cell | |
---|---|---|---|
Float64 | Float64 | Int64 | |
1 | -156.5 | 71.5 | 6504 |
2 | -155.5 | 71.5 | 6505 |
3 | -162.5 | 70.5 | 6858 |
4 | -161.5 | 70.5 | 6859 |
We ensure that the column of sites are represented as string
s in both data sets. We then construct the Assemblage object. The site columns are used to match the two DataFrames together.
phylocom.Plot = string.(phylocom.Plot)
coord.cell = string.(coord.cell)
tyrants = Assemblage(phylocom, coord)
Assemblage with 390 species in 3716 sites
Species names:
Empidonax_hammondii, Empidonax_alnorum, Sayornis_saya...Muscisaxicola_capistratus, Neoxolmis_rufiventris
Site names:
6504, 6505, 6858...51588, 51589
Let's have a look at the data
using Plots
default(color = cgrad(:Spectral, rev = true))
plot(tyrants)
Next, we'll read in the phylogenetic tree
using Phylo
tree = open(parsenewick, "../../data/tyrannid_tree.tre")
sort!(tree) # sort the nodes on the tree in order of size - useful for plotting
plot(tree, treetype = :fan, tipfont = (5,))