Node-based analysis of species distributions

This example demonstrates how to do a node-based comparison of species distributions, as described in Borregaard, M.K., Rahbek, C., Fjeldså, J., Parra, J.L., Whittaker, R.J. and Graham, C.H. (2014). Node-based analysis of species distributions. Methods in Ecology and Evolution 5: 1225-1235.

We will reimplement the method from the paper from first principles, using SpatialEcology functionality and the ecojulia phylogenetics package Phylo. We start by loading the basic data objects and end up with defining a function with the full functionality of the published paper.

We will work with the distributions and phylogenetic relationships for all species of the family Tyrannidae in the Americas. The species occurrences are defined on a regular grid with a cellsize of 1 lat/long degree. This is one of the datasets used in the Borregaard et al. (2014) paper.

Load data and create objects

First, let's load the data.

Species occurrence data for spatial ecological analysis exists in a variety of different formats. A common format is to have the data in one or several CSV files.

In this case, we have the data in two CSV tables, one of species occurrences in each grid cell, and one with the lat-long coordinates of each grid cell.

The CSV table of occurrences is in the widely used phylocom format, which is a long-form format for associating the occurrence of species in sites. It consists of three columns, a column of species names, one of abundances (here all have the value 1, as it's a presence-absence data set) and a column of sites.

using CSV, DataFrames, SpatialEcology
phylocom = CSV.read("../../data/tyrann_phylocom.tsv", DataFrame)

4 rows × 3 columns

PlotRecordSpecies
Int64Int64String6…
165041Empidonax_hammondii
265041Empidonax_alnorum
365041Sayornis_saya
465041Contopus_cooperi

The coordinates is a simple DataFrame with a column of sites, one of latitude and one of longitude

coord = CSV.read("../../data/tyrann_coords.tsv", DataFrame)

4 rows × 3 columns

LongLatcell
Float64Float64Int64
1-156.571.56504
2-155.571.56505
3-162.570.56858
4-161.570.56859

We ensure that the column of sites are represented as strings in both data sets. We then construct the Assemblage object. The site columns are used to match the two DataFrames together.

phylocom.Plot = string.(phylocom.Plot)
coord.cell = string.(coord.cell)
tyrants = Assemblage(phylocom, coord)
Assemblage with 390 species in 3716 sites

Species names:
Empidonax_hammondii, Empidonax_alnorum, Sayornis_saya...Muscisaxicola_capistratus, Neoxolmis_rufiventris

Site names:
6504, 6505, 6858...51588, 51589

Let's have a look at the data

using Plots
default(color = cgrad(:Spectral, rev = true))
plot(tyrants)

Next, we'll read in the phylogenetic tree

using Phylo
tree = open(parsenewick, "../../data/tyrannid_tree.tre")
sort!(tree) # sort the nodes on the tree in order of size - useful for plotting
plot(tree, treetype = :fan, tipfont = (5,))