There isn’t too much analysis for this post. It’s mostly a walk-through of the project that I did. The end result is interesting to look at, but I haven’t figured what to do with the visualisation. Coincidentally, there was a paper published to Nature a couple of weeks ago that used the same dataset to do some actual science. Chris Mooney at the Washington Post does a good job of summarizing the findings.
Since the dataset is so large, over 530,000 trees are mapped within just the City of Toronto; I’ve trimmed the map to only account for the 9 most common types of trees.
1
This left 206,756 trees that could be mapped out.
Norway Maple, 75,070 -- Crab Apple, 24,033 -- Colorado Blue Spruce, 22,512 -- Silver Maple, 20,633 -- Honey Locust, 18,837 -- Schwedler Norway Maple, 15,642 -- Littleleaf Linden, 13,890
All the work I’ve done here has been in an effort to teach myself how to do it. You’ll notice a few instances where I don’t know exactly why a specific piece of code is needed. As time goes on, I’m hoping to be able to answer my own questions. For now, I think it’s a great exercise to try to document what I’m learning. There were a couple of sites that helped me to put it all together. I went through the tutorials on both sites and used them as guides to help me create this.
I’ll start with the data source. I searched for data about Toronto and came across it. I recognized the WGS84 version of the data and hoped that it would match what I did in the tutorials. The City site seems to have quite a few more data sets that should be provide some a good source for analysis in the future.
These are the libraries I used:
library("rgdal") library("ggplot2") library("rgeos") library("plyr") library("ggmap") library("RColorBrewer") library("grid")
First, setting the working directory where I previously downloaded the files. Then, I used the readOGR function to read the WGS84 files. This actually takes a little bit of time to read. The data is saved to trees.
setwd("c:/coding/R/Toronto") trees <- readOGR(dsn = ".", "street_tree_general_data_wgs84")
This next part is still something I’m trying to understand. Using the Spatial Reference website, I found the proper ESPG code for Toronto. I need to figure out why it’s needed and not already built in.
proj4string(trees) <- CRS("+init=epsg:3348")
Here I’m converting the data into a data frame for manipulation. Then creating a count of the number of each type of tree so that I can subset the data.
trees_df <- as.data.frame(trees) tt <- table(trees$COMMON_NAM) ComTrees <- subset(trees_df, trees_df$COMMON_NAM %in% names(tt[tt > 12000]))
A quick change to the column headers to make them easier to read/handle.
names(ComTrees)[9] <- "long" names(ComTrees)[10] <- "lat"
This part here is done just to check how the data points are being laid out. Not actually overlaying on top of the map yet. But it’s pretty interesting being able to pick out the shape of the city along with some of the landmarks just from this.
Following the code…
“ggplot(ComTrees”, calls up up the ggplot function and uses the ComTrees data.
“aes(…)” function is the aesthetics. It passes along the x axis and y axis data, then the separation of color by the COMMON_NAM.
“geom_point(size = 0.01)” is the scatter plot layer that’s added with the size of each point.
“coord_equal()” function is how the projection of latitude and longitude should be scaled.
“scale_colour_brewer(type = "qual", palette = 1)” indicates what colour palette should be used.
Some of these functions will be repeated for the final code of layering the scatter plot of trees onto the map.
Map <- ggplot(ComTrees, aes(long, lat, color = COMMON_NAM), group = group) + geom_point(size = 0.01) + coord_equal() + scale_colour_brewer(type = "qual", palette = 1) Map
Now I had to pull in the coordinates boundaries of the data. I don’t think this is the cleanest way of doing it, but it worked right away so didn’t spend a lot of time looking into it. There’s a small added amount to the max and min latitudes and longitudes so that all the data points fall inside the map. I’d still like to understand the bbox function better to figure out why I couldn’t get it to work with the ComTrees dataframe.
b <- bbox(trees) b[1, ] <- (b[1, ] - mean(b[1, ])) * 1.05 + mean(b[1, ]) b[2, ] <- (b[2, ] - mean(b[2, ])) * 1.05 + mean(b[2, ])
A quick change to row names in b to make it easier to follow.
row.names(b)[1] <- "lat" row.names(b)[2] <- "long"
Now here’s the big chunk of code that will get most of our work done. I’ll try to break down each part of the function as best as I understand it. “get_map(location = b)” This passes the coordinates to the get_map function that pulls in the proper map from Google Maps. There are other providers that can be used.
“ggmap(Tor.b1, extent = "panel", maprange = FALSE)” : I honestly don’t know why I had to specify the extent and maprange, but at it worked. This function just grabs the map that was previously pulled from Google Maps.“%+% ComTrees”: Here we add in the data for the trees. For R, %+% is a continuation of code, but this is something I have to look into more.
“aes(x = long, y = lat, color = COMMON_NAM) “: Just a repeat of the geom_point chart done previously, but wanted redo it for this chunk of code. Again, pulling in the longitude and latitude data for the x and y axis, while the dots are separated by their Common Name.
“geom_point(size = 0.01) + scale_colour_brewer(palette = "Set1") + coord_equal()”: All the same as above, but wanted to replicate, I could have done it as a repeating layer, but as I was doing lots of tweaking to see the different options, it was easier to have a single piece of code to rerun everytime.
Tor.b1 <- get_map(location = b) Tor.Map <- ggmap(Tor.b1, extent = "panel", maprange = FALSE) %+% ComTrees + aes(x = long, y = lat, color = COMMON_NAM) + geom_point(size = 0.01) + scale_colour_brewer(palette = "Set1") + theme_opts + coord_equal()