By the end of this practical lab you will be able to:
The dynamics of urban systems are captured by various types of interaction data with varying degrees of spatio-temporal granularity. In this practical you will explore various ways in which interaction data can be mapped; including the estimation and representation of routes from co-ordinate pairs, mapping GPS trails and how very large origin-destination flows can be summarized.
Many interaction data within cities are simply pairs of origin and destination locations, with some flow recorded between them. A good source of such data which are available within many municipalities relate to the location and flow between bike share docking stations. Many of the operators of these systems now make these data openly available.
We will now read in the September 2015 - August 2016 data from the Bay Area Bike Share, SF, USA. Although the original zip files contain other data, we will just read in the data related to the station location and also the use records.
#Read in data
stations <- read.csv("./data/201608_station_data.csv")
trips <- read.csv("./data/201608_trip_data.csv")
Each of the stations has various attributes and cover a series of locations within the bay area - in this case, we will subset to only those within San Francisco.
head(stations)
## station_id name lat long
## 1 2 San Jose Diridon Caltrain Station 37.32973 -121.9018
## 2 3 San Jose Civic Center 37.33070 -121.8890
## 3 4 Santa Clara at Almaden 37.33399 -121.8949
## 4 5 Adobe on Almaden 37.33141 -121.8932
## 5 6 San Pedro Square 37.33672 -121.8941
## 6 7 Paseo de San Antonio 37.33380 -121.8869
## dockcount landmark installation
## 1 27 San Jose 8/6/2013
## 2 15 San Jose 8/5/2013
## 3 11 San Jose 8/6/2013
## 4 19 San Jose 8/5/2013
## 5 15 San Jose 8/7/2013
## 6 15 San Jose 8/7/2013
# Limit to SF stations
stations <- stations[stations$landmark == "San Francisco",]
library(ggmap)
#Get background map for Chicago
SanFran <- get_map(location = c(-122.405,37.79), zoom = 14,color = 'bw')
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=37.79,-122.405&zoom=14&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
#Basic point plot
ggmap(SanFran) + geom_point(data = stations, aes(x = long, y = lat, colour = "red")) +
theme_bw() +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.title=element_blank(),
axis.ticks = element_blank(),
legend.key = element_blank(),
legend.position="none",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
Turning now to the trips taken between these origin and destinations, these are ordered within the data frame as a trip per row, with each trip giving a various details including the start and end terminals. Using these data we will create a table of aggregate origin destination flows - however, we will only consider those flows between the stations identified as “San Francisco” and therefore shown on the map above.
# View top six rows of the trips data
head(trips)
## Trip.ID Duration Start.Date Start.Station
## 1 913465 746 9/1/2015 0:10 San Francisco Caltrain 2 (330 Townsend)
## 2 913466 969 9/1/2015 0:15 Clay at Battery
## 3 913467 233 9/1/2015 0:15 Davis at Jackson
## 4 913468 213 9/1/2015 1:29 Clay at Battery
## 5 913469 574 9/1/2015 1:33 Steuart at Market
## 6 913470 623 9/1/2015 1:36 San Jose Diridon Caltrain Station
## Start.Terminal End.Date End.Station
## 1 69 9/1/2015 0:23 San Francisco City Hall
## 2 41 9/1/2015 0:31 Washington at Kearny
## 3 42 9/1/2015 0:19 Commercial at Montgomery
## 4 41 9/1/2015 1:32 Steuart at Market
## 5 74 9/1/2015 1:42 San Francisco Caltrain 2 (330 Townsend)
## 6 2 9/1/2015 1:47 Japantown
## End.Terminal Bike.. Subscriber.Type Zip.Code
## 1 58 238 Subscriber 94107
## 2 46 16 Subscriber 94133
## 3 45 534 Subscriber 94111
## 4 74 312 Subscriber 94107
## 5 69 279 Subscriber 94107
## 6 9 261 Subscriber 95112
# Get a list of the station IDs within SF
s_SF <- unique(stations$station_id)
# Limit trips to those with both origin and destination within the SF subset
trips_SF <- trips[(trips$Start.Terminal %in% s_SF) & (trips$End.Terminal %in% s_SF),]
# Create an table with origins and destination pairs
OD_trips_SF <- table(trips$Start.Terminal,trips$End.Terminal)
#View the top six rows
head(OD_trips_SF)
##
## 2 3 4 5 6 7 8 9 10 11 12 13 14 16 21 22 23
## 2 67 193 971 354 402 390 118 350 96 289 93 209 26 135 0 0 0
## 3 148 118 23 31 39 24 13 23 11 18 15 8 20 12 0 0 0
## 4 932 35 96 14 14 29 11 10 11 11 49 8 206 4 0 0 0
## 5 372 29 9 58 51 26 53 15 37 22 15 14 17 3 0 0 0
## 6 396 51 25 37 81 38 31 137 30 16 18 131 53 47 0 0 0
## 7 322 27 18 17 54 55 60 51 15 24 13 30 17 83 0 0 0
##
## 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 41
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## 42 45 46 47 48 49 50 51 54 55 56 57 58 59 60 61 62
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 80 82
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 0
## 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0
##
## 83 84 88 89 90 91
## 2 0 314 1 8 0 0
## 3 0 19 1 8 0 0
## 4 0 13 0 1 0 0
## 5 0 65 0 0 0 0
## 6 0 57 0 17 0 0
## 7 0 43 0 1 0 0
If you remember from an earlier practical we can convert a table object into a data frame as follows which turns it from a wide to a narrow format:
# Create a data frame of the OD pairs
OD_trips_SF_Narrow <- data.frame(OD_trips_SF)
# Create sensible column names
colnames(OD_trips_SF_Narrow) <- c("Origin","Destination","Trips")
We will now identify the top ten most frequently ridden origin-destination pairs
# Sorts the trips in decending order
OD_trips_SF_Narrow <- OD_trips_SF_Narrow[order(OD_trips_SF_Narrow$Trips,decreasing = TRUE),]
# Get the top 10 trips
top10 <- OD_trips_SF_Narrow[1:10,]
top10
## Origin Destination Trips
## 4272 65 69 3341
## 4274 67 69 3035
## 3593 50 60 2986
## 3010 61 50 2781
## 4637 60 74 2605
## 3019 70 50 2593
## 3667 50 61 2531
## 4355 74 70 2482
## 4334 51 70 2384
## 4333 50 70 2362
We will now add origin and destination latitude and longitude co-ordinates by merging with the stations data. First the origin locations:
# Add origin co-ordinates
top10 <- merge(top10,stations, by.x="Origin",by.y="station_id", all.x=TRUE)
# Remove unwanted columns
top10 <- subset(top10, select=c("Origin","Destination","Trips","lat","long"))
# Change column names
colnames(top10) <- c("Origin","Destination","Trips","O_lat","O_long")
And then the destinations:
# Add destination co-ordinates
top10 <- merge(top10,stations, by.x="Destination",by.y="station_id", all.x=TRUE)
# Remove unwanted columns
top10 <- subset(top10, select=c("Origin","Destination","Trips","O_lat","O_long","lat","long"))
# Change column names
colnames(top10) <- c("Origin","Destination","Trips","O_lat","O_long","D_lat","D_long")
One of the simplest ways of calculating a route is to use the Google maps API which is implemented in the googleway package.
# Install package
install.packages("googleway")
# Load package
library(googleway)
For this you will need to get a Google maps API key:
# Set your key
key <- "your_api_key"
We will then extract an origin destination pair from our top10 object, and then use the google_directions() function to generate a route - this is then converted to a set of lat lon waypoints using decode_pl():
# Using the first origin/destination
x <- 1 # You can change this between 1 - 10 to view each of the routes
origin <- c(top10[x,"O_lat"],top10[x,"O_long"])
destination <- c(top10[x,"D_lat"],top10[x,"D_long"])
# get the directions from Google Maps API
res <- google_directions(origin = origin,destination = destination,key = key, mode= "bicycling")
# Convert the results to co-ordinates
df_polyline <- decode_pl(res$routes$overview_polyline$points)
# See the top six rows
head(df_polyline)
## lat lon
## 1 37.77671 -122.3951
## 2 37.77686 -122.3953
## 3 37.77700 -122.3951
## 4 37.77723 -122.3948
## 5 37.77825 -122.3936
## 6 37.77859 -122.3931
These can then be mapped with ggmap:
ggmap(SanFran) +
geom_path(aes(x = lon, y = lat), color = "red", size = 0.5, data = df_polyline, lineend = "round")
We can extend the above to run a conditional statement with the for() function which does something (in this case, what is in brackets) until a condition is satisfied. Here loop changes the value of x from 1 to the number of rows in the top10 object (i.e. 10), and for each change in x the code between the { and } is run. For loops are very helpful to run a block of code multiple times.
Because x is changed from 1-10 on each run, we can use this value in various helpful ways, firstly to select a particular row from the data frame top10, and second to act as an ID for each set of routes extracted.
tmp <- data.frame(lat = numeric(0), lon = numeric(0), ID = numeric(0), Trips = numeric(0))
for (x in 1:nrow(top10)) {
# Get origins and destinations
origin <- c(top10[x,"O_lat"],top10[x,"O_long"])
destination <- c(top10[x,"D_lat"],top10[x,"D_long"])
# get the directions from Google Maps API
res <- google_directions(origin = origin,destination = destination,key = key, mode= "bicycling")
# Convert the results to co-ordinates
df_polyline <- decode_pl(res$routes$overview_polyline$points)
# Add a route ID and Trips to the data frame
df_polyline$ID <- x
df_polyline$Trips <- top10[x,"Trips"]
# Append the results to the tmp object
tmp <- rbind(tmp,df_polyline)
}
We can now visualize this using the ID as a factor which shows each route as a separate color.
ggmap(SanFran) +
geom_path(aes(x = lon, y = lat,color = as.factor(ID)), size = 0.5, data = tmp, lineend = "round")
To enable some more experimentation with the flow data visualization without having to generate all the potential routes yourself, we have run these already for all origin destination station pairs where the flow was greater than 0. We will load these now:
# Load flows
load("./data/All_Flows.Rdata")
# Show the top six rows of the table
head(All_Flows)
## lat lon ID Trips
## 1 37.33245 -121.8903 1 4
## 2 37.33209 -121.8900 1 4
## 3 37.33227 -121.8900 1 4
## 4 37.33230 -121.8900 1 4
## 5 37.33230 -121.8900 1 4
## 6 37.33282 -121.8904 1 4
We can now show these on a map - we use the group option within the aes to tell ggmap that these are id that separate the routes, otherwise the whole set of co-ordinates are interpreted as a single route. You can remove these and generate the plot again to see what happens.
ggmap(SanFran) +
geom_path(aes(x = lon, y = lat, group = ID), data = All_Flows)
## Warning: Removed 2506 rows containing missing values (geom_path).
For those who know the topography of San Francisco will understand why the Google routes have been calculated to avoid the central area.
We can now use the trip information to adjust the plot - for example, to scale the routes by the flow volume. We add the size option, but also divide the flows by 1000 to make the line widths an acceptable size. Thicker lines represent greater flows. We have also added the location of the stations in red.
ggmap(SanFran) +
geom_path(aes(x = lon, y = lat, group = ID), data = All_Flows,size = All_Flows$Trips/1000) +
geom_point(data=stations, aes(long, lat),colour="red")
## Warning: Removed 2506 rows containing missing values (geom_path).
There are a lot of other adjustment options, for example, we can darken the map and change the line color:
ggmap(SanFran,darken = 0.8) +
geom_path(aes(x = lon, y = lat, group = ID), data = All_Flows, size = All_Flows$Trips/1000, colour = "white") +
geom_point(data=stations, aes(long, lat),colour="red")
## Warning: Removed 2506 rows containing missing values (geom_path).
Or color the lines by intensity of flow; plus, we have also added some labels for the station ID using geom_text():
ggmap(SanFran,darken = 0.8) +
geom_path(aes(x = lon, y = lat, group = ID,colour = All_Flows$Trips), data = All_Flows, size = All_Flows$Trips/1000) +
scale_colour_gradient(low="#900C3F", high="#FFC300") +
geom_point(data=stations, aes(long, lat),colour="red") +
geom_text(data = stations,aes(x = long, y = lat, label = station_id), check_overlap = TRUE, colour="#FFFFFF",hjust=-0.6)
## Warning: Removed 2506 rows containing missing values (geom_path).
We will now clean up this plot by removing the unwanted elements and changing the title of the legend:
ggmap(SanFran,darken = 0.8) +
geom_path(aes(x = lon, y = lat, group = ID,colour = All_Flows$Trips), data = All_Flows, size = All_Flows$Trips/1000) +
scale_colour_gradient(low="#900C3F", high="#FFC300",name="Trips") +
geom_point(data=stations, aes(long, lat),colour="red") +
geom_text(data = stations,aes(x = long, y = lat, label = station_id), check_overlap = TRUE, colour="#FFFFFF",hjust=-0.6) +
theme (
axis.text = element_blank (),
axis.title = element_blank (),
axis.ticks = element_blank ()
)
## Warning: Removed 2506 rows containing missing values (geom_path).
In the previous example we estimated the routes that cyclists may have taken as part of a bike share scheme - in reality the true routes taken would be divergent from these estimated paths which are essentially an optimized shortest path (based on how google calculate these for cyclists), and as such are a set of co-ordinates that follow the road topology.
For some urban analytics applications we may have tracking data gathered by GPS. For this example we will use a sample of the GeoLife data. This GPS trajectory dataset was collected by 182 users during a period of three years from April 2007 to August 2012, and was part of a Microsoft Research Asia project. We will import a sample of this data which are the records for a single user. There are multiple text files with the file extension “.pit”, each of which relate to a different journey. The first six rows of these data can be discarded.
In total there are 395 of these files which as you could imagine would take a very long time to import one by one. As such, we have written a small block of code that will import these for you and store each imported data frame in a list. This is an alternative to the loop that was presented in the last section.
library(lubridate)
# Create a list of the files to import
file_list <- list.files("./data/GeoLife", full=T)
count <- length(file_list)
# This function imports a file
file_con <- lapply(file_list, function(x){
tab <- read.table(x, head=F, quote = "\"", skip = 6, sep = ",") # import file
colnames(tab) <- c("lat","lon","zero","alt","days","date","time") # change the column headings
tab <- subset(tab,select=c("lat","lon","date","time")) # discard unwanted columns
tab$date <- ymd(tab$date) # Set the time date
tab$time <- hms(tab$time) # Set the time format
tab$ID <- strsplit(x,"/|\\.")[[1]][5] # Uses the file name as an ID
return(tab)
})
# Run the function and rbind the data frames together
file_con_df <- do.call(rbind, file_con)
The imported files are very similar in structure to the combined routes we explored in the last section. However, here, each GPS trail is separated by an ID.
head(file_con_df)
## lat lon date time ID
## 1 39.99997 116.3271 2008-10-23 17H 58M 52S 20081023175852
## 2 40.00001 116.3272 2008-10-23 17H 58M 58S 20081023175852
## 3 40.00001 116.3271 2008-10-23 17H 59M 3S 20081023175852
## 4 40.00002 116.3271 2008-10-23 17H 59M 8S 20081023175852
## 5 40.00000 116.3271 2008-10-23 17H 59M 13S 20081023175852
## 6 40.00002 116.3271 2008-10-23 17H 59M 18S 20081023175852
We will first create a simple map of the densest area of the activity:
#Get background map
beijing <- get_map(location = c(116.3244, 39.99202), zoom = 13,color = 'bw')
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=39.99202,116.3244&zoom=13&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
#Create plot
ggmap(beijing) +
geom_path(aes(x = lon, y = lat, group = ID), data = file_con_df)
## Warning: Removed 27092 rows containing missing values (geom_path).
With GPS gathered data the tracks can be messy and will not necessarily snap to network features such as paths or road. One way to make a GPS map more readily interpretable we can adjust the alpha of the plot.
ggmap(beijing) +
geom_path(aes(x = lon, y = lat, group = ID), data = file_con_df,alpha=0.1)
## Warning: Removed 27092 rows containing missing values (geom_path).
We can make this clearer by adjusting some of the colors:
ggmap(beijing,darken = 0.8) +
geom_path(aes(x = lon, y = lat, group = ID), data = file_con_df,alpha=0.1,colour="#FFFFFF")
## Warning: Removed 27092 rows containing missing values (geom_path).
We can also use the wday function to split the gps data into weekdays or weekend and see if there are different patterns visible:
ggmap(beijing,darken = 0.8) +
# plots the weekday
geom_path(aes(x = lon, y = lat, group = ID), data = file_con_df[!wday(file_con_df$date) %in% c(0,7),],alpha=0.1,colour="green") +
# plots weekend
geom_path(aes(x = lon, y = lat, group = ID), data = file_con_df[wday(file_con_df$date) %in% c(0,7),],alpha=0.1,colour="white")
## Warning: Removed 18524 rows containing missing values (geom_path).
## Warning: Removed 10489 rows containing missing values (geom_path).
We can also use a facet grid to visualize the patterns by a temporal period - in this case by hour:
# Create a new hour of the day variable
file_con_df$hour <- hour(file_con_df$time)
# Facet plot
ggmap(beijing,darken = 0.8, legend='none') +
geom_path(aes(x = lon, y = lat, group = ID), data = file_con_df,alpha=0.2,colour="white") +
facet_wrap(~hour,ncol=3) +
theme (
axis.text = element_blank (),
axis.title = element_blank (),
axis.ticks = element_blank ()
)
## Warning: Removed 35037 rows containing missing values (geom_path).
At more disaggregate temporal scales flow data are captured within many urban contexts through more traditional survey data or censuses. These commonly are created for questions around home-work locations or migration; and as such for a variety of variables it is often possible to map aggregate flows both internally and within urban systems.
In this final section of the practical we will map some longitudinal employer-household dynamics data from the US Census Bureau. There are a range of different data within this collection, but for these purposes we will utilize an extract of the origin destination data - LEHD Origin-Destination Employment Statistics (LODES). These are available for each state, however, the extract we consider here is for 2014 in Texas. This is at a block level.
First we will read in the origin-destination data and a cross walk file which gives a list of higher aggregations for the blocks. We have cut down both tables from the full file to save space.
# Import OD
load("./data/tx_od_main_JT00_2014.Rdata")
# Import cross walk
XWalk <- read.csv("./data/tx_xwalk.csv")
# We will also turn off scientific notation for this example as the block codes are stored as a very large number
options(scipen=999)
# View top six rows of OD data
head(OD)
## w_geocode h_geocode S000
## 1 480019501001001 480019501002117 1
## 2 480019501001001 480019501003007 1
## 3 480019501001001 480019501003080 1
## 4 480019501001001 480739508012094 1
## 5 480019501001001 482030201032067 1
## 6 480019501001001 482139501001101 1
There are a lot of variables within the file, the details of which can be found here; however, for this practical we will just be using “S000” which relates to the total number of jobs alongside the origin (homes - h_geocode) and destination (work - w_geocode) block codes.
We will view the crosswalk data frame and then we need to append this to the OD data. We merge this twice as the tract codes are needed for both the origin and destination blocks.
# View the top six rows of data
head(XWalk)
## tabblk2010 trct
## 1 480019501001000 48001950100
## 2 480019501001001 48001950100
## 3 480019501001002 48001950100
## 4 480019501001003 48001950100
## 5 480019501001004 48001950100
## 6 480019501001005 48001950100
# Merge onto the home block code
OD <- merge(OD,XWalk, by.x = "h_geocode", by.y = "tabblk2010", all.x= TRUE)
# Change column names
colnames(OD) <- c("h_geocode","w_geocode","S000","h_geocode_trct")
# Merge onto the work block code
OD <- merge(OD,XWalk, by.x = "w_geocode", by.y = "tabblk2010", all.x= TRUE)
# Change column names
colnames(OD) <- c("h_geocode","w_geocode","S000","h_geocode_trct","w_geocode_trct")
We can then aggregate the flows into the origin and destination tracts:
# Aggregate flows into Tracts
OD_Tract <- aggregate(data=OD, S000 ~ h_geocode_trct + w_geocode_trct, sum)
Before we can plot the flows we need the location of the tracts, which we can extract from a shapefile of the zone locations:
library(rgdal)
We will then read in the tract polygons for Texas downloaded from the US Census Beurueau site:
# Import spatial data
TX_SP <- readOGR("./data/Texas_Tract.geojson", "OGRGeoJSON")
## OGR data source with driver: GeoJSON
## Source: "./data/Texas_Tract.geojson", layer: "OGRGeoJSON"
## with 5254 features
## It has 9 fields
# Convert to WGS84
TX_SP <- spTransform(TX_SP, CRS("+init=epsg:4326"))
We will now have a look at the content of the data frame associated with TX_SP and then use the coordinates() function which extracts the centroid of a zone to build a new lookup:
# View the top 6 rows
head(TX_SP@data)
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD
## 0 48 005 000102 1400000US48005000102 48005000102 1.02 CT
## 1 48 005 001002 1400000US48005001002 48005001002 10.02 CT
## 2 48 005 001300 1400000US48005001300 48005001300 13 CT
## 3 48 007 950100 1400000US48007950100 48007950100 9501 CT
## 4 48 007 950400 1400000US48007950400 48007950400 9504 CT
## 5 48 021 950200 1400000US48021950200 48021950200 9502 CT
## ALAND AWATER
## 0 156392425 2427401
## 1 225237385 1495765
## 2 403384219 36228741
## 3 474927337 467095102
## 4 6040756 0
## 5 37063780 31442
# Create lookup table
TX_tract_centroids <- data.frame(TX_SP@data$GEOID,coordinates(TX_SP))
# Change column names
colnames(TX_tract_centroids) <- c("Tract","lon","lat")
# View the top six rows of the new data
head(TX_tract_centroids)
## Tract lon lat
## 0 48005000102 -94.87112 31.32737
## 1 48005001002 -94.70740 31.16608
## 2 48005001300 -94.52019 31.19509
## 3 48007950100 -96.94047 28.23178
## 4 48007950400 -97.07097 28.01728
## 5 48021950200 -97.37416 30.35373
We will now merge the tract coordinates onto the tract flow data frame - again, we will do this twice and adjust the column names so both origin and destinations are coded.
# Add home lat lon
OD_Tract <- merge(OD_Tract,TX_tract_centroids,by.x="h_geocode_trct", by.y="Tract",all.x=TRUE)
# Fix column names
colnames(OD_Tract) <- c("h_geocode_trct","w_geocode_trct","S000","h_lon","h_lat")
# Add work lat lon
OD_Tract <- merge(OD_Tract,TX_tract_centroids,by.x="w_geocode_trct", by.y="Tract",all.x=TRUE)
# Fix column names
colnames(OD_Tract) <- c("w_geocode_trct","h_geocode_trct","S000","h_lon","h_lat","w_lon","w_lat")
We can now create a map - we will first create a map connecting tracts with flows of over 70
# Get base map
texas <- get_map(location = "Texas", zoom = 6,color = 'bw')
# Create plot
ggmap(texas) +
geom_segment(data=OD_Tract[OD_Tract$S000 > 70,],aes(y = h_lat, x = h_lon, yend = w_lat, xend = w_lon))
This is a bit messy, but we can improve this plot by adjusting some colors, line widths and transparency:
# Create plot
ggmap(texas,darken = 0.8) +
geom_segment(data=OD_Tract[OD_Tract$S000 > 70,],aes(y = h_lat, x = h_lon, yend = w_lat, xend = w_lon),colour= "white", alpha= 0.1, size=0.2)
We will now create a map at a city scale; in this case Houston. We have adjusted some of the parameters again, and also increased the number of origin-destinations:
# Get base map
Houston <- get_map(location = "Houston, TX", zoom = 10,color = 'bw')
# Create plot
ggmap(Houston,darken = 0.8) +
geom_segment(data=OD_Tract[OD_Tract$S000 > 5,],aes(y = h_lat, x = h_lon, yend = w_lat, xend = w_lon),colour= "white", alpha= 0.01, size=0.2)
## Warning: Removed 274660 rows containing missing values (geom_segment).
The problem with this plot is that all the origin destination pairs are being considered uniformly, and there is no consideration of the volume of flows. We can adjust this using the scale_alpha_continuous() option.
# Create plot
ggmap(Houston,darken = 0.8) +
geom_segment(data=OD_Tract[OD_Tract$S000 > 5,],aes(y = h_lat, x = h_lon, yend = w_lat, xend = w_lon, alpha= S000), size=0.3, colour = "white") +
scale_alpha_continuous(range = c(0.004, 0.3)) +
theme ( legend.position="none",
axis.text = element_blank (),
axis.title = element_blank (),
axis.ticks = element_blank ()
)
## Warning: Removed 274660 rows containing missing values (geom_segment).