The City of Vancouver releases a dataset of the 3-1-1 phone calls - the general hotline regarding maintenance issues in the city. Currently there is no tool to visualize and access the data. How can citizens engage the city for these matters if there’s no way to work with the data? In the name of civic “hacking”, the project brief is to develop a project that:
As a data and design team, you must deliver:
The timeline for the project is 9 hours total:
We will use this project to go through Ben Fry’s Data Visualization Pipeline.
This will be a collaborative project - you will need to communicate within and accross teams and be organized and have a well documented workflow. This will become clear when we generate the interactive map tile layers later on in the project.
Data viz is hard and in the end comes down to a lot of experimentation and exploration. This script attempts to showcase how the data viz pipeline is done in practice and how it is far from a linear process, but rather a very interactive and dynamic process.
Let’s being our script by a nice cheeky commented header:
######################################################
# Vancouver 3-1-1: Data Processing Script
# Date:
# By:
# Desc:
######################################################
We noticed how libraries can help us to read in geographic data and even help us make new scales. Since this is a bigger project, we’re going to need to help of some more libraries:
# ------------------------------------------------- #
# --------------- Install Libararies --------------- #
# ------------------------------------------------- #
install.packages("GISTools")
install.packages('rgdal')
install.packages('curl')
install.packages("devtools")
devtools::install_github("corynissen/geocodeHERE")
# Unused Libraries:
# install.packages("ggmap")
# library(ggmap)
You will notice we have some unused libraries - these are some that I started out using in the beginning, but decided to not use. I kept them here just for future reference. NOTE: ggmap was previously used for it’s geocode() function - but with google’s 2500 api call limit, it wasn’t enough for the 10,000+ geocoding events we would need for our project.
Now that you’ve installed the libraries now Let’s load up our libraries to make all those new functions available to us:
# ------------------------------------------------- #
# ---------------- Load Libararies ----------------- #
# ------------------------------------------------- #
library(GISTools)
## Loading required package: maptools
## Loading required package: sp
## Checking rgeos availability: TRUE
## Loading required package: RColorBrewer
## Loading required package: MASS
## Loading required package: rgeos
## rgeos version: 0.3-13, (SVN revision 508)
## GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921
## Linking to sp version: 1.2-0
## Polygon checking: TRUE
library(rgdal)
## rgdal: version: 1.0-7, (SVN revision 559)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
## Path to GDAL shared files: /usr/local/Cellar/gdal/1.11.2_3/share/gdal
## Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
## Path to PROJ.4 shared files: (autodetected)
## Linking to sp version: 1.2-0
library(geocodeHERE)
library(curl)
We use the curl() function to make http/https requests from the web to get data and used our read.csv() function to read our table in to R.
# ------------------------------------------------- #
# ------------------- Acquire --------------------- #
# ------------------------------------------------- #
# access from the interwebz using "curl"
fname = curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails.csv')
# Read data as csv
data = read.csv(fname, header=T)
# inspect your data
print(head(data))
## Year Month Day Hour Minute Department
## 1 2014 1 1 7 21 CSG - Licenses
## 2 2014 1 1 7 45 ENG - Solid Waste
## 3 2014 1 1 9 27 ENG - Solid Waste
## 4 2014 1 1 10 6 ENG - Solid Waste
## 5 2014 1 1 10 15 PRB - Administration
## 6 2014 1 1 10 17 ENG - Streets
## Division
## 1 Animal Control
## 2 Sanitation
## 3 Sanitation
## 4 Sanitation
## 5 General Park Board Information
## 6 Traffic and Electrical Operations
## Case_Type Hundred_Block
## 1 Dead Animal Pickup Case Intersection
## 2 Missed Garbage Pickup 22##
## 3 Abandoned Garbage Pickup - City Property & Parks 10##
## 4 Missed Garbage Pickup 27##
## 5 PRB_Park Ranger SR 11##
## 6 Sign - Repair 8600
## Street_Name Local_Area
## 1 E 64TH AV and PRINCE ALBERT ST Sunset
## 2 SW MARINE DRIVE Kerrisdale
## 3 W 12TH AV Fairview
## 4 W 16TH AV Arbutus Ridge
## 5 W CORDOVA ST Downtown
## 6 - 8699 GRANVILLE ST Marpole
Upon inspecting our data, we notice we have the addresses, but the city has put int “#’s” to help with the anonymity of the callers. Furthermore, we notice that we don’t have any lat/lon coordinates to work with to turn our 3-1-1 calls into a something spatial. How can we develop a solution for this?
First let’s sub out the “#” with “0”:
# ------------------------------------------------- #
# -------------- Parse: Geocoder ------------------- #
# ------------------------------------------------- #
# change intersection to 00's
data$h_block = gsub("#", "0", data$Hundred_Block)
print(head(data$h_block))
## [1] "Intersection" "2200" "1000" "2700"
## [5] "1100" "8600"
Next let’s concatenate the newly created “h_block” column with the Street_Name column, and a string that specifices that all of the calls are from Vancouver, BC, Cananda:
# Join the strings from each column together & add "Vancouver, BC":
data$full_address = paste(data$h_block,
paste(data$Street_Name,
"Vancouver, BC, Canada",
sep=", "),
sep=" ")
We also notice that the city has put in the word “intersection” for those calls that refer to an intersection. Let’s take those out so as to make our geocoding parsing potentially easier:
data$full_address = gsub("Intersection", "", data$full_address)
print(head(data$full_address))
## [1] " E 64TH AV and PRINCE ALBERT ST, Vancouver, BC, Canada"
## [2] "2200 SW MARINE DRIVE, Vancouver, BC, Canada"
## [3] "1000 W 12TH AV, Vancouver, BC, Canada"
## [4] "2700 W 16TH AV, Vancouver, BC, Canada"
## [5] "1100 W CORDOVA ST, Vancouver, BC, Canada"
## [6] "8600 - 8699 GRANVILLE ST, Vancouver, BC, Canada"
Now we will use Nokia’s Here API for their geocoder - their limit on geocoding requests is ~10,000 calls per 24 hours. This works well since our data is ~10,000 features. This will take all of our addresses and return lat/lon coordinates for us to use. The total process takes about 20 min which is pretty fast - possibly faster with a good internet connection.
# Geocode the events - we use Nokia's Here API
# Create an empty vector for lat and lon coordinates
lat = c(); lon = c()
# loop through the addresses
for(i in 1: length(data$full_address)){
# store the address at index "i" as a character
address = as.character(data$full_address[i])
# append the latitude of the geocoded address to the lat vector
lat = c(lat,geocodeHERE_simple(address)$Latitude)
# append the longitude of the geocoded address to the lon vector
lon = c(lon,geocodeHERE_simple(address)$Longitude)
# at each iteration through the loop, print the coordinates - takes about 20 min.
# print(paste("#",i,", ", lat[i], lon[i], sep=","))
}
# add the lat lon coordinates to the dataframe
data$lat = lat
data$lon = lon
After a major computational task (e.g. geocoding) you probably want to write your file out so that you keep a copy of your data at this step. In side the quotes for the variable ofile, put in the path to where you want your data stored.
# after geocoding, it's a good idea to write your file out!
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo.csv'
ofile ='/Users/Jozo/Desktop/temp-geo.csv'
write.csv(data, ofile)
We won’t be doing any heavy analysis on the data, but we will try to tease out some categories to make the data more intuitive for exploring. The first thing we’ll do is to see what are the types of unique cases that are being called in:
# ------------------------------------------------- #
# --------------------- Mine ---------------------- #
# ------------------------------------------------- #
# --- Examine the unique cases --- #
# examine how the cases are grouped - are these intuitive?
unique(data$Department)
## [1] CSG - Licenses
## [2] ENG - Solid Waste
## [3] PRB - Administration
## [4] ENG - Streets
## [5] ENG - Transportation
## [6] ENG - Water & Sewer
## [7] PRB - Planning and Operations
## [8] Business Planning & Services
## [9] Public Safety - Fire
## [10] CSG - Inspections
## [11] CSG - Licenses & Inspections
## [12] Interdepartmental Initiatives
## [13] ENG - Office of the Deputy City Engineer
## [14] City Administration
## 14 Levels: Business Planning & Services ... Public Safety - Fire
unique(data$Division)
## [1] Animal Control
## [2] Sanitation
## [3] General Park Board Information
## [4] Traffic and Electrical Operations
## [5] Neighbourhood Parking and Transportation
## [6] Traffic and Data Management
## [7] Street Activities - Integrated Graffiti Management
## [8] Street Activities - Streets Furniture
## [9] Water Operations
## [10] Streets Design - P & D - Accessibility
## [11] Street Trees
## [12] 311 Contact Centre
## [13] Sewers Operations
## [14] Prevention
## [15] Electrical
## [16] Property Use
## [17] Administration Branch
## [18] License Office
## [19] Streets Operations
## [20] Solid Waste Management
## [21] Parking Ops & Enforcement
## [22] Vegetation
## [23] Building
## [24] Snow Angel Program
## [25] Plumbing and Gas
## [26] Kent Construction Supplies and Services
## [27] Elections
## [28] Communications
## [29] Street Activities - Streets Horticulture
## [30] Sewers and Drainage Design
## [31] Sewer Separation
## [32] Water Design
## [33] Transfer and Landfill Operations
## [34] Active Transportation
## 34 Levels: 311 Contact Centre ... Water Operations
# examine the types of cases - can we make new groups that are more useful?
# unique(data$Case_Type)
# Print each unique case on a new line for easier inspection
for (i in 1:length(unique(data$Case_Type))){
print(unique(data$Case_Type)[i], max.levels=0)
}
## [1] Dead Animal Pickup Case
## [1] Missed Garbage Pickup
## [1] Abandoned Garbage Pickup - City Property & Parks
## [1] PRB_Park Ranger SR
## [1] Sign - Repair
## [1] Animal Control General Inquiry Case
## [1] Lost Pets Case
## [1] Residential Parking Requests
## [1] Missed Yard Trimmings and Food Scraps Pickup
## [1] Traffic & Pedestrian Signal - New
## [1] Graffiti Removal - City Property
## [1] Poster/Sign Removal Request
## [1] Street Furniture Repair and Maintenance Request
## [1] Water Leaks/Breaks
## [1] Street Cleaning & Debris Pickup
## [1] Wheelchair Curb/Ramp Request
## [1] Recycling Bag Request
## [1] Animal Complaint - Non-Emergency Case
## [1] Street Light - Out
## [1] Missed Recycling Pickup
## [1] Missed Apartment Recycling Pickup
## [1] Street Tree Work Request SR
## [1] Citizen Feedback
## [1] Cart - Green (Yard Trimmings and Food Scraps)
## [1] Water Service Turn On/Off Request
## [1] Sewer Pipe Inquiries
## [1] Fire Reinspection Request for Firehall
## [1] Street Litter Can Request
## [1] Electrical Inspection Cancellation Case
## [1] Cart - Garbage
## [1] Fire Reinspection Request for Inspector
## [1] Collection Calendar Mail-Out Request
## [1] PUI Noise Complaint Case
## [1] Building Plans Information Request
## [1] Licence Payment Request Case
## [1] Streets - General Issues
## [1] Street Light - Pole Repair
## [1] Water Service Locate Request
## [1] Recycling Box Request
## [1] Gone Out of Business Case
## [1] Street - Surface Water Flooding
## [1] Blue Box and Leaf Removal Guide Mail-Out Request
## [1] Illegal Dumping/Abandoned Garbage Pickup
## [1] Sidewalk - Repair
## [1] Abandoned Vehicle Request
## [1] Cart - Apartment Recycling
## [1] Water Work Site Complaint
## [1] Parks Litter Can or Cart Request
## [1] Water Hydrant Issue
## [1] Secondary Suite Information Request
## [1] Vegetation Maintenance SR
## [1] Street - Repair
## [1] Building Inspection Cancellation Case
## [1] PUI Noise General Inquiry Case
## [1] Snow Angel Program - Individual Volunteer
## [1] Street Light - New/Relocation
## [1] FPB_General Inquiry Case
## [1] PUI General Inquiry Case
## [1] Pothole - Repair
## [1] Trees and Vegetation Encroachment - City Property
## [1] Holding Stray Case
## [1] Green Bin Program - Feedback and General Inquiry
## [1] Traffic Calming Request
## [1] Traffic & Pedestrian Signal - Modify
## [1] Catch Basin Issues
## [1] Snow & Ice Removal - City Property
## [1] Plumbing and Gas Inspection Cancellation Case
## [1] Street Light - Flat Glass Fixture Request
## [1] General Information Request SR
## [1] Traffic Sign - Modify
## [1] Sewer Manhole Issues
## [1] Snow and Ice Removal - Sidewalk Bylaw Violation
## [1] Water General Inquiry
## [1] Sewer General Inquiries
## [1] Curbside Sign - New
## [1] Occupancy Permit Information Request
## [1] Graffiti Removal - External Organization
## [1] Water General Work Request
## [1] Pavement Marking - Repair
## [1] Election General Concerns
## [1] Home Safety Check Request Case
## [1] Apartment Recycling - Registration Request
## [1] Street and Traffic Light - Utility Damage
## [1] Sewer Odour Complaints
## [1] Horticulture Inquiry on Right-of-Way
## [1] Traffic Sign - New
## [1] Sewer Design General Inquiries
## [1] Homelessness/Transient Issue
## [1] Banner Request
## [1] Sewer Separation Inspection Cancellation Case
## [1] Boulevard Maintenance Issues
## [1] Parking Meter Requests
## [1] Bridges & Structures - Repair
## [1] Street Sign - New
## [1] Animal Cremation Case
## [1] Dead Skunk Pickup
## [1] Water Pressure or No Water Issue
## [1] Water Conservation Violation
## [1] Truck Violation
## [1] Water Meter Issue
## [1] Flag Request
## [1] Pavement Markings Request - New/Modify
## [1] Snow and Ice Removal - Sidewalk Bylaw Inquiry
## [1] Transfer Station & Recycling - General Inquiries
## [1] Curbside Sign - Modify
## [1] Crosswalk Marking - New
## [1] Chafer Beetle Feedback
## [1] Water Damage To City Water System
## [1] Traffic Count Request
## [1] Sewer Utility Damage
## [1] Bicycle Route Map Request
Whoa! That’s a heck of a lot of cases. Our role of data visualization people is to take this mess and try to make sense of it so that we can represent it in a manner that is more intuitive and possibly delightful. There are a bunch of ways that we can organize these cases into specific classes, so I invite you to think about how we might best organize these. For now, I’ve come up with 6 classes that bin these cases together. Does this make sense? Should we change it?:
// --- graffiti and noise --- //
Graffiti Removal - City Property
Graffiti Removal - External Organization
PUI Noise Complaint Case
PUI Noise General Inquiry Case
// --- street surface & Maintenance --- //
Street Furniture Repair and Maintenance Request
Street Cleaning & Debris Pickup
Street Light - Out
Street Tree Work Request SR
Street Litter Can Request
Streets - General Issues
Street Light - Pole Repair
Street - Surface Water Flooding
Street - Repair
Street Light - New/Relocation
Street Light - Flat Glass Fixture Request
Street and Traffic Light - Utility Damage
Street Sign - New
Crosswalk Marking - New
Boulevard Maintenance Issues
Bicycle Route Map Request
Sidewalk - Repair
Pothole - Repair
Pavement Markings Request - New/Modify
Pavement Marking - Repair
Sewer Pipe Inquiries
Sewer Manhole Issues
Sewer General Inquiries
Sewer Design General Inquiries
Sewer Separation Inspection Cancellation Case
Sewer Utility Damage
Sewer Odour Complaints
Plumbing and Gas Inspection Cancellation Case
Snow Angel Program - Individual Volunteer
Snow & Ice Removal - City Property
Snow and Ice Removal - Sidewalk Bylaw Violation
Snow and Ice Removal - Sidewalk Bylaw Inquiry
Traffic & Pedestrian Signal - New
Traffic Calming Request
Traffic & Pedestrian Signal - Modify
Traffic Sign - Modify
Street and Traffic Light - Utility Damage
Traffic Sign - New
Traffic Count Request
Truck Violation
Residential Parking Requests
Parking Meter Requests
Abandoned Vehicle Request
// --- garbage, Recycling & organics --- //
Missed Garbage Pickup
Abandoned Garbage Pickup - City Property & Parks
Cart - Garbage
Illegal Dumping/Abandoned Garbage Pickup
Parks Litter Can or Cart Request
Recycling Bag Request
Missed Recycling Pickup
Missed Apartment Recycling Pickup
Recycling Box Request
Cart - Apartment Recycling
Apartment Recycling - Registration Request
Transfer Station & Recycling - General Inquiries
Blue Box and Leaf Removal Guide Mail-Out Request
Missed Yard Trimmings and Food Scraps Pickup
Cart - Green (Yard Trimmings and Food Scraps)
Green Bin Program - Feedback and General Inquiry
Collection Calendar Mail-Out Request
// --- Water Related Issues --- //
Water Leaks/Breaks
Water Service Turn On/Off Request
Water Service Locate Request
Street - Surface Water Flooding
Water Work Site Complaint
Water Hydrant Issue
Water General Inquiry
Water General Work Request
Water Pressure or No Water Issue
Water Conservation Violation
Water Meter Issue
Water Damage To City Water System
Catch Basin Issues
// --- animal and vegetation related --- //
Dead Animal Pickup Case
Animal Control General Inquiry Case
Animal Complaint - Non-Emergency Case
Animal Cremation Case
Dead Skunk Pickup
Lost Pets Case
Holding Stray Case
Chafer Beetle Feedback
Vegetation Maintenance SR
Trees and Vegetation Encroachment - City Property
Horticulture Inquiry on Right-of-Way
// --- Other --- //
Poster/Sign Removal Request
Sign - Repair
Curbside Sign - New
Curbside Sign - Modify
Banner Request
Fire Reinspection Request for Firehall
Fire Reinspection Request for Inspector
Citizen Feedback
Wheelchair Curb/Ramp Request
Wheelchair
PRB_Park Ranger SR
Building Plans Information Request
Building Inspection Cancellation Case
Licence Payment Request Case
Gone Out of Business Case
FPB_General Inquiry Case
PUI General Inquiry Case
Electrical Inspection Cancellation Case
Bridges & Structures - Repair
Secondary Suite Information Request
General Information Request SR
Election General Concerns
Occupancy Permit Information Request
Home Safety Check Request Case
Flag Request
Homelessness/Transient Issue
In the end our 6 classes are:
Now that we have our classes, we can put them in arrays:
# Determine classes to group case types:
# graffiti and noise
graffiti_noise = c('Graffiti Removal - City Property','Graffiti Removal - External Organization','PUI Noise Complaint Case','PUI Noise General Inquiry Case')
# street surface & Maintenance
street_traffic_maint = c('Street Furniture Repair and Maintenance Request','Street Cleaning & Debris Pickup','Street Light - Out','Street Tree Work Request SR','Street Litter Can Request','Streets - General Issues','Street Light - Pole Repair','Street - Surface Water Flooding','Street - Repair','Street Light - New/Relocation','Street Light - Flat Glass Fixture Request','Street and Traffic Light - Utility Damage','Street Sign - New','Crosswalk Marking - New','Boulevard Maintenance Issues','Bicycle Route Map Request','Sidewalk - Repair','Pothole - Repair','Pavement Markings Request - New/Modify','Pavement Marking - Repair','Sewer Pipe Inquiries','Sewer Manhole Issues','Sewer General Inquiries','Sewer Design General Inquiries','Sewer Separation Inspection Cancellation Case','Sewer Utility Damage','Sewer Odour Complaints','Plumbing and Gas Inspection Cancellation Case','Snow Angel Program - Individual Volunteer','Snow & Ice Removal - City Property','Snow and Ice Removal - Sidewalk Bylaw Violation','Snow and Ice Removal - Sidewalk Bylaw Inquiry','Traffic & Pedestrian Signal - New','Traffic Calming Request','Traffic & Pedestrian Signal - Modify','Traffic Sign - Modify','Street and Traffic Light - Utility Damage','Traffic Sign - New','Traffic Count Request','Truck Violation','Residential Parking Requests','Parking Meter Requests','Abandoned Vehicle Request')
# garbage, Recycling & organics related
garbage_recycling_organics = c('Missed Garbage Pickup','Abandoned Garbage Pickup - City Property & Parks','Cart - Garbage','Illegal Dumping/Abandoned Garbage Pickup','Parks Litter Can or Cart Request','Recycling Bag Request','Missed Recycling Pickup','Missed Apartment Recycling Pickup','Recycling Box Request','Cart - Apartment Recycling','Apartment Recycling - Registration Request','Transfer Station & Recycling - General Inquiries','Blue Box and Leaf Removal Guide Mail-Out Request','Missed Yard Trimmings and Food Scraps Pickup','Cart - Green (Yard Trimmings and Food Scraps)','Green Bin Program - Feedback and General Inquiry','Collection Calendar Mail-Out Request')
# water related
water = c('Water Leaks/Breaks','Water Service Turn On/Off Request','Water Service Locate Request','Street - Surface Water Flooding','Water Work Site Complaint','Water Hydrant Issue','Water General Inquiry','Water General Work Request','Water Pressure or No Water Issue','Water Conservation Violation','Water Meter Issue','Water Damage To City Water System','Catch Basin Issues')
#animal and vegetation related
animal_vegetation = c('Dead Animal Pickup Case','Animal Control General Inquiry Case','Animal Complaint - Non-Emergency Case','Animal Cremation Case','Dead Skunk Pickup','Lost Pets Case','Holding Stray Case','Chafer Beetle Feedback','Vegetation Maintenance SR','Trees and Vegetation Encroachment - City Property','Horticulture Inquiry on Right-of-Way')
# other
other = c('Poster/Sign Removal Request','Sign - Repair','Curbside Sign - New','Curbside Sign - Modify','Banner Request','Fire Reinspection Request for Firehall','Fire Reinspection Request for Inspector','Citizen Feedback','Wheelchair Curb/Ramp Request','Wheelchair','PRB_Park Ranger SR','Building Plans Information Request','Building Inspection Cancellation Case','Licence Payment Request Case','Gone Out of Business Case','FPB_General Inquiry Case','PUI General Inquiry Case','Electrical Inspection Cancellation Case','Bridges & Structures - Repair','Secondary Suite Information Request','General Information Request SR','Election General Concerns','Occupancy Permit Information Request','Home Safety Check Request Case','Flag Request','Homelessness/Transient Issue')
We can then run through the lists and give each of the classes and cid so that it is easier to identify and represent them later:
# give class id numbers:
data$cid = 9999
for(i in 1:length(data$Case_Type)){
if(data$Case_Type[i] %in% graffiti_noise){
data$cid[i] = 1
}else if(data$Case_Type[i] %in% street_traffic_maint){
data$cid[i] = 2
}else if(data$Case_Type[i] %in% garbage_recycling_organics){
data$cid[i] = 3
}else if(data$Case_Type[i] %in% water){
data$cid[i] = 4
}else if(data$Case_Type[i] %in% animal_vegetation){
data$cid[i] = 5
}else{
data$cid[i] = 0
}
}
Great, so we now have our data binned into specific groups in a way that seems to make sense. However, if do a little poking around at our data, we notice that since our addresses are aggregated a lot of times to the same address point, our lat/lon coordinates come out the same. How can we deal with this spatial overlap?
One way to do this is to add in some “jitter” to each point if it happens to have the same coordinates.
# --- handle overlapping points --- #
# Set offset for points in same location:
data$lat_offset = data$lat
data$lon_offset = data$lon
# Run loop - if value overlaps, offset it by a random number
for(i in 1:length(data$lat)){
if ( (data$lat_offset[i] %in% data$lat_offset) && (data$lon_offset[i] %in% data$lon_offset)){
data$lat_offset[i] = data$lat_offset[i] + runif(1, 0.0001, 0.0005)
data$lon_offset[i] = data$lon_offset[i] + runif(1, 0.0001, 0.0005)
}
}
To derive some more insight into the data, how about looking at the top calls:
# --- what are the top calls? --- #
# get a frequency distribution of the calls:
top_calls = data.frame(table(data$Case_Type))
top_calls = top_calls[order(top_calls$Freq),]
print(top_calls)
## Var1 Freq
## 7 Banner Request 1
## 18 Chafer Beetle Feedback 1
## 30 Flag Request 1
## 38 Home Safety Check Request Case 1
## 67 Sewer Odour Complaints 1
## 70 Sewer Utility Damage 1
## 74 Snow and Ice Removal - Sidewalk Bylaw Inquiry 1
## 76 Snow Angel Program - Individual Volunteer 1
## 98 Truck Violation 1
## 100 Water Conservation Violation 1
## 8 Bicycle Route Map Request 2
## 11 Bridges & Structures - Repair 2
## 21 Crosswalk Marking - New 2
## 69 Sewer Separation Inspection Cancellation Case 2
## 95 Traffic Sign - New 2
## 101 Water Damage To City Water System 2
## 26 Election General Concerns 3
## 48 Occupancy Permit Information Request 3
## 52 Pavement Markings Request - New/Modify 3
## 63 Secondary Suite Information Request 3
## 64 Sewer Design General Inquiries 3
## 96 Transfer Station & Recycling - General Inquiries 3
## 49 Parking Meter Requests 4
## 75 Snow and Ice Removal - Sidewalk Bylaw Violation 4
## 82 Street Light - Flat Glass Fixture Request 4
## 93 Traffic Count Request 4
## 106 Water Meter Issue 4
## 5 Animal Cremation Case 5
## 17 Catch Basin Issues 5
## 22 Curbside Sign - Modify 5
## 29 Fire Reinspection Request for Inspector 5
## 51 Pavement Marking - Repair 5
## 87 Street Sign - New 5
## 25 Dead Skunk Pickup 6
## 111 Wheelchair Curb/Ramp Request 6
## 59 PUI Noise General Inquiry Case 7
## 94 Traffic Sign - Modify 7
## 40 Horticulture Inquiry on Right-of-Way 8
## 107 Water Pressure or No Water Issue 8
## 6 Apartment Recycling - Registration Request 9
## 79 Street and Traffic Light - Utility Damage 9
## 102 Water General Inquiry 10
## 66 Sewer Manhole Issues 12
## 110 Water Work Site Complaint 12
## 10 Boulevard Maintenance Issues 13
## 23 Curbside Sign - New 13
## 32 General Information Request SR 13
## 83 Street Light - New/Relocation 13
## 9 Blue Box and Leaf Removal Guide Mail-Out Request 14
## 36 Green Bin Program - Feedback and General Inquiry 14
## 99 Vegetation Maintenance SR 14
## 39 Homelessness/Transient Issue 15
## 108 Water Service Locate Request 17
## 53 Plumbing and Gas Inspection Cancellation Case 18
## 27 Electrical Inspection Cancellation Case 19
## 103 Water General Work Request 19
## 33 Gone Out of Business Case 24
## 92 Traffic Calming Request 25
## 97 Trees and Vegetation Encroachment - City Property 28
## 65 Sewer General Inquiries 29
## 31 FPB_General Inquiry Case 30
## 54 Poster/Sign Removal Request 30
## 90 Traffic & Pedestrian Signal - Modify 30
## 37 Holding Stray Case 31
## 12 Building Inspection Cancellation Case 32
## 73 Snow & Ice Removal - City Property 32
## 57 PUI General Inquiry Case 33
## 91 Traffic & Pedestrian Signal - New 33
## 104 Water Hydrant Issue 39
## 81 Street Furniture Repair and Maintenance Request 40
## 86 Street Litter Can Request 44
## 85 Street Light - Pole Repair 45
## 68 Sewer Pipe Inquiries 50
## 35 Graffiti Removal - External Organization 55
## 109 Water Service Turn On/Off Request 55
## 43 Lost Pets Case 56
## 56 PRB_Park Ranger SR 64
## 44 Missed Apartment Recycling Pickup 65
## 13 Building Plans Information Request 66
## 20 Collection Calendar Mail-Out Request 67
## 50 Parks Litter Can or Cart Request 70
## 71 Sidewalk - Repair 71
## 72 Sign - Repair 77
## 62 Residential Parking Requests 80
## 24 Dead Animal Pickup Case 82
## 4 Animal Control General Inquiry Case 87
## 77 Street - Repair 87
## 58 PUI Noise Complaint Case 102
## 89 Streets - General Issues 105
## 28 Fire Reinspection Request for Firehall 119
## 34 Graffiti Removal - City Property 140
## 55 Pothole - Repair 142
## 105 Water Leaks/Breaks 149
## 78 Street - Surface Water Flooding 164
## 61 Recycling Box Request 165
## 3 Animal Complaint - Non-Emergency Case 168
## 2 Abandoned Vehicle Request 175
## 14 Cart - Apartment Recycling 181
## 41 Illegal Dumping/Abandoned Garbage Pickup 190
## 42 Licence Payment Request Case 201
## 80 Street Cleaning & Debris Pickup 266
## 88 Street Tree Work Request SR 267
## 60 Recycling Bag Request 325
## 46 Missed Recycling Pickup 366
## 16 Cart - Green (Yard Trimmings and Food Scraps) 371
## 19 Citizen Feedback 513
## 15 Cart - Garbage 689
## 1 Abandoned Garbage Pickup - City Property & Parks 721
## 84 Street Light - Out 761
## 45 Missed Garbage Pickup 763
## 47 Missed Yard Trimmings and Food Scraps Pickup 1229
So we’ve added in some new coordinates and fiddled with them a bit to help with the representation. At this point, it is a good idea to filter out any extraneous points or points that fall outside of our area of interest - in this case Vancouver. Geocoders aren’t perfect and so we should be wary to include any false positives.
First, let’s subset out data that is either an NA or falls outside of Vancouver’s bounds:
# ------------------------------------------------- #
# -------------------- Filter --------------------- #
# ------------------------------------------------- #
# Subset only the data if the coordinates are within our bounds or if it is not a NA
data_filter = subset(data, (lat <= 49.313162) & (lat >= 49.199554) &
(lon <= -123.019028) & (lon >= -123.271371) & is.na(lon) == FALSE )
# plot the data
plot(data_filter$lon, data_filter$lat)
# If everthing looks good, we might also write out our file:
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo-filtered.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo-filtered.csv'
ofile_filtered = '/Users/Jozo/Desktop/temp-geo-filtered.csv'
write.csv(data_filter, ofile_filtered)
NOTE: we write another file out here so we can save our “final dataset”
With a filtered dataset, we can now write our file out to a shapefile and a geojson (if you want to open it later in another GIS).
I highly recommend that you look into learning more about all the different geographic data types like shapefiles and geojson. The last thing you want to do is show up to a job interview and not understand how they work, what the differences are, what are their advantages and disadvantages, etc. For more check out this link, or this link, or this link
Now let’s write out our data
# --- Convert Data to Shapefile --- #
# store coordinates in dataframe
coords_311 = data.frame(data_filter$lon_offset, data_filter$lat_offset)
# create spatialPointsDataFrame
data_shp = SpatialPointsDataFrame(coords = coords_311, data = data_filter)
# set the projection to wgs84
projection_wgs84 = CRS("+proj=longlat +datum=WGS84")
proj4string(data_shp) = projection_wgs84
# set an output folder for our geojson files:
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/geo/'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\data\\geo\\'
geofolder = '/Users/Jozo/Desktop/'
# Join the folderpath to each file name:
opoints_shp = paste(geofolder, "calls_","1401",".shp", sep="")
print(opoints_shp) # print this out to see where your file will go & what it will be called
## [1] "/Users/Jozo/Desktop/calls_1401.shp"
opoints_geojson = paste(geofolder, "calls_","1401",".geojson", sep="")
print(opoints_geojson) # print this out to see where your file will go & what it will be called
## [1] "/Users/Jozo/Desktop/calls_1401.geojson"
# write the file to a shp
writeOGR(check_exists = FALSE, data_shp, opoints_shp, layer="data_shp", driver="ESRI Shapefile")
## Warning in writeOGR(check_exists = FALSE, data_shp, opoints_shp, layer =
## "data_shp", : Field names abbreviated for ESRI Shapefile driver
# write file to geojson
writeOGR(check_exists = FALSE, data_shp, opoints_geojson, layer="data_shp", driver="GeoJSON")
We have all the raw data, but remember that saying “overview first, details on demand”? Our brains simply can’t understand the sheer number of points on the map. How about using some way of aggregating the points to a grid? Turns our hexagonal grids are quite good for conveying density of data points. I’ve created a hexagonal grid in QGIS at a resolution of about 250m x 280m:
First let’s read in the grid and reproject it from utm zone 10 north to wgs84:
# --- aggregate to a grid --- #
# ref: http://www.inside-r.org/packages/cran/GISTools/docs/poly.counts
# set the file name - combine the shpfolder with the name of the grid
# grid_fn = paste(shpfolder,'hgrid_100m.shp', sep="")
grid_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson'
# define the utm 10n projection
# projection_utm10n = CRS('+proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs')
# read in the hex grid setting the projection to utm10n
hexgrid = readOGR(grid_fn, 'OGRGeoJSON')
## OGR data source with driver: GeoJSON
## Source: "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson", layer: "OGRGeoJSON"
## with 5104 features
## It has 4 fields
# transform the projection to wgs84 to match the point file and store it to a new variable (see variable: projection_wgs84)
hexgrid_wgs84 = spTransform(hexgrid, projection_wgs84)
Next, let’s use the poly.counts() function to count the number of points in each grid cell:
# Use the poly.counts() function to count the number of occurences of calls per grid cell
grid_cnt = poly.counts(data_shp, hexgrid_wgs84)
# create a data frame of the counts
grid_total_counts = data.frame(grid_cnt)
# set grid_total_counts dataframe to the hexgrid data
hexgrid_wgs84$data = grid_total_counts$grid_cnt
# remove all the grids without any calls
hexgrid_wgs84 = subset(hexgrid_wgs84, grid_cnt >0)
Finally, write out the data to a shapefile and geojson - we will use the geojson later, but just for good practice:
# define the output names:
ohex_shp = paste(geofolder, "hexgrid_250m_","1401","_cnts",".shp")
print(ohex_shp)
## [1] "/Users/Jozo/Desktop/ hexgrid_250m_ 1401 _cnts .shp"
ohex_geojson = paste(geofolder, "hexgrid_250m_","1401","_cnts",".geojson")
print(ohex_geojson)
## [1] "/Users/Jozo/Desktop/ hexgrid_250m_ 1401 _cnts .geojson"
# write the file to a shp
writeOGR(hexgrid_wgs84, ohex_shp, layer="hexgrid_wgs84", driver="ESRI Shapefile", check_exists = FALSE)
# write file to geojson
writeOGR(hexgrid_wgs84, ohex_geojson, layer="hexgrid_wgs84", driver="GeoJSON", check_exists = FALSE)
Technically, we have enough data now to start putting together our web app which visualizes the 3-1-1 data. The next step will be to work with the representation and using the data we produced to filter the visuals with user interactions.
If all is well, this should run in it’s entirety - remember you need to change the folder paths!!!:
######################################################
# Vancouver 3-1-1: Data Processing Script
# Date:
# By:
# Desc:
######################################################
# ------------------------------------------------- #
# --------------- Install Libararies --------------- #
# ------------------------------------------------- #
install.packages("GISTools")
install.packages('rgdal')
install.packages('curl')
install.packages("devtools")
devtools::install_github("corynissen/geocodeHERE")
# ------------------------------------------------- #
# ---------------- Load Libararies ----------------- #
# ------------------------------------------------- #
library(GISTools)
library(rgdal)
library(geocodeHERE)
library(curl)
# ------------------------------------------------- #
# ------------------- Acquire --------------------- #
# ------------------------------------------------- #
# access from the interwebz using "curl"
fname = curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails.csv')
# Read data as csv
data = read.csv(fname, header=T)
# inspect your data
print(head(data))
# ------------------------------------------------- #
# -------------- Parse: Geocoder ------------------- #
# ------------------------------------------------- #
# change intersection to 00's
data$h_block = gsub("#", "0", data$Hundred_Block)
print(head(data$h_block))
# Join the strings from each column together & add "Vancouver, BC":
data$full_address = paste(data$h_block,
paste(data$Street_Name,
"Vancouver, BC, Canada",
sep=", "),
sep=" ")
data$full_address = gsub("Intersection", "", data$full_address)
print(head(data$full_address))
# Geocode the events - we use Nokia's Here API
# Create an empty vector for lat and lon coordinates
lat = c(); lon = c()
# loop through the addresses
for(i in 1: length(data$full_address)){
# store the address at index "i" as a character
address = as.character(data$full_address[i])
# append the latitude of the geocoded address to the lat vector
lat = c(lat,geocodeHERE_simple(address)$Latitude)
# append the longitude of the geocoded address to the lon vector
lon = c(lon,geocodeHERE_simple(address)$Longitude)
# at each iteration through the loop, print the coordinates - takes about 20 min.
print(paste("#",i,", ", lat[i], lon[i], sep=","))
}
# add the lat lon coordinates to the dataframe
data$lat = lat
data$lon = lon
# after geocoding, it's a good idea to write your file out!
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo.csv'
ofile ='insert filepath here'
write.csv(data, ofile)
# ------------------------------------------------- #
# --------------------- Mine ---------------------- #
# ------------------------------------------------- #
# --- Examine the unique cases --- #
# examine how the cases are grouped - are these intuitive?
unique(data$Department)
unique(data$Division)
# examine the types of cases - can we make new groups that are more useful?
# unique(data$Case_Type)
# Print each unique case on a new line for easier inspection
for (i in 1:length(unique(data$Case_Type))){
print(unique(data$Case_Type)[i], max.levels=0)
}
# Determine classes to group case types:
# graffiti and noise
graffiti_noise = c('Graffiti Removal - City Property','Graffiti Removal - External Organization','PUI Noise Complaint Case','PUI Noise General Inquiry Case')
# street surface & Maintenance
street_traffic_maint = c('Street Furniture Repair and Maintenance Request','Street Cleaning & Debris Pickup','Street Light - Out','Street Tree Work Request SR','Street Litter Can Request','Streets - General Issues','Street Light - Pole Repair','Street - Surface Water Flooding','Street - Repair','Street Light - New/Relocation','Street Light - Flat Glass Fixture Request','Street and Traffic Light - Utility Damage','Street Sign - New','Crosswalk Marking - New','Boulevard Maintenance Issues','Bicycle Route Map Request','Sidewalk - Repair','Pothole - Repair','Pavement Markings Request - New/Modify','Pavement Marking - Repair','Sewer Pipe Inquiries','Sewer Manhole Issues','Sewer General Inquiries','Sewer Design General Inquiries','Sewer Separation Inspection Cancellation Case','Sewer Utility Damage','Sewer Odour Complaints','Plumbing and Gas Inspection Cancellation Case','Snow Angel Program - Individual Volunteer','Snow & Ice Removal - City Property','Snow and Ice Removal - Sidewalk Bylaw Violation','Snow and Ice Removal - Sidewalk Bylaw Inquiry','Traffic & Pedestrian Signal - New','Traffic Calming Request','Traffic & Pedestrian Signal - Modify','Traffic Sign - Modify','Street and Traffic Light - Utility Damage','Traffic Sign - New','Traffic Count Request','Truck Violation','Residential Parking Requests','Parking Meter Requests','Abandoned Vehicle Request')
# garbage, Recycling & organics related
garbage_recycling_organics = c('Missed Garbage Pickup','Abandoned Garbage Pickup - City Property & Parks','Cart - Garbage','Illegal Dumping/Abandoned Garbage Pickup','Parks Litter Can or Cart Request','Recycling Bag Request','Missed Recycling Pickup','Missed Apartment Recycling Pickup','Recycling Box Request','Cart - Apartment Recycling','Apartment Recycling - Registration Request','Transfer Station & Recycling - General Inquiries','Blue Box and Leaf Removal Guide Mail-Out Request','Missed Yard Trimmings and Food Scraps Pickup','Cart - Green (Yard Trimmings and Food Scraps)','Green Bin Program - Feedback and General Inquiry','Collection Calendar Mail-Out Request')
# water related
water = c('Water Leaks/Breaks','Water Service Turn On/Off Request','Water Service Locate Request','Street - Surface Water Flooding','Water Work Site Complaint','Water Hydrant Issue','Water General Inquiry','Water General Work Request','Water Pressure or No Water Issue','Water Conservation Violation','Water Meter Issue','Water Damage To City Water System','Catch Basin Issues')
#animal and vegetation related
animal_vegetation = c('Dead Animal Pickup Case','Animal Control General Inquiry Case','Animal Complaint - Non-Emergency Case','Animal Cremation Case','Dead Skunk Pickup','Lost Pets Case','Holding Stray Case','Chafer Beetle Feedback','Vegetation Maintenance SR','Trees and Vegetation Encroachment - City Property','Horticulture Inquiry on Right-of-Way')
# other
other = c('Poster/Sign Removal Request','Sign - Repair','Curbside Sign - New','Curbside Sign - Modify','Banner Request','Fire Reinspection Request for Firehall','Fire Reinspection Request for Inspector','Citizen Feedback','Wheelchair Curb/Ramp Request','Wheelchair','PRB_Park Ranger SR','Building Plans Information Request','Building Inspection Cancellation Case','Licence Payment Request Case','Gone Out of Business Case','FPB_General Inquiry Case','PUI General Inquiry Case','Electrical Inspection Cancellation Case','Bridges & Structures - Repair','Secondary Suite Information Request','General Information Request SR','Election General Concerns','Occupancy Permit Information Request','Home Safety Check Request Case','Flag Request','Homelessness/Transient Issue')
# give class id numbers:
data$cid = 9999
for(i in 1:length(data$Case_Type)){
if(data$Case_Type[i] %in% graffiti_noise){
data$cid[i] = 1
}else if(data$Case_Type[i] %in% street_traffic_maint){
data$cid[i] = 2
}else if(data$Case_Type[i] %in% garbage_recycling_organics){
data$cid[i] = 3
}else if(data$Case_Type[i] %in% water){
data$cid[i] = 4
}else if(data$Case_Type[i] %in% animal_vegetation){
data$cid[i] = 5
}else{
data$cid[i] = 0
}
}
# --- handle overlapping points --- #
# Set offset for points in same location:
data$lat_offset = data$lat
data$lon_offset = data$lon
# Run loop - if value overlaps, offset it by a random number
for(i in 1:length(data$lat)){
if ( (data$lat_offset[i] %in% data$lat_offset) && (data$lon_offset[i] %in% data$lon_offset)){
data$lat_offset[i] = data$lat_offset[i] + runif(1, 0.0001, 0.0005)
data$lon_offset[i] = data$lon_offset[i] + runif(1, 0.0001, 0.0005)
}
}
# --- what are the top calls? --- #
# get a frequency distribution of the calls:
top_calls = data.frame(table(data$Case_Type))
top_calls = top_calls[order(top_calls$Freq),]
print(top_calls)
# ------------------------------------------------- #
# -------------------- Filter --------------------- #
# ------------------------------------------------- #
# Subset only the data if the coordinates are within our bounds or if it is not a NA
data_filter = subset(data, (lat <= 49.313162) & (lat >= 49.199554) &
(lon <= -123.019028) & (lon >= -123.271371) & is.na(lon) == FALSE )
# plot the data
plot(data_filter$lon, data_filter$lat)
# If everthing looks good, we might also write out our file:
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo-filtered.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo-filtered.csv'
ofile_filtered = 'insert filepath here'
write.csv(data_filter, ofile_filtered)
# --- Convert Data to Shapefile --- #
# store coordinates in dataframe
coords_311 = data.frame(data_filter$lon_offset, data_filter$lat_offset)
# create spatialPointsDataFrame
data_shp = SpatialPointsDataFrame(coords = coords_311, data = data_filter)
# set the projection to wgs84
projection_wgs84 = CRS("+proj=longlat +datum=WGS84")
proj4string(data_shp) = projection_wgs84
# set an output folder for our geojson files:
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/geo/'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\data\\geo\\'
geofolder = 'insert folder path here'
# Join the folderpath to each file name:
opoints_shp = paste(geofolder, "calls_","1401",".shp", sep="")
print(opoints_shp) # print this out to see where your file will go & what it will be called
opoints_geojson = paste(geofolder, "calls_","1401",".geojson", sep="")
print(opoints_geojson) # print this out to see where your file will go & what it will be called
# write the file to a shp
writeOGR(data_shp, opoints_shp, layer="data_shp", driver="ESRI Shapefile", check_exists = FALSE)
# write file to geojson
writeOGR(data_shp, opoints_geojson, layer="data_shp", driver="GeoJSON", check_exists = FALSE)
# --- aggregate to a grid --- #
# ref: http://www.inside-r.org/packages/cran/GISTools/docs/poly.counts
# set the file name - combine the shpfolder with the name of the grid
# grid_fn = paste(shpfolder,'hgrid_100m.shp', sep="")
grid_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson'
# define the utm 10n projection
# projection_utm10n = CRS('+proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs')
# read in the hex grid setting the projection to utm10n
hexgrid = readOGR(grid_fn, 'OGRGeoJSON')
# transform the projection to wgs84 to match the point file and store it to a new variable (see variable: projection_wgs84)
hexgrid_wgs84 = spTransform(hexgrid, projection_wgs84)
# Use the poly.counts() function to count the number of occurences of calls per grid cell
grid_cnt = poly.counts(data_shp, hexgrid_wgs84)
# create a data frame of the counts
grid_total_counts = data.frame(grid_cnt)
# set grid_total_counts dataframe to the hexgrid data
hexgrid_wgs84$data = grid_total_counts$grid_cnt
# remove all the grids without any calls
hexgrid_wgs84 = subset(hexgrid_wgs84, grid_cnt >0)
# define the output names:
ohex_shp = paste(geofolder, "hexgrid_250m_","1401","_cnts",".shp", sep="")
print(ohex_shp)
ohex_geojson = paste(geofolder, "hexgrid_250m_","1401","_cnts",".geojson", sep="")
print(ohex_geojson)
# write the file to a shp
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_shp, layer="hexgrid_wgs84", driver="ESRI Shapefile")
# write file to geojson
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_geojson, layer="hexgrid_wgs84", driver="GeoJSON")
We completed the exploration for 1 month, but what about the remaining 11 months? What is the value added of having the other temporal data? For the future, you can explore the full year’s worth of data by running this script on each of the datasets. Don’t forget to change the filenames for each year - if you don’t you’ll write over your data and have to run all the code again ;)
We’ve just created a whole bunch of data and so far it is just a bunch of shapefiles (or geojson files - depending on what you exported) and .csv files. We haven’t even had the chance to look at what we’ve made! While there are a dozen tools we could use to visualize and interact with our data (e.g. Tilemill, Mapbox Studio, CartoDB, Tableau, etc), we are going to stick with R for now and let it “gently” introduce us to the fundamentals of a web map.
A number of great people out there have created an overview of web map fundamentals, let’s take a look and learn how it all works:
Alan McConchie & Beth Schechter! - anatomy of a web map
So if we know that to make a web map generally is composed of:
geodata
html/css/javascript (your webstack)
And we also know that:
THEN HOW THE HELL ARE WE SUPPOSED TO MAKE OUR OWN WEB MAP???
Well, turns out that others have also been in your same predicament and have developed a library to bring web mapping to R programmers.
Wait a second… so we can make a web map, without coding any html, css, or javascript?
YES! sort of. Some clever people got together and wrote a library in R that takes a very famous and awesome javascript library (yes there are libraries in javascript, and every other language out there!) called “Leaflet.js” and allows you to write R code and export a fully functional web map, with tiles and geodata drawn right on top!
Basically what happens is:
With 5 lines of code, we’re going to make our first interactive web map!
Run these lines of code and create your first interactive map with R!
# install the leaflet library
devtools::install_github("rstudio/leaflet")
## Downloading GitHub repo rstudio/leaflet@master
## Installing leaflet
## '/usr/local/Cellar/r/3.2.2_1/R.framework/Resources/bin/R' --no-site-file \
## --no-environ --no-save --no-restore CMD INSTALL \
## '/private/var/folders/77/gr3849bn3s7_060664hy9_0h0000gn/T/RtmpDkRWto/devtools79f26fb29b9/rstudio-leaflet-3b4d8a3' \
## --library='/usr/local/lib/R/3.2/site-library' --install-tests
# add the leaflet library to your script
library(leaflet)
# initiate the leaflet instance and store it to a variable
m = leaflet()
# we want to add map tiles so we use the addTiles() function - the default is openstreetmap
m = addTiles(m)
# we can add markers by using the addMarkers() function
m = addMarkers(m, lng=-123.256168, lat=49.266063, popup="T")
# we can "run"/compile the map, by running the printing it
m
Now if we export the map and save as webpage…:
The convention for naming .html files is “index”, therefore, let’s name our file: index.html
CRAZY - with just 4 lines of code, you added a pin to a map that now works on the interwebz! Let’s scale this up and use our dataset and see what we can come up with!
So let’s shift gears and work with our 3-1-1 data. We now we have a bunch of points for the month of January and hexagonal grids with the call densities. How do we begin to interact with the data?
What are the main interactions we are going to work with?:
** Let’s get started with a rather verbose first example**
If you’re continuing from the 3-1-1 data example, we should have a dataframe called data_filter. If you don’t, then you can read in your final csv dataset below - aren’t you glad you wrote your final dataset out???.
data_filter = read.csv(curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails-geo-filtered.csv'), header=TRUE)
We’re going to use the subset() function to get store our data into separate variables.
# subset out your data_filter:
df_graffiti_noise = subset(data_filter, cid == 1)
df_street_traffic_maint = subset(data_filter, cid == 2)
df_garbage_recycling_organics = subset(data_filter, cid == 3)
df_water = subset(data_filter, cid == 4)
df_animal_vegetation = subset(data_filter, cid == 5)
df_other = subset(data_filter, cid == 0)
Great, now each of our data categories lives in its own variable name.
Now we’re going to initiate our leaflet map:
# initiate leaflet
m = leaflet()
# add openstreetmap tiles (default)
m = addTiles(m)
# you can now see that our maptiles are rendered
m
Now we are going to define some colors using leaflet’s special colorFactor() function:
colorFactors = colorFactor(c('red', 'orange', 'purple', 'blue', 'pink', 'brown'), domain=data_filter$cid)
What the colorFactor function does is take our list of colors and “maps” then to the domain that we defined. For example, when we apply the colorFactor() function to our data, it will color a point “red”, if the “cid” in the data is equal to “0”, “orange” if the “cid” is equal to 1, etc.
The next step is to add our points to the map. We can do so using the addCircleMarkers() function:
m = addCircleMarkers(
m,
lng=df_graffiti_noise$lon_offset, # we feed the longitude coordinates
lat=df_graffiti_noise$lat_offset,
popup=df_graffiti_noise$Case_Type,
radius=2,
stroke = FALSE,
fillOpacity = 0.75,
color = colorFactors(df_graffiti_noise$cid),
group = "1 - graffiti & noise"
)
Here’s what each argument is saying:
Now repeat this function across the other layers:
m = addCircleMarkers(m,
lng=df_street_traffic_maint$lon_offset, lat=df_street_traffic_maint$lat_offset,
popup=df_street_traffic_maint$Case_Type,
radius=2,
stroke = FALSE, fillOpacity = 0.75,
color = colorFactors(df_street_traffic_maint$cid),
group = "2 - street & traffic & maintenance")
m = addCircleMarkers(m,
lng=df_garbage_recycling_organics$lon_offset, lat=df_garbage_recycling_organics$lat_offset,
popup=df_garbage_recycling_organics$Case_Type,
radius=2,
stroke = FALSE, fillOpacity = 0.75,
color = colorFactors(df_garbage_recycling_organics$cid),
group = "3 - garbage related")
m = addCircleMarkers(m,
lng=df_water$lon_offset, lat=df_water$lat_offset,
popup=df_water$Case_Type,
radius=2,
stroke = FALSE, fillOpacity = 0.75,
color = colorFactors(df_water$cid),
group = "4 - water related")
m = addCircleMarkers(m,
lng=df_animal_vegetation$lon_offset, lat=df_animal_vegetation$lat_offset,
popup=df_animal_vegetation$Case_Type,
radius=2,
stroke = FALSE, fillOpacity = 0.75,
color = colorFactors(df_animal_vegetation$cid),
group = "5 - animals & vegetation")
m = addCircleMarkers(m,
lng=df_other$lon_offset, lat=df_other$lat_offset,
popup=df_other$Case_Type,
radius=2,
stroke = FALSE, fillOpacity = 0.75,
color = colorFactors(df_other$cid),
group = "0 - other")
Now we’re going to add some additional map tiles by using the addTiles() and the addProviderTiles() functions:
m = addTiles(m, group = "OSM (default)")
m = addProviderTiles(m,"Stamen.Toner", group = "Toner")
m = addProviderTiles(m, "Stamen.TonerLite", group = "Toner Lite")
We can now add in our layer toggle control using the addLayersControl() function - notice baseGroups are our tileLayers and the overlayGroups are our data layers:
m = addLayersControl(m,
baseGroups = c("Toner Lite","Toner"),
overlayGroups = c("1 - graffiti & noise", "2 - street & traffic & maintenance",
"3 - garbage related","4 - water related", "5 - animals & vegetation",
"0 - other")
)
# make the map
m
So we were able to make a functioning little web app, but we notice our code had some redundancies and make it hard to change colors and things. What do we do when we have redundancy? How about loops?!
We already subset our data from before:
# subset out your data_filter:
df_graffiti_noise = subset(data_filter, cid == 1)
df_street_traffic_maint = subset(data_filter, cid == 2)
df_garbage_recycling_organics = subset(data_filter, cid == 3)
df_water = subset(data_filter, cid == 4)
df_animal_vegetation = subset(data_filter, cid == 5)
df_other = subset(data_filter, cid == 0)
Here’ we are:
# Now we're going to put those variables into a **list** so we can loop through them:
data_filterlist = list(df_graffiti_noise = subset(data_filter, cid == 1),
df_street_traffic_maint = subset(data_filter, cid == 2),
df_garbage_recycling_organics = subset(data_filter, cid == 3),
df_water = subset(data_filter, cid == 4),
df_animal_vegetation = subset(data_filter, cid == 5),
df_other = subset(data_filter, cid == 0))
# Remember we also had these groups associated with each variable? Let's put them in a list too:
layerlist = c("1 - graffiti & noise", "2 - street & traffic & maintenance",
"3 - garbage related","4 - water related", "5 - animals & vegetation",
"0 - other")
# We can keep that same color variable:
colorFactors = colorFactor(c('red', 'orange', 'purple', 'blue', 'pink', 'brown'), domain=data_filter$cid)
# Now we have our loop - each time through the loop, it is adding our markers to the map object:
for (i in 1:length(data_filterlist)){
m = addCircleMarkers(m,
lng=data_filterlist[[i]]$lon_offset,
lat=data_filterlist[[i]]$lat_offset,
popup=data_filterlist[[i]]$Case_Type,
radius=2,
stroke = FALSE,
fillOpacity = 0.75,
color = colorFactors(data_filterlist[[i]]$cid),
group = layerlist[i]
)
}
Now this time we flip “overlayGroups” and “baseGroups” so that we can get the radiobutton functionality - that way only 1 category of calls are shown:
m = addTiles(m, "Stamen.TonerLite", group = "Toner Lite")
m = addLayersControl(m,
overlayGroups = c("Toner Lite"),
baseGroups = layerlist
)
m
Remember we creates that hexagonal grid to aggregate our data to? Well, let’s work with it to get a general impression of the call density across the city:
If you haven’t already, you should have the rgdal and the GISTools libraries loaded:
library(rgdal)
library(GISTools)
Now we need to read in our hexgrid we aggregted our data to:
# define the filepath
hex_1401_fn='https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson'
# read in the geojson
hex_1401 = readOGR(hex_1401_fn, "OGRGeoJSON")
## OGR data source with driver: GeoJSON
## Source: "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson", layer: "OGRGeoJSON"
## with 1418 features
## It has 5 fields
Now initiate a new map object but this time with the Stamen Toner lite style:
m = leaflet()
m = addProviderTiles(m, "Stamen.TonerLite", group = "Toner Lite")
Like any choropleth map, we need to set a color scale. We can do so by using the colorNumeric() function which is part of the R leaflet package.
We use the “Greens” color and set the “domain” to the column called “data” in our geojson file.
Here’s the exciting stuff. Let’s add our polygon to the map:
# Create a continuous palette function
pal = colorNumeric(
palette = "Greens",
domain = hex_1401$data
)
# add the polygons to the map
m = addPolygons(m,
data = hex_1401,
stroke = FALSE,
smoothFactor = 0.2,
fillOpacity = 1,
color = ~pal(hex_1401$data),
popup= paste("Number of calls: ",hex_1401$data, sep="")
)
let’s break this down:
And what is a choropleth map without a legend? Let’s add one using the addLegend() function:
m = addLegend(m, "bottomright", pal = pal, values = hex_1401$data,
title = "3-1-1 call density",
labFormat = labelFormat(prefix = " "),
opacity = 0.75
)
m
Now that we’ve got the two pieces of our map - toggleable & clickable point layers and hex grid - let’s put them together.
# initiate leaflet map layer
m = leaflet()
m = addProviderTiles(m, "Stamen.TonerLite", group = "Toner Lite")
# --- hex grid --- #
# store the file name of the geojson hex grid
hex_1401_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson'
# read in the data
hex_1401 = readOGR(hex_1401_fn, "OGRGeoJSON")
## OGR data source with driver: GeoJSON
## Source: "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson", layer: "OGRGeoJSON"
## with 1418 features
## It has 5 fields
# Create a continuous palette function
pal = colorNumeric(
palette = "Greens",
domain = hex_1401$data
)
# add hex grid
m = addPolygons(m,
data = hex_1401,
stroke = FALSE,
smoothFactor = 0.2,
fillOpacity = 1,
color = ~pal(hex_1401$data),
popup= paste("Number of calls: ",hex_1401$data, sep=""),
group = "hex"
)
# add legend
m = addLegend(m, "bottomright", pal = pal, values = hex_1401$data,
title = "3-1-1 call density",
labFormat = labelFormat(prefix = " "),
opacity = 0.75
)
# --- points data --- #
# subset out your data:
df_graffiti_noise = subset(data_filter, cid == 1)
df_street_traffic_maint = subset(data_filter, cid == 2)
df_garbage_recycling_organics = subset(data_filter, cid == 3)
df_water = subset(data_filter, cid == 4)
df_animal_vegetation = subset(data_filter, cid == 5)
df_other = subset(data_filter, cid == 0)
data_filterlist = list(df_graffiti_noise = subset(data_filter, cid == 1),
df_street_traffic_maint = subset(data_filter, cid == 2),
df_garbage_recycling_organics = subset(data_filter, cid == 3),
df_water = subset(data_filter, cid == 4),
df_animal_vegetation = subset(data_filter, cid == 5),
df_other = subset(data_filter, cid == 0))
layerlist = c("1 - graffiti & noise", "2 - street & traffic & maintenance",
"3 - garbage related","4 - water related", "5 - animals & vegetation",
"0 - other")
colorFactors = colorFactor(c('red', 'orange', 'purple', 'blue', 'pink', 'brown'), domain=data_filter$cid)
for (i in 1:length(data_filterlist)){
m = addCircleMarkers(m,
lng=data_filterlist[[i]]$lon_offset, lat=data_filterlist[[i]]$lat_offset,
popup=data_filterlist[[i]]$Case_Type,
radius=2,
stroke = FALSE, fillOpacity = 0.75,
color = colorFactors(data_filterlist[[i]]$cid),
group = layerlist[i]
)
}
m = addLayersControl(m,
overlayGroups = c("Toner Lite", "hex"),
baseGroups = layerlist
)
# show map
m
Congratulations you just made your first super awesome and fully functional web map using R!
Now, play with oyur maps and try to identify weaknesses and strengths of your map. Are the colors appropriate? could they be bettter? what’s missing? REFINE YOUR STUFF
In this tutorial we covered a lot of ground. We’re neither masters of R nor data viz experts, however we got the chance to go through the entire data viz pipeline:
Right now our maps can only live on our computers, but we will have to learn how to put them onto the internet. For now, continue exploring locally, but keep in mind that the next step is to get these things on the web.