Skip to content

Latest commit

 

History

History
74 lines (54 loc) · 2.55 KB

00b-identifying-home-locations.md

File metadata and controls

74 lines (54 loc) · 2.55 KB

Identifying home locations for users

Identifying home locations for users

To identify home locations for users, we use four built-in recipes from homelocator package.

library(homelocator)
## Welcome to homelocator package!

Load de-identified dataset

df <- read_csv(here("analysis/data/derived_data/deidentified_sg_tweets.csv")) %>% #load de-identified dataset
  mutate(created_at = with_tz(created_at, tzone = "Asia/Singapore")) # the tweets were sent in Singapore, so must convert the timezone to SGT, the default timezone is UTC! 

Recipe: APDM

Recipe ‘APDM’ (Rein Ahas et al., 2010) calculates both the average and standard deviation timestamps in each location for each user.

#must load neighbors when using APDM recipe 
df_neighbors <- readRDS(here("analysis/data/derived_data/neighbors.rds"))
hm_apdm <- identify_location(df, user = "u_id", timestamp = "created_at", location = "grid_id", 
                             tz = "Asia/Singapore", keep_score = F, recipe = "APDM")

Recipe: FREQ

Recipe ‘FREQ’ simply selects the most frequently ‘visited’ location as users’ home locations.

hm_freq <- identify_location(df, user = "u_id", timestamp = "created_at", location = "grid_id", 
                             tz = "Asia/Singapore", show_n_loc = 1, recipe = "FREQ")

Recipe: HMLC

Recipe ‘HMLC’ weighs data points across multiple time frames to ‘score’ potentially meaningful locations.

hm_hmlc <- identify_location(df, user = "u_id", timestamp = "created_at", location = "grid_id", 
                             tz = "Asia/Singapore", show_n_loc = 1, keep_score = F, recipe = "HMLC")

Recipe: OSNA

Recipe ‘OSNA’ (Efstathiades et al., 2015), only considers data points sent on weekdays and divides a day into three time frames - ‘rest time’, ‘leisure time’ and ‘active time’. The algorithm finds the most ‘popular’ location during ‘rest’ and ‘leisure’ time as the home locations for users.

hm_osna <- identify_location(df, user = "u_id", timestamp = "created_at", location = "grid_id", 
                             tz = "Asia/Singapore", show_n_loc = 1, recipe = "OSNA")

The identified homes are in analysis/data/derived_data.

write_csv(hm_apdm, path = here("analysis/data/derived_data/hm_apdm.csv"))
write_csv(hm_freq, path = here("analysis/data/derived_data/hm_freq.csv"))
write_csv(hm_hmlc, path = here("analysis/data/derived_data/hm_hmlc.csv"))
write_csv(hm_osna, path = here("analysis/data/derived_data/hm_osna.csv"))