r - Adding a column of corresponding seasons to dataframe -
here example of dataframe. working in r.
date name count 2016-11-12 joe 5 2016-11-15 bob 5 2016-06-15 nick 12 2016-10-16 cate 6
i add column data frame tell me season corresponds date. this:
date name count season 2016-11-12 joe 5 winter 2016-11-15 bob 5 winter 2017-06-15 nick 12 summer 2017-10-16 cate 6 fall
i have started code:
startwinter <- c(month.name[1], month.name[12], month.name[11]) startsummer <- c(month.name[5], month.name[6], month.name[7]) startspring <- c(month.name[2], month.name[3], month.name[4]) # create function find correct season based on month monthseason <- function(month) { # !is.na() # ignores values na # match() # returns vector of positions of matches # if starting month matches spring season, print "spring". if starting month matches summer season, print "summer" etc. ifelse(!is.na(match(month, startspring)), return("spring"), return(ifelse(!is.na(match(month, startwinter)), "winter", ifelse(!is.na(match(month, startsummer)), "summer","fall")))) }
this code gives me season month. im not sure if going problem in right way. can me out? thanks!
there couple of hacks, , usability depends on whether want use meteorological or astronomical seasons. i'll offer both, think offer sufficient flexibility.
i'm going use second data provided, since provides more "winter".
txt <- "date name count 2016-11-12 joe 5 2016-11-15 bob 5 2017-06-15 nick 12 2017-10-16 cate 6" dat <- read.table(text = txt, header = true, stringsasfactors = false) dat$date <- as.date(dat$date)
the quickest method works when seasons defined strictly month.
metseasons <- c( "01" = "winter", "02" = "winter", "03" = "spring", "04" = "spring", "05" = "spring", "06" = "summer", "07" = "summer", "08" = "summer", "09" = "fall", "10" = "fall", "11" = "fall", "12" = "winter" ) metseasons[format(dat$date, "%m")] # 11 11 06 10 # "fall" "fall" "summer" "fall"
if choose use date ranges seasons not defined month start/stop such astronomical seasons, here's 'hack':
astroseasons <- as.integer(c("0000", "0320", "0620", "0922", "1221", "1232")) astroseasons_labels <- c("winter", "spring", "summer", "fall", "winter")
if use proper date
or posix
types, including years, makes things little less-generic. 1 might think of using julian dates, during leap years produces anomalies. so, assumption feb 28 never seasonal boundary, i'm "numericizing" month-day. though r character-comparisons fine, cut
expects numbers, convert them integers.
two safe-guards: because cut
either right-open (and left-closed) or right-closed (and left-open), our 2 book-ends need extend beyond legal dates, ergo "0000"
, "1232"
. there other techniques work equally here (e.g., using -inf
, inf
, post-integerization).
astroseasons_labels[ cut(as.integer(format(dat$date, "%m%d")), astroseasons, labels = false) ] # [1] "fall" "fall" "spring" "fall"
notice third date in spring when using astronomical seasons , summer otherwise.
this solution can adjusted account southern hemisphere or other seasonal preferences/beliefs.
edit: motivated @kristofersen's answer (thanks), looked benchmarks. lubridate::month
uses posixct
-to-posixlt
conversion extract month, can on 10x faster format(x, "%m")
method. such:
metseasons2 <- c( "winter", "winter", "spring", "spring", "spring", "summer", "summer", "summer", "fall", "fall", "fall", "winter" )
noting as.posixlt
returns 0-based months, add 1:
metseasons2[ 1 + as.posixlt(dat$date)$mon ] # [1] "fall" "fall" "summer" "fall"
comparison:
library(lubridate) library(microbenchmark) set.seed(42) x <- sys.date() + sample(1e3) xlt <- as.posixlt(x) microbenchmark( metfmt = metseasons[ format(x, "%m") ], metlt = metseasons2[ 1 + xlt$mon ], astrofmt = astroseasons_labels[ cut(as.integer(format(x, "%m%d")), astroseasons, labels = false) ], astrolt = astroseasons_labels[ cut(100*(1+xlt$mon) + xlt$mday, astroseasons, labels = false) ], lubridate = sapply(month(x), seasons) ) # unit: microseconds # expr min lq mean median uq max neval # metfmt 1952.091 2135.157 2289.63943 2212.1025 2308.1945 3748.832 100 # metlt 14.223 16.411 22.51550 20.0575 24.7980 68.924 100 # astrofmt 2240.547 2454.245 2622.73109 2507.8520 2674.5080 3923.874 100 # astrolt 42.303 54.702 72.98619 66.1885 89.7095 163.373 100 # lubridate 5906.963 6473.298 7018.11535 6783.2700 7508.0565 11474.050 100
so methods using as.posixlt(...)$mon
faster. (@kristofersen's answer improved vectorizing it, perhaps ifelse
, still won't compare speed of vector lookups or without cut
.)
Comments
Post a Comment