r - How to use dplyr to find unique entries in the previous rows -

i have long dataframe, more or less following structure:

df <- data.frame( dates = c("2011-10-01","2011-10-01","2011-10-01","2011-10-02","2011-10-03","2011-10-05","2011-10-06","2011-10-06"), ids = c("a","a","b","c","d","a","e","d"), values = c(10,1,25,2,5,10,4,1))  > df        dates ids values 1 2011-10-01       10 2 2011-10-01        1 3 2011-10-01   b     25 4 2011-10-02   c      2 5 2011-10-03   d      5 6 2011-10-05       10 7 2011-10-06   e      4 8 2011-10-06   d      1

i following output:

       dates   unique_ids sum_values 1 2011-10-01            2         36 2 2011-10-02            3         38 3 2011-10-03            4         43 4 2011-10-04            4         43 5 2011-10-05            4         53 6 2011-10-06            5         58

i.e. each date unique_ids gives number of unique ids corresponding earlier dates , sum_values gives sum of values corresponding earlier dates.

i want avoid cycles because original df big. thinking use dplyr.

i know how obtain sum_value

df %>% group_by(dates) %>% summarize(sum_values_daily = sum(values)) %>% mutate(sum_values = cumsum(sum_values_daily)) %>% select(dates, sum_values)

i don't know how obtains unique_ids column.

any idea?

because trying calculate number of distinct ids across groups, first we'll need define boolean column allow sum unique values.

secondly, want include missing dates original df in expected output, we'll need perform right_join full sequence of dates. assume here dates column of class date. produce na values replace 0.

finally calculate cumsum both unique_ids , sum_values.

library(dplyr)  df %>% mutate(unique_ids = !duplicated(ids)) %>%         group_by(dates) %>%         summarise(unique_ids = sum(unique_ids),                   sum_values = sum(values)) %>%         right_join(data.frame(dates = seq(min(df$date),                                            max(df$dates),                                            = 1))) %>%         mutate_each(funs(replace(., is.na(.), 0)), -dates)  %>%         mutate_each(funs(cumsum), -dates) #       dates unique_ids sum_values #      <date>      <dbl>      <dbl> #1 2011-10-01          2         36 #2 2011-10-02          3         38 #3 2011-10-03          4         43 #4 2011-10-04          4         43 #5 2011-10-05          4         53 #6 2011-10-06          5         58

Search This Blog

New Generation Education

r - How to use dplyr to find unique entries in the previous rows -

Comments

Post a Comment

Popular posts from this blog

cookies - Yii2 Advanced - Share session between frontend and mainsite (duplicate of frontend for www) -

angular - password and confirm password field validation angular2 reactive forms -

java - The path to the driver executable must be set by the webdriver.gecko.driver system property; -