r - How to use dplyr to find unique entries in the previous rows -


i have long dataframe, more or less following structure:

df <- data.frame( dates = c("2011-10-01","2011-10-01","2011-10-01","2011-10-02","2011-10-03","2011-10-05","2011-10-06","2011-10-06"), ids = c("a","a","b","c","d","a","e","d"), values = c(10,1,25,2,5,10,4,1))  > df        dates ids values 1 2011-10-01       10 2 2011-10-01        1 3 2011-10-01   b     25 4 2011-10-02   c      2 5 2011-10-03   d      5 6 2011-10-05       10 7 2011-10-06   e      4 8 2011-10-06   d      1 

i following output:

       dates   unique_ids sum_values 1 2011-10-01            2         36 2 2011-10-02            3         38 3 2011-10-03            4         43 4 2011-10-04            4         43 5 2011-10-05            4         53 6 2011-10-06            5         58 

i.e. each date unique_ids gives number of unique ids corresponding earlier dates , sum_values gives sum of values corresponding earlier dates.

i want avoid cycles because original df big. thinking use dplyr.

i know how obtain sum_value

df %>% group_by(dates) %>% summarize(sum_values_daily = sum(values)) %>% mutate(sum_values = cumsum(sum_values_daily)) %>% select(dates, sum_values) 

i don't know how obtains unique_ids column.

any idea?

because trying calculate number of distinct ids across groups, first we'll need define boolean column allow sum unique values.

secondly, want include missing dates original df in expected output, we'll need perform right_join full sequence of dates. assume here dates column of class date. produce na values replace 0.

finally calculate cumsum both unique_ids , sum_values.

library(dplyr)  df %>% mutate(unique_ids = !duplicated(ids)) %>%         group_by(dates) %>%         summarise(unique_ids = sum(unique_ids),                   sum_values = sum(values)) %>%         right_join(data.frame(dates = seq(min(df$date),                                            max(df$dates),                                            = 1))) %>%         mutate_each(funs(replace(., is.na(.), 0)), -dates)  %>%         mutate_each(funs(cumsum), -dates) #       dates unique_ids sum_values #      <date>      <dbl>      <dbl> #1 2011-10-01          2         36 #2 2011-10-02          3         38 #3 2011-10-03          4         43 #4 2011-10-04          4         43 #5 2011-10-05          4         53 #6 2011-10-06          5         58 

Comments

Popular posts from this blog

cookies - Yii2 Advanced - Share session between frontend and mainsite (duplicate of frontend for www) -

angular - password and confirm password field validation angular2 reactive forms -

php - Permission denied. Laravel linux server -