Keeping the last entry but removing other duplicate row(s) in a data frame using R -


i have large table containing thousands of entries queried database having structure similar in table 1 in image below. keep duplicate row has highest value var 1, shown in table 2. situation similar described in earlier query in forum remove duplicates based on 1 column , keep last entry. selecting rows using simple for loop works but, taking long time run. there faster elegant way of handling in r?

table1 <- structure(list(var1 = 1001:1009, var2 = c("aaa", "bbb", "ccc",  "aaa", "ddd", "bbb", "aaa", "eee", "ddd"), var3 = c(95l, 100l,  90l, 95l, 85l, 100l, 95l, 45l, 85l), var4 = c("mg", "kg", "pg",  "mg", "mg", "kg", "mg", "mg", "mg")), .names = c("var1", "var2",  "var3", "var4"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(na,  -9l), spec = structure(list(cols = structure(list(var1 = structure(list(), class = c("collector_integer",  "collector")), var2 = structure(list(), class = c("collector_character",  "collector")), var3 = structure(list(), class = c("collector_integer",  "collector")), var4 = structure(list(), class = c("collector_character",  "collector"))), .names = c("var1", "var2", "var3", "var4")),      default = structure(list(), class = c("collector_guess",      "collector"))), .names = c("cols", "default"), class = "col_spec")) 

enter image description here

we can use slice after grouping 'var2'

library(dplyr) table1 %>%     group_by(var2) %>%     slice(which.max(var1)) %>%     arrange(var1) #     var1  var2  var3  var4     #     <int> <chr> <int> <chr> #  1  1003   ccc    90    pg  #  2  1006   bbb   100    kg #  3  1007   aaa    95    mg #  4  1008   eee    45    mg #  5  1009   ddd    85    mg 

or arrange , filter non-duplicates

table1 %>%       arrange(var2, -var1) %>%       filter(!duplicated(var2)) %>%      arrange(var1) 

or data.table

library(data.table) setdt(table1)[order(var2,-var1)][!duplicated(var2)][order(var1)] 

note: can done within 1 step using fromlast=true duplicated, here not sure whether values ordered or not in original dataset. so, compact method doesn't mean works always

we can use compact code

table1[c(3, 6:9),] 

as way expected :-)


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -