Keeping the last entry but removing other duplicate row(s) in a data frame using R -
i have large table containing thousands of entries queried database having structure similar in table 1 in image below. keep duplicate row has highest value var 1, shown in table 2. situation similar described in earlier query in forum remove duplicates based on 1 column , keep last entry. selecting rows using simple for
loop works but, taking long time run. there faster elegant way of handling in r?
table1 <- structure(list(var1 = 1001:1009, var2 = c("aaa", "bbb", "ccc", "aaa", "ddd", "bbb", "aaa", "eee", "ddd"), var3 = c(95l, 100l, 90l, 95l, 85l, 100l, 95l, 45l, 85l), var4 = c("mg", "kg", "pg", "mg", "mg", "kg", "mg", "mg", "mg")), .names = c("var1", "var2", "var3", "var4"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(na, -9l), spec = structure(list(cols = structure(list(var1 = structure(list(), class = c("collector_integer", "collector")), var2 = structure(list(), class = c("collector_character", "collector")), var3 = structure(list(), class = c("collector_integer", "collector")), var4 = structure(list(), class = c("collector_character", "collector"))), .names = c("var1", "var2", "var3", "var4")), default = structure(list(), class = c("collector_guess", "collector"))), .names = c("cols", "default"), class = "col_spec"))
we can use slice
after grouping 'var2'
library(dplyr) table1 %>% group_by(var2) %>% slice(which.max(var1)) %>% arrange(var1) # var1 var2 var3 var4 # <int> <chr> <int> <chr> # 1 1003 ccc 90 pg # 2 1006 bbb 100 kg # 3 1007 aaa 95 mg # 4 1008 eee 45 mg # 5 1009 ddd 85 mg
or arrange
, filter
non-duplicates
table1 %>% arrange(var2, -var1) %>% filter(!duplicated(var2)) %>% arrange(var1)
or data.table
library(data.table) setdt(table1)[order(var2,-var1)][!duplicated(var2)][order(var1)]
note: can done within 1 step using fromlast=true
duplicated
, here not sure whether values ordered or not in original dataset. so, compact method doesn't mean works always
we can use compact code
table1[c(3, 6:9),]
as way expected :-)
Comments
Post a Comment