sql vs python approach to check duplicates -
hello have 2 files this: (with millions lines)
(1º) aaa bbb ccc (2º) aaa ccc ddd
so should faster if want check lines in first file not in second file? should put these data table , make query or should let python this?
this output want: (3º file) aaa bbb ccc ddd
thank all!
in python can use set , perform set operations on such adding them up, getting intersection , things that.
i use set in python result. following give result mentioned.
first =set(['aaa','bbb','ccc']) second = set(['aaa','ccc','ddd']) third = first.union(second) print(third)
as question faster, depends on data. if fits in memory python-only way faster.
Comments
Post a Comment