sql vs python approach to check duplicates -
hello have 2 files this: (with millions lines)
(1º) aaa bbb ccc  (2º) aaa ccc ddd so should faster if want check lines in first file not in second file? should put these data table , make query or should let python this?
this output want: (3º file) aaa bbb ccc ddd thank all!
in python can use set , perform set operations on such adding them up, getting intersection , things that.
i use set in python result. following give result mentioned.
first =set(['aaa','bbb','ccc']) second = set(['aaa','ccc','ddd']) third = first.union(second)  print(third) as question faster, depends on data. if fits in memory python-only way faster.
Comments
Post a Comment