sql vs python approach to check duplicates -

hello have 2 files this: (with millions lines)

(1º) aaa bbb ccc  (2º) aaa ccc ddd

so should faster if want check lines in first file not in second file? should put these data table , make query or should let python this?

this output want: (3º file) aaa bbb ccc ddd

thank all!

in python can use set , perform set operations on such adding them up, getting intersection , things that.

i use set in python result. following give result mentioned.

first =set(['aaa','bbb','ccc']) second = set(['aaa','ccc','ddd']) third = first.union(second)  print(third)

as question faster, depends on data. if fits in memory python-only way faster.

New Generation Education