sql vs python approach to check duplicates -


hello have 2 files this: (with millions lines)

(1º) aaa bbb ccc  (2º) aaa ccc ddd 

so should faster if want check lines in first file not in second file? should put these data table , make query or should let python this?

this output want: (3º file) aaa bbb ccc ddd 

thank all!

in python can use set , perform set operations on such adding them up, getting intersection , things that.

i use set in python result. following give result mentioned.

first =set(['aaa','bbb','ccc']) second = set(['aaa','ccc','ddd']) third = first.union(second)  print(third) 

as question faster, depends on data. if fits in memory python-only way faster.


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -