python - How do I group colours (Blue, Green, Purple, Red) from a csv file when the syntax (iePURPLE or PURPAL) is wrong? -


how group colours (blue, green, purple, red) csv file (50000 rows, example below) using python when syntax (ie. case, spelling - purple or purpal) wrong in several cases? can give

blue      5642 purpal    5640 red       5610 blue      5583 red       5541 green     5523 purple    5503 green     5491 red       5467 ...... 

you going need clean data. unique whatever situation data in, if trying identify misspelled color names perhaps filter dataframe show not blue, green, purple, or red.

you following identify misfits , figure out how fix them.

df.color = df.color.str.lower() colors = ['blue', 'red', 'purple', 'green'] misspellings = df.color[~df.color.isin(colors)].values print(misspellings) ['purpal'] 

from there individually fix each entry or write intelligently fix them. it's once you've done can group normal. fix entry or entries 'purpal' like:

df.loc[df.color == 'purpal', 'color'] = 'purple'  

Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -