python - How do I group colours (Blue, Green, Purple, Red) from a csv file when the syntax (iePURPLE or PURPAL) is wrong? -


how group colours (blue, green, purple, red) csv file (50000 rows, example below) using python when syntax (ie. case, spelling - purple or purpal) wrong in several cases? can give

blue      5642 purpal    5640 red       5610 blue      5583 red       5541 green     5523 purple    5503 green     5491 red       5467 ...... 

you going need clean data. unique whatever situation data in, if trying identify misspelled color names perhaps filter dataframe show not blue, green, purple, or red.

you following identify misfits , figure out how fix them.

df.color = df.color.str.lower() colors = ['blue', 'red', 'purple', 'green'] misspellings = df.color[~df.color.isin(colors)].values print(misspellings) ['purpal'] 

from there individually fix each entry or write intelligently fix them. it's once you've done can group normal. fix entry or entries 'purpal' like:

df.loc[df.color == 'purpal', 'color'] = 'purple'  

Comments

Popular posts from this blog

cookies - Yii2 Advanced - Share session between frontend and mainsite (duplicate of frontend for www) -

angular - password and confirm password field validation angular2 reactive forms -

php - Permission denied. Laravel linux server -