python - How do I group colours (Blue, Green, Purple, Red) from a csv file when the syntax (iePURPLE or PURPAL) is wrong? -
how group colours (blue, green, purple, red) csv file (50000 rows, example below) using python when syntax (ie. case, spelling - purple or purpal) wrong in several cases? can give
blue 5642 purpal 5640 red 5610 blue 5583 red 5541 green 5523 purple 5503 green 5491 red 5467 ......
you going need clean data. unique whatever situation data in, if trying identify misspelled color names perhaps filter dataframe show not blue, green, purple, or red.
you following identify misfits , figure out how fix them.
df.color = df.color.str.lower() colors = ['blue', 'red', 'purple', 'green'] misspellings = df.color[~df.color.isin(colors)].values print(misspellings) ['purpal']
from there individually fix each entry or write intelligently fix them. it's once you've done can group normal. fix entry or entries 'purpal' like:
df.loc[df.color == 'purpal', 'color'] = 'purple'
Comments
Post a Comment