python - Undersampling vs class_weight in ScikitLearn Random Forests -
i applying scikitlearn's random forests on extremely unbalanced dataset (ratio of 1:10 000). can use class_weigth='balanced' parameter. have read equivalent undersampling.
however, method seems apply weights samples , not change actual number of samples.
because each tree of random forest built on randomly drawn subsample of training set, afraid minority class not representative enough (or not representated @ all) in each subsample. true? lead biased trees.
thus, question is: class_weight="balanced" parameter allows build reasonably unbiased random forest models on extremely unbalanced datasets, or should find way undersample majority class @ each tree or when building training set?
i think can split majority class in +-10000 samples , train same model using each sample plus same points of minority class.
Comments
Post a Comment