python - Undersampling vs class_weight in ScikitLearn Random Forests -

i applying scikitlearn's random forests on extremely unbalanced dataset (ratio of 1:10 000). can use class_weigth='balanced' parameter. have read equivalent undersampling.

however, method seems apply weights samples , not change actual number of samples.

because each tree of random forest built on randomly drawn subsample of training set, afraid minority class not representative enough (or not representated @ all) in each subsample. true? lead biased trees.

thus, question is: class_weight="balanced" parameter allows build reasonably unbiased random forest models on extremely unbalanced datasets, or should find way undersample majority class @ each tree or when building training set?

i think can split majority class in +-10000 samples , train same model using each sample plus same points of minority class.

Search This Blog

New Generation Education

python - Undersampling vs class_weight in ScikitLearn Random Forests -

Comments

Post a Comment

Popular posts from this blog

cookies - Yii2 Advanced - Share session between frontend and mainsite (duplicate of frontend for www) -

angular - password and confirm password field validation angular2 reactive forms -

javascript - Angular2 intelliJ config error.. Cannot find module '@angular/core' -