python - Undersampling vs class_weight in ScikitLearn Random Forests -


i applying scikitlearn's random forests on extremely unbalanced dataset (ratio of 1:10 000). can use class_weigth='balanced' parameter. have read equivalent undersampling.

however, method seems apply weights samples , not change actual number of samples.

because each tree of random forest built on randomly drawn subsample of training set, afraid minority class not representative enough (or not representated @ all) in each subsample. true? lead biased trees.

thus, question is: class_weight="balanced" parameter allows build reasonably unbiased random forest models on extremely unbalanced datasets, or should find way undersample majority class @ each tree or when building training set?

i think can split majority class in +-10000 samples , train same model using each sample plus same points of minority class.


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -