Random Forest
RANDOM FOREST
- A random forest is an ensemble machine learning model. It makes a classification by aggregating the classifications of many decision trees.
- Random forests are used to avoid overfitting. By aggregating the classification of multiple trees, having overfitted trees in a random forest is less impactful.
- Every decision tree in a random forest is created by using a different subset of data points from the training set. Those data points are chosen at random with replacement, which means a single data point can be chosen more than once. This process is known as bagging.
- When creating a tree in a random
forest, a randomly selected subset of features are considered as
candidates for the best splitting feature. If your dataset has
n
features, it is common practice to randomly select the square root ofn
features.
Boosting Steps :
- Draw a random subset of training samples d1 without replacement from the training set D to train a weak learner C1
- Draw second random training subset d2 without replacement from the training set and add 50 percent of the samples that were previously falsely classified/misclassified to train a weak learner C2
- Find the training samples d3 in the training set D on which C1 and C2 disagree to train a third weak learner C3
- Combine all the weak learners via majority voting.
Bagging :
Before
understand Bagging lets understand the concept of Bootstrap which is
nothing but choosing a Random sample with replacement.
As everyone pointed Bagging is nothing but Bootstrap AGGregatING
- Generate n different bootstrap training sample
- Train Algorithm on each bootstrapped sample separately
- Average the predictions at the end
One
of the Key differences is the way how use sample each training set.
Bagging allows replacement in bootstrapped sample but Boosting doesn’t.
In
theory Bagging is good for reducing variance( Over-fitting) where as
Boosting helps to reduce both Bias and Variance as per this Boosting Vs Bagging, but in practice Boosting (Adaptive Boosting) know to have high variance because of over-fitting
Comments
Post a Comment