Bias & Variance
Graphical interpretation of Bias - Variance.
Let’s
understand this image. This is bull’s eye diagram. Assume that center
of the target(Red colored) is a model that perfectly predict the correct
values. As we move away from the bull’s eye, our prediction goes worse.
Imagine we can repeat our entire model building process to get a number
of separate hits on the target. Each hit represents an individual
realization of our model, given the chance variability in the training
data we gather. Sometimes we will get a good distribution of training
data so we predict very well and we are close to the bulls-eye, while
sometimes our training data might be full of outliers or non-standard
values resulting in poorer predictions. These different realizations
result in a scatter of hits on the target.
Let’s look at the definition of Bias and Variance :
Bias
- Bias means how far off our predictions are from real values.
Generally parametric algorithms have a high bias making them fast to
learn and easier to understand but generally less flexible. In turn they
are have lower predictive performance on complex problems that fail to
meet the simplifying assumptions of the algorithms bias.
- Low Bias: Suggests more assumptions about the form of the target function.
- High-Bias: Suggests less assumptions about the form of the target function.
Examples of low-bias machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Examples
of high-bias machine learning algorithms include: Linear Regression,
Linear Discriminant Analysis and Logistic Regression.
Variance
- Change in predictions across different data sets. Again, imagine you
can repeat the entire model building process multiple times. The
variance is how much the predictions for a given point vary between
different realizations of the model. In other words,
Variance is the amount that the estimate of the target function will change if different training data was used.
Variance is the amount that the estimate of the target function will change if different training data was used.
The
target function is estimated from the training data by a machine
learning algorithm, so we should expect the algorithm to have some
variance. Ideally, it should not change too much from one training
dataset to the next, meaning that the algorithm is good at picking out
the hidden underlying mapping between the inputs and the output
variables.
Machine
learning algorithms that have a high variance are strongly influenced
by the specifics of the training data. This means that the specifics of
the training have influences the number and types of parameters used to
characterize the mapping function.
- Low Variance: Suggests small changes to the estimate of the target function with changes to the training dataset.
- High Variance: Suggests large changes to the estimate of the target function with changes to the training dataset.
Generally
nonparametric machine learning algorithms that have a lot of
flexibility have a high variance. For example decision trees have a high
variance, that is even higher if the trees are not pruned before use.
Examples
of low-variance machine learning algorithms include: Linear Regression,
Linear Discriminant Analysis and Logistic Regression.
Examples of high-variance machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Understanding Over- and Under-Fitting
At
its root, dealing with bias and variance is really about dealing with
over- and under-fitting. Bias is reduced and variance is increased in
relation to model complexity. As more and more parameters are added to a
model, the complexity of the model rises and variance becomes our
primary concern while bias steadily falls. For example, as more
polynomial terms are added to a linear regression, the greater the
resulting model's complexity will be. In other words, bias has a
negative first-order derivative in response to model complexity while
variance has a positive slope.
Understanding
bias and variance is critical for understanding the behavior of
prediction models, but in general what you really care about is overall
error, not the specific decomposition. The sweet spot for any model is
the level of complexity at which the increase in bias is equivalent to
the reduction in variance. Mathematically:
dBiasd/Complexity=−dVariance/dComplexity
If
our model complexity exceeds this sweet spot, we are in effect
over-fitting our model; while if our complexity falls short of the sweet
spot, we are under-fitting the model. In practice, there is not an
analytical way to find this location. Instead we must use an accurate
measure of prediction error and explore differing levels of model
complexity and then choose the complexity level that minimizes the
overall error.
Comments
Post a Comment