Data Science

Posts

Showing posts from March, 2020

P-Value

March 31, 2020

In a hypothesis test, a p-value is the probability that the null-hypothesis is true. It's good practice to set the p value threshold before conducting the study: If the p value is less than the threshold, we: Reject the null hypothesis. If the p value is greater than the threshold, we: Fail to reject the null hypothesis. The most common p value threshold is 0.05 or 5%,

Hypothesis

March 31, 2020

A hypothesis is a pattern or rule about a process in the world that can be tested. We use hypothesis testing to determine if a change we made had a meaningful impact or not. You can use hypothesis testing to help you determine: if a new banner ad on a website caused a meaningful drop in the user engagement, if raising the price of a product caused a meaningful drop in sales, if a new weight loss pill helped people lose more weight. Observing a decrease in user engagement or sales after instituting a change doesn't automatically imply that the change was the cause. Hypothesis testing allows us to calculate the probability that random chance was actually responsible for the difference in outcome. Every process has some inherent amount of randomness that we can't measure and understanding the role of chance helps us reach a conclusion that's more likely to be correct. We first set up a null hypothesis that describes the status quo. We then state an a...

Bias & Variance

March 19, 2020

What are the concise meaning and interpretation of bias and variance in machine learning and statistics? Graphical interpretation of Bias - Variance. Let’s understand this image. This is bull’s eye diagram. Assume that center of the target(Red colored) is a model that perfectly predict the correct values. As we move away from the bull’s eye, our prediction goes worse. Imagine we can repeat our entire model building process to get a number of separate hits on the target. Each hit represents an individual realization of our model, given the chance variability in the training data we gather. Sometimes we will get a good distribution of training data so we predict very well and we are close to the bulls-eye, while sometimes our training data might be full of outliers or non-standard values resulting in poorer predictions. These different realizations result in a scatter of hits on the target. Let’s look at the definition of Bias and Variance : Bias ...