Basic Probability and Statistics
Expectation value
The expectation value of a function of a univariate continuous random variable is given by
In case of is a function of discrete random variable is given by
For multivariate random variable, the expectation value is defined element wise,
Mean
Variance
Also see Bessel's correction.
Standard deviation
z-score
Correlation
Uncorrelated: , where denotes expectation values.
Positive correlation:
Negative correlation:
Measure of correlation: Covariance
Pearson's correlation coefficient:
Range for Pearson's coefficient: [-1, 1].
Note: If two variables are uncorrelated, their covariance is 0. However, the reverse need not be true: two variables can dependent on each other and still can have zero covariance.
Confidence interval
Confidence Interval = Best Estimate ± Margin of Error
In case of 95% Confidence interval, the formula becomes:
Where is the sample size and is referred as population proportion. This means if we sample for large number of times, overall 95% of the time, the estimation would fall within the (Best Estimate ± Margin of Error). This theory is best applicable for simple random sample. We should have large enough sample size. The data is a categorical variable.
Below is the -multiplier for different confidence intervals. The z-values are taken from the normal distribution.
Confidence interval (%) | z-multiplier |
---|---|
90 | 1.645 |
95 | 1.96 |
98 | 2.326 |
99 | 2.576 |
When finding out the difference between two Proportion Confidence Intervals:
There are also other approaches to define the standard error:
is approximately 2 for 95% confidence interval. This is more conservative approach than the previous formula.
In case of quantitative data, the standard error is given by:
So the confidence interval is given by:
For a specific confidence interval:
Where multiplier comes from a t-distribution with (n-1) degrees of freedom. For 95% confidence interval with sample size (n) = 25, = 2.064, and with a sample size of 1000, = 1.962. For large sample sizes, the value will be closer to the -multiplier value that we used for categorical data.
Differences in the population means with confidence interval for two independent groups:
De Morgan's Law
We can express one probability event in terms of other events, especially through unions, intersections and compliments. Oftentimes, one expression is easier to calculate than another. In this regards, De Morgans laws are very helpful.
Note that analogous results hold for unions and intersection of more than two events.
Binomial coefficient
For any nonnegative integers and , the binomial coefficient (read as choose ) is the number of subsets of size for a set of size .
For , we have .
Binomial Theorem:
Vandermonde's identity:
Bayes' rule
Definition of conditional probability:
, if
is probability of event given that event is true/occurring.
Resources
- Probability and statistics: https://projects.iq.harvard.edu/stat110
- Standard deviation Wikipedia : https://en.wikipedia.org/wiki/Standard_deviation
- Seeing Theory
- Bayes' rule Wikipedia: https://en.m.wikipedia.org/wiki/Bayes%27_theorem