Skip to main content

Basic Probability and Statistics

Expectation value

The expectation value of a function g:RRg : \R \to \R of a univariate continuous random variable Xp(x)X \sim p(x) is given by

EX[g(x)]=Xg(x)p(x)dx\mathrm{E}_{\mathrm{X}}[g(x)] = \int_{\Chi} g(x) p(x) dx

In case of gg is a function of discrete random variable Xp(x)X \sim p(x) is given by

EX[g(x)]=xXg(x)p(x)\mathrm{E}_{\mathrm{X}}[g(x)] = \sum \limits_{x \in \Chi} g(x) p(x)

For multivariate random variable, the expectation value is defined element wise,

EX[g(x)]=[EX1[g(x1)]EX2[g(x2)]EXD[g(xD)]]RD\mathrm{E}_{\mathrm{X}}[g(x)] = \begin{bmatrix} \mathrm{E}_{\mathrm{X_1}} [g(x_1)] \\ \mathrm{E}_{\mathrm{X_2}}[g(x_2)] \\ \vdots \\ \mathrm{E}_{\mathrm {X_D}}[g(x_D)]\end{bmatrix} \in \R^D


xˉ=1Nixi\bar{x} = \frac{1}{N}\sum \limits_i x_i


Variance=1N(xixˉ)2\text{Variance} = \frac{1}{N}(x_i - \bar{x})^2

Also see Bessel's correction.

Standard deviation

σ=Variance\sigma = \sqrt{\text{Variance}}


z-score(xi)=(xixˉ)/σz\text{-score} (x_i) = (x_i - \bar{x})/\sigma


Uncorrelated: E(XY)=E(X)E(Y)\mathcal{E}(XY) = \mathcal{E}(X) \mathcal{E}(Y), where E\mathcal{E} denotes expectation values.

Positive correlation: E(XY)>E(X)E(Y)\mathcal{E}(XY) > \mathcal{E}(X) \mathcal{E}(Y)

Negative correlation: E(XY)<E(X)E(Y)\mathcal{E}(XY) < \mathcal{E}(X) \mathcal{E}(Y)

Measure of correlation: Covariance

Cov(X,Y)=E(XY)E(X)E(X)Cov(X, Y) = \mathcal{E}(XY) - \mathcal{E}(X) \mathcal{E}(X)

Pearson's correlation coefficient:

E(XY)E(X)E(X)σ(X)σ(Y)\frac{\mathcal{E}(XY) - \mathcal{E}(X) \mathcal{E}(X)}{\sigma(X) \sigma(Y)}

Range for Pearson's coefficient: [-1, 1].

Note: If two variables are uncorrelated, their covariance is 0. However, the reverse need not be true: two variables can dependent on each other and still can have zero covariance.

Confidence interval

Confidence Interval = Best Estimate ± Margin of Error

In case of 95% Confidence interval, the formula becomes:

p^±1.96×p^(1p^)n\hat{p} \pm 1.96 \times \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}

Where nn is the sample size and p^\hat{p} is referred as population proportion. This means if we sample for large number of times, overall 95% of the time, the estimation would fall within the (Best Estimate ± Margin of Error). This theory is best applicable for simple random sample. We should have large enough sample size. The data is a categorical variable.

Below is the zz-multiplier for different confidence intervals. The z-values are taken from the normal distribution.

Confidence interval (%)z-multiplier

When finding out the difference between two Proportion Confidence Intervals:

p1^p2^±1.96×p1^(1p1^)n1+p2^(1p2^)n2\hat{p_1} - \hat{p_2} \pm 1.96 \times \sqrt{\frac{\hat{p_1}(1 - \hat{p_1})}{n_1} + \frac{\hat{p_2}(1 - \hat{p_2})}{n_2}}

There are also other approaches to define the standard error:

p^±z12np^±1n\hat{p} \pm z^* \frac{1}{2 \sqrt{n}} \approx \hat{p} \pm \frac{1}{\sqrt{n}}

zz^* is approximately 2 for 95% confidence interval. This is more conservative approach than the previous formula.

In case of quantitative data, the standard error is given by:


So the confidence interval is given by:

μ±σn\mu \pm \frac{\sigma}{\sqrt{n}}

For a specific confidence interval:

μ±tσn\mu \pm t^* \frac{\sigma}{\sqrt{n}}

Where tt^* multiplier comes from a t-distribution with (n-1) degrees of freedom. For 95% confidence interval with sample size (n) = 25, tt^* = 2.064, and with a sample size of 1000, tt^* = 1.962. For large sample sizes, the tt^* value will be closer to the zz-multiplier value that we used for categorical data.

Differences in the population means with confidence interval for two independent groups:

μ1μ2±tσ12n1+σ22n2\mu_1 - \mu_2 \pm t^* \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

De Morgan's Law

We can express one probability event in terms of other events, especially through unions, intersections and compliments. Oftentimes, one expression is easier to calculate than another. In this regards, De Morgans laws are very helpful.

(AB)c=AcBc(A \cup B)^c = A^c \cap B^c

(AB)c=AcBc(A \cap B)^c = A^c \cup B^c

Note that analogous results hold for unions and intersection of more than two events.

Binomial coefficient

For any nonnegative integers kk and nn, the binomial coefficient (nk)\binom{n}{k} (read as nn choose kk) is the number of subsets of size kk for a set of size nn.

(nk)=n(n1)(nk+1)k!=n!(nk)!k!\binom{n}{k} = \frac{n(n-1) \dots (n-k+1)}{k!} = \frac{n!}{(n-k)!k!}

For k>nk \gt n, we have (nk)=0\binom{n}{k} = 0.

Binomial Theorem:

(x+y)n=k=0n(nk)xkynk(x + y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k}

Vandermonde's identity:

(m+nk)=i=0k(mj)(nkj)\binom{m + n}{k} = \sum_{i=0}^k \binom{m}{j} \binom{n}{k-j}

Bayes' rule

Definition of conditional probability:

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}, if P(B)0P(B) \neq 0

P(AB)P(A|B) is probability of event AA given that event BB is true/occurring.

P(AB)=P(A)P(BA)P(B)P(A|B) = \frac{P(A) P(B|A)}{P(B)}