Annie Adams
2025-08-28
An specific example of a continuous distribution is the Uniform distribution with parameters \(a\) and \(b\): \(X \sim \mathrm{Unif}(a, \ b)\).
The p.d.f. is given by \[ f_X(x) = \begin{cases} \frac{1}{b - a} & \text{if } a \leq x \leq b \\ 0 & \text{otherwise} \\ \end{cases} \]
The expected value and variance are: \[ \mathbb{E}[X] = \frac{a + b}{2} ; \quad \mathrm{Var}(X) = \frac{(b - a)^2}{12} \]
Again, probabilities are found as areas underneath the density curve:
can be decomposed as
\[ \huge - \]
\[ \mathbb{P}(a \leq X \leq b) = \underbrace{\mathbb{P}(X \leq b)}_{\text{c.d.f. at $b$}} - \underbrace{\mathbb{P}(X \leq a)}_{\text{c.d.f. at $a$}} \]
Chalkboard Example 2
The time (in minutes) spent waiting in line at Starbucks is found to vary uniformly between 5mins and 15mins.
A random sample of 10 customers is taken; what is the probability that exactly 4 of these customers will spend between 10 and 13 minutes waiting in line?
We also learned about the Normal Distribution: \(X \sim \mathcal{N}(\mu, \ \sigma)\)
The normal density curve is bell-shaped
The primary goal of inferential statistics is to take samples from some population, and use summary statistics to try and make inferences about population parameters
For example, we could take samples, compute sample proportions \(\widehat{P}\), and try to make inferences about the population proportion \(p\).
We could also take samples, compute sample means \(\overline{X}\), and try to make inferences about the population mean \(\mu\).
Our summary statistics will often be point estimators (i.e. quantities that have expected value equal to the corresponding population parameter), which are random variables as they depend on the sample taken.
The distribution of a point estimator is called the sampling distribution of the estimator.
Given a population with population proportion \(p\), we use \(\widehat{P}\) as a point estimator of \(p\).
Assume the success-failure conditions are met; i.e.
Then, the Central Limit Theorem for Proportions tells us that \[ \widehat{P} \sim \mathcal{N}\left(p, \ \sqrt{\frac{p(1 - p)}{n}} \right) \]
If we don’t have access to \(p\) directly (as is often the case), we use the substitution approximation to check whether
Given a population with population mean \(\mu\) and population standard deviation \(\sigma\), we use \(\overline{X}\) as a point estimator of \(\mu\).
If the population is normally distributed, then \[ \overline{X} \sim \mathcal{N}\left(\mu, \ \frac{\sigma}{\sqrt{n}} \right) \] or, equivalently, \[ \frac{\overline{X} - \mu}{\sigma / \sqrt{n}} \sim \mathcal{N}\left(0, \ 1 \right) \]
If the population is not normally distributed, but the sample size \(n\) is at least 30, then the Central Limit Theorem for the Sample Mean (or just the Central Limit Theorem) tells us \[ \frac{\overline{X} - \mu}{\sigma / \sqrt{n}} \sim \mathcal{N}\left(0, \ 1 \right) \]
If the population is non-normal, the sample size is large, and we don’t have access to \(\sigma\) (but access to \(s\), the sample standard deviation instead), then \[ \frac{\overline{X} - \mu}{s / \sqrt{n}} \sim t_{n - 1}\]
Recall that the \(t-\)distribution looks like a standard normal distribution, but has wider tails than the standard normal distribution (which accounts for the additional uncertainty injected into the problem by using \(s\), a random variable, in place of \(\sigma\), a deterministic constant).
Also recall that \(t_{\infty}\) (i.e. the \(t-\)distribution with an infinite degrees of freedom) is the same thing as the standard normal distribution.
Instead of using point estimators (which are random) to estimate population parameters (which are deterministic), it may make more sense to provide an interval that, with some confidence level, contains the true parameter value.
In general, when constructing a confidence interval for a parameter \(\theta\), we use \[ \widehat{\theta} \pm c \cdot \mathrm{SD}(\widehat{\theta}) \] where \(c\) is some constant that depends on our confidence level.
The coefficient \(c\) will also depend on the sampling distribution of \(\widehat{\theta}\).
Worked-Out Example 3
Saoirse would like to construct a 95% confidence interval for the true proportion of California Residents that speak Spanish. To that end, she took a representative sample of 120 CA residents and found that 36 of these residents speak Spanish.
The population is the set of all California residents.
The parameter of interest is \(p\), the true proportion of CA residents that speak Spanish.
The random variable of interest is \(\widehat{P}\), the proportion of people in a representative sample of 120 CA residents that speak spanish.
We check the success-failure conditions, with the substitution approximation:
Recall that in the framework of hypothesis testing, we wish to utilize data to assess the plausibility/validity of a hypothesis, called the null hypothesis.
In the case of hypothesis testing for a population proportion \(p\), our null takes the form \(H_0: p = p_0\) and there are several different alternative hypotheses we could consider:
Two-Sided Test for a Proportion:
When testing \(H_0: \ p = p_0\) vs \(H_A: \ p \neq p_0\) at an \(\alpha\) level of significance, where \(p\) denotes a population proportion, the test takes the form \[ \texttt{decision}(\mathrm{TS}) = \begin{cases} \texttt{reject } H_0 & \text{if } |\mathrm{TS}| > z_{1 - \alpha/2} \\ \texttt{fail to reject } H_0 & \text{otherwise}\\ \end{cases} \] where:
\(\displaystyle \mathrm{TS} = \frac{\widehat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}}\)
\(z_{1 - \alpha/2}\) denotes the \((\alpha/2) \times 100\)th percentile of the standard normal distribution, scaled by negative 1 (which is equivalent to the \((1 - \alpha/2) \times 100\)th percentile)
provided that: \(n p_0 \geq 10\) and \(n (1 - p_0) \geq 10\).
Lower-Tailed Test for a Proportion:
When testing \(H_0: \ p = p_0\) vs \(H_A: \ p < p_0\) at an \(\alpha\) level of significance, where \(p\) denotes a population proportion, the test takes the form \[ \texttt{decision}(\mathrm{TS}) = \begin{cases} \texttt{reject } H_0 & \text{if } \mathrm{TS} < z_{\alpha} \\ \texttt{fail to reject } H_0 & \text{otherwise}\\ \end{cases} \] where:
\(\displaystyle \mathrm{TS} = \frac{\widehat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}}\)
\(z_{\alpha}\) denotes the \((\alpha) \times 100\)th percentile of the standard normal distribution, not scaled by anything
provided that: \(n p_0 \geq 10\) and \(n (1 - p_0) \geq 10\).
Upper-Tailed Test for a Proportion:
When testing \(H_0: \ p = p_0\) vs \(H_A: \ p > p_0\) at an \(\alpha\) level of significance, where \(p\) denotes a population proportion, the test takes the form \[ \texttt{decision}(\mathrm{TS}) = \begin{cases} \texttt{reject } H_0 & \text{if } \mathrm{TS} > z_{1 - \alpha} \\ \texttt{fail to reject } H_0 & \text{otherwise}\\ \end{cases} \] where:
\(\displaystyle \mathrm{TS} = \frac{\widehat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}}\)
\(z_{1 - \alpha}\) denotes the \((1 - \alpha) \times 100\)th percentile of the standard normal distribution, not scaled by anything
provided that: \(n p_0 \geq 10\) and \(n (1 - p_0) \geq 10\).
Worked-Out Example 4
Administration within a Statistics department at an unnamed university claims to admit 24% of all applicants. A disgruntled student, dubious of the administration’s claims, takes a representative sample of 120 students who applied to the Statistics major, and found that 20% of these students were actually admitted into the major.
Conduct a two-sided hypothesis test at a 5% level of significance on the administrator’s claims that 24% of applicants into the Statistics major are admitted. Be sure you phrase your conclusion clearly, and in the context of the problem.
We first phrase the hypotheses.
Let \(p\) denote the true proportion of applicants who get admitted into the major. Since we are performing a two-sided test, our hypotheses take the form \[ \left[ \begin{array}{rr} H_0: p = 0.24 \\ H_A: p \neq 0.24 \end{array} \right.\]
Now we compute the observed value of the test statistic: \[ \mathrm{ts} = \frac{\widehat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}} = \frac{0.2 - 0.24}{\sqrt{\frac{(0.24) \cdot (1 - 0.24)}{120}}} \approx -1.026 \]
Next, we compute the critical value. Since we are using an \(\alpha = 0.05\) level of significance, we will use the critical value \(1.96\)
Finally, we perform the test: we will reject the null in favor of the alternative if \(|\mathrm{ts}|\) is larger than the critical value.
In this case, \(|\mathrm{ts}| = |-1.026| = 1.026 < 1.96\) meaning we fail to reject the null:
At an \(\alpha = 0.05\) level of significance, there was insufficient evidence to reject the null hypothesis that 24% of applicants are admitted into the major in favor of the alternative that the true admittance rate was not 24%.
Similar to test for proportions, we also have the following alternatives:
We want to check the following conditions:
\[ \mathrm{TS} = \begin{cases} \displaystyle \frac{\overline{X} - \mu_0}{\sigma / \sqrt{n}} & \text{if } \quad \begin{array}{rl} \bullet & \text{pop. is normal, OR} \\ \bullet & \text{$n \geq 30$ AND $\sigma$ is known} \end{array} \quad \stackrel{H_0}{\sim} \mathcal{N}(0, \ 1) \\[5mm] \displaystyle \frac{\overline{X} - \mu_0}{s / \sqrt{n}} & \text{if } \quad \begin{array}{rl} \bullet & \text{$n \geq 30$ AND $\sigma$ is not known} \end{array} \quad \stackrel{H_0}{\sim} t_{n - 1} \end{cases} \]
We also talked about p-value: the probability of obtaining a value of the statistic as extreme or more extreme as the observed statistic when the null hypothesis is true.
When the p-value is small, we have evidence that our observed results are unlikely to have occurred by chance alone when the null hypothesis is true. The smaller the p-value, the stronger the evidence is against the null hypothesis.
Small p-value suggest that we have results that are inconsistent with the null hypothesis and that the null should be rejected.
Large p-value suggest that we have results that are not inconsistent with the null hypothesis and that null should not be rejected.
Decision | |
---|---|
p-value \(\leq \alpha\) | Reject \(H_0\) |
p-value \(> \alpha\) | Fail to reject \(H_0\) |