## Problem 1: Analyzing Educational Data

### Problem Description:

In this statistical analysis assignment, we delve into a dataset (ps4data.xlsx) focused on educational variables. Our objective is to perform statistical analyses, including calculating means, conducting t-tests, and establishing confidence intervals.

**Part a: **Descriptive Statistics and Confidence Interval

```
sample.mean <- mean(ps4data$educ)
print(sample.mean)
## [1] 7.044534
# Standard error
sample.n <- length(ps4data$educ)
sample.sd <- sd(ps4data$educ)
sample.se <- sample.sd/sqrt(sample.n)
print(sample.se)
## [1] 0.1061065
# t score corresponding to the Confidence Interval
alpha = 0.05
degrees.freedom = sample.n - 1
t.score = qt(p=alpha/2, df=degrees.freedom,lower.tail=F)
print(t.score)
## [1] 1.963175
# Marginal Error
margin.error <- t.score * sample.se
# Confidence Interval
lower. bound <- sample. mean - margin. error
upper.bound <- sample.mean + margin.error
print(c(lower.bound,upper.bound))
## [1] 6.836229 7.252840
```

**Outcome:** The mean education level is 7.044534, and we are 95% confident that it falls within the range of (6.836229, 7.252840).

**Part b: **One Sample t-test

```
t.test(ps4data$educ, mu = 5, alternative = "two.sided")
##
## One Sample t-test
##
## data: ps4data$educ
## t = 19.269, df = 740, p-value < 2.2e-16
## Alternative hypothesis: the true mean is not equal to 5
## 95 percent confidence interval:
## 6.836229 7.252840
## sample estimates:
## mean of x
## 7.044534
```

**Outcome: **The analysis reveals a rejection of the null hypothesis, suggesting that the true mean is not equal to 5.

**Part c: **One Sample T-test with Different Hypothesis

```
t.test(ps4data$educ, mu = 7.2, alternative = "two.sided")
##
## One Sample t-test
##
## data: ps4data$educ
## t = -1.4652, df = 740, p-value = 0.1433
## Alternative hypothesis: the true mean is not equal to 7.2
## 95 percent confidence interval:
## 6.836229 7.252840
## sample estimates:
## mean of x
## 7.044534
```

**Outcome: **When testing against a mean of 7.2, we fail to reject the null hypothesis, indicating no significant difference.

**Part d:** Two-Sample t-test

```
Y_t <- subset(ps4data, ps4data$abd == 1)
Y_c <- subset(ps4data, ps4data$abd == 0)
# two sided t-test
t.test(Y_t$educ, Y_c$educ, alternative = "two.sided", var.equal = FALSE)
##
## Welch Two Sample T-test
##
## data: Y_t$educ and Y_c$educ
## t = -2.6798, df = 551.58, p-value = 0.007587
## Alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.0318702 -0.1589784
## sample estimates:
## mean of x mean of y
## 6.820346 7.415771
```

**Outcome:** The two-sample t-test suggests no significant difference in means between two distinct subsets.

**Part e: **Advantages of One-Tailed Test

# Explanation of advantages

**Outcome:** Opting for a one-tailed test provides increased statistical power at the same significance level.

**Part f: **One-Tailed Two-Sample t-test

```
Y_t <- subset(ps4data, ps4data$abd == 1)
Y_c <- subset(ps4data, ps4data$abd == 0)
# two sided t-test
t.test(Y_t$educ, Y_c$educ, alternative = "less", var.equal = FALSE)
##
## Welch Two Sample T-test
##
## data: Y_t$educ and Y_c$educ
## t = -2.6798, df = 551.58, p-value = 0.003794
## Alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.2293362
## sample estimates:
## mean of x mean of y
## 6.820346 7.415771
```

**Outcome: **Exploring if the mean of Y_t is less than Y_c yields a p-value of 0.003794.

**Part g: **Two-Sample t-test with Different Variable

```
Y_t <- subset(ps4data, ps4data$abd == 1)
Y_c <- subset(ps4data, ps4data$abd == 0)
# two sided t-test
t.test(Y_t$fthr_ed, Y_c$fthr_ed, alternative = "two.sided", var.equal = FALSE)
##
## Welch Two Sample T-test
##
## data: Y_t$fthr_ed and Y_c$fthr_ed
## t = -1.1125, df = 572.99, p-value = 0.2664
## Alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.8408032 0.2327410
## sample estimates:
## mean of x mean of y
## 5.764069 6.068100
```

**Outcome:** Testing a different variable suggests no significant difference in means, given the sample size.

**Part h:** Minimizing Type I Error

# Explanation on minimizing Type I error

**Outcome:** To minimize Type I error, consider decreasing the significance level; altering sample size has no effect.

**Part i: **One Sample t-test for Wages Improvement

# Explanation and code for Part i

**Outcome: **The one-sample t-test assesses wage improvement, comparing those with vocational training to those without.

## Problem 2: Simulation and Central Limit Theorem

### Problem Description:

In this scenario, the challenge lies in understanding the impact of sample size on hypothesis testing and the subsequent insights derived from the Central Limit Theorem. We delve into the intricacies of rejection rates, providing a hands-on perspective on the importance of appropriate sample sizes in statistical analyses.

**Part a:** Small Sample Size Issue

# Explanation and code for Part a

**Outcome: **Simulating small samples from an exponential distribution leads to a higher rejection rate due to the small sample size issue.

**Part b: **Larger Sample Size and Central Limit Theorem

# Explanation and code for Part b

**Outcome:** Using a larger sample size (100) reduces the rejection rate, emphasizing the impact of the central limit theorem.