+1 (315) 557-6473 

Statistical Analysis with R: Hypothesis Testing, Regression, and ANOVA in Real-world Scenarios

Explore the dynamic landscape of statistical analysis using R. In this comprehensive guide, we delve into real-world scenarios, applying R to conduct hypothesis testing, regression analysis, and ANOVA—essential techniques for drawing meaningful insights from diverse datasets. From assessing the efficacy of medical interventions, and comparing the effectiveness of pharmaceuticals, to uncovering potential astrological influences on driving accidents, these examples showcase the power of R in analyzing and interpreting complex data. Gain a practical understanding of statistical methods and enhance your data-driven decision-making skills with hands-on applications in R.

1: Investigating Body Temperature Norms

Problem Description:

In the mid-1800s, Dr. Carl Wunderlich established 98.6°F as the "normal" body temperature. A contemporary researcher seeks to challenge this norm by studying the mean body temperature in 200 healthy adults. The researcher collected temperature data and aimed to use statistical analysis to verify or refute Dr. Wunderlich's findings.

97.10,97.19,97.25,97.46,97.52,97.63,97.63,97.63,97.70,97.71,97.71,

97.72,97.74,97.75,97.75,97.76,97.78,97.81,97.81,97.82,97.83,97.89,

97.90,97.90,97.91,97.92,97.92,97.93,97.96,97.96,97.98,98.00,98.01,

98.01,98.02,98.02,98.03,98.03,98.03,98.05,98.05,98.06,98.08,98.08,

98.09,98.10,98.11,98.11,98.11,98.11,98.11,98.11,98.12,98.12,98.12,

98.12,98.13,98.14,98.14,98.14,98.17,98.17,98.18,98.18,98.19,98.19,

98.20,98.20,98.21,98.21,98.21,98.21,98.22,98.25,98.26,98.26,98.28,

98.28,98.30,98.31,98.32,98.32,98.32,98.33,98.33,98.34,98.34,98.35,

98.35,98.36,98.36,98.36,98.36,98.37,98.38,98.39,98.42,98.42,98.43,

98.45,98.46,98.46,98.46,98.48,98.48,98.51,98.51,98.51,98.52,98.53,

98.53,98.54,98.54,98.55,98.55,98.55,98.56,98.56,98.57,98.57,98.57,

98.58,98.58,98.60,98.60,98.61,98.66,98.67,98.67,98.67,98.68,98.68,

98.69,98.70,98.72,98.72,98.72,98.73,98.74,98.77,98.78,98.79,98.79,

98.79,98.80,98.80,98.82,98.82,98.83,98.83,98.84,98.85,98.85,98.86,

98.87,98.87,98.89,98.89,98.90,98.92,98.92,98.93,98.97,98.98,98.98,

98.98,99.00,99.01,99.02,99.03,99.04,99.05,99.07,99.08,99.08,99.09,

99.09,99.11,99.13,99.13,99.14,99.15,99.15,99.16,99.20,99.24,99.27,

99.27,99.27,99.27,99.30,99.31,99.32,99.34,99.40,99.42,99.52,99.67,

99.67,99.87

(a) At the 1% significance level, is there enough evidence to conclude that the mean body temperature is less than 98.6◦?

Hypothesis Testing: The researcher formulates the null hypothesis (H0) that the mean body temperature is greater than 98.6°F, and the alternative hypothesis (H1) that it is less. Statistical tests are conducted using a one-sample t-test at a 1% significance level, leading to the rejection of H0. The conclusion is drawn based on the calculated t-statistic and critical t-value.

(b) Find a 95% Confidence Interval for the mean body temperature according to the data.

Confidence Interval: To provide a more nuanced perspective, a 95% confidence interval for the mean body temperature is calculated, offering a range within which the true population mean is likely to fall.

moe <- qt(0.975,df=199)*(sd(temp)/sqrt(200))

CI <- c(mean(temp) - moe, mean(temp) + moe)

CI

CI = (98.39917, 98.54003).

2: Predicting Vehicle Prices through Regression Analysis

Problem Description

As a vehicle owner contemplating selling a 2015 Subaru Forester, you decide to determine the optimal selling price based on comparable vehicles in your area. You collect data on the number of miles and their corresponding asking prices, aiming to establish a linear relationship between these variables.

Miles Price
122,498 $9,850
39,115 $16,989
71,292 $11,990
74,632 $13,490
81,594 $15,298
104,799 $12,699
114,317 $13,999
77,251 $14,749
77,618 $15,498
94,820 $12,598
79,475 $13,999

Table 1: Conclusive data on the number of miles and their corresponding prices

(a) Run a linear regression to determine the line of best fit. What is the equation for the line of best fit?

Line of Best Fit: A linear regression analysis is employed to find the equation for the line of best fit, helping predict the price of a vehicle based on its mileage.

We can use the following R code snippet to generate the linear regression equation.

# Miles and Price data

miles <- c(122498, 39115, 71292, 74632, 81594, 104799, 114317, 77251, 77618, 94820, 79475)

price <- c(9850, 16989, 11990, 13490, 15298, 12699, 13999, 14749, 15498, 12598, 13999)

# Linear regression

regression <- lm(price ~ miles)

# Coefficients and equation of the line

intercept <- coef(regression)[1]

intercept

intercept = 18878.58

slope <- coef(regression)[2]

slope

slope = -0.06027808

The linear regression equation: Price = 18878.58 – 0.06027808*Miles

(b) Determine the correlation coefficient between the miles and price of these vehicles.

# Correlation coefficient

correlation <- cor(miles, price)

correlation = -0.713828

Correlation Coefficient: The correlation coefficient is calculated to quantify the strength and direction of the linear relationship between the number of miles and the vehicle's asking price.

(c) According to the regression, at what price would you expect to sell your own 2015 Subaru Forester if it had 83,000 miles?

Price = 18878.58 – 0.06027808*Miles = Price = 18878.58 – 0.06027808*83000

Price = $13,878.50

Price Prediction (83,000 miles): Using the established regression equation, the researcher predicts the selling price for a 2015 Subaru Forester with 83,000 miles.

(d) According to the regression, at what price would you expect to sell your own 2015 Subaru Forester if it had 200,000 miles?

Price = 18878.58 – 0.06027808*Miles = Price = 18878.58 – 0.06027808*200000

Price = $6,822.964

Price Prediction (200,000 miles): Similarly, the regression model is used to predict the selling price for a vehicle with a higher mileage of 200,000 miles.

3: Analyzing the Impact of Diet on BMI

Problem Description

A comprehensive study was conducted to determine the efficacy of different diets in maintaining a healthy lifestyle. Diets examined include Regular, Low Carb, Low Calorie, and Low Sodium. Participants were randomly assigned to diet groups, and their BMI categories were self-reported after six months.

Diet Underweight Healthy Overweight Obese Total
(BMI < 18.5) (18.5 - 24.9) (25.0 - 29.9) (BMI ≥ 30.0)
Regular 42 492 637 819 1,990
Low Carb 35 490 637 810 1,972
Low Calorie 76 532 629 848 2,068
Low Sodium 30 488 610 902 2,047
Total 183 2,002 2,513 3,379 8,077

Table 2: Determinants of a healthy lifestyle

(a) At the 5% significance level, is there an association between diet and BMI?

Chi-square Test of Independence: To assess the association between diet and BMI, a chi-square test of independence is employed at a 5% significance level. The observed frequencies in each BMI category for each diet group are used to perform the test.

result

Person's Chi-square Test

data: obs

X-squerd= 33.481, df=9, p-value= 0.0001101

(b) What are the weaknesses of the study as described? How can they be addressed?

Weaknesses and Potential Improvements: The study is critiqued for not accounting for participants' initial BMI. A suggested improvement involves using a paired samples t-test to analyze changes in BMI for individual participants, providing a more robust understanding of the diet impact.

4: Assessing COVID Vaccine Efficacy

Problem Description

Pfizer, a pharmaceutical company, initiated one of the earliest COVID vaccine trials. Out of 34,922 patients, 17,411 received the vaccine (treatment group), and 17,511 received a placebo (control group). The study aims to determine if the vaccine is effective in preventing COVID-19. The data reveals the number of patients who contracted COVID-19 in both groups. At the 0.1% significance level, is there sufficient evidence to conclude that the vaccine is effective?

(a) Hypothesis Testing: A two-sample proportion Z test is employed to compare the vaccine and placebo groups. The null hypothesis (H0) posits no difference in effectiveness, while the alternative hypothesis (H1) suggests the vaccine is effective. The test is conducted at a 0.1% significance level, comparing critical Z values with the calculated Z value.

The alpha level is 0.001. The critical value of Z at the significance level is ±3.090232.

#R code

n1 <- 17411

n2 <- 17511

x1 <- 8

x2 <- 162

p1 <- x1 / n1

p1 = 0.000459

p2 <- x2 / n2

p2 = 0.00925

z <- (p1 - p2) / sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

z

z = -11.85702

(b) Efficacy Calculation: The efficacy of the vaccine is defined as the proportion of patients in the placebo group who contracted COVID-19 divided by the total number of patients who contracted COVID-19.

Efficacy = 162/170 = 95.29%

5: Comparative Analysis of Heart Medications

Problem Description

Researchers compare the effectiveness of two heart medications, clopidogrel and ticagrelor, in preventing heart attacks or strokes. A total of 13,408 patients participated, with 6,676 receiving clopidogrel and 6,732 receiving ticagrelor. The number of patients suffering a heart attack or stroke is recorded for both groups.

(a) Hypothesis Testing: A two-sample proportion Z test is utilized to compare the occurrence of heart attacks or strokes between the two medication groups. The null hypothesis (H0) assumes no difference, while the alternative hypothesis (H1) suggests a lower proportion for ticagrelor. The test is conducted at a 1% significance level.

6: Astrological Signs and Driving Accidents

Problem Description

In a survey conducted by an insurance company, 100 participants were asked to disclose their astrological signs and the number of accidents they experienced as drivers since obtaining their licenses. The data is categorized by astrological sign.

Astrological Sign Number of Accidents
Aries 1, 1, 0, 1, 0, 0, 3, 1
Taurus 0, 0, 0, 0, 2, 1, 0, 3, 2
Gemini 0, 0, 1, 1, 0, 1, 0
Cancer 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0
Leo 1, 0, 1, 1
Virgo 1, 0, 2, 1, 0, 0, 0, 1, 0, 0, 1
Libra 1, 2, 0, 0, 0, 2, 3, 2, 0, 1
Scorpio 0, 0, 2, 1, 2, 1, 1, 1
Sagittarius 1, 4, 0, 0, 1
Capricorn 0, 2, 0, 0, 0, 2, 0, 2
Aquarius 0, 2, 1, 0, 2, 2, 0, 0, 1, 0, 1, 0, 1, 0
Pisces 1, 1, 0, 3, 0

Table 3: First-hand accident experiences and the victims’ astrological signs

(a) ANOVA Test: A one-way ANOVA test is used to determine if a person's astrological sign has a statistically significant impact on the number of accidents. The null hypothesis assumes that the average number of accidents is the same for all astrological signs, while the alternative hypothesis suggests at least one sign differs. The analysis is conducted at a 5% significance level.

R-Code:

Aquarius Aries Cancer Capricorn Gemini Leo
0.714286 0.875 0.454546 0.75 0.428571 0.75
Libra Pisces Sagittarius Scorpio Taurus Virgo
1.1 1 1.2 1 0.888889 0.545455

Conclusion:

The ANOVA results determine whether astrological signs significantly impact the number of accidents. The average values for each sign are examined for any statistically significant differences.