## Problem Description:

This STATA assignment aims to investigate the relationship between cholesterol levels in the blood and dietary choices, focusing on carbohydrate and fat intake. By analyzing data, we seek to understand if there is a linear correlation between these variables and contribute to better preventative measures against heart disease.

### Solution:

**Method
**

### Part I: Data Overview

In this section, we provide an overview of the dataset, which contains 15 variables with diverse observations. We introduce a new variable, "Cholesterol level," which categorizes cholesterol levels into low, medium, and high. We also create a variable for "Race" and examine the distribution of other variables. It's important to note that some variables are normally distributed, while others, like "Annual Household Income," are not.

### Part II: Hypothesis Testing

We analyze the relationship between dietary carbohydrate intake and cholesterol levels. Using statistical tests, we assess whether a linear relationship exists between them. We also examine the relationship between cholesterol levels and protein and total fat intake. Our findings reveal whether these variables have a linear correlation, and we utilize hypothesis testing to support these conclusions.

### Part III: Regression Analysis

In this section, we employ regression models to explore the correlation between total blood cholesterol and carbohydrate and total fat consumption. By calculating R-squared values and p-values, we aim to understand the extent to which these variables influence serum cholesterol levels.

**Results
**

### Part I: Data Insights

We provide an in-depth analysis of the dataset, including mean, variance, median, and interquartile range for various variables. This analysis helps us understand the distribution of key factors, including gender, education levels, cholesterol levels, race, and income per household.

### Part II: Hypothesis Testing Outcomes

We present the results of our hypothesis testing, which reveal the presence or absence of linear relationships between cholesterol levels and dietary intake variables (carbohydrates, protein, and total fat). Our findings provide valuable insights into these connections.

### Part III: Regression Analysis Findings

We share the outcomes of the regression analysis, specifically focusing on the relationship between total serum cholesterol and carbohydrate and total fat intake. These results help us understand the extent to which these dietary factors influence serum cholesterol levels.

**Discussion
**

In the discussion section, we consider the potential inclusion of additional variables in a multivariate model to better understand the relationships between dependent and independent variables. We also emphasize the significance of education levels and explore other variables, such as race and income, that could be incorporated into the analysis to gain a more comprehensive understanding of the factors affecting blood cholesterol levels.

**STATA Code
**

```
use C:\Users\User\OneDrive\Desktop\Stata\cholesterol_Fall2022.dta
recode dr2tchol 0/200 = 0 260/max = 1, generate(chol)
recode ridreth1 (3=1 "Mexican America") (2 . =9 "Other Hispanic") ( 1/5 = 4 "Non-Hispanic White") (nonmiss = 8 "Non-Hispanic Black") (miss= 9 "Others"), generate(Race)
```

** Histogram

histogram riagendr, width(1)

histogram Race, width(3)

histogram dmdeduc2, width(1)

histogram indhhin2, width(10)

histogram chol, width(1)

graph box seqn, over(riagendr)

graph box seqn, over(Race)

graph box seqn, over(dmdeduc2)

graph box seqn, over(indhhin2)

graph box seqn, over(chol)

**** mean and standard deviation for those normally distributed
**

histogram chol, width(1)

tabstat riagendr, statistics( mean sd ) columns(variables)

tabstat ridreth1, statistics( mean sd ) columns(variables)

tabstat dmdeduc2, statistics( mean sd ) columns(variables)

tabstat chol, statistics( mean sd ) columns(variables)

tabstat indhhin2, statistics( mean sd ) columns(variables)

tabstat Race, statistics( mean sd ) columns(variables)

**** Non normally distributed
**

sum indhhin2, detail

**** Frequencies
**

tab dmdeduc2

tab indhhin2

tab Race

tab chol

tab riagendr

**** Chi-square test
**

tabulate riagendr chol, chi2

tabulate dmdeduc2 chol, chi2

tabulate indhhin2 chol, chi2

tabulate Race chol, chi2

****** Examine the distribution
**

histogram dr2tchol, width(70)

tab cholestralev, su( dr2tcarb)

tab cholestralev, su( dr2tprot)

tab cholestralev, su( dr2ttfat)

**** Two sample mean comparison
**

ttest dr2tchol == dr2tcarb, unpaired

ttest dr2tchol == dr2tprot, unpaired

ttest dr2tchol == dr2ttfat , unpaired

**** Linear model
**

regress dr2tchol dr2tcarb

regress dr2tchol dr2ttfat dr2tcarb