Hypothesis formulation
Based on research review, it appears that while more women have joined the labor force thanks to urbanization, they have encountered several cultural and social obstacles that have negatively impacted their employment rates. It would be interesting to study the correlation between urbanization and the female employment rate in the Gapminder dataset. Hypothesis: Higher urbanization rates lead to lower female employment rates.
Summary of findings:
- The “Class Level Information” table lists the variables that appear in the CLASS statement, their levels, and the number of observations in the data set. There are 6 levels, 213 observations read and 173 observations used.
- The model degrees of freedom (DF) for the one-way analysis of variance are the number of levels minus 1; in this case, 6 - 1=5.
- The Corrected Total degrees of freedom are the total number of observations minus one; in this case 173 – 1 = 172. The sum of Model and Error degrees of freedom equal the Corrected Total.
- The overall F test is significant (F = 6.29; p < 0.0001), indicating that the model as a whole accounts for a significant portion of the variability in the female employ rate.
- The F test for the urbanization rate groups is significant, indicating that some contrast between the means for the different urbanization rate groups is different from zero. The null hypothesis is rejected.
- The Model and Urbanization Rate Groups F tests are identical, since “Urbanizationrategroup” is the only term in the model.
- The F test for Urbanization Rate Groups (F = 6.29; p < 0.0001) suggests that there are female employ rate distribution rate differences among the urbanization rate groups, but it does not reveal any information about the nature of the differences. A Mean comparison methods is used to gather further information using Tukey’s multiple comparisons test for pairwise differences between the urbanization rate group means.
- Significant differences in the female employ rate are observed between:
- Urbanization rate group 1 and groups 6, 4, and 5
- Urbanization rate group 2 and groups 6, 4, and 5
/** USING PROC ANOVA FOR ONE-WAY ANALYSIS OF VARIANCE **/
/** PROC ANOVA is used when the independent variable is categorical and the dependent variable is continuous.
Independent variable: UrbanizationRateGroup (categorized in the Binning/Grouping assignment)
Dependent: Female employ rate (continuous variable)
**/
PROC ANOVA DATA=NEW;
CLASS UrbanizationRateGroup; /** defines the categorical variable **/
MODEL femaleemployrate = UrbanizationRateGroup; /** defines the dependent variable & effects **/
RUN;
/** Using the TUKEY procedure to further understand differences in the categorical variable **/
PROC ANOVA DATA=NEW;
CLASS UrbanizationRateGroup;
MODEL femaleemployrate = UrbanizationRateGroup;
MEANS UrbanizationRateGroup / TUKEY; / ** studentized range test **/
RUN;
Urbanization Rate Vertical Bar Chart
The ANOVA Procedure
Class Level Information | ||
---|---|---|
Class | Levels | Values |
UrbanizationRateGroup | 6 | 1 2 3 4 5 6 |
Number of Observations Read | 213 |
---|---|
Number of Observations Used | 173 |
The ANOVA Procedure
Dependent Variable: femaleemployrate
Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
Model | 5 | 5948.15795 | 1189.63159 | 6.29 | <.0001 |
Error | 167 | 31599.31337 | 189.21745 | ||
Corrected Total | 172 | 37547.47132 |
R-Square | Coeff Var | Root MSE | femaleemployrate Mean |
---|---|---|---|
0.158417 | 28.80604 | 13.75563 | 47.75260 |
Source | DF | Anova SS | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
UrbanizationRateGrou | 5 | 5948.157954 | 1189.631591 | 6.29 | <.0001 |
The ANOVA Procedure
Tukey's Studentized Range (HSD) Test for femaleemployrate
This test controls the Type I experimentwise error rate.
Alpha | 0.05 |
---|---|
Error Degrees of Freedom | 167 |
Error Mean Square | 189.2174 |
Critical Value of Studentized Range | 4.07731 |
Comparisons significant at the 0.05 level are indicated by ***. | ||||
---|---|---|---|---|
UrbanizationRateGroup Comparison | Difference Between Means | Simultaneous 95% Confidence Limits | ||
1 - 2 | 13.615 | -7.742 | 34.972 | |
1 - 3 | 20.462 | -0.648 | 41.572 | |
1 - 6 | 25.280 | 4.459 | 46.102 | *** |
1 - 4 | 26.208 | 5.098 | 47.318 | *** |
1 - 5 | 27.537 | 6.845 | 48.229 | *** |
2 - 1 | -13.615 | -34.972 | 7.742 | |
2 - 3 | 6.847 | -3.893 | 17.586 | |
2 - 6 | 11.665 | 1.504 | 21.826 | *** |
2 - 4 | 12.593 | 1.854 | 23.333 | *** |
2 - 5 | 13.922 | 4.030 | 23.815 | *** |
3 - 1 | -20.462 | -41.572 | 0.648 | |
3 - 2 | -6.847 | -17.586 | 3.893 | |
3 - 6 | 4.818 | -4.813 | 14.449 | |
3 - 4 | 5.747 | -4.493 | 15.987 | |
3 - 5 | 7.076 | -2.272 | 16.423 | |
6 - 1 | -25.280 | -46.102 | -4.459 | *** |
6 - 2 | -11.665 | -21.826 | -1.504 | *** |
6 - 3 | -4.818 | -14.449 | 4.813 | |
6 - 4 | 0.928 | -8.703 | 10.559 | |
6 - 5 | 2.257 | -6.419 | 10.934 | |
4 - 1 | -26.208 | -47.318 | -5.098 | *** |
4 - 2 | -12.593 | -23.333 | -1.854 | *** |
4 - 3 | -5.747 | -15.987 | 4.493 | |
4 - 6 | -0.928 | -10.559 | 8.703 | |
4 - 5 | 1.329 | -8.019 | 10.677 | |
5 - 1 | -27.537 | -48.229 | -6.845 | *** |
5 - 2 | -13.922 | -23.815 | -4.030 | *** |
5 - 3 | -7.076 | -16.423 | 2.272 | |
5 - 6 | -2.257 | -10.934 | 6.419 | |
5 - 4 | -1.329 | -10.677 | 8.019 |