Monday, March 7, 2016

Chi-Square Test of independence

Using data from the Gapminder codebook: female employ rate  and urban rate. The data output was generated using SAS Studio.


Hypothesis formulation
Based on research review, it appears that while more women have joined the labor force thanks to urbanization, they have encountered several cultural and social obstacles that have negatively impacted their employment rates. It would be interesting to study the correlation between urbanization and the female employment rate in the Gapminder dataset.

Hypothesis: Higher urbanization rates lead to lower female employment rates.

Chi-Square Test
A Chi-Square test of independence was used to look for the association or difference between female employment rates and urbanization rates based on the hypothesis being tested. Because there are several continuous values in both variables, the data for each variable was grouped into categories in a previous assignment. The categories were then used in the chi-square test.

SAS Code
/** CHI SQUARE TEST OF INDEPENDENCE **/
PROC FREQ DATA=NEW;   
TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
TITLE 'Chi-Squared Test of Independence: Female Employment Rate and Urbanization Rate';
RUN;



Results
The contingency table shows:
The row percentages (Female Employment Rate Group / response variable) appear to be higher than the column percentages (Urbanization Rate Groups / independent variable) for Female Employment Rate Groups 1, 2, 5 and 6. The reverse is true for rows 3, 4 and 5. The data suggests an inverse relationship between the two groups. The higher level urbanization rate groups appear to have a lower percentage of female employment rates.

The Statistics table shows a Chi-Square value of 52.58 that is significant at the 0.001 probability level. The probability is less than .001 indicating a strong relationship between these two variables.












Post-hoc Tests for Chi Square Tests of Independence
This is used to examine the p-value and protect against a Type 1 error by using the Bonferroni Adjustment. The Null hypothesis will be rejected at the P-value / Comparisons level, i.e. 0.05/15 = 0.003333 
 Below are the Chi Square p-levels generated from pairs of each of the 6 levels in the Urbanization Rate Group. Level 1 has the most number of statistically significant values (below 0.003), for pair-wise comparisons (1,4), (1,5) and (1,6).



SAS Code:

DATA COMPARISON1; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '1' OR UrbanizationRateGroup = '2';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON2; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '1' OR UrbanizationRateGroup = '3';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON3; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '1' OR UrbanizationRateGroup = '4';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON4; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '1' OR UrbanizationRateGroup = '5';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON5; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '1' OR UrbanizationRateGroup = '6';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON6; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '2' OR UrbanizationRateGroup = '3';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON7; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '2' OR UrbanizationRateGroup = '4';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON8; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '2' OR UrbanizationRateGroup = '5';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON9; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '2' OR UrbanizationRateGroup = '6';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON10; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '3' OR UrbanizationRateGroup = '4';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON11; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '3' OR UrbanizationRateGroup = '5';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON12; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '3' OR UrbanizationRateGroup = '6';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON13; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '4' OR UrbanizationRateGroup = '5';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON14; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '4' OR UrbanizationRateGroup = '6';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;

DATA COMPARISON15; 
SET WORK.NEW;    
IF UrbanizationRateGroup = '5' OR UrbanizationRateGroup = '6';
PROC FREQ; 
     TABLES FemaleEmploymentRateGroup * UrbanizationRateGroup / CHISQ; 
RUN;










No comments:

Post a Comment