Monday, March 14, 2016

Pearson Correlation

Using data from the Gapminder codebook: female employ rate  and urban rate. The data output was generated using SAS Studio.


Hypothesis formulation
Based on research review, it appears that while more women have joined the labor force thanks to urbanization, they have encountered several cultural and social obstacles that have negatively impacted their employment rates. It would be interesting to study the correlation between urbanization and the female employment rate in the Gapminder dataset. Hypothesis: Higher urbanization rates lead to lower female employment rates.

Calculating the Pearson Correlation

code
/** DETERMINING THE CO-EFFICIENT CORRELATION**/
PROC CORR DATA=work.newdata;
          VAR urbanrate femaleemployrate;
RUN;

results:

For the association between urban rate and female employment rate, the correlation co-efficient is approximately -0.303, with p-vale < 0.0001. The association between urban rate and female employment rate appears to be modestly negative and significant statistically . It is therefore unlikely that the association is by chance alone.

Squaring the correlation co-efficient gives us the fraction of the variability of one variable that can be predicted by another. 

R^2 = (-0.303)^2 = 0.091797,

Therefore, if we know the urban rate we can predict approximately 9.2% of the variability we see in the female employment rate. This means that 91% of the variability is due to factors other than the urban rate.

This scatter plot shows the correlation between female employment rate and urban rate:

/** SCATTER PLOT **/
PROC SGPLOT DATA=WORK.NEWDATA;
REG X=urbanrate Y=femaleemployrate;
RUN;




No comments:

Post a Comment