Using data from the Gapminder codebook: female employ rate and urban rate. The data output was generated using SAS Studio.
Hypothesis formulation
Hypothesis formulation
Based on research review, it appears that while more women have joined the labor force thanks to urbanization, they have encountered several cultural and social obstacles that have negatively impacted their employment rates. It would be interesting to study the correlation between urbanization and the female employment rate in the Gapminder dataset. Hypothesis: Higher urbanization rates lead to lower female employment rates.
The program below uses PROC MEANS to identify some descriptive statistics, including:
N: number of non-missing values
NMISS: number of missing values
MEAN
MEDIAN
MODE
MIN: minimum value
MAX: maximum value
STDDEV: standard deviation
RANGE
Each of the 3 variables has too many different data values and significantly large data ranges. To make it easier to analyze these variables of interest, it is necessary to group the data using IF-THEN statements. When grouping or binning the data, each variable's values of the minimum value, maximum value and standard deviation serve as a guide when selecting the data ranges to be used in the IF-THEN statements. The first IF-THEN statement for each variable deals with observations that have missing data.
/** USING SAS STUDIO**/
/** Import the saved csv file from
folder **/
PROC IMPORT DATAFILE="\\fsp5800\users\maryt\Desktop\gapminder.csv"
DBMS=CSV /** data source
identifier **/
OUT=work.newdata; /** the output
SAS data set **/
GETNAMES=YES; /** generates
SAS data set names from the first record in the import file **/
/** BINNING VARIABLES: Execute statements for observations that
meet certain conditions**/
DATA NEW; /** Creates new variable that will
be the SAS output data set**/
SET work.newdata; /** Reads
observations from the SAS dataset**/
KEEP femaleemployrate employrate
urbanrate /** Include
in output data sets**/
FemaleEmploymentRateGroup
EmploymentRateGroup
UrbanizationRateGroup; /** Creates secondary variables **/
/** Exploring the data using PROC MEANS to produce statistics **/
PROC MEANS DATA=NEW N NMISS MEAN MODE MEDIAN MIN MAX STD RANGE ;
The MEANS Procedure
Variable | N | N Miss | Mean | Mode | Median | Minimum | Maximum | Std Dev | Range |
---|---|---|---|---|---|---|---|---|---|
femaleemployrate
employrate
urbanrate
|
178
178
203
|
35
35
10
|
47.5494381
58.6359551
56.7693596
|
42.0999985
47.2999992
100.0000000
|
47.5499992
58.6999989
57.9400000
|
11.3000002
32.0000000
10.4000000
|
83.3000031
83.1999969
100.0000000
|
14.6257425
10.5194545
23.8449326
|
72.0000029
51.1999969
89.6000000
|
/** FemaleEmploymentRateGroup **/
/** '^= .' excludes missing data**/
if (femaleemployrate ^= . &
femaleemployrate <= 15) then
FemaleEmploymentRateGroup
= "1";
if (femaleemployrate > 15 &
femaleemployrate <= 30) then
FemaleEmploymentRateGroup
= "2";
if (femaleemployrate > 30 &
femaleemployrate <= 45) then
FemaleEmploymentRateGroup
= "3";
if (femaleemployrate > 45 &
femaleemployrate <= 60) then
FemaleEmploymentRateGroup
= "4";
if (femaleemployrate > 60 &
femaleemployrate <= 75) then
FemaleEmploymentRateGroup
= "5";
if (femaleemployrate > 75) then
FemaleEmploymentRateGroup
= "6";
/** EmploymentRateGroup **/
if (employrate ^= . &
employrate <= 35) then
EmploymentRateGroup
= "1";
if (employrate > 35 &
employrate <= 45) then
EmploymentRateGroup
= "2";
if (employrate > 45 &
employrate <= 55) then
EmploymentRateGroup
= "3";
if (employrate > 55 &
employrate <= 65) then
EmploymentRateGroup
= "4";
if (employrate > 65 &
employrate <= 75) then
EmploymentRateGroup
= "5";
if (employrate > 75 ) then
EmploymentRateGroup
= "6";
/**
UrbanizationRateGroup **/
if (urbanrate ^= . & urbanrate
<= 15) then
UrbanizationRateGroup
= "1";
if (urbanrate > 15 & urbanrate
<= 30) then
UrbanizationRateGroup
= "2";
if (urbanrate > 30 & urbanrate
<= 45) then
UrbanizationRateGroup
= "3";
if (urbanrate > 45 & urbanrate
<= 60) then
UrbanizationRateGroup
= "4";
if (urbanrate > 60 & urbanrate
<= 75) then
UrbanizationRateGroup
= "5";
if (urbanrate > 75) then
UrbanizationRateGroup="6";
RUN;
PROC FREQ DATA=NEW;
TABLES
FemaleEmploymentRateGroup EmploymentRateGroup UrbanizationRateGroup;
RUN;
Female employ rate:
- 178 of the 213 countries have female employment statistics listed. 35 of the 213 countries do not have the statistics listed.
- The highest female employment rate listed is 83.3%, from Burundi. The lowest female employment rate listed is 11.3%, from the West Bank and Gaza. The resulting range among the observations is 72% and the standard deviation is 14.6.
- The average female employment rate among the 178 observations is 47.6%.
- The female employment rate with the highest frequency (mode) is 42.1%.
- The highest number of observations is in ‘Female Employment Rate Group’ = 4, which includes 75 observations (countries). Therefore the majority of countries have female employment rates between 46-60%.
- The second highest frequency of female employment rate is in group 3, between 31-45%. This accounts for 30.9% of the observations.
Employ rate:
- 178 of the 213 countries have overall employment statistics listed. 35 of the 213 countries do not have the statistics listed.
- The highest employment rate listed is 83.2%, from both Burundi and Uganda. The lowest employment rate listed is 32%, from the West Bank and Gaza. The resulting range among the observations is 51.2% and the standard deviation is 10.5.
- The average employment rate among the 178 observations is 58.6%.
- The employment rate with the highest frequency (mode) is 47.3%.
- The highest number of observations is in ‘Employment Rate Group’ = 4, which includes 76 observations (countries) or 42.7% of total observations. Therefore the majority of countries have employment rates between 56-65%.
- The second highest frequency of employment rate is in group 3, between 46-55%. This accounts for 23% of the observations.
Urban rate:
- 203 of the 213 countries have urbanization rate statistics listed. 10 of the 213 countries do not have the statistics listed.
- The highest urban rate listed is 100%, from 6 countries: Hong Kong / China, Singapore, Macao / China, Cayman Islands, Monaco and Bermuda. The lowest urban rate listed is 10.4%, from Burundi. The resulting range among the observations is 89.6% and the standard deviation is 23.8.
- The average urban rate among the 203 observations is 56.8%.
- The urban rate with the highest frequency (mode) is 100%.
- The highest number of observations is in ‘Urban Rate Group’ = 5, which includes 50 observations (countries) or 24.6% of total observations. Therefore the majority of countries have urban rates between 61-75%.
- The second highest frequency of urban rate is in group 6, greater than 75%. This accounts for 23.7% of the observations.
The FREQ Procedure
FemaleEmploymentRateGroup | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
Frequency Missing = 35 | ||||
1 | 3 | 1.69 | 3 | 1.69 |
2 | 15 | 8.43 | 18 | 10.11 |
3 | 55 | 30.90 | 73 | 41.01 |
4 | 75 | 42.13 | 148 | 83.15 |
5 | 21 | 11.80 | 169 | 94.94 |
6 | 9 | 5.06 | 178 | 100.00 |
EmploymentRateGroup | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
Frequency Missing = 35 | ||||
1 | 2 | 1.12 | 2 | 1.12 |
2 | 17 | 9.55 | 19 | 10.67 |
3 | 41 | 23.03 | 60 | 33.71 |
4 | 76 | 42.70 | 136 | 76.40 |
5 | 28 | 15.73 | 164 | 92.13 |
6 | 14 | 7.87 | 178 | 100.00 |
UrbanizationRateGroup | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
Frequency Missing = 10 | ||||
1 | 5 | 2.46 | 5 | 2.46 |
2 | 30 | 14.78 | 35 | 17.24 |
3 | 35 | 17.24 | 70 | 34.48 |
4 | 35 | 17.24 | 105 | 51.72 |
5 | 50 | 24.63 | 155 | 76.35 |
6 | 48 | 23.65 | 203 | 100.00 |
No comments:
Post a Comment