R Tutorial : Intermediate Data Analysis – Part 2

This R Tutorial builds on the previous R tutorial and arms you with few more fundamental analysis tools for capturing essence of data. This will go a long way in understanding the data more.

Continued from previous post R : Basic Data Analysis – Part 1

Please use the dataset : WHO ( Open the XLS and and save it as csv for rest of the discussion )


How to create Tables of Summary

  • To get to know data better we need to dig deeper into data by knowing sum, count, mean etc.. over levels of factors or ranges of numerical variables
  • Lets say we want to know how many countries are in each region using Region variable which is a categorical variable (factor). We can do this using simple table command.
table(WHO$Region)

Africa              Americas Eastern Mediterranean 
  46                    35                    22 
Europe       South-East Asia       Western Pacific 
  53                    11                    27
  • Lets get it a little complex . Let us say we want to know how many countries we in each of the region which have population above 30000

table(WHO$Region,WHO$Population > 30000) FALSE TRUE Africa 38 8 Americas 29 6 Eastern Mediterranean 16 6 Europe 44 9 South-East Asia 6 5 Western Pacific 22 5

  • Now the problem that the table command has is that it only gives you a count. For getting mean or sum or standard deviation we need to use another function called tapply. tapply is very similar to pivot tables in Excel.
  • Lets say we want to get mean life expectancies of all regions
tapply(WHO$LifeExpectancy,WHO$Region,mean)

 Africa              Americas Eastern Mediterranean 
57.95652              74.34286              69.59091 
 Europe       South-East Asia       Western Pacific 
76.73585              69.36364              72.33333 
  • Now lets say i want to get standard deviations for Life Expectancies or countries whose population is more or less than 30000 and if there are any missing values I want to ignore them
tapply(WHO$LifeExpectancy,WHO$Population > 30000,sd)

FALSE     TRUE 
9.372043 8.823353 

With this little introduction to getting peek into data , we are ready to go.
See you till next time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s