# R Tutorial : Intermediate Data Analysis – Part 2

This R Tutorial builds on the previous R tutorial and arms you with few more fundamental analysis tools for capturing essence of data. This will go a long way in understanding the data more.

Continued from previous post R : Basic Data Analysis – Part 1

Please use the dataset : WHO ( Open the XLS and and save it as csv for rest of the discussion )

How to create Tables of Summary

• To get to know data better we need to dig deeper into data by knowing sum, count, mean etc.. over levels of factors or ranges of numerical variables
• Lets say we want to know how many countries are in each region using Region variable which is a categorical variable (factor). We can do this using simple table command.
```table(WHO\$Region)

Africa              Americas Eastern Mediterranean
46                    35                    22
Europe       South-East Asia       Western Pacific
53                    11                    27
```
• Lets get it a little complex . Let us say we want to know how many countries we in each of the region which have population above 30000

table(WHO\$Region,WHO\$Population > 30000) FALSE TRUE Africa 38 8 Americas 29 6 Eastern Mediterranean 16 6 Europe 44 9 South-East Asia 6 5 Western Pacific 22 5

• Now the problem that the table command has is that it only gives you a count. For getting mean or sum or standard deviation we need to use another function called tapply. tapply is very similar to pivot tables in Excel.
• Lets say we want to get mean life expectancies of all regions
```tapply(WHO\$LifeExpectancy,WHO\$Region,mean)

Africa              Americas Eastern Mediterranean
57.95652              74.34286              69.59091
Europe       South-East Asia       Western Pacific
76.73585              69.36364              72.33333
```
• Now lets say i want to get standard deviations for Life Expectancies or countries whose population is more or less than 30000 and if there are any missing values I want to ignore them
```tapply(WHO\$LifeExpectancy,WHO\$Population > 30000,sd)

FALSE     TRUE
9.372043 8.823353
```

With this little introduction to getting peek into data , we are ready to go.
See you till next time.