The tutorial introduces you to the basics of machine learning and different types of machine learning algorithms and how to kick start career in the field of data science by understanding the basics.
Machine Learning and Data Science are the hottest buzzwords these days in the industry. There is a lot of confusion going on around what exactly is machine learning. In this article I will try to explain what machine learning is and how different it is from conventional algorithms. I will also try to introduce types of learning and where they are applicable. So lets start the voyage.
What is Machine Learning ?
There are lot of definitions of Machine Learning , from simplest one to the most complex ones. Here is how I think of Machine Learning –
Machine learning is a system based on one or more algorithms which aims to improve with experience using statistical computations as its base
Now if that is too complex let me try to explain. Machine learning is very similar to
how we humans learn, which is from experiences . In case of Machine learning experience can be the data that we are inputting to the system to learn from.
- Supervised Learning
- Unsupervised Learning
- Reinforced Learning
Supervised learning is a type of Machine learning where historical data along with correct outcomes given, guides us to create a function which can generalize and predict the outcome for future inputs.
A wonderful example of supervised learning is how a mother teaches her child to speak. She demonstrates how to speak by giving a lot of example words , tells the child which ones are wrongly spoken and which ones are rightly spoken. So in a way the mother supervises during the whole speech training process by providing the child with data on what is right and what is wrong. This is what broadly Supervised Learning techniques do. Using a lot of examples the Supervised Learning techniques try to train the system to create a general model which learns to do the task.
Now the question most of us ask is how different is it from conventional programming?
- Well in conventional programming we assume that we know the relationship between input and output variables and just go about programming so as to transform the inputs into outputs. This is deduction, where we already have a general rule and we try to use the general rule to get outcomes for a subset of possible inputs for the general rule. This type of programming is not able to cope up with input values which are not considered as a part of general rule.
- To give an example let us say we are writing a program to find square of a number. In this case we already know how to calculate square from the input number. Let us call this f(x) = y where x is the input , y is the output and f(x) is a function to convert x into y.
- In contrast to conventional programming where we already know the relationship between input and output, in case of supervised learning we do not know the relationship. So part of solution is to approximate the relationship. Let us see this with an example where only data is given ; x are inputs and y are corresponding outputs.
X : 1 2 3 Y : 1 4 9
- We may be really tempted to say that its easy and we will absolutely be able to figure out the function to convert X into Y with 100% accuracy. In this case we may say that the relationship is Y=X^2 . Well may be … But then let us say that some more data comes in and your new dataset is
X : 1 2 3 4 5 6 7 Y : 1 4 9 8 10 12 15
- Now can we find a function which will convert all X to all Y with 100% accuracy? Not so easy anymore. Right? That is why we say we can only do function approximation which will make least amount of error when converting from X to Y.
- This is also called as induction since we are generalizing a rule which explains the transformation of X into Y as best as it can. Let us say we say that our approximate function is Y=2X in this case when we encounter a new unseen value X=8 we can predict the outcome Y=16. Our actual outcome could potentially be 17 so we are still making an error of 1 but this is acceptable since we are approximating our function.
- Supervised learning is usually used for classification problems where you want to categorize into one or more predetermined classes depending on input variables.
- How well the model is trained depends upon how well it can predict correct classes with new unseen combination of input values.
- In unsupervised learning, unlike supervised learning we do not have right answers for historical data. We only have input variables and we are looking to club together items into different clusters which are very similar to each other.
- Best example of this would be a child figuring out by himself that all things which are Green at the Top and are connected to land with brown rectangular structures are a category which is different from all multicolored boxes which are connected to land with 4 circular structures on which they move. This is unsupervised learning. Later on the child labels the first category of objects as Trees and the second category of objects as Cars..
- Unsupervised learning in most cases , is used with Clustering problems. e.g. customer segmentation, collection of similar articles etc.
- Clustering in most cases is not the end but an intermediate step into complete training process. Many a times mixed input data does not give out crisp enough trends for us to categorize it with accuracy. In such cases clustering is used to segment the data into similar sets and then other techniques are applied separately on each of these subsets. This usually gives us much better results.
- Reinforced learning takes a different path . In reinforced learning , we learn by trial and error. Every success trial gives a positive feed
- Best example of this again is a child learning by committing mistakes. Lets say the child sees a cup of steaming hot coffee on floor. It crawls up to the cup and touches it . ( This is trial ). Immediately it knows this is a mistake because it gets a burn on its hand.( The feedback) That sends a negative feedback to his brain associated with the situation that a cup filled with something on flow with steams coming out is not good. Again next time it sees a glass full of steaming hot water on coffee table. The child crawls and touches again . Again a negative feedback is sent to his brain. After several such trials the child learns the hard way that anything which is in a cup or glass with steam coming out of it is harmful for me and so i should not touch it. This is reinforced learning.
- Reinforced learning is used in categorization problems with delayed feedback and Environmental control problems.
I hope I was able to give enough introduction on Machine learning and different types of learning. We will explore on Machine learning and different machine learning techniques in next few posts.
Till then Happy Learning !
To go to next gear and learn more about Machine learning try reading through following easy tutorials with examples…