Decision tree is a supervised algorithm and use for solving classification and regression problem means it uses for prediction and forming clusters.
As we know modeling of machine learning model involves so many steps like collection of data, pre-processing on data, model training (In this phase, we give data to our model) and at last, we have testing phase where we test our model on number of input values.
It represents a function that takes input in the form of vector and return a decision as an output, so the goal of decision tree based machine learning model that can use to predict the class or value of the target variable by learning simple decision rules which is based on the previous data.
Target variable are those features, which a model want to predict or classify according to that feature.
Types of Decision Tree
There are two type of decision tree.
1. Categorical variable decision tree.
2. Continuous variable decision tree.
We can understand above mentioned two terms with the help of following example.
Let's take a student and based of his previous academic record, we want to predict whether he/she will pass in the next examination or not means (YES/NO), this is the example of categorical because no numeric terms are there for target feature.
But in the same example, if we want to predict the percentage will be achieved by that student, it's the example of continuous variable decision tree.
Algorithm
There are so many approach for decision tree algorithm but ID 3 is one of the most common decision tree algorithm.
ID 3 stands of Iterative Dichtomisers, means dividing a things (features) into two completely opposite things.
Algorithm iteratively divides attributes into two groups.
Calculate the entropy and information gain for each attribute.(this is use for selecting which attribute should divided next into subgroup and work as tree or decision node)
Terminologies in Decision Tree
Root node
Parent node
Child node
Splitting
Branch/Subgroup
leaf nodes
Formula's
Information gain = ( -P /(P+N)) ㏒₂(P / (P+N)) - (N /(P+N))㏒₂(N/(P+N))
Entropy = -p㏒₂p - Q㏒₂Q
where q = 1-p (probability)
P = positive outcomes
N = negative outcomes
P+N = total number of outcomes.
If the sample is completely homogeneous then entropy will be zero.
I hope, now you have better understanding of Decision Tree Algorithm and how's it work.
If you have any doubt, let me know in comment section.
connect with us from the following link-
Thanking You.
Thank you.
ReplyDeleteInformative and innovative.....good going😊
ReplyDeleteThank you.
DeleteClear content. Goooddd
ReplyDelete