Decision Tree | Machine learning | Supervised Learning Algorithm

April 20, 2020

Decision Tree | Machine learning | Supervised Learning Algorithm

Decision tree is a supervised algorithm and use for solving classification and regression problem means it uses for prediction and forming clusters.

As we know modeling of machine learning model involves so many steps like collection of data, pre-processing on data, model training (In this phase, we give data to our model) and at last, we have testing phase where we test our model on number of input values.

It represents a function that takes input in the form of vector and return a decision as an output, so the goal of decision tree based machine learning model that can use to predict the class or value of the target variable by learning simple decision rules which is based on the previous data.

Target variable are those features, which a model want to predict or classify according to that feature.

Types of Decision Tree

There are two type of decision tree.

1. Categorical variable decision tree.

2. Continuous variable decision tree.

We can understand above mentioned two terms with the help of following example.

Let's take a student and based of his previous academic record, we want to predict whether he/she will pass in the next examination or not means (YES/NO), this is the example of categorical because no numeric terms are there for target feature.

But in the same example, if we want to predict the percentage will be achieved by that student, it's the example of continuous variable decision tree.

Algorithm

There are so many approach for decision tree algorithm but ID 3 is one of the most common decision tree algorithm.

ID 3 stands of Iterative Dichtomisers, means dividing a things (features) into two completely opposite things.

Algorithm iteratively divides attributes into two groups.

Calculate the entropy and information gain for each attribute.(this is use for selecting which attribute should divided next into subgroup and work as tree or decision node)

Terminologies in Decision Tree

Root node

Parent node

Child node

Splitting

Branch/Subgroup

leaf nodes

Formula's

Information gain = ( -P /(P+N)) ㏒₂(P / (P+N)) - (N /(P+N))㏒₂(N/(P+N))

Entropy = -p㏒₂p - Q㏒₂Q

where q = 1-p (probability)

P = positive outcomes

N = negative outcomes

P+N = total number of outcomes.

If the sample is completely homogeneous then entropy will be zero.

I hope, now you have better understanding of Decision Tree Algorithm and how's it work.

If you have any doubt, let me know in comment section.

connect with us from the following link-

www.instagram.com/datasciencewithkp27

Thanking You.

Comments

Kavyansh PandeyApril 20, 2020 at 9:31 PM
Thank you.
ReplyDelete
Replies
Lokesh ChandaniApril 20, 2020 at 9:50 PM
Informative and innovative.....good going😊
ReplyDelete
Replies
Nimisha pandeyApril 20, 2020 at 10:04 PM
Clear content. Goooddd
ReplyDelete
Replies

Add comment

Search This Blog

All about Data Science

Featured

Skillset, topics, projects and virtual internships for DS