Skip to main content

Featured

Skillset, topics, projects and virtual internships for DS

This post is for those who are beginner and do not have any idea about topics that they need as a beginner DATA SCIENCE/ DATA ANALYST.  I am also facing the same problem before a year ago and till date I have some relevant knowledge about data science and also have some projects.  People are saying the we need so many skills like Mathematics, Programming language, some cloud concepts too. Actually they are right. Being a Data Scientist is not like being a web developer or a front-end developer that have limited skill set.  In this post I will tell you the exact topics that you need to learn at beginner level. MATHEMATICS Descriptive Statistics, distributions, hypothesis testing and regression analysis. Bayesian Thinking, conditional probability, priors, maximum likely hood. Vectors and matrices Matrices operations Eigenvalues and eigenvectors Linear and non linear functions Multivariable calculus  PROGRAMMING LANGUAGE(Python or R)    Data types, String operations, Expressions and varia

Decision Tree | Machine learning | Supervised Learning Algorithm

Decision tree is a supervised algorithm and use for solving classification and regression problem means it uses for prediction and forming clusters.

As we know modeling of machine learning model involves so many steps like collection of data, pre-processing on data, model training (In this phase, we give data to our model) and at last, we have testing phase where we test our model on number of input values.

It represents a function that takes input in the form of vector and return a decision as an output, so the goal of decision tree based machine learning model that can use to predict the class or value of the target variable by learning simple decision rules which is based on the previous data.


Target variable are those features, which a model want to predict or classify according to that feature.

Types of Decision Tree

There are two type of decision tree.
1. Categorical variable decision tree.
2. Continuous variable decision tree.

We can understand above mentioned two terms with the help of following example.

Let's take a student and based of his previous academic record, we want to predict whether he/she will pass in the next examination or not means (YES/NO), this is the example of categorical because no numeric terms are there for target feature.
But in the same example, if we want to predict the percentage will be achieved by that student, it's the example of continuous variable decision tree.

Algorithm

There are so many approach for decision tree algorithm but ID 3 is one of the most common decision tree algorithm.
ID 3 stands of Iterative Dichtomisers, means dividing a things (features) into two completely opposite things.
Algorithm iteratively divides attributes into two groups.
Calculate the entropy and information gain for each attribute.(this is use for selecting which attribute should divided next into subgroup and work as tree or decision node)

Terminologies in Decision Tree 

Root node
Parent node
Child node
Splitting
Branch/Subgroup
leaf nodes

Formula's

Information gain = ( -P /(P+N)) ㏒₂(P / (P+N)) - (N /(P+N))㏒₂(N/(P+N))

Entropy = -p㏒₂p - Q㏒₂Q

where q = 1-p (probability)
          P = positive outcomes
          N = negative outcomes
        P+N = total number of outcomes.
If the sample is completely homogeneous then entropy will be zero.

I hope, now you have better understanding of Decision Tree Algorithm and how's it work.

If you have any doubt, let me know in comment section.
connect with us from the following link-

Thanking You.

Comments

Post a Comment

Popular Posts