Skip to main content

Featured

Skillset, topics, projects and virtual internships for DS

This post is for those who are beginner and do not have any idea about topics that they need as a beginner DATA SCIENCE/ DATA ANALYST.  I am also facing the same problem before a year ago and till date I have some relevant knowledge about data science and also have some projects.  People are saying the we need so many skills like Mathematics, Programming language, some cloud concepts too. Actually they are right. Being a Data Scientist is not like being a web developer or a front-end developer that have limited skill set.  In this post I will tell you the exact topics that you need to learn at beginner level. MATHEMATICS Descriptive Statistics, distributions, hypothesis testing and regression analysis. Bayesian Thinking, conditional probability, priors, maximum likely hood. Vectors and matrices Matrices operations Eigenvalues and eigenvectors Linear and non linear functions Multivariable calculus  PROGRAMMING LANGUAGE(Python or R)    Data types, String oper...

Precision | Recall | Accuracy | F1 score | ML model evaluation parameters

Precision | Recall | Accuracy | F1 score

Evaluation of a machine learning model is based on four parameters and that are - 
1. Accuracy
2. Recall
3. precision
4. F1 score

For understanding of above mentioned four parameters, firstly we need to understand what is confusion metrics?


Confusion metrics - 

It is a two dimensional metrics and these two parameters are actual results and prediction results. It is used for describing the performance of a machine learning classification model with the help of True and False values.

It contain four type of values.
For understanding of below mentioned terminologies, let's take an example - "suppose a person is in the doctor clinic and want to know whether he is suffering from fever or his health is normal. so what are the possibilities of results."


TN (True Negative) Neither person actually suffering from fever nor doctor predict ,he is suffering.
TP (True Positive) person actually suffering from fever as well as doctor predict ,he is suffering
FP (False Positive) person is actually not suffering from fever but doctor predict, he is suffering.
FN (False Negative) person is actually suffering from fever but doctor predict, he is not suffering.

With the help of above mentioned (TN, TP, FP, FN), we evaluate all parameters.

Precision -

Precision is the ration of  'True Positives'  with predicted 'Positives'. As clear from the above figure i.e; predicted positives contains True Positives and False Positives.
formula - TP / (TP + FP)

Recall -

Recall is the ratio of  'True Positives' with Actual 'Positives'. And actual positives contains True Positives and False Negatives.
formula - TP/ (TP + FN)

Accuracy -

It is the ratio of  'True Positives' and 'True Negatives' with Total observations. 
formula - (TP + TN)  /  (TP + TN + FN + FP)

F1 score -

F1 score is the weighted average of Precision and Recall and can be calculated by following formula.
F1 score = 2*(Recall * Precision) / (Recall + Precision)

Accuracy and F1 score both are quite similar but F1 score is more useful than accuracy because accuracy works better when False Negative and False Positive have similar costs(value) so on that condition we should take F1 score for model selection.

I hope that now you have better understanding regarding terminologies for machine learning model evaluation.
If you found this post helpful , share with your friend. If  you have any doubts, just drop a comment.
Connect with us -

Thank You for reading this.

Comments

Post a Comment