Support vector machine is basically used for supervised learning means it deals with classification and regression and the dataset which is used with this, is labeled. It is one of the popular algorithm that is used for classification problem.This algorithm is powerful as well as clear for complex task.
Definition
SVM is a supervised algorithm that classifies cases by finding a separator and it involves two steps that are following.
1.Mapping data into a high dimensional feature space.
2.Finding a separator.
Above figure have three blocks. In first block model is taking dataset ,after then model will be trained on that dataset, in second phase, it will be tested for some inputs and at last we will get the result.
Support vector machine generally deals with two types of data, first is linear separable and next one is non separable data.
It is an example of non linear separable data.
We use kernalling to solve non linear problem.
It is nothing but conversion of non linear into linear by using polynomials, sigmoid function etc.
eg. non linear data with one variable can be easily converted in to linear separable by using following function.
f(x) = [ x : x^2 ]
In this figure data can be separated by a single straight line, that's why it is linear separable data.
some terms that are related with SVM are given below.
1. Hyperplane - Straight line which divides datasset into two clusters and separate the whole dataset in to two parts is know as hyperplane. It is also known as decision boundary.
2. Margin- two straight lines are drawn parallel to hyperplane. The distance between hyperplane and each parallel lines is known as margin. It is denoted by D- and D+ respectively. It is the distance of parallel lines.
3. Support vectors - two data points that are nearest to the parallel lines is known as support vectors.
Significance of Margin
We only take that model which have greater margin because models having large margin are more accurate and it will be good for future prediction.
distance between margin is directly proportional to accuracy.
Advantage
1. Accurate in high dimensional space
2. memory efficient.
Disadvantage
1. leads to over-fitting.
2. No probability estimation.
3. Useful for small datasets.
Applications
1. Image classification.
2. text mining.
3. hand writing recognition.
4. genes expression classification.
Implementation of SVM algorithm
here I am going to upload snapshots of code , if you guys want the whole project based on Support vector machine, just drop a comment I'll explain the whole project to you.
After running this script you'll get the value
0.9635036496350365 which is almost 1, means our model has approx 96 % accuracy that will be better for future prediction.
If you have any doubt, just drop a comment.
connect with us on -
https://www.instagram.com/kavyansh.pandey
Thank You.
👌 👌 👌 👌 👌 👌 👌 👌
ReplyDelete