A Support Vector Machine (SVM) is a very effective and adaptable Machine Learning model that can do linear or nonlinear classification, regression, and even outlier identification. In this notebook, we will learn about the support vector machine technique and how to apply it in scikit-learn. We will also learn about Principal Component Analysis and how to use it with Scikit-learn.

Many people favor support vector machines because they generate considerable accuracy while using minimal compute power. SVM, or Support Vector Machine, may be used for both regression and classification applications. However, it is commonly employed in classification aims.

The goal of the support vector machine technique is to identify a hyperplane in an N-dimensional space (N = the number of characteristics) that categorizes the data points clearly. Based on the figure above. There are several hyperplanes that might be used to split the two groups of data points. Our goal is to discover a plane with the greatest margin, i.e. the greatest distance between data points from both classes. Maximizing the margin distance gives some reinforcement, allowing subsequent data points to be classified with more certainty.

SVM Implementation in Python

We will use support vector machine in Predicting if the cancer diagnosis is benign or malignant based on several observations/features. 30 features are used, examples: - radius (mean of distances from center to points on the perimeter). - texture (standard deviation of gray-scale values). - perimeter. - area. - smoothness (local variation in radius lengths). - compactness (perimeter2 / area - 1.0). - concavity (severity of concave portions of the contour). - concave points (number of concave portions of the contour). - symmetry . - fractal dimension ("coastline approximation" - 1). Datasets are linearly separable using all 30 input features. Number of Instances: 569. Class Distribution: 212 Malignant, 357 Benign. Target class: - Malignant. - Benign.

Support Vector Machines (Kernels)

C parameter: Controlls trade-off between classifying training points correctly and having a smooth decision boundary.
- Small C (loose) makes cost (penalty) of misclassification low (soft margin)
- Large C (strict) makes cost of misclassification high (hard margin), forcing the model to explain input data stricter and potentially over it.

Gamma parameter: Controlls how far the influence of a single training set reaches.
- Large gamma: close reach (closer data points have high weight)
- Small gamma: far reach (more generalized solution)

Degree parameter : Degree of the polynomial kernel function ('poly'). Ignored by all other kernels.

A common approach to find the right hyperparameter values is to use grid search. It is often faster to first do a very coarse grid search, then a finer grid search around the best values found. Having a good sence of the what each hyperparameter actually does can also help you search in the right part of the hyperparameter space.

Data Preparation for SVM

This section lists some suggestions for how to best prepare your training data when learning an SVM model. Numerical Inputs: SVM assumes that your inputs are numeric. If you have categorical inputs you may need to covert them to binary dummy variables (one variable for each category). Binary Classification: Basic SVM as described in this post is intended for binary (two-class) classification problems. Although, extensions have been developed for regression and multi-class classification.

Principal Component Analysis

PCA is: Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. Unsupervised Machine Learning A transformation of your data and attempts to find out what features explain the most variance in your data. PCA with Scikit Learn follows a similar procedure to the other preprocessing methods included with Scikit Learn. We create a PCA object, use the fit method to discover the principle components, and then apply rotation and dimensionality reduction by executing transform ().


  • from sklearn.svm import SVC, LinearSVC

Full Code Exercise of Support Vector Classifier

See the full code below :