Support Vector Machine
Links
Support Vector Machine
Zulaikha Lateef: Support Vector Machine in R: Using SVM to Predict Heart Disease
SVM is a supervised machine learning algorithm to classify data into different classes.
It uses a hyperplane to act as a decision boundary between two classes.
It can generate multiple separating hyperplanes to divide data into multiple segments.
SVR (support vector regression) is used for regression problems.
SVM can classify non-linear data using kernel trick which transforms data into another dimension that has a clear dividing margin between classes of data.
The closest data points to the hyperplane are known as support vectors.
The optimum hyperplane will have a maximum distance from each of the support vectors. And this distance between the hyperplane and the support vectors is known as the margin.
Non-linear support vector machine uses a kernal to transform data into another dimension that has a clear dividing margin.
Tom Sharp: An Introduction to Support Vector Regression (SVR)
The objective function of SVR is to minimize the coefficients, l2-norm of the coefficient vector. \[MIN \frac{1}{2} ||w||^2\]
Constrain: to set a absolute error less than or equal to a specified margin \[|y_i - w_i x_i| \le \epsilon\]
Another hyperparameter: slack variable (\(\xi\)) denotes the deviation from the margin for some data points fall outside of \(\epsilon\)
Minimize \[MIN \frac{1}{2} ||w||^2 + C\sum_{i=1}^{n}|\xi_i|\]
Constrain \[|y_i - w_i x_i| \le \epsilon + |\xi|\]
Kevin Swersky: Support Vector Machines vs Logistic Regression
Logistic regression focuses on maximumizing the probability of the data. The farther the data points lie from the separating hyperplance, the happier LR is.
SVM tries to find the a seprarating hyperplance that maximizes the distance of the closest points to the margin (the support vectors). If a point is not a support vector, it doesn’t really matter.
We don’t care about getting the right probability, but just want to make the right decision
For LR, we express this as a constrain on the likelyhood ratio \[\frac{P(y=1|x)}{P(y=0|x)} \ge C ; C \ge 1\]
Put a quadratic penalty on the weights to make the solution unique on LR, gives SVM. SVM is derived by asking LR to make a right decision.
R CARET package
- SVM with CARET
- Support Vector Machine with linear kernel
- Cost of C is a tuning parameter that determines the possible misclassification. It imposes a penalty to the model for making an error. The higher the value of C, the less likely the SVM algorithm to misclassify a point.
Linear Kernel
Preprocess data, center and scale
library(tidyverse)
library(caret)
data("PimaIndiansDiabetes2", package = "mlbench")
<- na.omit(PimaIndiansDiabetes2)
pima.data # Inspect the data
sample_n(pima.data, 3)
pregnant glucose pressure triceps insulin mass pedigree age diabetes
522 3 124 80 33 130 33.2 0.305 26 neg 459 10 148 84 48 237 37.6 1.001 51 pos 544 4 84 90 23 56 39.5 0.159 25 neg
# Set up Repeated k-fold Cross Validation
<- trainControl(method="repeatedcv", number=10, repeats=3)
train_control
# Fit the model
<- train(diabetes ~.
svm1 data = pima.data
, method = "svmLinear"
, trControl = train_control
, preProcess = c("center","scale"))
, #View the model
svm1
Support Vector Machines with Linear Kernel
392 samples 8 predictor 2 classes: ‘neg’, ‘pos’
Pre-processing: centered (8), scaled (8) Resampling: Cross-Validated (10 fold, repeated 3 times) Summary of sample sizes: 353, 352, 353, 353, 353, 353, … Resampling results:
Accuracy Kappa
0.7814744 0.4758302
Tuning parameter ‘C’ was held constant at a value of 1
Tune on Cost
# Fit the model
<- train(diabetes ~., data = pima.data, method = "svmLinear", trControl = train_control, preProcess = c("center","scale"), tuneGrid = expand.grid(C = seq(0, 2, length = 20)))
svm2 #View the model
svm2
Support Vector Machines with Linear Kernel
392 samples 8 predictor 2 classes: ‘neg’, ‘pos’
Pre-processing: centered (8), scaled (8) Resampling: Cross-Validated (10 fold, repeated 3 times) Summary of sample sizes: 353, 353, 353, 353, 353, 352, … Resampling results across tuning parameters:
C Accuracy Kappa
0.0000000 NaN NaN 0.1052632 0.7794017 0.4715285 0.2105263 0.7794017 0.4713448 0.3157895 0.7827991 0.4790678 0.4210526 0.7819444 0.4767819 0.5263158 0.7844872 0.4818525 0.6315789 0.7844872 0.4818525 0.7368421 0.7861752 0.4858907 0.8421053 0.7870299 0.4876050 0.9473684 0.7870299 0.4876050 1.0526316 0.7861752 0.4859893 1.1578947 0.7861752 0.4859893 1.2631579 0.7870299 0.4876050 1.3684211 0.7878846 0.4900540 1.4736842 0.7870299 0.4884383 1.5789474 0.7870299 0.4878579 1.6842105 0.7861752 0.4862422 1.7894737 0.7861752 0.4862422 1.8947368 0.7861752 0.4862422 2.0000000 0.7861752 0.4862422
Accuracy was used to select the optimal model using the largest value. The final value used for the model was C = 1.368421.
# Plot model accuracy vs different values of Cost
plot(svm2)
# Print the best tuning parameter C that
# maximizes model accuracy
$bestTune svm2
C
14 1.368421
<-as_tibble(svm2$results[which.min(svm2$results[,2]),])
res2 res2
A tibble: 1 x 5
C Accuracy Kappa AccuracySD KappaSD
Non-Linear Kernel
# Fit the model
<- train(diabetes ~., data = pima.data, method = "svmRadial", trControl = train_control, preProcess = c("center","scale"), tuneLength = 10)
svm3 # Print the best tuning parameter sigma and C that maximizes model accuracy
$bestTune svm3
sigma C
2 0.1392566 0.5
svm3
Support Vector Machines with Radial Basis Function Kernel
392 samples 8 predictor 2 classes: ‘neg’, ‘pos’
Pre-processing: centered (8), scaled (8) Resampling: Cross-Validated (10 fold, repeated 3 times) Summary of sample sizes: 353, 353, 352, 353, 353, 353, … Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.7578419 0.4119995 0.50 0.7646368 0.4370571 1.00 0.7620940 0.4349336 2.00 0.7561111 0.4247877 4.00 0.7528205 0.4239482 8.00 0.7408333 0.3968142 16.00 0.7212607 0.3549167 32.00 0.7076282 0.3361284 64.00 0.6939744 0.3145535 128.00 0.6991667 0.3285441
Tuning parameter ‘sigma’ was held constant at a value of 0.1392566 Accuracy was used to select the optimal model using the largest value. The final values used for the model were sigma = 0.1392566 and C = 0.5.
#save the results for later
<-as_tibble(svm3$results[which.min(svm3$results[,2]),])
res3 res3
A tibble: 1 x 6
sigma C Accuracy Kappa AccuracySD KappaSD
R e1071 package
TechVidvan: SVM in R for Data Classification using e1071 Package
Generate a two-dimension data
set.seed(100)
<- matrix(rnorm(40),20,2)
x <- rep(c(-1,1),c(10,10))
y == 1,] = x[y == 1,] + 1
x[y plot(x, col = y + 3, pch = 19)
library(e1071)
= data.frame(x, y = as.factor(y)) data
Linear Kernel
= svm(y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)
data.svm print(data.svm)
Call: svm(formula = y ~ ., data = data, kernel = “linear”, cost = 10, scale = FALSE)
Parameters: SVM-Type: C-classification SVM-Kernel: linear cost: 10
Number of Support Vectors: 5
plot(data.svm, data)
R flexmix package
Two Discreate Slopes Linear Model
https://cran.r-project.org/web/packages/flexmix/vignettes/flexmix-intro.pdf
library(flexmix)
set.seed(2021)
<- 1:10
x <- x * 2 + rnorm(10, 0, 1.5)
y1 <- x * 4 + rnorm(10, 0, 1.5)
y2 <- rbind(data.frame(x, y=y1, type=2), data.frame(x, y=y2, type=4))
t
<- flexmix(y ~ x + 0, data = t, k = 2)
m
print(m)
Call: flexmix(formula = y ~ x + 0, data = t, k = 2)
Cluster sizes: 1 2 9 11
convergence after 11 iterations
summary(m)
Call: flexmix(formula = y ~ x + 0, data = t, k = 2)
prior size post>0 ratio
Comp.1 0.478 9 13 0.692 Comp.2 0.522 11 12 0.917
‘log Lik.’ -48.27758 (df=5) AIC: 106.5552 BIC: 111.5338
parameters(m)
Comp.1 Comp.2
coef.x 4.167150 2.090996 sigma 1.721344 1.338027
clusters(m)
[1] 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1
table(t$type, clusters(m))
1 2
2 0 10 4 9 1
plot(m)
summary(refit(m))
$Comp.1 Estimate Std. Error z value Pr(>|z|)
x 4.167407 0.085556 48.71 < 2.2e-16 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
$Comp.2 Estimate Std. Error z value Pr(>|z|)
x 2.090862 0.066312 31.531 < 2.2e-16 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Computing Environment
sessionInfo()
R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] flexmix_2.3-17 e1071_1.7-4 caret_6.0-86
[4] lattice_0.20-41 forcats_0.5.1 stringr_1.4.0
[7] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[10] tibble_3.0.6 tidyverse_1.3.0 Wu_0.0.0.9000
[13] flexdashboard_0.5.2 lme4_1.1-26 Matrix_1.2-18
[16] mgcv_1.8-33 nlme_3.1-149 png_0.1-7
[19] scales_1.1.1 nnet_7.3-14 labelled_2.7.0
[22] kableExtra_1.3.2 plotly_4.9.3 gridExtra_2.3
[25] ggplot2_3.3.3 DT_0.17 tableone_0.12.0
[28] magrittr_2.0.1 lubridate_1.7.9.2 dplyr_1.0.4
[31] plyr_1.8.6 data.table_1.13.6 rmdformats_0.3.7
[34] knitr_1.31
loaded via a namespace (and not attached): [1] minqa_1.2.4 colorspace_2.0-0 modeltools_0.2-23
[4] ellipsis_0.3.1 class_7.3-17 fs_1.5.0
[7] rstudioapi_0.13 prodlim_2019.11.13 fansi_0.4.2
[10] xml2_1.3.2 codetools_0.2-16 splines_4.0.3
[13] jsonlite_1.7.2 nloptr_1.2.2.2 pROC_1.16.2
[16] broom_0.7.1 kernlab_0.9-29 dbplyr_2.0.0
[19] compiler_4.0.3 httr_1.4.2 backports_1.2.0
[22] assertthat_0.2.1 lazyeval_0.2.2 survey_4.0
[25] cli_2.3.0 htmltools_0.5.1.1 tools_4.0.3
[28] gtable_0.3.0 glue_1.4.2 reshape2_1.4.4
[31] Rcpp_1.0.6 cellranger_1.1.0 jquerylib_0.1.3
[34] vctrs_0.3.6 iterators_1.0.13 timeDate_3043.102
[37] xfun_0.21 gower_0.2.2 ps_1.5.0
[40] rvest_0.3.6 lifecycle_1.0.0 statmod_1.4.35
[43] MASS_7.3-53 ipred_0.9-9 hms_1.0.0
[46] yaml_2.2.1 sass_0.3.1 rpart_4.1-15
[49] stringi_1.5.3 highr_0.8 foreach_1.5.1
[52] boot_1.3-25 lava_1.6.8.1 rlang_0.4.10
[55] pkgconfig_2.0.3 evaluate_0.14 recipes_0.1.15
[58] htmlwidgets_1.5.3 tidyselect_1.1.0 bookdown_0.21
[61] R6_2.5.0 generics_0.1.0 DBI_1.1.1
[64] pillar_1.4.7 haven_2.3.1 withr_2.4.1
[67] survival_3.2-7 modelr_0.1.8 crayon_1.4.1
[70] utf8_1.1.4 rmarkdown_2.7 grid_4.0.3
[73] readxl_1.3.1 ModelMetrics_1.2.2.2 reprex_0.3.0
[76] digest_0.6.27 webshot_0.5.2 stats4_4.0.3
[79] munsell_0.5.0 viridisLite_0.3.0 bslib_0.2.4
[82] mitools_2.4