Support Vector Machine

  • Zulaikha Lateef: Support Vector Machine in R: Using SVM to Predict Heart Disease

  • SVM is a supervised machine learning algorithm to classify data into different classes.

  • It uses a hyperplane to act as a decision boundary between two classes.

  • It can generate multiple separating hyperplanes to divide data into multiple segments.

  • SVR (support vector regression) is used for regression problems.

  • SVM can classify non-linear data using kernel trick which transforms data into another dimension that has a clear dividing margin between classes of data.

  • The closest data points to the hyperplane are known as support vectors.

  • The optimum hyperplane will have a maximum distance from each of the support vectors. And this distance between the hyperplane and the support vectors is known as the margin.

  • Non-linear support vector machine uses a kernal to transform data into another dimension that has a clear dividing margin.

  • Tom Sharp: An Introduction to Support Vector Regression (SVR)

  • The objective function of SVR is to minimize the coefficients, l2-norm of the coefficient vector. \[MIN \frac{1}{2} ||w||^2\]

  • Constrain: to set a absolute error less than or equal to a specified margin \[|y_i - w_i x_i| \le \epsilon\]

  • Another hyperparameter: slack variable (\(\xi\)) denotes the deviation from the margin for some data points fall outside of \(\epsilon\)

  • Minimize \[MIN \frac{1}{2} ||w||^2 + C\sum_{i=1}^{n}|\xi_i|\]

  • Constrain \[|y_i - w_i x_i| \le \epsilon + |\xi|\]

  • Kevin Swersky: Support Vector Machines vs Logistic Regression

  • Logistic regression focuses on maximumizing the probability of the data. The farther the data points lie from the separating hyperplance, the happier LR is.

  • SVM tries to find the a seprarating hyperplance that maximizes the distance of the closest points to the margin (the support vectors). If a point is not a support vector, it doesn’t really matter.

  • We don’t care about getting the right probability, but just want to make the right decision

  • For LR, we express this as a constrain on the likelyhood ratio \[\frac{P(y=1|x)}{P(y=0|x)} \ge C ; C \ge 1\]

  • Put a quadratic penalty on the weights to make the solution unique on LR, gives SVM. SVM is derived by asking LR to make a right decision.

R CARET package

  • SVM with CARET
  • Support Vector Machine with linear kernel
  • Cost of C is a tuning parameter that determines the possible misclassification. It imposes a penalty to the model for making an error. The higher the value of C, the less likely the SVM algorithm to misclassify a point.

Linear Kernel

Preprocess data, center and scale

library(tidyverse)
library(caret)

data("PimaIndiansDiabetes2", package = "mlbench")
pima.data <- na.omit(PimaIndiansDiabetes2)
# Inspect the data
sample_n(pima.data, 3)
pregnant glucose pressure triceps insulin mass pedigree age diabetes

522 3 124 80 33 130 33.2 0.305 26 neg 459 10 148 84 48 237 37.6 1.001 51 pos 544 4 84 90 23 56 39.5 0.159 25 neg

# Set up Repeated k-fold Cross Validation
train_control <- trainControl(method="repeatedcv", number=10, repeats=3)

# Fit the model
svm1 <- train(diabetes ~.
            , data = pima.data
            , method = "svmLinear"
            , trControl = train_control
           ,  preProcess = c("center","scale"))
#View the model
svm1

Support Vector Machines with Linear Kernel

392 samples 8 predictor 2 classes: ‘neg’, ‘pos’

Pre-processing: centered (8), scaled (8) Resampling: Cross-Validated (10 fold, repeated 3 times) Summary of sample sizes: 353, 352, 353, 353, 353, 353, … Resampling results:

Accuracy Kappa
0.7814744 0.4758302

Tuning parameter ‘C’ was held constant at a value of 1

Tune on Cost

# Fit the model
svm2 <- train(diabetes ~., data = pima.data, method = "svmLinear", trControl = train_control,  preProcess = c("center","scale"), tuneGrid = expand.grid(C = seq(0, 2, length = 20)))
#View the model
svm2

Support Vector Machines with Linear Kernel

392 samples 8 predictor 2 classes: ‘neg’, ‘pos’

Pre-processing: centered (8), scaled (8) Resampling: Cross-Validated (10 fold, repeated 3 times) Summary of sample sizes: 353, 353, 353, 353, 353, 352, … Resampling results across tuning parameters:

C Accuracy Kappa
0.0000000 NaN NaN 0.1052632 0.7794017 0.4715285 0.2105263 0.7794017 0.4713448 0.3157895 0.7827991 0.4790678 0.4210526 0.7819444 0.4767819 0.5263158 0.7844872 0.4818525 0.6315789 0.7844872 0.4818525 0.7368421 0.7861752 0.4858907 0.8421053 0.7870299 0.4876050 0.9473684 0.7870299 0.4876050 1.0526316 0.7861752 0.4859893 1.1578947 0.7861752 0.4859893 1.2631579 0.7870299 0.4876050 1.3684211 0.7878846 0.4900540 1.4736842 0.7870299 0.4884383 1.5789474 0.7870299 0.4878579 1.6842105 0.7861752 0.4862422 1.7894737 0.7861752 0.4862422 1.8947368 0.7861752 0.4862422 2.0000000 0.7861752 0.4862422

Accuracy was used to select the optimal model using the largest value. The final value used for the model was C = 1.368421.

# Plot model accuracy vs different values of Cost
plot(svm2)

# Print the best tuning parameter C that
# maximizes model accuracy
svm2$bestTune
      C

14 1.368421

res2<-as_tibble(svm2$results[which.min(svm2$results[,2]),])
res2

A tibble: 1 x 5

  C Accuracy Kappa AccuracySD KappaSD

1 0.105 0.779 0.472 0.0777 0.191

Non-Linear Kernel

# Fit the model
svm3 <- train(diabetes ~., data = pima.data, method = "svmRadial", trControl = train_control, preProcess = c("center","scale"), tuneLength = 10)
# Print the best tuning parameter sigma and C that maximizes model accuracy
svm3$bestTune
  sigma   C

2 0.1392566 0.5

svm3

Support Vector Machines with Radial Basis Function Kernel

392 samples 8 predictor 2 classes: ‘neg’, ‘pos’

Pre-processing: centered (8), scaled (8) Resampling: Cross-Validated (10 fold, repeated 3 times) Summary of sample sizes: 353, 353, 352, 353, 353, 353, … Resampling results across tuning parameters:

C Accuracy Kappa
0.25 0.7578419 0.4119995 0.50 0.7646368 0.4370571 1.00 0.7620940 0.4349336 2.00 0.7561111 0.4247877 4.00 0.7528205 0.4239482 8.00 0.7408333 0.3968142 16.00 0.7212607 0.3549167 32.00 0.7076282 0.3361284 64.00 0.6939744 0.3145535 128.00 0.6991667 0.3285441

Tuning parameter ‘sigma’ was held constant at a value of 0.1392566 Accuracy was used to select the optimal model using the largest value. The final values used for the model were sigma = 0.1392566 and C = 0.5.

#save the results for later
res3<-as_tibble(svm3$results[which.min(svm3$results[,2]),])
res3

A tibble: 1 x 6

sigma C Accuracy Kappa AccuracySD KappaSD 1 0.139 0.25 0.758 0.412 0.0634 0.157

R e1071 package

TechVidvan: SVM in R for Data Classification using e1071 Package

Generate a two-dimension data

set.seed(100)
x <- matrix(rnorm(40),20,2)
y <- rep(c(-1,1),c(10,10))
x[y == 1,] = x[y == 1,] + 1
plot(x, col = y + 3, pch = 19)

library(e1071)
data = data.frame(x, y = as.factor(y))

Linear Kernel

data.svm = svm(y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)
print(data.svm)

Call: svm(formula = y ~ ., data = data, kernel = “linear”, cost = 10, scale = FALSE)

Parameters: SVM-Type: C-classification SVM-Kernel: linear cost: 10

Number of Support Vectors: 5

plot(data.svm, data)

R flexmix package

Two Discreate Slopes Linear Model

https://cran.r-project.org/web/packages/flexmix/vignettes/flexmix-intro.pdf

library(flexmix)

set.seed(2021)
x <- 1:10
y1 <- x * 2 + rnorm(10, 0, 1.5)
y2 <- x * 4 + rnorm(10, 0, 1.5)
t <- rbind(data.frame(x, y=y1, type=2), data.frame(x, y=y2, type=4))

m <- flexmix(y ~ x + 0, data = t, k = 2)

print(m)

Call: flexmix(formula = y ~ x + 0, data = t, k = 2)

Cluster sizes: 1 2 9 11

convergence after 11 iterations

summary(m)

Call: flexmix(formula = y ~ x + 0, data = t, k = 2)

   prior size post>0 ratio

Comp.1 0.478 9 13 0.692 Comp.2 0.522 11 12 0.917

‘log Lik.’ -48.27758 (df=5) AIC: 106.5552 BIC: 111.5338

parameters(m)
     Comp.1   Comp.2

coef.x 4.167150 2.090996 sigma 1.721344 1.338027

clusters(m)

[1] 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1

table(t$type, clusters(m))
 1  2

2 0 10 4 9 1

plot(m)

summary(refit(m))

$Comp.1 Estimate Std. Error z value Pr(>|z|)
x 4.167407 0.085556 48.71 < 2.2e-16 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

$Comp.2 Estimate Std. Error z value Pr(>|z|)
x 2.090862 0.066312 31.531 < 2.2e-16 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

Computing Environment

sessionInfo()

R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] flexmix_2.3-17 e1071_1.7-4 caret_6.0-86
[4] lattice_0.20-41 forcats_0.5.1 stringr_1.4.0
[7] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[10] tibble_3.0.6 tidyverse_1.3.0 Wu_0.0.0.9000
[13] flexdashboard_0.5.2 lme4_1.1-26 Matrix_1.2-18
[16] mgcv_1.8-33 nlme_3.1-149 png_0.1-7
[19] scales_1.1.1 nnet_7.3-14 labelled_2.7.0
[22] kableExtra_1.3.2 plotly_4.9.3 gridExtra_2.3
[25] ggplot2_3.3.3 DT_0.17 tableone_0.12.0
[28] magrittr_2.0.1 lubridate_1.7.9.2 dplyr_1.0.4
[31] plyr_1.8.6 data.table_1.13.6 rmdformats_0.3.7
[34] knitr_1.31

loaded via a namespace (and not attached): [1] minqa_1.2.4 colorspace_2.0-0 modeltools_0.2-23
[4] ellipsis_0.3.1 class_7.3-17 fs_1.5.0
[7] rstudioapi_0.13 prodlim_2019.11.13 fansi_0.4.2
[10] xml2_1.3.2 codetools_0.2-16 splines_4.0.3
[13] jsonlite_1.7.2 nloptr_1.2.2.2 pROC_1.16.2
[16] broom_0.7.1 kernlab_0.9-29 dbplyr_2.0.0
[19] compiler_4.0.3 httr_1.4.2 backports_1.2.0
[22] assertthat_0.2.1 lazyeval_0.2.2 survey_4.0
[25] cli_2.3.0 htmltools_0.5.1.1 tools_4.0.3
[28] gtable_0.3.0 glue_1.4.2 reshape2_1.4.4
[31] Rcpp_1.0.6 cellranger_1.1.0 jquerylib_0.1.3
[34] vctrs_0.3.6 iterators_1.0.13 timeDate_3043.102
[37] xfun_0.21 gower_0.2.2 ps_1.5.0
[40] rvest_0.3.6 lifecycle_1.0.0 statmod_1.4.35
[43] MASS_7.3-53 ipred_0.9-9 hms_1.0.0
[46] yaml_2.2.1 sass_0.3.1 rpart_4.1-15
[49] stringi_1.5.3 highr_0.8 foreach_1.5.1
[52] boot_1.3-25 lava_1.6.8.1 rlang_0.4.10
[55] pkgconfig_2.0.3 evaluate_0.14 recipes_0.1.15
[58] htmlwidgets_1.5.3 tidyselect_1.1.0 bookdown_0.21
[61] R6_2.5.0 generics_0.1.0 DBI_1.1.1
[64] pillar_1.4.7 haven_2.3.1 withr_2.4.1
[67] survival_3.2-7 modelr_0.1.8 crayon_1.4.1
[70] utf8_1.1.4 rmarkdown_2.7 grid_4.0.3
[73] readxl_1.3.1 ModelMetrics_1.2.2.2 reprex_0.3.0
[76] digest_0.6.27 webshot_0.5.2 stats4_4.0.3
[79] munsell_0.5.0 viridisLite_0.3.0 bslib_0.2.4
[82] mitools_2.4