Restricted Cubic Spline

Cubic Spline Function

Suppose a continuous variable x is divided into three intervals at cutpoints a and b, a model can be built between y and x by assuming piecewise linear in which there is a linear relationship for each of three intervals but with different slope.

A cubic spline curve is a piecewise cubic curve with continuous second derivative. Instead of a linear line, a cubic polynomial has a good ability to model a non-linear relationship by fitting a curve. Meanwhile, a piecewise cubic spline is smooth on the joint points (cutpoints) by forcing the first and second derivatives to be same on the joint points.

A piecewise cubic spline with two cutpoints a and b on the x variable has 5 regression coefficients in the model related to the variable x.

\[Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + \beta_4X_4 + \beta_5X_5 + \cdots \beta_{\cdots}X_{\cdots}\]

\(X_1 = X\)
\(X_2 = X^2\)
\(X_3 = X^3\)
\(X_4 = (X-a)^3_+\), any value of X less than a is set as zero.
\(X_5 = (X-b)^3_+\), any value of X less than b is set as zero.
Notation
- \(U_+ = 0\), if \(U <= 0\)
- \(U_+ = U\), if \(U > 0\)

Restricted Cubic Spline

The restricted cubic spline restrains the functions to be linear before the first knot and after the last knot. It has only \(k - 1\) parameters must be estimated as opposed to \(k + 3\) parameters with unrestricted cubic spline where \(k\) is the number of knots.

\[f(X) = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_{[k-1]}X_{k-1}\] where \(X_1 = X\) and for \(j = 1,...,k-2\), \[X_{j+1} = (X-t_j)_+^3 - (X-t_{k-1})_+^3(t_k - t_j)/(t_k - t_{k-1}) + (X - t_k)_+^3(t_{k-1} - t_j)/(t_k - t_{k-1})\]

In case of four knots \(t_1, t_2, t_3, t_4\) which means that there are four cutpoints with five intervals on a line, then \(k = 4\), there should be \(k - 1 = 3\) variables \(X_1, X_2, X_3\) being generated and put into the model, and three corresponding parameters need to be estimated with the restricted cubic spline.

\(X_1 = X\)
For \(j = 1,...,k-2\), when \(k = 4\), \(j\) has values of \(1\) and \(2\).
- When \(j = 1, k = 4\): \(X_2 = (X - t_1)_+^3 - (X - t_3)_+^3(t_4 - t_1)/(t_4 - t_3) + (X - t_4)_+^3(t_3 - t_1)/(t_4 - t_3)\)
- When \(j = 2, k = 4\): \(X_3 = (X - t_2)_+^3 - (X - t_3)_+^3(t_4 - t_2)/(t_4 - t_3) + (X - t_4)_+^3(t_3 - t_2)/(t_4 - t_3)\)
Reference: Frank Harrel, Hmisc Package, “rscpline.eval” Function

library(Hmisc)
x <- seq(0,100,by=1)

knots <- quantile(x,c(0.05,0.35,0.65,0.95))
t <- quantile(x,c(0.05,0.35,0.65,0.95))

y <- rcspline.eval(x,nk=4,inclx = TRUE)
y <- as.data.frame(y)
## str(y)
## dim(y)

y$y2 <- (x >= t[1])*(x - t[1])^3 - (x >= t[3])*(x - t[3])^3*(t[4]-t[1])/(t[4]-t[3]) + (x >= t[4])*(x-t[4])^3*(t[3]-t[1])/(t[4]-t[3])
y$y3 <- (x >= t[2])*(x - t[2])^3 - (x >= t[3])*(x - t[3])^3*(t[4]-t[2])/(t[4]-t[3]) + (x >= t[4])*(x-t[4])^3*(t[3]-t[2])/(t[4]-t[3])


tau <- (t[4] - t[1])^2
y$V2b <- y$V2*tau
y$V3b <- y$V3*tau
# View(y)

## sum(y$V3b - y$y3)
## sum(y$V2b - y$y2)

plot(y$x, y$V2)

plot(y$x, y$V3)

gam Package

set.seed(123456)
dt <- data.table(x = y$x, y = y$V3)
ids <- sample(1:nrow(dt), size = 250, replace = TRUE)

dt <- dt[ids, ]
p0 <- 0.2
dt <- dt[, odds0 := p0 / (1 - p0)
         ][, log_odds := log(odds0) + log(y)
           ][, p := exp(log_odds) / (1 + exp(log_odds))]

vsample <- function(p){
    sample(c(1, 0), size = 1, replace = TRUE, prob = c(p, 1 - p))
}

vsample <- Vectorize(vsample)

dt <- dt[, outcome := vsample(p)]

library(mgcv)
library(sjPlot)
m <- gam(
    outcome ~ s(x, bs = "cr", k = 10)
  , data=dt
  , method="REML"
  , family = binomial
)


prt_coef.gam <- function(obj, exp = FALSE, digits = 2){
    obj_summary <- summary(obj)
    rtn <- data.table(
        coef_name = names(obj_summary$p.coeff)
      , coef_value = obj_summary$p.coeff
      , df = sum(obj$edf)
    )
    rtn <- merge(
        x = rtn
      , y = data.table(coef_name = names(obj_summary$se), se = obj_summary$se)
      , on = "coef_name"
      , all.x = TRUE
      , all.y = FALSE
    )
    rtn[, coef_value_lower := coef_value + qt(0.025, 999) * se
        ][, coef_value_upper := coef_value + qt(0.975, 999) * se
          ]
    if (exp == TRUE){
        rtn[, coef_value := exp(coef_value)
            ][, coef_value_lower := exp(coef_value_lower)
              ][, coef_value_upper := exp(coef_value_upper)]
    }
    rtn[, ci := paste0(
              format(round(coef_value, digits), nsmall = digits)
            , "("
            , format(round(coef_value_lower, digits), nsmall = digits)
            , ","
            , format(round(coef_value_upper, digits), nsmall = digits)
            , ")")
        ]
    ## print(rtn)

    if (obj$family$family == "multinom"){
        ## print(obj$family$family)
    }
    return(rtn)
}

get_coef.gam <- function(obj, exp=FALSE, digits=2){
    obj_summary <- summary(obj)
    rtn <- obj_summary$p.table
    rtn <- as.data.table(rtn)
    colnames(rtn) <- c("coef_value", "se", "t_value", "p_value")
    rtn$coef_name <- rownames(obj_summary$p.table)
    rtn <- rtn[, list(coef_name, coef_value, se, t_value, p_value)]
    rtn <- rtn[, coef_value_lower := coef_value + qnorm(0.025) * se
               ][, coef_value_upper := coef_value + qnorm(0.975) * se
                 ]
    if (exp == TRUE) {
        rtn <- rtn[, coef_value := exp(coef_value)
                   ][, coef_value_lower := exp(coef_value_lower)
                     ][, coef_value_upper := exp(coef_value_upper)]
    }
    rtn <- rtn[, ci := paste0(
                     format(round(coef_value, digits), nsmall = digits)
                   , "("
                   , format(round(coef_value_lower, digits), nsmall = digits)
                   , ","
                   , format(round(coef_value_upper, digits), nsmall = digits)
                   , ")")
               ]
}

t <- get_coef.gam(m)
t[, list(coef_name, ci, p_value)] %>%
    prt(caption="Fixed Effects", col.names=c("Variable", "Effect (95% CI)", "p-value"))

Fixed Effects
Variable	Effect (95% CI)	p-value
(Intercept)	-3.29(-5.82,-0.77)	0.0105635

t2 <- anova(m)$s.table
t2 <- as.data.table(t2, keep.rownames = T)
t2[-2, -(3:4)] %>% prt(caption="Wald tests of the smooth term (experience)")

Wald tests of the smooth term (experience)
rn	edf	p-value
s(x)	2.618781	0

get_pred <- function(obj){
    m_age <- median(dt$age)
    ## 59;
    m_bmi <- median(dt$bmi, na.rm = TRUE)
    ## 27.4
    m_apacheiiscore <- median(dt$apacheiiscore)
    ## 21
    m_baselinesat_dcs <- median(dt$baselinesat_dcs, na.rm = TRUE)
    ## 99
    newdata <- data.table(
        x = dt$x
    )
    prd <- predict(obj, newdata, type = "link", se.fit = TRUE)
    newdata$fit <- prd$fit
    newdata$se <- prd$se.fit
    newdata <- newdata[
      , lower := fit - qnorm(0.975) * se
    ][, upper := fit + qnorm(0.975) * se]
    return(newdata)
}

prd_m1 <- get_pred(m)

x_breaks <- prd_m1[x %in% seq(0, 100, 25)]$x
x_labels <- prd_m1[x %in% seq(0, 100, 25)]$x


dtm <- m$model
dtm$ystart <- -30
dtm$yend <- -28

xj <- jitter(dtm$x, 0.5)

bd <- ggplot(data = prd_m1
           , aes(x = x
               , y = fit)) +
    geom_line(color = "#666666") +
    geom_ribbon(aes(ymin = lower, ymax = upper, linetype = NA)
              , color = "#666666"
              , alpha = 0.2
              , inherit.aes = TRUE
              , show.legend = FALSE
                ) +
    geom_segment(data=dtm
               , x=xj
               , xend=xj
               , y=dtm$ystart
               , yend=dtm$yend
               , alpha=0.3
               , color = "#666666"
                 ) +
    labs(x = "x"
       , y = "Linear Predictor") +
    scale_x_continuous(breaks = x_breaks
                     , labels = x_labels
                     , limits = c(0, 100)
                     , expand = c(0, 0)) +
    scale_y_continuous(breaks = seq(-30, 5, by=5)
                     , limits = c(-30, 5)
                     , expand = c(0, 0)) +
    theme(
        ## panel.grid.major = element_blank()
        ## , panel.grid.minor = element_blank()
        panel.background = element_blank()
        ## , axis.ticks.x = element_blank()
          )
bd

Plot Nonlinear Relationship

plot_nonlinear(x = x, y = outcome, data = dt)

Computing Environment

sessionInfo()

R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] sjPlot_2.8.10 Hmisc_4.7-0 Formula_1.2-4
[4] survival_3.2-13 lattice_0.20-45 Wu_0.0.0.9000
[7] flexdashboard_0.5.2 lme4_1.1-29 Matrix_1.4-0
[10] mgcv_1.8-38 nlme_3.1-152 png_0.1-7
[13] scales_1.2.0 nnet_7.3-16 labelled_2.9.1
[16] kableExtra_1.3.4 plotly_4.10.0 gridExtra_2.3
[19] ggplot2_3.3.6 DT_0.23 tableone_0.13.2
[22] magrittr_2.0.3 lubridate_1.8.0 dplyr_1.0.9
[25] plyr_1.8.7 data.table_1.14.2 rmdformats_1.0.4
[28] knitr_1.39

loaded via a namespace (and not attached): [1] TH.data_1.1-1 minqa_1.2.4 colorspace_2.0-3
[4] ellipsis_0.3.2 sjlabelled_1.2.0 estimability_1.4
[7] htmlTable_2.4.0 parameters_0.18.1 base64enc_0.1-3
[10] rstudioapi_0.13 farver_2.1.1 fansi_1.0.3
[13] mvtnorm_1.1-3 xml2_1.3.3 codetools_0.2-18
[16] splines_4.2.0 sjmisc_2.8.9 jsonlite_1.8.0
[19] nloptr_2.0.3 ggeffects_1.1.2 broom_0.8.0
[22] cluster_2.1.2 effectsize_0.7.0 compiler_4.2.0
[25] httr_1.4.3 sjstats_0.18.1 emmeans_1.7.5
[28] backports_1.4.1 assertthat_0.2.1 fastmap_1.1.0
[31] lazyeval_0.2.2 survey_4.1-1 cli_3.3.0
[34] htmltools_0.5.3 tools_4.2.0 coda_0.19-4
[37] gtable_0.3.0 glue_1.6.2 Rcpp_1.0.9
[40] jquerylib_0.1.4 vctrs_0.4.1 svglite_2.1.0
[43] insight_0.18.0 xfun_0.31 stringr_1.4.0
[46] rvest_1.0.2 lifecycle_1.0.1 klippy_0.0.0.9500
[49] MASS_7.3-54 zoo_1.8-10 hms_1.1.1
[52] sandwich_3.0-2 RColorBrewer_1.1-3 yaml_2.3.5
[55] sass_0.4.1 rpart_4.1-15 latticeExtra_0.6-29 [58] stringi_1.7.8 highr_0.9 bayestestR_0.12.1
[61] checkmate_2.1.0 boot_1.3-28 rlang_1.0.4
[64] pkgconfig_2.0.3 systemfonts_1.0.4 evaluate_0.15
[67] purrr_0.3.4 labeling_0.4.2 htmlwidgets_1.5.4
[70] tidyselect_1.1.2 bookdown_0.27 R6_2.5.1
[73] generics_0.1.3 multcomp_1.4-19 DBI_1.1.2
[76] pillar_1.8.0 haven_2.5.0 foreign_0.8-81
[79] withr_2.5.0 datawizard_0.4.1 tibble_3.1.8
[82] performance_0.9.1 modelr_0.1.8 utf8_1.2.2
[85] rmarkdown_2.14 jpeg_0.1-9 grid_4.2.0
[88] forcats_0.5.1 digest_0.6.29 webshot_0.5.3
[91] xtable_1.8-4 tidyr_1.2.0 munsell_0.5.0
[94] viridisLite_0.4.0 bslib_0.3.1 mitools_2.4