Sparse regularisation · resampling for honest test error · choosing λ
Asset2026_surname1_surname2_surname3. Email it to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates.Scope
We will:
We will NOT:
Approach
Part I — Foundations
Part II — Application
Foundations
Week 1
15.04.2026
Course outline · Backtesting fundamentals
Introduction to R
Week 2
22.04.2026
RStudio · variables · vectors · data frames · live coding
Assessing model accuracy & Ridge regression
Week 3
29.04.2026
Statistical learning · MSE · bias-variance · linear model selection · Ridge
Lasso, cross-validation & Elastic Net
Week 4
06.05.2026
Sparse regularisation · resampling for honest test error · choosing λ
Prediction markets, the Polymarket Quant Bench & your project
Week 5
13.05.2026
From Welch-Goyal to event-resolved binary contracts
Final presentations
Week 13
01.07.2026
Group presentations · Q&A · wrap-up
Project (Code + Report) 50% of your grade
Rmd code + knitr-rendered PDF report. Build a library of indicators over the Polymarket Quant Bench dataset (curated OHLCV bars on HuggingFace, derived from Jon Becker’s polymarket-data dump), derive trade signals, back-test, and write a critical reflection.
Group of up to 3.
Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-1-project-report_surname1_surname2_…
30 June 2026
Final Presentation 50% of your grade
20-minute group presentation in class on 1 July 2026; submit slides as PDF together with the project zip.
Group of up to 3.
Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-2-final-presentation_surname1_surname2_…
1 July 2026
Ridge minimises:
\[ \sum_{i=1}^n \Bigl(y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\Bigr)^2 + \boxed{\lambda \sum_{j=1}^p \beta_j^2} \]
Lasso minimises:
\[ \sum_{i=1}^n \Bigl(y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\Bigr)^2 + \boxed{\lambda \sum_{j=1}^p |\beta_j|} \]
L2 vs L1: squared coefficients (Ridge) vs absolute values (Lasso).
Lasso.R — glmnet with alpha = 1.➡ Answer next: cross-validation.
Tools that involve repeatedly drawing samples from a training set and refitting a model on each sample, to obtain more information about the fitted model.
Drawback: resampling is computationally expensive.
In this course we use cross-validation (we skip bootstrapping).
We cover:
mpg from horsepower.library(ISLR); attach(Auto)
# 10 random splits × 10 polynomial degrees → matrix of test MSEs
mse <- matrix(0, 10, 10)
for (i in 1:10) {
set.seed(i)
train <- sample(392, 196)
for (j in 1:10) {
lm.fit <- lm(mpg ~ poly(horsepower, j), data = Auto, subset = train)
mse[i, j] <- mean((mpg - predict(lm.fit, Auto))[-train]^2)
}
}
plot(mse[1, ], type = "l", col = 1, xlab = "Flexibility", ylab = "MSE",
ylim = c(15, 30))
for (j in 2:10) lines(mse[j, ], col = j)for loops in your project where possible — prefer vectorised / apply-family code (see https://www.datacamp.com/community/tutorials/r-tutorial-apply-family).library(ISLR); attach(Auto)
# Manual LOOCV across 10 polynomial degrees
mse <- matrix(0, 392, 10)
for (j in 1:10) {
for (i in 1:392) {
lm.fit <- lm(mpg ~ poly(horsepower, j), data = Auto[-i, ])
mse[i, j] <- (mpg - predict(lm.fit, Auto))[i]^2
}
}
mse_loocv <- colMeans(mse)
plot(mse_loocv, type = "l", xlab = "Flexibility", ylab = "MSE",
ylim = c(15, 30))boot::cv.glm for an order-of-magnitude speed-up.5fold_CV.R) — same shape, much faster.LOOCV is computationally heavy, so we run K-fold CV instead:
\[ \mathrm{CV}_{(k)} \;=\; \dfrac{1}{k} \sum_{i=1}^k \mathrm{MSE}_i \]
library(ISLR); library(glmnet)
Hitters <- na.omit(Hitters)
x <- model.matrix(Salary ~ ., Hitters)[, -1]
y <- Hitters$Salary
grid <- 10^seq(5, -5, length = 100)
set.seed(1)
train <- sample(1:nrow(x), nrow(x) / 2)
test <- -train
y.test <- y[test]
# Lasso path on the training half
lasso.mod <- glmnet(x[train, ], y[train], alpha = 1, lambda = grid)
plot(lasso.mod, xvar = "lambda")
# 10-fold CV — pick the best λ
cv.out <- cv.glmnet(x[train, ], y[train], alpha = 1)
plot(cv.out)
bestlam <- cv.out$lambda.min
# Test MSE
lasso.pred <- predict(lasso.mod, s = bestlam, newx = x[test, ])
mean((lasso.pred - y.test)^2)
# Refit on full sample, list non-zero coefficients
out <- glmnet(x, y, alpha = 1, lambda = grid)
lasso.coef <- predict(out, type = "coefficients", s = bestlam)[1:20, ]
lasso.coef[lasso.coef != 0]cv.glmnet builds the CV folds and returns lambda.min.glmnet is fit on the full sample at the chosen \(\lambda\).lasso.coef[lasso.coef != 0] lists the surviving variables — your sparse model.Recipe — OLS post-Lasso:
library(ISLR); library(glmnet)
Hitters <- na.omit(Hitters)
x <- model.matrix(Salary ~ ., Hitters)[, -1]
y <- Hitters$Salary
set.seed(1)
train <- sample(1:nrow(x), nrow(x) / 2)
# Step 1: CV-tuned Lasso to find non-zero coefficients
cv.out <- cv.glmnet(x[train, ], y[train], alpha = 1)
bestlam <- cv.out$lambda.min
out <- glmnet(x, y, alpha = 1, lambda = 10^seq(5, -5, length = 100))
lasso.coef <- predict(out, type = "coefficients", s = bestlam)[2:20, ]
indexLasso <- which(lasso.coef != 0)
# Step 2: re-fit OLS on the surviving columns
fitPostLasso <- lm(y ~ x[, indexLasso])
summary(fitPostLasso)lm restricted to that set.glmnet, the parameter \(\alpha\) defines the mix between the two penalties.library(ISLR); library(glmnet)
Hitters <- na.omit(Hitters)
x <- model.matrix(Salary ~ ., Hitters)[, -1]
y <- Hitters$Salary
set.seed(1)
train <- sample(1:nrow(x), nrow(x) / 2)
test <- -train
y.test <- y[test]
# Sweep α from 0 to 1 in 0.1 steps; tune λ via 10-fold CV at each α
mse <- matrix(0, 11, 2)
for (i in 1:11) {
alpha_i <- (i / 10) - 0.1
cv.out <- cv.glmnet(x[train, ], y[train], alpha = alpha_i)
bestlam <- cv.out$lambda.min
pred <- predict(cv.out, s = bestlam, newx = x[test, ])
mse[i, ] <- c(alpha_i, mean((pred - y.test)^2))
}
mseFoundations
Week 1
15.04.2026
Course outline · Backtesting fundamentals
Introduction to R
Week 2
22.04.2026
RStudio · variables · vectors · data frames · live coding
Assessing model accuracy & Ridge regression
Week 3
29.04.2026
Statistical learning · MSE · bias-variance · linear model selection · Ridge
Lasso, cross-validation & Elastic Net
Week 4
06.05.2026
Sparse regularisation · resampling for honest test error · choosing λ
Prediction markets, the Polymarket Quant Bench & your project
Week 5
13.05.2026
From Welch-Goyal to event-resolved binary contracts
Final presentations
Week 13
01.07.2026
Group presentations · Q&A · wrap-up
Lasso.R, OLS_Post_Lasso.R, EN.R, and 5fold_CV.R locally.Reminder
Institute of Strategic Management and Finance · Ulm University