Lecture 4: Lasso, cross-validation & Elastic Net

Sparse regularisation · resampling for honest test error · choosing λ

Prof. Dr. Andre Guettler
Prof. Dr. Andre Guettler Director of the Institute
Helmholtzstraße 22, Room 205
andre.guettler@uni-ulm.de
+49 731 50 31 030
Oliver Padmaperuma
Oliver Padmaperuma Doctoral Candidate
Helmholtzstraße 22, Room 203
oliver.padmaperuma@uni-ulm.de
+49 731 50 31 036

4.1 Course objectives

  • 4.1 Course objectives
  • 4.2 Recap from Lecture 3
  • 4.3 The Lasso
  • 4.4 Resampling methods
  • 4.5 Selecting λ for Lasso
  • 4.6 Refinements
  • 4.M Conclusion of Lecture 4
  • Welcome to
  • Course Objective
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Assignments / Exams

Welcome to Finance Project — Asset Management

  • This is a project course: there is no central exam to register for. Sign up on the course Moodle page by 15 April 2026 so you receive announcements and the data link.
  • Submit the project by 30 June 2026 as a single zip — name pattern: Asset2026_surname1_surname2_surname3. Email it to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates.
  • Ask questions during or right after each session — that is the preferred channel.
  • Admin / studies / exam-eligibility questions go to the registrar’s office (Studiensekretariat) at studiensekretariat@uni-ulm.de.
  • Course-content questions outside class: email oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de.
  • We also recommend the student advisory service.

Course Objective

Scope

We will:

  • Build an end-to-end empirical pipeline in R: load, explore, model, back-test
  • Cover the core ML toolbox for asset-management research: linear models, Ridge, Lasso, Elastic Net, cross-validation
  • Apply it to a non-traditional asset class: prediction markets
  • Develop your own indicator library and trading strategy in groups of three

We will NOT:

  • Drift into deep-learning or reinforcement-learning methods
  • Cover prediction markets in depth
  • Provide a “ready-to-fork” backtest — the demo code is intentionally basic

Approach

Part I — Foundations

  • L1: Motivation, organisation, backtesting fundamentals
  • L2: Hands-on R intro — RStudio, live coding, etc.
  • L3 + L4: Statistical learning — model accuracy, regularisation, resampling

Part II — Application

  • L5: Prediction-markets primer + the Polymarket dataset + assignment briefing
  • Project work in groups of three (≈ 7 weeks of self-organised work)
  • Final session (1 July): 20-minute presentations per team

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Assignments / Exams

Project (Code + Report) 50% of your grade

Rmd code + knitr-rendered PDF report. Build a library of indicators over the Polymarket Quant Bench dataset (curated OHLCV bars on HuggingFace, derived from Jon Becker’s polymarket-data dump), derive trade signals, back-test, and write a critical reflection.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-1-project-report_surname1_surname2_…

30 June 2026

Final Presentation 50% of your grade

20-minute group presentation in class on 1 July 2026; submit slides as PDF together with the project zip.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-2-final-presentation_surname1_surname2_…

1 July 2026

4.2 Recap from Lecture 3

  • 4.1 Course objectives
  • 4.2 Recap from Lecture 3
  • 4.3 The Lasso
  • 4.4 Resampling methods
  • 4.5 Selecting λ for Lasso
  • 4.6 Refinements
  • 4.M Conclusion of Lecture 4
  • Where we are

Where we are

  • Statistical learning as estimation of \(f\) from \(Y = f(X) + \varepsilon\).
  • MSE: training vs test MSE — they diverge as flexibility grows.
  • Bias / variance trade-off: more flexible ⇒ less bias, more variance; expected test MSE has a minimum.
  • Ridge regression: L2 penalty \(\lambda \sum \beta_j^2\) shrinks coefficients but never to exactly zero.

4.3 The Lasso

  • 4.1 Course objectives
  • 4.2 Recap from Lecture 3
  • 4.3 The Lasso
  • 4.4 Resampling methods
  • 4.5 Selecting λ for Lasso
  • 4.6 Refinements
  • 4.M Conclusion of Lecture 4
  • Why move past Ridge?
  • Lasso’s penalty term
  • What’s the big deal?
  • Hitters data — Lasso coefficient paths

Why move past Ridge?

  • Ridge’s penalty never forces any coefficient to be exactly zero.
  • The final model always includes all variables — harder to interpret with many predictors.
  • A more modern alternative is the Lasso (Tibshirani, 1996).
  • Lasso works similarly to Ridge — but with a different penalty.

Lasso’s penalty term

Ridge minimises:

\[ \sum_{i=1}^n \Bigl(y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\Bigr)^2 + \boxed{\lambda \sum_{j=1}^p \beta_j^2} \]

Lasso minimises:

\[ \sum_{i=1}^n \Bigl(y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\Bigr)^2 + \boxed{\lambda \sum_{j=1}^p |\beta_j|} \]

L2 vs L1: squared coefficients (Ridge) vs absolute values (Lasso).

What’s the big deal?

  • Looks like a tiny change — but the L1 penalty can drive some coefficients to exactly zero.
  • Lasso therefore yields a model that has high predictive power and is simple to interpret (variable selection!).
  • Drawback: there is no closed-form solution like Ridge’s \((X'X + \lambda I)^{-1} X'y\) — numerical optimisation only.

Hitters data — Lasso coefficient paths

  • Reproduce with Lasso.Rglmnet with alpha = 1.
  • Note that — unlike Ridge — coefficients hit exactly zero at finite \(\lambda\).
  • Question: how do we pick the optimal \(\lambda\)?

➡ Answer next: cross-validation.

4.4 Resampling methods

  • 4.1 Course objectives
  • 4.2 Recap from Lecture 3
  • 4.3 The Lasso
  • 4.4 Resampling methods
  • 4.5 Selecting λ for Lasso
  • 4.6 Refinements
  • 4.M Conclusion of Lecture 4
  • What are resampling methods?
  • Three types of cross-validation
  • The Validation Set Approach
  • Example — Auto data, validation set
  • Live demo — validation-set approach
  • Validation set — pros & cons
  • Leave-One-Out Cross-Validation (LOOCV)
  • LOOCV vs validation set
  • Live demo — LOOCV by hand
  • K-fold Cross-Validation
  • Auto data — LOOCV vs K-fold
  • What do we do in practice?

What are resampling methods?

Tools that involve repeatedly drawing samples from a training set and refitting a model on each sample, to obtain more information about the fitted model.

  • Model assessment — estimate test error rates.
  • Model selection — pick the appropriate level of model flexibility (e.g. \(\lambda\)).

Drawback: resampling is computationally expensive.

In this course we use cross-validation (we skip bootstrapping).

Three types of cross-validation

We cover:

  1. The Validation Set Approach
  2. Leave-One-Out Cross-Validation (LOOCV)
  3. K-fold Cross-Validation

The Validation Set Approach

  • Suppose we want the variable set with the lowest test (not training) error rate.
  • With a large data set: randomly split into training and validation parts.
  • Fit each candidate model on the training set.
  • Pick the model with the lowest validation error.

Example — Auto data, validation set

  • Predict mpg from horsepower.
  • Two candidate models:
    • \(\mathrm{mpg} \sim \mathrm{horsepower}\)
    • \(\mathrm{mpg} \sim \mathrm{horsepower} + \mathrm{horsepower}^2\) (and higher-order polynomials)
  • Randomly split 392 obs into 196 training + 196 validation.
  • Fit both models on training; evaluate test MSE on the validation half.
  • Lowest test MSE wins.

Live demo — validation-set approach

library(ISLR);  attach(Auto)

# 10 random splits × 10 polynomial degrees → matrix of test MSEs
mse <- matrix(0, 10, 10)
for (i in 1:10) {
  set.seed(i)
  train <- sample(392, 196)
  for (j in 1:10) {
    lm.fit    <- lm(mpg ~ poly(horsepower, j), data = Auto, subset = train)
    mse[i, j] <- mean((mpg - predict(lm.fit, Auto))[-train]^2)
  }
}

plot(mse[1, ], type = "l", col = 1, xlab = "Flexibility", ylab = "MSE",
     ylim = c(15, 30))
for (j in 2:10) lines(mse[j, ], col = j)
  • Outer loop: 10 different random train/test splits.
  • Inner loop: polynomial degrees 1–10.
  • Resulting plot shows a lot of variability between splits — hence the validation MSE itself is unreliable.
  • Note: avoid for loops in your project where possible — prefer vectorised / apply-family code (see https://www.datacamp.com/community/tutorials/r-tutorial-apply-family).

Validation set — pros & cons

  • Simple to think about.
  • Easy to implement.
  • The validation MSE is highly variable between random splits.
  • Only a subset of observations is used to fit — methods perform worse with fewer training observations.

Leave-One-Out Cross-Validation (LOOCV)

  • For each candidate model:
    • Split the data of size \(n\) into training (size \(n-1\)) and validation (size 1).
    • Fit the model on the training set.
    • Compute the squared error for the held-out observation.
    • Repeat \(n\) times.
  • \(\mathrm{CV}_{(n)} = \dfrac{1}{n}\sum_{i=1}^n \mathrm{MSE}_i\)

LOOCV vs validation set

  • LOOCV has less bias — almost the entire data set is used to fit each model.
  • LOOCV produces a more stable MSE — the validation approach gives different MSEs each time due to randomness in splitting; LOOCV always returns the same answer.
  • LOOCV is computationally intensive — fit each model \(n\) times.

Live demo — LOOCV by hand

library(ISLR);  attach(Auto)

# Manual LOOCV across 10 polynomial degrees
mse <- matrix(0, 392, 10)
for (j in 1:10) {
  for (i in 1:392) {
    lm.fit    <- lm(mpg ~ poly(horsepower, j), data = Auto[-i, ])
    mse[i, j] <- (mpg - predict(lm.fit, Auto))[i]^2
  }
}

mse_loocv <- colMeans(mse)
plot(mse_loocv, type = "l", xlab = "Flexibility", ylab = "MSE",
     ylim = c(15, 30))
  • Outer loop over polynomial degrees (1–10), inner loop over 392 observations.
  • This is the slow, didactic LOOCV. In practice use boot::cv.glm for an order-of-magnitude speed-up.
  • Compare against 5-fold CV (5fold_CV.R) — same shape, much faster.

K-fold Cross-Validation

LOOCV is computationally heavy, so we run K-fold CV instead:

  1. Divide the data into K different parts (e.g. \(K = 5\) or \(K = 10\)).
  2. Remove the first part; fit the model on the remaining \(K-1\) parts; compute MSE on the omitted part.
  3. Repeat \(K\) times — taking out a different part each round.
  4. Average the \(K\) MSEs:

\[ \mathrm{CV}_{(k)} \;=\; \dfrac{1}{k} \sum_{i=1}^k \mathrm{MSE}_i \]

Auto data — LOOCV vs K-fold

  • Left: LOOCV error curve — single, deterministic.
  • Right: 5-fold CV run 10 times — curves coincide tightly (much less spread than the validation-set approach).
  • LOOCV is a special case of K-fold with \(K = n\).
  • Both stable; LOOCV more compute-heavy.

What do we do in practice?

  • We tend to use K-fold CV with \(K = 5\) or \(K = 10\).
  • Empirically these yield test-error estimates that suffer neither from excessively high bias nor very high variance — best balance.

4.5 Selecting λ for Lasso

  • 4.1 Course objectives
  • 4.2 Recap from Lecture 3
  • 4.3 The Lasso
  • 4.4 Resampling methods
  • 4.5 Selecting λ for Lasso
  • 4.6 Refinements
  • 4.M Conclusion of Lecture 4
  • Lasso — selecting the tuning parameter λ
  • Live demo — CV-tuned Lasso

Lasso — selecting the tuning parameter λ

  • Pick a grid of candidate \(\lambda\) values.
  • Use cross-validation to estimate test error for each.
  • Choose the \(\lambda\) giving the smallest test error.
  • In this example, min MSE ≈ 9.3 (log 2.2); only 10 of 19 coefficients remain — Lasso has shrunk 9 to zero.

Live demo — CV-tuned Lasso

library(ISLR);  library(glmnet)
Hitters <- na.omit(Hitters)
x <- model.matrix(Salary ~ ., Hitters)[, -1]
y <- Hitters$Salary

grid <- 10^seq(5, -5, length = 100)
set.seed(1)
train  <- sample(1:nrow(x), nrow(x) / 2)
test   <- -train
y.test <- y[test]

# Lasso path on the training half
lasso.mod <- glmnet(x[train, ], y[train], alpha = 1, lambda = grid)
plot(lasso.mod, xvar = "lambda")

# 10-fold CV — pick the best λ
cv.out  <- cv.glmnet(x[train, ], y[train], alpha = 1)
plot(cv.out)
bestlam <- cv.out$lambda.min

# Test MSE
lasso.pred <- predict(lasso.mod, s = bestlam, newx = x[test, ])
mean((lasso.pred - y.test)^2)

# Refit on full sample, list non-zero coefficients
out        <- glmnet(x, y, alpha = 1, lambda = grid)
lasso.coef <- predict(out, type = "coefficients", s = bestlam)[1:20, ]
lasso.coef[lasso.coef != 0]
  • cv.glmnet builds the CV folds and returns lambda.min.
  • The final glmnet is fit on the full sample at the chosen \(\lambda\).
  • lasso.coef[lasso.coef != 0] lists the surviving variables — your sparse model.

4.6 Refinements

  • 4.1 Course objectives
  • 4.2 Recap from Lecture 3
  • 4.3 The Lasso
  • 4.4 Resampling methods
  • 4.5 Selecting λ for Lasso
  • 4.6 Refinements
  • 4.M Conclusion of Lecture 4
  • OLS post-Lasso
  • Live demo — OLS post-Lasso
  • Elastic Net — combining Ridge and Lasso
  • Live demo — tuning Elastic Net

OLS post-Lasso

  • Lasso’s penalty mitigates overfitting and yields a sparse solution …
  • … but it also tends to shrink coefficients of selected variables too much.

Recipe — OLS post-Lasso:

  1. Use Lasso to reduce the dimension of the model.
  2. Re-estimate the coefficients of the selected predictors with plain OLS — bias-corrected.
  3. Standard errors need adjusting (not naïve OLS standard errors).

Live demo — OLS post-Lasso

library(ISLR);  library(glmnet)
Hitters <- na.omit(Hitters)
x <- model.matrix(Salary ~ ., Hitters)[, -1]
y <- Hitters$Salary

set.seed(1)
train <- sample(1:nrow(x), nrow(x) / 2)

# Step 1: CV-tuned Lasso to find non-zero coefficients
cv.out  <- cv.glmnet(x[train, ], y[train], alpha = 1)
bestlam <- cv.out$lambda.min
out     <- glmnet(x, y, alpha = 1, lambda = 10^seq(5, -5, length = 100))
lasso.coef <- predict(out, type = "coefficients", s = bestlam)[2:20, ]
indexLasso <- which(lasso.coef != 0)

# Step 2: re-fit OLS on the surviving columns
fitPostLasso <- lm(y ~ x[, indexLasso])
summary(fitPostLasso)
  • Step 1 picks the non-zero set via Lasso + CV.
  • Step 2 runs plain lm restricted to that set.
  • Caveat: the printed standard errors ignore the selection step; correcting inference (e.g. via post-selection inference (Belloni and Chernozhukov 2013)) is a separate research topic.

Elastic Net — combining Ridge and Lasso

  • Elastic Net combines Ridge (\(\alpha = 0\)) and Lasso (\(\alpha = 1\)).
  • In glmnet, the parameter \(\alpha\) defines the mix between the two penalties.
  • Useful framing:
    • \(\alpha\) controls the mixing between the L2 and L1 penalties.
    • \(\lambda\) controls the amount of penalisation.
  • For the Hitters data set, \(\alpha = 0\) (pure Ridge) yielded the lowest test MSE.

Live demo — tuning Elastic Net

library(ISLR);  library(glmnet)
Hitters <- na.omit(Hitters)
x <- model.matrix(Salary ~ ., Hitters)[, -1]
y <- Hitters$Salary

set.seed(1)
train  <- sample(1:nrow(x), nrow(x) / 2)
test   <- -train
y.test <- y[test]

# Sweep α from 0 to 1 in 0.1 steps; tune λ via 10-fold CV at each α
mse <- matrix(0, 11, 2)
for (i in 1:11) {
  alpha_i  <- (i / 10) - 0.1
  cv.out   <- cv.glmnet(x[train, ], y[train], alpha = alpha_i)
  bestlam  <- cv.out$lambda.min
  pred     <- predict(cv.out, s = bestlam, newx = x[test, ])
  mse[i, ] <- c(alpha_i, mean((pred - y.test)^2))
}
mse
  • Outer sweep: \(\alpha \in \{0, 0.1, \ldots, 1\}\).
  • Inner: 10-fold CV picks the best \(\lambda\) for each \(\alpha\).
  • Read the resulting MSE matrix to find the best (\(\alpha, \lambda\)) pair.
  • For Hitters, the minimum lands at \(\alpha = 0\) — pure Ridge.

4.M Conclusion of Lecture 4

  • 4.1 Course objectives
  • 4.2 Recap from Lecture 3
  • 4.3 The Lasso
  • 4.4 Resampling methods
  • 4.5 Selecting λ for Lasso
  • 4.6 Refinements
  • 4.M Conclusion of Lecture 4
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Further reading
  • Prepare before next lecture
  • See you next time
  • References

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Further reading

  • James et al. (2021) — Chapter 5 (resampling), Chapter 6 (Lasso, Ridge, Elastic Net).
  • Tibshirani (1996) — the original Lasso paper.
  • Belloni and Chernozhukov (2013) — formal post-selection inference for OLS post-Lasso.

Prepare before next lecture

  1. Run Lasso.R, OLS_Post_Lasso.R, EN.R, and 5fold_CV.R locally.
  2. Compare CV-selected \(\lambda\) across multiple seeds — how stable is it?
  3. Read ISLR §6.2 (Ridge & Lasso) and §5.1 (CV).

See you next time

Reminder

  • Lecture 5 (13 May 2026): prediction-markets primer + the Polymarket dataset + your project briefing. Bring questions!

References

Belloni, Alexandre, and Victor Chernozhukov. 2013. “Least Squares After Model Selection in High-Dimensional Sparse Models.” Bernoulli 19 (2): 521–47. https://doi.org/10.3150/11-BEJ410.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. 2nd ed. New York, NY: Springer. https://www.statlearning.com/.
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society: Series B (Methodological) 58 (1): 267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.