Lecture 3: Assessing model accuracy & Ridge regression

Statistical learning · MSE · bias-variance · linear model selection · Ridge

Prof. Dr. Andre Guettler
Prof. Dr. Andre Guettler Director of the Institute
Helmholtzstraße 22, Room 205
andre.guettler@uni-ulm.de
+49 731 50 31 030
Oliver Padmaperuma
Oliver Padmaperuma Doctoral Candidate
Helmholtzstraße 22, Room 203
oliver.padmaperuma@uni-ulm.de
+49 731 50 31 036

3.1 Course objectives

  • 3.1 Course objectives
  • 3.2 Recap from Lectures 1 & 2
  • 3.3 Assessing model accuracy
  • 3.4 Linear model selection & regularisation
  • 3.M Conclusion of Lecture 3
  • Welcome to
  • Course Objective
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Assignments / Exams

Welcome to Finance Project — Asset Management

  • This is a project course: there is no central exam to register for. Sign up on the course Moodle page by 15 April 2026 so you receive announcements and the data link.
  • Submit the project by 30 June 2026 as a single zip — name pattern: Asset2026_surname1_surname2_surname3. Email it to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates.
  • Ask questions during or right after each session — that is the preferred channel.
  • Admin / studies / exam-eligibility questions go to the registrar’s office (Studiensekretariat) at studiensekretariat@uni-ulm.de.
  • Course-content questions outside class: email oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de.
  • We also recommend the student advisory service.

Course Objective

Scope

We will:

  • Build an end-to-end empirical pipeline in R: load, explore, model, back-test
  • Cover the core ML toolbox for asset-management research: linear models, Ridge, Lasso, Elastic Net, cross-validation
  • Apply it to a non-traditional asset class: prediction markets
  • Develop your own indicator library and trading strategy in groups of three

We will NOT:

  • Drift into deep-learning or reinforcement-learning methods
  • Cover prediction markets in depth
  • Provide a “ready-to-fork” backtest — the demo code is intentionally basic

Approach

Part I — Foundations

  • L1: Motivation, organisation, backtesting fundamentals
  • L2: Hands-on R intro — RStudio, live coding, etc.
  • L3 + L4: Statistical learning — model accuracy, regularisation, resampling

Part II — Application

  • L5: Prediction-markets primer + the Polymarket dataset + assignment briefing
  • Project work in groups of three (≈ 7 weeks of self-organised work)
  • Final session (1 July): 20-minute presentations per team

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Assignments / Exams

Project (Code + Report) 50% of your grade

Rmd code + knitr-rendered PDF report. Build a library of indicators over the Polymarket Quant Bench dataset (curated OHLCV bars on HuggingFace, derived from Jon Becker’s polymarket-data dump), derive trade signals, back-test, and write a critical reflection.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-1-project-report_surname1_surname2_…

30 June 2026

Final Presentation 50% of your grade

20-minute group presentation in class on 1 July 2026; submit slides as PDF together with the project zip.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-2-final-presentation_surname1_surname2_…

1 July 2026

3.2 Recap from Lectures 1 & 2

  • 3.1 Course objectives
  • 3.2 Recap from Lectures 1 & 2
  • 3.3 Assessing model accuracy
  • 3.4 Linear model selection & regularisation
  • 3.M Conclusion of Lecture 3
  • Where we are

Where we are

  • L1: backtesting fundamentals — IS vs OOS, \(R^2_{OS}\), useful-predictor checklist, p-hacking.
  • L2: R fundamentals — RStudio, vectors, data frames, functions, loops, import/export.
  • Today: how to assess how good a model really is, and meet the first regularised regression — Ridge.

3.3 Assessing model accuracy

  • 3.1 Course objectives
  • 3.2 Recap from Lectures 1 & 2
  • 3.3 Assessing model accuracy
  • 3.4 Linear model selection & regularisation
  • 3.M Conclusion of Lecture 3
  • What is statistical learning?
  • Measuring quality of fit — MSE
  • A problem
  • Training vs test MSE
  • Example I — splines on a noisy curve
  • Example II — train vs test MSE curve
  • Bias-variance trade-off
  • Bias of learning methods
  • Variance of learning methods
  • The trade-off — formula
  • Over- vs underfitting
  • A fundamental picture

What is statistical learning?

We observe \(Y\) and \(X = (X_1, \ldots, X_p)\) for \(p\) predictors and \(i\) observations.

We believe a relationship exists (e.g. excess return of S&P 500 vs dividend yield):

\[Y = f(X) + \varepsilon\]

  • \(f\) — unknown function
  • \(\varepsilon\) — random error term

Statistical learning is all about how to estimate \(f\). In this class we use predictors \(X\) to forecast \(Y\).

Measuring quality of fit — MSE

A common measure of accuracy in regression is mean squared error:

\[ MSE \;=\; \dfrac{1}{n} \sum_{i=1}^{n} (y_i - \hat y_i)^2 \]

where \(\hat y_i\) is the prediction for observation \(i\) in the training data.

A problem

  • Methods are designed to minimise MSE on training data (e.g., OLS picks the line that does so).
  • What we really care about is performance on new data — we call this test data.
  • There is no guarantee that the smallest training MSE delivers the smallest test MSE.

Training vs test MSE

  • The more flexible a method, the lower its training MSE — flexible methods can generate richer shapes for \(f\) than restrictive ones (e.g. linear regression).
  • But test MSE may rise for a more flexible method than for a simple approach like linear regression.
  • Less flexible ⇒ easier to interpret. Trade-off: flexibility vs interpretability.

Example I — splines on a noisy curve

  • Reproduce with StatLearning.R (splines, OLS, train/test MSE loop).
  • Black = truth, orange = OLS, blue = smoothing spline (less flexible), green = smoothing spline (more flexible).
  • Higher flexibility hugs the data closer — but track training vs test MSE separately.

Example II — train vs test MSE curve

  • Grey = training MSE: declines monotonically with flexibility.
  • Red = test MSE: U-shape — falls, then rises.
  • Vertical dashed line marks the minimum test MSE — the optimal flexibility.

Bias-variance trade-off

The previous figure illustrates the trade-off that governs every choice of statistical learning method:

There are always two competing forces — bias and variance.

Bias of learning methods

  • Modelling complicated real-life problems may induce error called bias.
  • Linear regression assumes \(Y\) and \(X\) are linear; in reality the relationship is rarely exactly linear, so some bias is present.
  • The more flexible / complex a method, the less bias it generally has.

Variance of learning methods

  • Variance measures how much your estimate for \(f\) would change with a different training data set.
  • Generally, the more flexible a method, the more variance it has.

The trade-off — formula

For any given \(X = x_0\), the expected test MSE on a new \(Y\) at \(x_0\) is:

\[ \mathrm{Expected\_Test\_MSE} \;=\; E\!\bigl(Y - \hat f(x_0)\bigr)^2 \;=\; \mathrm{Bias}^2 + \mathrm{Var} + \underbrace{\sigma^2}_{\text{Irreducible Error}} \]

As complexity rises, bias falls and variance grows — but expected test MSE may go either way.

Over- vs underfitting

  • Ideal (low bias, low variance): tight cluster on the bull’s-eye.
  • Overfitting (low bias, high variance): scattered around the centre.
  • Underfitting (high bias, low variance): tight cluster off-centre.
  • Worst (high bias, high variance): scattered and off-centre.

A fundamental picture

  • Training error: monotonically declines with complexity.
  • Test error: declines first (bias dominates), then rises (variance dominates).
  • More flexible / complicated is not always better — keep this picture in mind when choosing a learning method.

3.4 Linear model selection & regularisation

  • 3.1 Course objectives
  • 3.2 Recap from Lectures 1 & 2
  • 3.3 Assessing model accuracy
  • 3.4 Linear model selection & regularisation
  • 3.M Conclusion of Lecture 3
  • Starting point — OLS
  • Why might we improve on OLS?
  • 1 · Prediction accuracy
  • 2 · Model interpretability
  • Solutions — three families
  • Ridge regression — the equation
  • Ridge — what the penalty does
  • Manual calculation of betas
  • Live demo — Ridge by hand
  • Hitters data — coefficient paths vs λ
  • Why shrinking towards zero helps
  • Ridge bias / variance trade-off
  • Computational advantages of Ridge

Starting point — OLS

\[ Y_i = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \varepsilon \]

  • \(\beta_0\) — intercept (mean of \(Y\) when all \(X\)’s are zero).
  • \(\beta_j\) — average increase in \(Y\) when \(X_j\) increases by 1, holding other \(X\)’s constant.
  • Closed form (matrix notation):

\[ \beta = (X'X)^{-1} X' y \]

If you need to refresh OLS, read Chapter 3 of the textbook (ISLR).

Why might we improve on OLS?

We want to improve OLS by replacing least-squares fitting with an alternative procedure. Two reasons to consider alternatives:

  1. Prediction accuracy
  2. Model interpretability

1 · Prediction accuracy

  • OLS estimates have low bias and low variability when \(Y\) and \(X\) are linear and \(n \gg p\).
  • When \(n \approx p\), OLS has high variance — possible overfitting and poor estimates on unseen data.
  • When \(n < p\), OLS fails completely: no unique solution; variance is infinite.

2 · Model interpretability

  • With a large number of predictors, many often have little or no effect on \(Y\).
  • Leaving them in obscures the important variables.
  • Removing them (setting coefficients to zero) makes the model easier to interpret.
  • Simpler models also imply lower information costs and faster run times.

Solutions — three families

  1. Subset selection — identify a subset of predictors \(X\) believed to relate to \(Y\), then fit on that subset (best subset, stepwise — covered in ISLR §6.1).
  2. Shrinkage (Ridge and Lasso — our focus) — shrink coefficient estimates towards zero to reduce variance; some may shrink to exactly zero, performing variable selection.
  3. Dimension reduction — e.g. principal-components regression (PCR).

Ridge regression — the equation

OLS minimises the residual sum of squares:

\[ \mathrm{RSS} \;=\; \sum_{i=1}^n \Bigl( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \Bigr)^2 \]

Ridge regression adds a penalty on the coefficients:

\[ \sum_{i=1}^n \Bigl( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \Bigr)^2 \;+\; \boxed{\lambda \sum_{j=1}^p \beta_j^2} \]

Ridge — what the penalty does

  • Tuning parameter \(\lambda > 0\).
  • The penalty shrinks large \(|\beta|\) towards zero.
  • The intercept is not penalised.
  • The constraint should improve fit because shrinking coefficients reduces their variance.
  • When \(\lambda = 0\), Ridge collapses back to OLS.

Manual calculation of betas

\[ \beta \;=\; (X'X + \lambda I)^{-1} X' y \]

  • Penalty term: \(\lambda I\)\(\lambda\) times the identity matrix, so dimensions match the \(\beta\) vector (\(\beta_1, \ldots, \beta_p\)).
  • If predictors are centered (mean zero), \(\beta_0 = \bar Y\) — no need to include the intercept in the equation.

Live demo — Ridge by hand

library(ISLR);  library(glmnet)
Hitters <- na.omit(Hitters)

# simplified to two predictors
x  <- as.matrix(data.frame(Hitters$AtBat, Hitters$Hits))
y  <- Hitters$Salary
xs <- scale(x, center = TRUE, scale = FALSE)   # centre predictors
n      <- nrow(x)
sd_y   <- sqrt(var(y) * (n - 1) / n)[1]
iden   <- diag(2)

# lambda = 0  — should recover OLS
lam <- 0
ridge.mod <- glmnet(xs, y, alpha = 0,
                    lambda = lam * sd_y / n,
                    standardize = FALSE, thresh = 1e-20)
ridge.man <- solve(t(xs) %*% xs + lam * iden) %*% t(xs) %*% y
beta_0    <- mean(y)

cbind(coef(ridge.mod),
      coef(lm(y ~ xs)),
      c(beta_0, ridge.man))   # all three columns match
  • We compute Ridge three ways at \(\lambda = 0\)glmnet, base lm, and the closed-form formula — to verify they coincide.
  • glmnet’s lambda is on a different scale than the textbook formula — multiply by sd_y / n to align (see stats.stackexchange).
  • thresh = 1e-20 tightens convergence so the comparison is numerically tight.
  • Centering predictors removes the need to include the intercept in the matrix formula.

Hitters data — coefficient paths vs λ

  • Reproduce with Ridge_figures.R (loop over 0–1000 λ values, plot standardised coefficients).
  • As \(\lambda\) increases, standardised coefficients shrink towards zero.
  • Bar at the bottom: flexibility decreases as \(\lambda\) grows.

Why shrinking towards zero helps

  • OLS estimates have low bias but can be highly variable, especially when \(n \approx p\).
  • The penalty makes Ridge estimates biased, but substantially reduces variance.
  • Net effect: a bias / variance trade-off that often improves test MSE.

Ridge bias / variance trade-off

  • Bias² (black) rises with \(\lambda\).
  • Variance (green) falls with \(\lambda\).
  • Test MSE (purple) is U-shaped with a clear minimum — pick the \(\lambda\) that minimises it.
  • Ridge wins most when OLS estimates have high variance.

Computational advantages of Ridge

  • For large \(p\), best-subset selection would search through \(2^p\) models — combinatorially expensive.
  • With Ridge, for any given \(\lambda\), fit one model — the computations are very simple.
  • Ridge even works when \(p > n\), where OLS fails completely.

3.M Conclusion of Lecture 3

  • 3.1 Course objectives
  • 3.2 Recap from Lectures 1 & 2
  • 3.3 Assessing model accuracy
  • 3.4 Linear model selection & regularisation
  • 3.M Conclusion of Lecture 3
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Further reading
  • Prepare before next lecture
  • See you next time
  • References

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Further reading

  • James et al. (2021) — Chapter 2 (statistical learning), Chapter 6 (linear model selection & regularisation).
  • Welch and Goyal (2008) — bias / variance arguments mirror the IS-vs-OOS results we saw in Lecture 1.

Prepare before next lecture

  1. Run StatLearning.R locally — confirm you can reproduce Figure 2.9.
  2. Run Ridge_comparison.R and Ridge_figures.R — verify all three Ridge implementations agree at \(\lambda = 0\).
  3. Read ISLR §2.2 (assessing model accuracy) and §6.2 (Ridge & Lasso).

See you next time

Reminder

  • Lecture 4 (6 May 2026): Lasso, Elastic Net, cross-validation — selecting the optimal \(\lambda\) honestly.

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. 2nd ed. New York, NY: Springer. https://www.statlearning.com/.
Scheuch, Christoph, Stefan Voigt, and Patrick Weiss. 2023. Tidy Finance with R. Chapman & Hall/CRC. https://www.tidy-finance.org/r/.
Welch, Ivo, and Amit Goyal. 2008. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies 21 (4): 1455–1508. https://doi.org/10.1093/rfs/hhm014.