This is a project course: there is no central exam to register for. Sign up on the course Moodle page by 15 April 2026 so you receive announcements and the data link.
Submit the project by 30 June 2026 as a single zip — name pattern: Asset2026_surname1_surname2_surname3. Email it to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates.
Ask questions during or right after each session — that is the preferred channel.
Admin / studies / exam-eligibility questions go to the registrar’s office (Studiensekretariat) at studiensekretariat@uni-ulm.de.
Course-content questions outside class: email oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de.
Rmd code + knitr-rendered PDF report. Build a library of indicators over the Polymarket Quant Bench dataset (curated OHLCV bars on HuggingFace, derived from Jon Becker’s polymarket-data dump), derive trade signals, back-test, and write a critical reflection.
where \(\hat y_i\) is the prediction for observation \(i\) in the training data.
A problem
Methods are designed to minimise MSE on training data (e.g., OLS picks the line that does so).
What we really care about is performance on new data — we call this test data.
There is no guarantee that the smallest training MSE delivers the smallest test MSE.
Training vs test MSE
The more flexible a method, the lower its training MSE — flexible methods can generate richer shapes for \(f\) than restrictive ones (e.g. linear regression).
But test MSE may rise for a more flexible method than for a simple approach like linear regression.
Less flexible ⇒ easier to interpret. Trade-off: flexibility vs interpretability.
Example I — splines on a noisy curve
Reproduce with StatLearning.R (splines, OLS, train/test MSE loop).
Black = truth, orange = OLS, blue = smoothing spline (less flexible), green = smoothing spline (more flexible).
Higher flexibility hugs the data closer — but track training vs test MSE separately.
Example II — train vs test MSE curve
Grey = training MSE: declines monotonically with flexibility.
Red = test MSE: U-shape — falls, then rises.
Vertical dashed line marks the minimum test MSE — the optimal flexibility.
Bias-variance trade-off
The previous figure illustrates the trade-off that governs every choice of statistical learning method:
There are always two competing forces — bias and variance.
Bias of learning methods
Modelling complicated real-life problems may induce error called bias.
Linear regression assumes \(Y\) and \(X\) are linear; in reality the relationship is rarely exactly linear, so some bias is present.
The more flexible / complex a method, the less bias it generally has.
Variance of learning methods
Variance measures how much your estimate for \(f\) would change with a different training data set.
Generally, the more flexible a method, the more variance it has.
The trade-off — formula
For any given \(X = x_0\), the expected test MSE on a new \(Y\) at \(x_0\) is:
\(\beta_0\) — intercept (mean of \(Y\) when all \(X\)’s are zero).
\(\beta_j\) — average increase in \(Y\) when \(X_j\) increases by 1, holding other \(X\)’s constant.
Closed form (matrix notation):
\[
\beta = (X'X)^{-1} X' y
\]
If you need to refresh OLS, read Chapter 3 of the textbook (ISLR).
Why might we improve on OLS?
We want to improve OLS by replacing least-squares fitting with an alternative procedure. Two reasons to consider alternatives:
Prediction accuracy
Model interpretability
1 · Prediction accuracy
OLS estimates have low bias and low variability when \(Y\) and \(X\) are linear and \(n \gg p\).
When \(n \approx p\), OLS has high variance — possible overfitting and poor estimates on unseen data.
When \(n < p\), OLS fails completely: no unique solution; variance is infinite.
2 · Model interpretability
With a large number of predictors, many often have little or no effect on \(Y\).
Leaving them in obscures the important variables.
Removing them (setting coefficients to zero) makes the model easier to interpret.
Simpler models also imply lower information costs and faster run times.
Solutions — three families
Subset selection — identify a subset of predictors \(X\) believed to relate to \(Y\), then fit on that subset (best subset, stepwise — covered in ISLR §6.1).
Shrinkage(Ridge and Lasso — our focus) — shrink coefficient estimates towards zero to reduce variance; some may shrink to exactly zero, performing variable selection.
Dimension reduction — e.g. principal-components regression (PCR).
Sparse regularisation · resampling for honest test error · choosing λ
Lasso: L1 penalty and exact-zero coefficients
Cross-validation: validation set, LOOCV, K-fold
Choosing the optimal λ for Lasso
OLS post-Lasso for cleaner coefficient inference
Elastic Net — combining Ridge and Lasso
Prediction markets, the Polymarket Quant Bench & your project
Week 5
13.05.2026
From Welch-Goyal to event-resolved binary contracts
Prediction markets — definition and Polymarket as the canonical venue
How prices form: liquidity, resolution, mechanics
The Polymarket Quant Bench dataset (HuggingFace): access and schema
First look at the data in R
Your project: indicator design, back-test, deliverables, R toolbox
Course at a glance (2/2)
Final presentations
Week 13
01.07.2026
Group presentations · Q&A · wrap-up
Presentation order and time budget
Q&A rules
Closing thoughts and feedback
Further reading
James et al. (2021) — Chapter 2 (statistical learning), Chapter 6 (linear model selection & regularisation).
Welch and Goyal (2008) — bias / variance arguments mirror the IS-vs-OOS results we saw in Lecture 1.
Prepare before next lecture
Run StatLearning.R locally — confirm you can reproduce Figure 2.9.
Run Ridge_comparison.R and Ridge_figures.R — verify all three Ridge implementations agree at \(\lambda = 0\).
Read ISLR §2.2 (assessing model accuracy) and §6.2 (Ridge & Lasso).
See you next time
Reminder
Lecture 4 (6 May 2026): Lasso, Elastic Net, cross-validation — selecting the optimal \(\lambda\) honestly.
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. 2nd ed. New York, NY: Springer. https://www.statlearning.com/.
Scheuch, Christoph, Stefan Voigt, and Patrick Weiss. 2023. Tidy Finance with R. Chapman & Hall/CRC. https://www.tidy-finance.org/r/.
Welch, Ivo, and Amit Goyal. 2008. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.”Review of Financial Studies 21 (4): 1455–1508. https://doi.org/10.1093/rfs/hhm014.