Lecture 5: Prediction markets, the Polymarket Quant Bench & your project

From Welch-Goyal to event-resolved binary contracts

Authors
Affiliation

Prof. Dr. Andre Guettler

Institute of Strategic Management and Finance, Ulm University

Oliver Padmaperuma

Institute of Strategic Management and Finance, Ulm University

Published

May 13, 2026

5.1 Course objectives

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Welcome to
  • Course Objective
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Assignments / Exams

Welcome to Finance Project — Asset Management

  • This is a project course: there is no central exam to register for. Sign up on the course Moodle page by 15 April 2026 so you receive announcements and the data link.
  • Submit the project by 30 June 2026 as a single zip — name pattern: Asset2026_surname1_surname2_surname3. Email it to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates.
  • Ask questions during or right after each session — that is the preferred channel.
  • Admin / studies / exam-eligibility questions go to the registrar’s office (Studiensekretariat) at studiensekretariat@uni-ulm.de.
  • Course-content questions outside class: email oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de.
  • We also recommend the student advisory service.

Course Objective

Scope

We will:

  • Build an end-to-end empirical pipeline in R: load, explore, model, back-test
  • Cover the core ML toolbox for asset-management research: linear models, Ridge, Lasso, Elastic Net, cross-validation
  • Apply it to a non-traditional asset class: prediction markets
  • Develop your own indicator library and trading strategy in groups of three

We will NOT:

  • Drift into deep-learning or reinforcement-learning methods
  • Cover prediction markets in depth
  • Provide a “ready-to-fork” backtest — the demo code is intentionally basic

Approach

Part I — Foundations

  • L1: Motivation, organisation, backtesting fundamentals
  • L2: Hands-on R intro — RStudio, live coding, etc.
  • L3 + L4: Statistical learning — model accuracy, regularisation, resampling

Part II — Application

  • L5: Prediction-markets primer + the Polymarket dataset + assignment briefing
  • Project work in groups of three (≈ 7 weeks of self-organised work)
  • Final session (1 July): 20-minute presentations per team

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Assignments / Exams

Project (Code + Report) 50% of your grade

Rmd code + knitr-rendered PDF report. Build a library of indicators over the Polymarket Quant Bench dataset (curated OHLCV bars on HuggingFace, derived from Jon Becker’s polymarket-data dump), derive trade signals, back-test, and write a critical reflection.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-1-project-report_surname1_surname2_…

30 June 2026

Final Presentation 50% of your grade

20-minute group presentation in class on 1 July 2026; submit slides as PDF together with the project zip.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-2-final-presentation_surname1_surname2_…

1 July 2026

5.2 Recap of the empirical R toolkit

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • What you’ve already covered
  • You’re now ready to …

What you’ve already covered

  • L1: backtesting fundamentals — IS vs OOS, \(R^2_{OS}\), the useful-predictor checklist, p-hacking discipline.
  • L2: R fundamentals — RStudio, vectors, data frames, functions.
  • L3: bias / variance trade-off, Ridge regression (L2 penalty).
  • L4: Lasso (L1 penalty, exact-zero coefficients), cross-validation (validation set, LOOCV, K-fold), OLS post-Lasso, Elastic Net.

Notes

Lecture 5 is the bridge between methods (L1–L4) and project (today onwards). From now on we apply the toolkit to a real dataset rather than worked examples. The pace shifts: less new theory, more “here’s how you assemble these pieces into a working pipeline”.

If any L1–L4 concept feels foggy, the corresponding handout (slides → handout.html for that lecture) is the reference document — much more detailed than the in-class delivery. Spending an hour reviewing the L4 handout (especially the CV and Lasso sections) before today is high-leverage if you didn’t internalise those lectures fully the first time.

You’re now ready to …

  • Engineer features (indicators) from a price time-series.
  • Use Ridge / Lasso / Elastic Net to combine many indicators into one signal without overfitting.
  • Pick hyper-parameters honestly with K-fold CV.
  • Run a walk-forward back-test that respects the chronology of your data.

Today: meet the asset class that will host all of this — prediction markets.

Notes

The four-bullet capability list is what you’ll exercise during the project. Each capability maps to one or two CRAN packages — the toolbox slide later in the deck names them. Walk-forward backtesting is the harder one operationally; if you’ve never written a rolling-window time loop, plan an hour to internalise the pattern with a small synthetic example before doing it on the real dataset.

5.3 Prediction markets — primer

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • What is a prediction market?
  • Polymarket — the canonical venue
  • How prices form — and why finance researchers care
  • Differences vs traditional asset classes

What is a prediction market?

  • A prediction market is a venue where participants trade contracts whose payoff depends on the outcome of a future event.
  • The most common contract pays $1 if the event happens, $0 otherwise — a “Yes / No” binary contract.
  • The price of the Yes contract therefore behaves like a market-implied probability of the event between 0 and 1.
  • Examples: “Will the ECB cut rates by December 2026?”, “Will Bitcoin go up or down in the next 5 minutes?”.

Notes

The structural feature that makes prediction markets distinctive is the bounded payoff. Every Yes contract is worth between $0 and $1 at any moment, and exactly $0 or $1 at resolution. That bound:

  • Forces price interpretation as probability — under standard no-arbitrage assumptions, the price equals the market’s risk-neutral probability that the event occurs. A price of 0.40 means “the market thinks there’s a 40 % chance”.
  • Creates a known-terminal-value time series — unlike stocks or bonds where the long-run value is uncertain, here we know exactly what the asset is worth at the end. That makes evaluating predictions much cleaner: did your indicator anticipate the resolution?
  • Naturally bounds drawdown per market — you can’t lose more than $1 per share per market, which simplifies risk management compared to leveraged or open-ended assets.

Wolfers and Zitzewitz (Wolfers and Zitzewitz 2004) is the canonical survey of the academic literature on prediction markets — calibration evidence, design choices, and the conditions under which prices aggregate information well.

Polymarket — the canonical venue

  • Web-native prediction-market exchange running on the Polygon blockchain (USDC-settled).
  • Users buy and sell Yes / No shares at any time before the market resolves.
  • Each market has an order book, a last-traded price, volume, and liquidity.
  • At resolution, Yes-share holders receive $1 per share if the event occurred; No-share holders receive $1 otherwise.
  • Polymarket’s website lists thousands of resolved markets, mostly across politics, sports, crypto, geopolitics, and macro.

Notes

Polymarket has become the dominant prediction-market exchange — its 2024 US-election markets traded billions of dollars and were widely cited in mainstream coverage. Operationally:

  • USDC settlement on the Polygon blockchain — gas costs are tiny, so traders can update positions frequently.
  • Continuous order book rather than a market-maker — bids and offers from individual users determine the price, similar to a stock exchange.
  • Resolution is determined by oracle — typically the UMA optimistic oracle for events that need objective verification. Disputes are rare but possible; the dataset metadata flags them.

For your project, the clean academic interpretation (“price = probability”) works best on the actively-traded markets. Thinly-traded markets have wide spreads and stale prices that don’t reflect updated beliefs — you’ll want to filter these out as part of universe construction (covered later in this deck).

How prices form — and why finance researchers care

  • Prices aggregate the beliefs of traders weighted by their dollar conviction.
  • Under standard assumptions (no manipulation, low frictions) the price is a calibrated probability of the event.
  • The empirical literature treats prediction-market prices as forward-looking forecasts — they react to news within seconds and outperform polls in many event categories.
  • For an asset-management course they offer a clean test bed: thousands of independent, event-resolved time series with a known terminal value (0 or 1).

Notes

The “calibrated probability” claim is the empirical regularity that makes prediction markets interesting for academic research. Wolfers and Zitzewitz (Wolfers and Zitzewitz 2004) survey decades of evidence: prediction-market prices generally outperform expert forecasts and polls, especially for events where many independent informed agents trade.

Manski (Manski 2006) is the cautionary counterpoint — under certain assumptions about trader beliefs and risk preferences, the price is not equal to the average probability across traders. Read both papers; the truth is somewhere in between, and being explicit about the assumptions matters when interpreting your indicator’s predictions.

For your project, thinking of price as “the market’s probability” is the right working assumption. Deviations from that assumption (overpricing of long-shots, underpricing of certainties) are themselves trading opportunities — these are well-documented anomalies in horse-race and political markets, and your indicator could exploit them.

Differences vs traditional asset classes

  • No terminal value — perpetual security.
  • Price ≈ discounted expected future cash flows.
  • Drift is the equity / risk premium.
  • Cross-sectional anomalies dominate the literature.
  • Liquidity is deep and continuous.
  • Terminal value is exactly 0 or 1 at resolution.
  • Price ≈ instantaneous probability of the event.
  • “Drift” is mechanically driven by news and the resolution clock.
  • Each market is its own micro-asset class.
  • Liquidity is highly heterogeneous — many markets are thin.

Many of your L1–L4 tools transfer; the interpretation of risk and return changes. There is no Sharpe-ratio analogue without first defining a notion of “return” on a binary contract.

Notes

The asset-class differences shape how you build indicators. Some translation patterns to keep in mind:

  • “Trend” in stocks = price has been rising. “Trend” in prediction markets = probability has been rising. Same indicator (rolling mean of return / change), but the magnitudes are bounded by [0, 1] and the trend has to terminate at the resolution.
  • “Mean reversion” in stocks = price returns to a long-run mean. “Mean reversion” in prediction markets is dubious — the probability genuinely changes as the underlying event clarifies. A market that’s been at 0.3 for six months has no reason to revert toward 0.5.
  • “Risk premium” in stocks is the long-run excess return for bearing market risk. There is no risk premium in prediction markets in the same sense — the expected return depends on whether you’re buying mispriced contracts.

The “no Sharpe analog” caveat is important to think through early in your project: how will you measure risk-adjusted return? Hit rate (% of trades correct) is one option; PnL per dollar risked over many markets is another. Your project report should justify whichever measure you pick.

5.4 The Polymarket Quant Bench dataset

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Source — Polymarket Quant Bench (HuggingFace)
  • One-time setup — download the dataset (HF CLI)
  • If the setup gets stuck
  • Loading the data in R
  • Three configs — schema cheat sheet
  • A first descriptive look
  • One market end-to-end — YES price + 20-day SMA
  • Caveats & data-quality notes

Source — Polymarket Quant Bench (HuggingFace)

  • We use the Polymarket Quant Bench dataset published by our institute on HuggingFace — a curated OHLCV view over ~603 MB of parquet shards.
  • Built on top of Jon Becker’s prediction-market-analysis raw on-chain dump (Becker 2025); we resample to clean hourly and daily bars and pre-filter to liquid resolved markets (≥ $100k cumulative volume and ≥ 200 on-chain trade fills).
  • Three named configs: markets (36,831 rows, one per market), bars_hourly (~12.66M rows), bars_daily (~1.46M rows).
  • Public, CC-BY-4.0, citation key smf2026polymarketquantbench (Strategic Management and Finance 2026).

Notes

The Polymarket Quant Bench (Strategic Management and Finance 2026) is the SMF Lab’s curated, OHLCV-shaped view onto the raw Polymarket trade history. Why a curated dataset rather than the raw dump?

  • Bars, not ticks. The upstream data is at the on-chain trade level — millions of OrderFilled events with timestamps. Resampling those into uniform OHLCV bars (open/high/low/close/vwap/volume/n_trades per token-period) is the operation every project ends up doing anyway. We do it once, deterministically, and ship the result.
  • Liquidity floor pre-applied. The raw dump includes thousands of thinly-traded markets that contaminate any backtest. We filter to markets with at least $100k cumulative volume AND at least 200 on-chain fills. The resulting 36,831-market panel is dense enough to do empirical work on.
  • Reproducibility. HuggingFace versions the dataset by commit sha, so when you cite the dataset with a specific revision your project remains exactly reproducible even if we publish a new revision later.

Always credit Jon Becker’s polymarket-data (Becker 2025) as the upstream source whenever you reference the dataset — that’s where the raw trade data comes from. The Quant Bench is a derived product.

One-time setup — download the dataset (HF CLI)

# 1. Install the HuggingFace CLI (one-time, per machine).
#    Comes with the official `huggingface_hub` Python package.
pip install huggingface_hub

# 2. Set up your group's project folder. Recommended layout:
#
#      asset-group-NN/
#      ├── asset-group-NN.R     ← RStudio project file
#      ├── asset-group-NN.Rmd   ← your analysis lives here
#      ├── data/
#          └── polymarket/      ← dataset lands here (~603 MB)

# 3. Download the dataset. The CLI shows a live progress bar.
hf download smf-ulm/polymarket-quant-bench \
    --repo-type dataset \
    --local-dir data/
  • One-time per machine: pip-install huggingface_hub, run the CLI command once.
  • --local-dir data/ writes real file copies into your project tree.
  • If you work with GitHub, add data/ to .gitignore so the 603 MB never enters your repo.

Notes

This is a one-time setup that lives outside your Rmd, in a terminal. You only run it once per machine — after that, the dataset is on disk and every R session in the project just reads from data/polymarket/.

Where to get the terminal: on Windows use PowerShell or the Anaconda Prompt; on macOS use Terminal; on Linux any shell. You don’t need to run R / RStudio while doing this step.

Why this folder layout? Keeping the dataset inside the project folder (rather than in a global cache directory) means: (a) your data/polymarket/ path is the same on every group member’s machine; (b) it’s clearly excluded from git via the data/ entry in .gitignore; (c) when the project is over you delete the folder and recover all 603 MB cleanly. The earlier alternative — a global cache — sounds elegant but creates Windows-symlink issues and makes it harder for teammates to share the exact-same file paths in code.

Pinning revisions: the HuggingFace dataset page shows the commit history (the “History” link on the dataset page). Each commit has a sha — append --revision <sha> to the CLI command and you get exactly that revision. Do this before submission so the marker re-runs against the same bytes you did. Without it, you implicitly pin to main, which can drift if we publish a new revision.

Python install if you don’t have it: download from https://python.org (Windows / macOS) or use your distro’s package manager (Linux). pip ships with Python 3. Conda / Anaconda / miniconda also work — the hf command becomes available after pip install huggingface_hub in either case. (Older guides may mention huggingface-cli; that’s the deprecated alias for the same tool — use hf.)

Re-downloads are safe — the CLI checks file hashes and only re-fetches what changed. You can re-run the command after an interrupted download and it picks up where it left off.

If the setup gets stuck

  • No Python on your machine → install from https://python.org (Windows / macOS) or your distro’s package manager (Linux). pip ships with Python ≥ 3.4. Tick “Add Python to PATH” during install on Windows.
  • pip: command not found → use the Python-launcher fallback:
    • Windows → py -m pip install huggingface_hub
    • macOS / Linux → python3 -m pip install huggingface_hub
  • hf: command not found after a successful install → close and reopen your terminal (PATH refresh). If still missing, fall back to the module form python -m huggingface_hub download … (same flags as before).
  • You see huggingface-cli is deprecated, use hf instead → just replace huggingface-cli with hf in your command. The flags are identical.
  • Download interrupted (Wi-Fi drop, timeout) → just re-run the same command. The CLI hashes every file and only re-fetches what’s missing or partial.
  • Behind a corporate / university proxy → set HTTPS_PROXY=http://<proxy-host>:<port> (and HTTP_PROXY similarly) before running the CLI. Ask IT for the right address if you’re not sure.

Notes

The five categories above cover ~95% of student install issues. A few extra operational notes:

Python install on Windows — be careful to tick “Add Python to PATH” during the python.org installer. If you skip that box, every subsequent pip and python invocation fails with “command not found” until you either re-run the installer with that option on, or manually add C:\Users\<you>\AppData\Local\Programs\Python\Python3XX\ to PATH.

The py Python launcher is Windows-specific and ships with the python.org installer. It’s the reliable way to invoke pip if your PATH is broken — py -m pip always works, regardless of how many Pythons are installed or which is “active”. For students with multiple Python versions, prefer py -m pip over pip until you’ve sorted PATH out.

hf not found after install is the most common single complaint — happens when pip writes the CLI executable to a Scripts/ (Windows) or bin/ (macOS/Linux) folder that isn’t on PATH yet for the current terminal session. The reliable fallback python -m huggingface_hub download … always works because it invokes the module directly without depending on the PATH-managed shim. Slightly verbose but bulletproof; teach this one if a student is panicking before class.

Note on the rename: huggingface-cli was the original name; HuggingFace renamed it to hf and deprecated the old form. Recent versions still recognise huggingface-cli but print a deprecation warning saying “use hf instead”. Students reading older Stack Overflow answers will see the old name — point them to this slide’s command syntax as the canonical version.

Proxies / corporate firewalls: Ulm’s university network sometimes routes through a proxy that breaks the default CLI download. If a student hits an SSL or connection error on a uni-Wi-Fi laptop, the proxy env vars usually fix it. The HTTPS_PROXY env var is honoured by Python’s requests library which huggingface_hub uses internally. If they’re on home Wi-Fi or eduroam direct, there’s no proxy and the problem is something else.

Disk space — make sure you have at least 1 GB free on the drive your project folder lives on (603 MB for the dataset + headroom for parquet decompression during reads). On Windows that’s usually C: unless you keep your projects on a different drive.

If all else fails — the dataset is also browsable in the HuggingFace web UI; in the worst case students can click-download individual parquet shards manually. Slow but unblocks them for class.

Loading the data in R

# install.packages(c("arrow", "dplyr", "ggplot2", "lubridate"))
library(arrow)        # parquet + lazy datasets
library(dplyr)        # wrangle
library(ggplot2)      # plot
library(lubridate)    # dates

# Path that matches the --local-dir from the CLI download.
local_path <- "data/polymarket"

# Open each config lazily — 1,418 parquet shards across three folders;
# arrow stitches them as one logical table.
markets     <- arrow::open_dataset(file.path(local_path, "markets"))     |> collect()
bars_daily  <- arrow::open_dataset(file.path(local_path, "bars_daily"))  |> collect()
bars_hourly <- arrow::open_dataset(file.path(local_path, "bars_hourly")) |> collect()
  • arrow::open_dataset() |> collect() loads each config into RAM as a regular tibble. Peak memory across all three is ~1.5 GB — comfortable on a 16 GB laptop.
  • The |> (pipe) is base-R syntax sugar — same as collect(open_dataset(...)). No magic, just chains the calls left-to-right.
  • arrow::open_dataset() stitches the parquet shards transparently — bars_hourly is hundreds of separate files but you see one tibble.
  • Re-running these three lines after the first download is instant (data already on disk).

Notes

The R code is intentionally minimal: open the three configs, decide which to materialise upfront.

Why arrow::open_dataset() rather than read_parquet()? The dataset is sharded — each config is split into hundreds of separate parquet files. read_parquet() reads one file; open_dataset() reads many and stitches them as one logical table. We pipe straight into collect() because we materialise everything (1.5 GB peak across the three configs is fine for the modern laptops the cohort runs).

About the |> pipe operator: it’s the modern R native pipe, available since R 4.1. x |> f() is exactly f(x); x |> f() |> g() is g(f(x)). Purely a readability convenience — no RAM impact, no special evaluation semantics. If you prefer the tidyverse %>% from magrittr, that’s fully interchangeable here.

If RAM ever becomes a constraint (e.g., you’re on an 8 GB machine and your trading-rule loop balloons memory): keep the arrow::open_dataset() call WITHOUT |> collect(). That gives you a lazy reference that only materialises rows when you filter and collect() later. We’re not doing that in the lecture for simplicity, but it’s a one-line change if you need it.

Advanced alternative: DuckDB’s httpfs extension can query parquet over HTTPS without any local download — dbExecute(con, "INSTALL httpfs; LOAD httpfs;"), then SELECT … FROM 'https://huggingface.co/.../file.parquet'. Cleaner for read-once exploration, but you’d lose the local-cache speed for the thousand-iteration indicator-tuning loop. The CLI-downloaded local copy is the right default.

Three configs — schema cheat sheet

  • markets (one row per market, 36,831 rows):
    • id, condition_id (parent / child link), question, slug, category
    • outcomes, outcome_prices (JSON arrays — winner closes near 1.0)
    • clob_token_ids (JSON [yes_token_id, no_token_id] — pairs market to bars)
    • volume, liquidity, created_at, end_date
  • bars_daily / bars_hourly (one row per token × period):
    • token_id (YES or NO — not the market!)
    • period_start, period_end (UTC)
    • open, high, low, close, vwap — all in [0, 1] (implied probability)
    • volume_usd, n_trades, n_buys, n_sells

Three gotchas: (1) bars are per token, so YES and NO are separate series — pair via clob_token_ids if you want a mid. (2) Bars are sparse (no row for periods with zero trades) — tidyr::fill() to forward-fill. (3) liquidity in markets is a snapshot when the data was collected, not a time series — use volume_usd in bars for time-varying liquidity.

Notes

Three things worth internalising before writing any indicator code:

  1. The token-level grain of the bars. Polymarket is a CTF (conditional-token framework) market — each binary market has TWO outcome tokens (YES and NO). Bars are emitted per token. If you compute mean(close) on bars_daily across all rows naively, you’re averaging YES and NO prices, which sum to 1 by construction (modulo spread) — your “mean” is meaningless. Always filter to one side (YES) first, or compute the mid via the clob_token_ids mapping.

  2. Sparse bars are sparse on purpose. A market that traded 3 times last week has 3 hourly rows that week, not 168. For rolling indicators you’ll want to forward-fill: bars |> arrange(period_start) |> tidyr::fill(close, .direction = "down"). Pick a fill semantic and document it.

  3. The terminal value is encoded in outcome_prices. It’s a JSON-stringified array [p_yes_final, p_no_final]. Parse with jsonlite::fromJSON() (or purrr::map + fromJSON per row). The winning outcome’s price is essentially 1.0; the losing outcome’s is essentially 0. Use this to compute realised win-rates, P&L, etc.

The Quant Bench README on HuggingFace (Strategic Management and Finance 2026) is authoritative for the full column list — bookmark it.

A first descriptive look

# Row counts per config — sanity check after download.
nrow(markets)        # 36,831
nrow(bars_daily)     # 1,462,282
nrow(bars_hourly)    # 12,655,266

# Distribution of markets by category.
markets |>
  count(category, sort = TRUE) |>
  ggplot(aes(reorder(category, n), n)) +
  geom_col() + coord_flip() +
  labs(x = NULL, y = "Markets", title = "Resolved markets by category")

# Histogram of cumulative volume per market (log scale).
markets |>
  filter(volume > 0) |>
  ggplot(aes(volume)) +
  geom_histogram(bins = 60) + scale_x_log10() +
  labs(x = "Cumulative volume (USDC, log)", y = "Markets",
       title = "Volume distribution across resolved markets")
  • The three counts (36,831 / ~1.46M / ~12.66M) match the dataset card on HuggingFace — first reproducibility check of any project.
  • category is best-effort upstream labelling (Politics, Sports, Crypto, …). Treat as a hint, not a contract — verify by sampling.
  • Volume is heavy-tailed — log scale on the histogram x-axis is necessary; linear obscures the structure.

Notes

The “row count per config” check is a 10-second sanity test that catches a surprising fraction of data issues: a corrupted shard, a partial download, an unzipped directory at the wrong layer. Add it as the first cell of every Rmd that touches this dataset.

The category histogram tells you immediately where the bulk of the data lives — typically Politics dominates, with Sports and Crypto strong-but-smaller. If your project plans to focus on (say) Crypto markets, the count tells you whether you have enough markets to do meaningful empirical work in that bucket.

The volume histogram on a log scale is the standard plot for any heavy-tailed financial quantity. The Quant Bench is pre-filtered to ≥ $100k volume, so the left edge of the histogram is hard-floored — what’s interesting is the upper tail, which spans many orders of magnitude (the largest US-election markets traded > $1bn USDC).

One market end-to-end — YES price + 20-day SMA

library(jsonlite)
library(slider)
library(patchwork)

# Pick the most heavily traded market, parse out its YES token id.
top_mkt <- markets |> slice_max(volume, n = 1)
yes_id  <- fromJSON(top_mkt$clob_token_ids)[1]

# Pull its full daily bar history (one row per calendar day).
mkt_bars <- bars_daily |>
  filter(token_id == yes_id) |>
  arrange(period_start) |>
  mutate(sma_20 = slide_dbl(close, mean, .before = 19, .complete = TRUE))

p_price <- ggplot(mkt_bars, aes(period_start)) +
  geom_line(aes(y = close),  colour = "steelblue", linewidth = 0.4) +
  geom_line(aes(y = sma_20), colour = "darkorange", linewidth = 0.6) +
  labs(title = top_mkt$question, y = "Implied probability", x = NULL)

p_vol <- ggplot(mkt_bars, aes(period_start, volume_usd)) +
  geom_col(width = 1, fill = "grey40") +
  labs(y = "Volume (USDC)", x = NULL)

p_price / p_vol      # patchwork: stacks the two panels vertically
  • slice_max(volume, n = 1) returns the single most-traded market — a US-election or crypto-price market in most snapshots.
  • clob_token_ids is a JSON-stringified [yes, no] array — fromJSON() parses it to a length-2 character vector; take [1] for YES.
  • slider::slide_dbl() is the modern, vectorised rolling-window primitive — .before = 19 + the current row = 20-day window.
  • patchwork’s / operator stacks two ggplot panels vertically, sharing the x-axis.

Notes

This is the “first plot” template for the project — once it runs end-to-end on the most-traded market, you have a working skeleton you can adapt to any market in the panel.

Three places to look when something looks wrong:

  • Flat-line price. If the close stays constant for long stretches, the bars are likely sparse and you forgot to forward-fill. Re-run with tidyr::fill(close, .direction = "down") after the arrange().
  • Price outside [0, 1]. Shouldn’t happen on the Quant Bench (we clip upstream), but a sanity check summary(mkt_bars$close) should always show min ≥ 0 and max ≤ 1.
  • NA in sma_20 at the start. Normal — the first 19 rows can’t form a 20-day window. .complete = TRUE ensures NA rather than a partial-window mean.

The 20-day SMA is the simplest trend indicator you’ll see during the project. RSI (Relative Strength Index) and Bollinger bands are similar one-liners using TTR::RSI() and TTR::BBands() against the same close column — try them next.

Caveats & data-quality notes

  • Sparse bars — many markets trade in bursts; intermediate hours / days have no row. Forward-fill with tidyr::fill(close, .direction = "down") after arrange(period_start) before any rolling computation.
  • YES vs NO is a token, not a market. bars_daily is keyed on token_id. Always filter to the YES (or NO) side first; otherwise you’re mixing two anti-correlated series.
  • Categories are best-effort labels. Useful for stratifying but not contractually correct — sample 10 markets per category and confirm the labels match your intuition.
  • Survivorship — every market in this dataset resolved. Be careful interpreting “the average market converges to its terminal value” — by construction, these are the markets that did terminate.
  • Timestamps are UTC throughout. No timezone conversion needed inside the dataset; only at the reporting layer (German lecture times etc.).

Notes

Each caveat is a concrete defensive step:

  • Sparse-bar fill is the most common silent bug: a momentum indicator computed without forward-fill double-counts the trading-day gaps. Make tidyr::fill() the second operation after arrange().
  • Token-level grain: filter on clob_token_ids[1] (YES) for “Yes-side” indicators; on [2] (NO) for the opposing side. If you want a market mid, pair the two and take (yes + (1 - no)) / 2.
  • Category labels: upstream metadata is occasionally mis-categorised — “Politics” markets sometimes drift into “World” etc. Don’t make category-conditional claims without spot-checking 5–10 rows.
  • Survivorship: the Quant Bench’s universe is resolved markets. Markets that were cancelled, paused, or never reached resolution are excluded by construction. Your trading rule cannot “see” cancellation risk because the data doesn’t include it — note this in the report’s limitations section.

Document each defensive step in the Rmd. The marker reads the script and forms an opinion about whether your universe construction was deliberate.

5.5 Your project

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Goal
  • Suggested workflow
  • R package toolbox
  • Indicator categories — pick at least five
  • Frequently overlooked things
  • Deliverables — recap

Goal

Project goal

In groups of three, design a small library of indicators on the Polymarket subset, derive trading signals from them, back-test a strategy on the price history, and write a critical reflection on what works and what doesn’t.

Optional but encouraged: bring external data (Google Trends, news, related markets, sports / political odds, weather, …) to enrich your indicators.

Notes

The “indicators → signal → backtest → reflection” structure is the canonical empirical-finance project shape. A few things to know going in:

  • Indicators don’t have to be original. Reusing well-known technical indicators (moving averages, RSI, momentum) is fine — your contribution is in combining them, adapting them to prediction-market dynamics, or adding a non-standard signal (Google Trends, news sentiment).
  • External data is high-value. Most groups use only the price data. Bringing in Google Trends, news headlines, or related-market prices materially differentiates your project. Two cautions: (a) make sure the external data is publicly accessible so the marker can reproduce your pipeline; (b) be honest about the look-ahead risk — Google Trends data has reporting lags that you must respect.
  • “What works and what doesn’t” is graded as carefully as “what works”. A project that honestly reports a failed indicator with a clear analysis of why it failed is much better than one that buries failures and overstates a marginal success.

Suggested workflow

  1. Explore the dataglimpse, summary, basic plots; pick the market category you’ll focus on.
  2. Define a universe — set inclusion rules (minimum volume, minimum trading days, category, resolution date range).
  3. Engineer indicators — at least 5–7, drawn from technical / statistical / external-data buckets (next slide).
  4. Combine them into a signal — Ridge / Lasso / Elastic Net with K-fold CV (Lectures 3–4).
  5. Back-test honestly — walk-forward, frictions explicit, cross-validate \(\lambda\) on training data only (Lecture 1).
  6. Reflect — does the strategy still work on a held-out cohort of markets? Where does it break?

Notes

The six-step workflow is sequential — don’t jump to step 4 without doing 1–3 properly. Two pitfalls to avoid:

  • Skipping universe definition. “I included all markets” is rarely the right choice — the data has thin markets, cancelled markets, edge cases. Defining a universe with explicit inclusion rules is part of the project, and the marker will read those rules to assess whether your conclusions generalise.
  • Mixing CV and walk-forward. CV (Lecture 4) is for choosing hyperparameters; walk-forward (Lecture 1) is for evaluating the final strategy. Both are necessary. The wrong order is: walk-forward → optimise \(\\lambda\) on the same data → backtest with the chosen \(\\lambda\). The right order is: walk-forward → at each step, do CV on only the training-up-to-that-point data → use the CV-selected \(\\lambda\) to predict the next step.

The “held-out cohort of markets” check (step 6) is the hardest robustness test: train your model on, say, all politics markets resolved before 2024; evaluate on politics markets resolved in 2024–25. Genuine signal generalises across time and similar contexts; an overfit model collapses.

R package toolbox

  • arrow — read parquet (incl. sharded open_dataset())
  • (no Hub client needed — the CLI does the download once, R just reads the local copy)
  • tidyverse (dplyr, readr, tidyr)
  • lubridate — dates
  • data.table — large data
  • jsonlite — JSON dumps
  • gh — GitHub API
  • xts, zoo — time-series objects
  • tsibble, slider — rolling windows in tidy form
  • TTR — classic technical indicators (SMA, EMA, RSI, Bollinger…)
  • quantmod, tidyquant — quant wrappers
  • forecast — ARIMA, ETS if needed
  • glmnet — Ridge / Lasso / EN
  • caret or tidymodels — CV pipelines
  • PerformanceAnalytics — Sharpe-like metrics
  • ggplot2 — figures
  • rmarkdown, knitr, kableExtra — your deliverable

Notes

The package toolbox covers ~95 % of what you’ll need. A few high-leverage notes:

  • arrow — much faster than readr::read_csv for the parquet files. Install once, use everywhere.
  • slider::slide_dbl is the modern, tidyverse-friendly way to compute rolling-window indicators. xts::rollapply is the classic alternative; either works. Avoid bare for-loops for rolling computations — both packages are vectorised in C and are 10–100× faster.
  • TTR has every classical technical indicator implemented and tested — SMA, EMA, MACD, RSI, Bollinger, ROC, ADX, etc. Use ?SMA etc. to learn the API; the function signatures are consistent.
  • glmnet + caretglmnet for the actual fitting; caret for cross-validation pipelines that handle preprocessing, CV, and tuning sweeps in one consistent API. tidymodels is the modern alternative; either works.
  • PerformanceAnalytics — Sharpe-like and drawdown metrics in one library. Useful even for non-stock-return data once you’ve defined a “return” measure.
  • rmarkdown + kableExtra — the deliverable is an Rmd that knits to PDF. kableExtra makes regression and summary tables look publication-grade.

Indicator categories — pick at least five

  • Trend — moving averages (SMA / EMA), MACD, slope of fitted line over a rolling window.
  • Momentum — RSI, rate-of-change, recent return percentile.
  • Volatility — Bollinger bands, rolling std-dev, GARCH if you’re brave.
  • Volume / liquidity — VWAP, volume z-score, bid-ask spread when available.
  • Time-to-resolution — days until resolution, log-clock decay.
  • Cross-market — correlation with related Polymarket markets, parent / child contracts.
  • External signals — Google Trends, news sentiment (e.g. tidytext + a labelled corpus), polls, weather, sports odds.

Notes

Aim for 5–7 indicators across at least three categories. A balanced selection has more chance of generating a useful combined signal than seven momentum indicators (which are highly correlated with each other).

Specific suggestions per category:

  • Trend (1–2 indicators) — 7-day and 30-day moving averages of the Yes price. Slope of a linear fit over a rolling window. Fast/slow MA crossover.
  • Momentum (1–2) — 7-day return; rank of current price within trailing 30-day window.
  • Volatility (1) — rolling standard deviation of daily return; or Bollinger band width.
  • Volume / liquidity (1) — log volume rolling mean; bid-ask spread when available.
  • Time-to-resolution (1) — days remaining until resolution; log-decay clock. This is special to prediction markets — no equivalent in stocks. Captures the “as the resolution date approaches, prices should become more binary” intuition.
  • External (optional but encouraged) — Google Trends search volume for keywords related to the market; news sentiment from tidytext on a corpus you scrape.

For categorical markets (e.g. politics), poll data is the most natural external signal. For sports, odds from bookmakers (Pinnacle, Betfair) are publicly accessible.

Frequently overlooked things

  • Don’t use for loops for vectorisable computations — use slider::slide_dbl, dplyr::mutate(across(...)), or data.table syntax. Marked down at grading.
  • Cite your data — anywhere you read a column, comment what it represents. The Rmd should be self-explanatory.
  • Train-test discipline — never tune \(\lambda\) on the test set. Walk-forward in time wherever possible.
  • Transaction costs — be explicit (even a flat 1 % per trade is honest; ignoring them is not).
  • Reproducibilityset.seed() everywhere, lock package versions if you can (renv is overkill but worth knowing).

Notes

The five rules are the difference between a well-graded project and a deduction-heavy one.

No for loops for vectorisable work — this is a code-quality concern but also a real performance issue. A walk-forward backtest with a naive for-loop on 5000 markets × 365 days takes minutes; a slider-based vectorised version runs in seconds. The marker checks for vectorisation idioms.

Cite your data in code as well as in the report — comment every column read with what it represents, every threshold with why it was chosen. The Rmd should be self-explanatory to a reader who hasn’t read the report.

Train-test discipline is the most common gradesink. Even sophisticated groups occasionally tune \(\\lambda\) on the test set without realising it. Two safeguards: (1) write the train/test split before any model code; (2) never refer to test data inside cv.glmnet calls.

Transaction costs — a back-test that ignores costs is essentially fictional. Even a flat 1 % per trade is honest; the better choice is to model the bid-ask spread observed in the data (use the spread quintile of the relevant market as a per-trade cost).

Reproducibilityset.seed(42) at the top of every chunk that uses randomness (CV folds, random splits). The marker should be able to run your Rmd and reproduce every number; if they can’t, that’s a deduction.

Deliverables — recap

Summary

Submit your assignment by 30 June 2026, 18:00 in a single zip-folder named Asset2026_surname1_surname2_surname3 containing:

  1. Your Rmd code (well-commented, vectorised, helper functions for repetitive logic).
  2. Your project report as PDF (knitted from the Rmd, 10–15 pages).
  3. Your presentation slides as PDF (~20 minutes’ worth of content).

Email the zip to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates. Subject line follows the same pattern as the zip name.

Notes

Three artefacts in one zip — Rmd, knitted PDF, presentation slides PDF. Each has a specific role:

  • Rmd — the source. The marker runs this end-to-end to confirm reproducibility. Every analysis, every figure, every number in the report comes from this file. Comment liberally.
  • Report PDF — the polished narrative. Knit from the Rmd; don’t hand-edit. 10–15 pages with sections roughly mirroring an academic paper structure (Introduction, Data, Methods, Results, Conclusion). The Lecture 4 handout in the Research in Finance sister course has a longer discussion of academic-paper structure that’s worth a skim.
  • Presentation slides PDF — for the final-presentation session. ~20 minutes’ worth of content, so 15–20 slides typically. The audience is your peers and instructors; assume they have the report but haven’t read it yet.

The naming convention Asset2026_surname1_surname2_surname3 is parsed by the marker’s inbox filter — getting it wrong risks the email being lost in noise. Triple-check before sending.

5.6 Conclusion of Lecture 5

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Further reading
  • Prepare before the final session
  • See you on 1 July
  • References

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Further reading

  • Wolfers and Zitzewitz (2004) — foundational survey of prediction-markets research and design.
  • Manski (2006) — when prediction-market prices are not calibrated probabilities; cautionary reading.
  • James et al. (2021) — Chapters 5–6 stay relevant; Lasso + CV are your default tools.

Notes

Two short, high-leverage academic readings on the asset class itself:

  • Wolfers and Zitzewitz (Wolfers and Zitzewitz 2004) is the canonical academic survey. Read sections on calibration, market design, and prediction-market vs. expert-forecast comparisons. Section on “issues raised by prediction markets” is the right framing for your project’s interpretation discussion.
  • Manski (Manski 2006) is the cautionary counterpoint — under particular assumptions about trader risk preferences, prices are not equal to average beliefs. A useful corrective if your project’s interpretation depends on naïve “price = probability”.

JWHT chapters 5 (CV) and 6 (Lasso, Ridge, Elastic Net) remain the operational reference for the modelling part of your project — you’ll reach for them throughout the project phase.

Prepare before the final session

  1. Form your group of 3 by 13 May 2026 + 1 week at the latest. Email Oliver if you can’t form one.
  2. Sketch your indicator menu before writing code — peer-review within the group.
  3. Reach out to oliver.padmaperuma@uni-ulm.de (CC andre.guettler@uni-ulm.de) for any blocking questions during the project phase — fast turnaround.
  4. Finish your assignment 😎!

Notes

Group formation: groups of 3 produce the best work for a project of this scope — solo is too much load; pairs lack a tie-breaker for design choices; 4+ has coordination overhead. If you don’t have a group, email Oliver and we’ll allocate.

Indicator menu sketch first — peer-reviewing within the group before any code is written catches misaligned expectations early. Each member proposes 2–3 indicators with one-line motivation; the group picks the best 5–7. Saves rework when you discover three weeks in that the team had different mental models of what the project was about.

Consultation hours during the project phase are fast-turnaround — email with a specific blocker and we’ll get back within a day. Don’t suffer alone for two weeks on a problem that takes 30 minutes to debug with help.

See you on 1 July

Final presentations
  • 20 minutes per group + Q&A.
  • Submit Rmd + report PDF + slides PDF as a single zip by 30 June 2026, 18:00.
  • Bring two laptops (primary + backup) on presentation day.
  • Best of luck — apply what you learned, and be honest about what doesn’t work in your back-test.

References

Becker, Jonathan. 2025. polymarket-data: Raw Trade and Market Data from Polymarket.” GitHub repository. https://github.com/jon-becker/prediction-market-analysis.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. 2nd ed. New York, NY: Springer. https://www.statlearning.com/.
Manski, Charles F. 2006. “Interpreting the Predictions of Prediction Markets.” Economics Letters 91 (3): 425–29. https://doi.org/10.1016/j.econlet.2006.01.004.
Strategic Management, Institute of, and University of Ulm Finance. 2026. Polymarket Quant Bench: OHLCV Bars for High-Liquidity Resolved Markets.” HuggingFace dataset. https://huggingface.co/datasets/smf-ulm/polymarket-quant-bench.
Wolfers, Justin, and Eric Zitzewitz. 2004. “Prediction Markets.” Journal of Economic Perspectives 18 (2): 107–26. https://doi.org/10.1257/0895330041371321.