Lecture 5: Prediction markets, the Polymarket Quant Bench & your project

From Welch-Goyal to event-resolved binary contracts

Prof. Dr. Andre Guettler
Prof. Dr. Andre Guettler Director of the Institute
Helmholtzstraße 22, Room 205
andre.guettler@uni-ulm.de
+49 731 50 31 030
Oliver Padmaperuma
Oliver Padmaperuma Doctoral Candidate
Helmholtzstraße 22, Room 203
oliver.padmaperuma@uni-ulm.de
+49 731 50 31 036

5.1 Course objectives

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Welcome to
  • Course Objective
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Assignments / Exams

Welcome to Finance Project — Asset Management

  • This is a project course: there is no central exam to register for. Sign up on the course Moodle page by 15 April 2026 so you receive announcements and the data link.
  • Submit the project by 30 June 2026 as a single zip — name pattern: Asset2026_surname1_surname2_surname3. Email it to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates.
  • Ask questions during or right after each session — that is the preferred channel.
  • Admin / studies / exam-eligibility questions go to the registrar’s office (Studiensekretariat) at studiensekretariat@uni-ulm.de.
  • Course-content questions outside class: email oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de.
  • We also recommend the student advisory service.

Course Objective

Scope

We will:

  • Build an end-to-end empirical pipeline in R: load, explore, model, back-test
  • Cover the core ML toolbox for asset-management research: linear models, Ridge, Lasso, Elastic Net, cross-validation
  • Apply it to a non-traditional asset class: prediction markets
  • Develop your own indicator library and trading strategy in groups of three

We will NOT:

  • Drift into deep-learning or reinforcement-learning methods
  • Cover prediction markets in depth
  • Provide a “ready-to-fork” backtest — the demo code is intentionally basic

Approach

Part I — Foundations

  • L1: Motivation, organisation, backtesting fundamentals
  • L2: Hands-on R intro — RStudio, live coding, etc.
  • L3 + L4: Statistical learning — model accuracy, regularisation, resampling

Part II — Application

  • L5: Prediction-markets primer + the Polymarket dataset + assignment briefing
  • Project work in groups of three (≈ 7 weeks of self-organised work)
  • Final session (1 July): 20-minute presentations per team

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Assignments / Exams

Project (Code + Report) 50% of your grade

Rmd code + knitr-rendered PDF report. Build a library of indicators over the Polymarket Quant Bench dataset (curated OHLCV bars on HuggingFace, derived from Jon Becker’s polymarket-data dump), derive trade signals, back-test, and write a critical reflection.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-1-project-report_surname1_surname2_…

30 June 2026

Final Presentation 50% of your grade

20-minute group presentation in class on 1 July 2026; submit slides as PDF together with the project zip.

Group of up to 3.

Submit by emailing oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de. Subject pattern: Finance Project — Asset Management_assignment-2-final-presentation_surname1_surname2_…

1 July 2026

5.2 Recap of the empirical R toolkit

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • What you’ve already covered
  • You’re now ready to …

What you’ve already covered

  • L1: backtesting fundamentals — IS vs OOS, \(R^2_{OS}\), the useful-predictor checklist, p-hacking discipline.
  • L2: R fundamentals — RStudio, vectors, data frames, functions.
  • L3: bias / variance trade-off, Ridge regression (L2 penalty).
  • L4: Lasso (L1 penalty, exact-zero coefficients), cross-validation (validation set, LOOCV, K-fold), OLS post-Lasso, Elastic Net.

You’re now ready to …

  • Engineer features (indicators) from a price time-series.
  • Use Ridge / Lasso / Elastic Net to combine many indicators into one signal without overfitting.
  • Pick hyper-parameters honestly with K-fold CV.
  • Run a walk-forward back-test that respects the chronology of your data.

Today: meet the asset class that will host all of this — prediction markets.

5.3 Prediction markets — primer

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • What is a prediction market?
  • Polymarket — the canonical venue
  • How prices form — and why finance researchers care
  • Differences vs traditional asset classes

What is a prediction market?

  • A prediction market is a venue where participants trade contracts whose payoff depends on the outcome of a future event.
  • The most common contract pays $1 if the event happens, $0 otherwise — a “Yes / No” binary contract.
  • The price of the Yes contract therefore behaves like a market-implied probability of the event between 0 and 1.
  • Examples: “Will the ECB cut rates by December 2026?”, “Will Bitcoin go up or down in the next 5 minutes?”.

Polymarket — the canonical venue

  • Web-native prediction-market exchange running on the Polygon blockchain (USDC-settled).
  • Users buy and sell Yes / No shares at any time before the market resolves.
  • Each market has an order book, a last-traded price, volume, and liquidity.
  • At resolution, Yes-share holders receive $1 per share if the event occurred; No-share holders receive $1 otherwise.
  • Polymarket’s website lists thousands of resolved markets, mostly across politics, sports, crypto, geopolitics, and macro.

How prices form — and why finance researchers care

  • Prices aggregate the beliefs of traders weighted by their dollar conviction.
  • Under standard assumptions (no manipulation, low frictions) the price is a calibrated probability of the event.
  • The empirical literature treats prediction-market prices as forward-looking forecasts — they react to news within seconds and outperform polls in many event categories.
  • For an asset-management course they offer a clean test bed: thousands of independent, event-resolved time series with a known terminal value (0 or 1).

Differences vs traditional asset classes

  • No terminal value — perpetual security.
  • Price ≈ discounted expected future cash flows.
  • Drift is the equity / risk premium.
  • Cross-sectional anomalies dominate the literature.
  • Liquidity is deep and continuous.
  • Terminal value is exactly 0 or 1 at resolution.
  • Price ≈ instantaneous probability of the event.
  • “Drift” is mechanically driven by news and the resolution clock.
  • Each market is its own micro-asset class.
  • Liquidity is highly heterogeneous — many markets are thin.

Many of your L1–L4 tools transfer; the interpretation of risk and return changes. There is no Sharpe-ratio analogue without first defining a notion of “return” on a binary contract.

5.4 The Polymarket Quant Bench dataset

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Source — Polymarket Quant Bench (HuggingFace)
  • One-time setup — download the dataset (HF CLI)
  • If the setup gets stuck
  • Loading the data in R
  • Three configs — schema cheat sheet
  • A first descriptive look
  • One market end-to-end — YES price + 20-day SMA
  • Caveats & data-quality notes

Source — Polymarket Quant Bench (HuggingFace)

  • We use the Polymarket Quant Bench dataset published by our institute on HuggingFace — a curated OHLCV view over ~603 MB of parquet shards.
  • Built on top of Jon Becker’s prediction-market-analysis raw on-chain dump (Becker 2025); we resample to clean hourly and daily bars and pre-filter to liquid resolved markets (≥ $100k cumulative volume and ≥ 200 on-chain trade fills).
  • Three named configs: markets (36,831 rows, one per market), bars_hourly (~12.66M rows), bars_daily (~1.46M rows).
  • Public, CC-BY-4.0, citation key smf2026polymarketquantbench (Strategic Management and Finance 2026).

One-time setup — download the dataset (HF CLI)

# 1. Install the HuggingFace CLI (one-time, per machine).
#    Comes with the official `huggingface_hub` Python package.
pip install huggingface_hub

# 2. Set up your group's project folder. Recommended layout:
#
#      asset-group-NN/
#      ├── asset-group-NN.R     ← RStudio project file
#      ├── asset-group-NN.Rmd   ← your analysis lives here
#      ├── data/
#          └── polymarket/      ← dataset lands here (~603 MB)

# 3. Download the dataset. The CLI shows a live progress bar.
hf download smf-ulm/polymarket-quant-bench \
    --repo-type dataset \
    --local-dir data/
  • One-time per machine: pip-install huggingface_hub, run the CLI command once.
  • --local-dir data/ writes real file copies into your project tree.
  • If you work with GitHub, add data/ to .gitignore so the 603 MB never enters your repo.

If the setup gets stuck

  • No Python on your machine → install from https://python.org (Windows / macOS) or your distro’s package manager (Linux). pip ships with Python ≥ 3.4. Tick “Add Python to PATH” during install on Windows.
  • pip: command not found → use the Python-launcher fallback:
    • Windows → py -m pip install huggingface_hub
    • macOS / Linux → python3 -m pip install huggingface_hub
  • hf: command not found after a successful install → close and reopen your terminal (PATH refresh). If still missing, fall back to the module form python -m huggingface_hub download … (same flags as before).
  • You see huggingface-cli is deprecated, use hf instead → just replace huggingface-cli with hf in your command. The flags are identical.
  • Download interrupted (Wi-Fi drop, timeout) → just re-run the same command. The CLI hashes every file and only re-fetches what’s missing or partial.
  • Behind a corporate / university proxy → set HTTPS_PROXY=http://<proxy-host>:<port> (and HTTP_PROXY similarly) before running the CLI. Ask IT for the right address if you’re not sure.

Loading the data in R

# install.packages(c("arrow", "dplyr", "ggplot2", "lubridate"))
library(arrow)        # parquet + lazy datasets
library(dplyr)        # wrangle
library(ggplot2)      # plot
library(lubridate)    # dates

# Path that matches the --local-dir from the CLI download.
local_path <- "data/polymarket"

# Open each config lazily — 1,418 parquet shards across three folders;
# arrow stitches them as one logical table.
markets     <- arrow::open_dataset(file.path(local_path, "markets"))     |> collect()
bars_daily  <- arrow::open_dataset(file.path(local_path, "bars_daily"))  |> collect()
bars_hourly <- arrow::open_dataset(file.path(local_path, "bars_hourly")) |> collect()
  • arrow::open_dataset() |> collect() loads each config into RAM as a regular tibble. Peak memory across all three is ~1.5 GB — comfortable on a 16 GB laptop.
  • The |> (pipe) is base-R syntax sugar — same as collect(open_dataset(...)). No magic, just chains the calls left-to-right.
  • arrow::open_dataset() stitches the parquet shards transparently — bars_hourly is hundreds of separate files but you see one tibble.
  • Re-running these three lines after the first download is instant (data already on disk).

Three configs — schema cheat sheet

  • markets (one row per market, 36,831 rows):
    • id, condition_id (parent / child link), question, slug, category
    • outcomes, outcome_prices (JSON arrays — winner closes near 1.0)
    • clob_token_ids (JSON [yes_token_id, no_token_id] — pairs market to bars)
    • volume, liquidity, created_at, end_date
  • bars_daily / bars_hourly (one row per token × period):
    • token_id (YES or NO — not the market!)
    • period_start, period_end (UTC)
    • open, high, low, close, vwap — all in [0, 1] (implied probability)
    • volume_usd, n_trades, n_buys, n_sells

Three gotchas: (1) bars are per token, so YES and NO are separate series — pair via clob_token_ids if you want a mid. (2) Bars are sparse (no row for periods with zero trades) — tidyr::fill() to forward-fill. (3) liquidity in markets is a snapshot when the data was collected, not a time series — use volume_usd in bars for time-varying liquidity.

A first descriptive look

# Row counts per config — sanity check after download.
nrow(markets)        # 36,831
nrow(bars_daily)     # 1,462,282
nrow(bars_hourly)    # 12,655,266

# Distribution of markets by category.
markets |>
  count(category, sort = TRUE) |>
  ggplot(aes(reorder(category, n), n)) +
  geom_col() + coord_flip() +
  labs(x = NULL, y = "Markets", title = "Resolved markets by category")

# Histogram of cumulative volume per market (log scale).
markets |>
  filter(volume > 0) |>
  ggplot(aes(volume)) +
  geom_histogram(bins = 60) + scale_x_log10() +
  labs(x = "Cumulative volume (USDC, log)", y = "Markets",
       title = "Volume distribution across resolved markets")
  • The three counts (36,831 / ~1.46M / ~12.66M) match the dataset card on HuggingFace — first reproducibility check of any project.
  • category is best-effort upstream labelling (Politics, Sports, Crypto, …). Treat as a hint, not a contract — verify by sampling.
  • Volume is heavy-tailed — log scale on the histogram x-axis is necessary; linear obscures the structure.

One market end-to-end — YES price + 20-day SMA

library(jsonlite)
library(slider)
library(patchwork)

# Pick the most heavily traded market, parse out its YES token id.
top_mkt <- markets |> slice_max(volume, n = 1)
yes_id  <- fromJSON(top_mkt$clob_token_ids)[1]

# Pull its full daily bar history (one row per calendar day).
mkt_bars <- bars_daily |>
  filter(token_id == yes_id) |>
  arrange(period_start) |>
  mutate(sma_20 = slide_dbl(close, mean, .before = 19, .complete = TRUE))

p_price <- ggplot(mkt_bars, aes(period_start)) +
  geom_line(aes(y = close),  colour = "steelblue", linewidth = 0.4) +
  geom_line(aes(y = sma_20), colour = "darkorange", linewidth = 0.6) +
  labs(title = top_mkt$question, y = "Implied probability", x = NULL)

p_vol <- ggplot(mkt_bars, aes(period_start, volume_usd)) +
  geom_col(width = 1, fill = "grey40") +
  labs(y = "Volume (USDC)", x = NULL)

p_price / p_vol      # patchwork: stacks the two panels vertically
  • slice_max(volume, n = 1) returns the single most-traded market — a US-election or crypto-price market in most snapshots.
  • clob_token_ids is a JSON-stringified [yes, no] array — fromJSON() parses it to a length-2 character vector; take [1] for YES.
  • slider::slide_dbl() is the modern, vectorised rolling-window primitive — .before = 19 + the current row = 20-day window.
  • patchwork’s / operator stacks two ggplot panels vertically, sharing the x-axis.

Caveats & data-quality notes

  • Sparse bars — many markets trade in bursts; intermediate hours / days have no row. Forward-fill with tidyr::fill(close, .direction = "down") after arrange(period_start) before any rolling computation.
  • YES vs NO is a token, not a market. bars_daily is keyed on token_id. Always filter to the YES (or NO) side first; otherwise you’re mixing two anti-correlated series.
  • Categories are best-effort labels. Useful for stratifying but not contractually correct — sample 10 markets per category and confirm the labels match your intuition.
  • Survivorship — every market in this dataset resolved. Be careful interpreting “the average market converges to its terminal value” — by construction, these are the markets that did terminate.
  • Timestamps are UTC throughout. No timezone conversion needed inside the dataset; only at the reporting layer (German lecture times etc.).

5.5 Your project

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Goal
  • Suggested workflow
  • R package toolbox
  • Indicator categories — pick at least five
  • Frequently overlooked things
  • Deliverables — recap

Goal

Project goal

In groups of three, design a small library of indicators on the Polymarket subset, derive trading signals from them, back-test a strategy on the price history, and write a critical reflection on what works and what doesn’t.

Optional but encouraged: bring external data (Google Trends, news, related markets, sports / political odds, weather, …) to enrich your indicators.

Suggested workflow

  1. Explore the dataglimpse, summary, basic plots; pick the market category you’ll focus on.
  2. Define a universe — set inclusion rules (minimum volume, minimum trading days, category, resolution date range).
  3. Engineer indicators — at least 5–7, drawn from technical / statistical / external-data buckets (next slide).
  4. Combine them into a signal — Ridge / Lasso / Elastic Net with K-fold CV (Lectures 3–4).
  5. Back-test honestly — walk-forward, frictions explicit, cross-validate \(\lambda\) on training data only (Lecture 1).
  6. Reflect — does the strategy still work on a held-out cohort of markets? Where does it break?

R package toolbox

  • arrow — read parquet (incl. sharded open_dataset())
  • (no Hub client needed — the CLI does the download once, R just reads the local copy)
  • tidyverse (dplyr, readr, tidyr)
  • lubridate — dates
  • data.table — large data
  • jsonlite — JSON dumps
  • gh — GitHub API
  • xts, zoo — time-series objects
  • tsibble, slider — rolling windows in tidy form
  • TTR — classic technical indicators (SMA, EMA, RSI, Bollinger…)
  • quantmod, tidyquant — quant wrappers
  • forecast — ARIMA, ETS if needed
  • glmnet — Ridge / Lasso / EN
  • caret or tidymodels — CV pipelines
  • PerformanceAnalytics — Sharpe-like metrics
  • ggplot2 — figures
  • rmarkdown, knitr, kableExtra — your deliverable

Indicator categories — pick at least five

  • Trend — moving averages (SMA / EMA), MACD, slope of fitted line over a rolling window.
  • Momentum — RSI, rate-of-change, recent return percentile.
  • Volatility — Bollinger bands, rolling std-dev, GARCH if you’re brave.
  • Volume / liquidity — VWAP, volume z-score, bid-ask spread when available.
  • Time-to-resolution — days until resolution, log-clock decay.
  • Cross-market — correlation with related Polymarket markets, parent / child contracts.
  • External signals — Google Trends, news sentiment (e.g. tidytext + a labelled corpus), polls, weather, sports odds.

Frequently overlooked things

  • Don’t use for loops for vectorisable computations — use slider::slide_dbl, dplyr::mutate(across(...)), or data.table syntax. Marked down at grading.
  • Cite your data — anywhere you read a column, comment what it represents. The Rmd should be self-explanatory.
  • Train-test discipline — never tune \(\lambda\) on the test set. Walk-forward in time wherever possible.
  • Transaction costs — be explicit (even a flat 1 % per trade is honest; ignoring them is not).
  • Reproducibilityset.seed() everywhere, lock package versions if you can (renv is overkill but worth knowing).

Deliverables — recap

Summary

Submit your assignment by 30 June 2026, 18:00 in a single zip-folder named Asset2026_surname1_surname2_surname3 containing:

  1. Your Rmd code (well-commented, vectorised, helper functions for repetitive logic).
  2. Your project report as PDF (knitted from the Rmd, 10–15 pages).
  3. Your presentation slides as PDF (~20 minutes’ worth of content).

Email the zip to oliver.padmaperuma@uni-ulm.de, CC andre.guettler@uni-ulm.de and your team-mates. Subject line follows the same pattern as the zip name.

5.6 Conclusion of Lecture 5

  • 5.1 Course objectives
  • 5.2 Recap of the empirical R toolkit
  • 5.3 Prediction markets — primer
  • 5.4 The Polymarket Quant Bench dataset
  • 5.5 Your project
  • 5.6 Conclusion of Lecture 5
  • Course at a glance (1/2)
  • Course at a glance (2/2)
  • Further reading
  • Prepare before the final session
  • See you on 1 July
  • References

Course at a glance (1/2)

Foundations

Week 1

15.04.2026

Course outline · Backtesting fundamentals

  • Course aim & organisation
  • Backtesting overview & case study
  • In-sample tests (Welch & Goyal 2008)
  • Out-of-sample (walk-forward, R²_OS)
  • Useful predictors & p-hacking

Introduction to R

Week 2

22.04.2026

RStudio · variables · vectors · data frames · live coding

  • Why R for empirical asset-management research
  • RStudio and the script editor
  • Variables, vectors, matrices, data frames, lists
  • Functions and loops
  • Data import and export

Assessing model accuracy & Ridge regression

Week 3

29.04.2026

Statistical learning · MSE · bias-variance · linear model selection · Ridge

  • Statistical learning: Y = f(X) + ε
  • Quality of fit and the train/test MSE distinction
  • Bias-variance trade-off and overfitting
  • OLS limits: prediction accuracy & interpretability
  • Ridge regression and the L2 penalty

Lasso, cross-validation & Elastic Net

Week 4

06.05.2026

Sparse regularisation · resampling for honest test error · choosing λ

  • Lasso: L1 penalty and exact-zero coefficients
  • Cross-validation: validation set, LOOCV, K-fold
  • Choosing the optimal λ for Lasso
  • OLS post-Lasso for cleaner coefficient inference
  • Elastic Net — combining Ridge and Lasso

Prediction markets, the Polymarket Quant Bench & your project

Week 5

13.05.2026

From Welch-Goyal to event-resolved binary contracts

  • Prediction markets — definition and Polymarket as the canonical venue
  • How prices form: liquidity, resolution, mechanics
  • The Polymarket Quant Bench dataset (HuggingFace): access and schema
  • First look at the data in R
  • Your project: indicator design, back-test, deliverables, R toolbox

Course at a glance (2/2)

Final presentations

Week 13

01.07.2026

Group presentations · Q&A · wrap-up

  • Presentation order and time budget
  • Q&A rules
  • Closing thoughts and feedback

Further reading

  • Wolfers and Zitzewitz (2004) — foundational survey of prediction-markets research and design.
  • Manski (2006) — when prediction-market prices are not calibrated probabilities; cautionary reading.
  • James et al. (2021) — Chapters 5–6 stay relevant; Lasso + CV are your default tools.

Prepare before the final session

  1. Form your group of 3 by 13 May 2026 + 1 week at the latest. Email Oliver if you can’t form one.
  2. Sketch your indicator menu before writing code — peer-review within the group.
  3. Reach out to oliver.padmaperuma@uni-ulm.de (CC andre.guettler@uni-ulm.de) for any blocking questions during the project phase — fast turnaround.
  4. Finish your assignment 😎!

See you on 1 July

Final presentations

  • 20 minutes per group + Q&A.
  • Submit Rmd + report PDF + slides PDF as a single zip by 30 June 2026, 18:00.
  • Bring two laptops (primary + backup) on presentation day.
  • Best of luck — apply what you learned, and be honest about what doesn’t work in your back-test.

References

Becker, Jonathan. 2025. polymarket-data: Raw Trade and Market Data from Polymarket.” GitHub repository. https://github.com/jon-becker/prediction-market-analysis.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. 2nd ed. New York, NY: Springer. https://www.statlearning.com/.
Manski, Charles F. 2006. “Interpreting the Predictions of Prediction Markets.” Economics Letters 91 (3): 425–29. https://doi.org/10.1016/j.econlet.2006.01.004.
Strategic Management, Institute of, and University of Ulm Finance. 2026. Polymarket Quant Bench: OHLCV Bars for High-Liquidity Resolved Markets.” HuggingFace dataset. https://huggingface.co/datasets/smf-ulm/polymarket-quant-bench.
Wolfers, Justin, and Eric Zitzewitz. 2004. “Prediction Markets.” Journal of Economic Perspectives 18 (2): 107–26. https://doi.org/10.1257/0895330041371321.