[P] Random Forest on ~100k Polymarket questions — 80% accuracy (text-only)

Built a text-only baseline: trained a Random Forest on ~90,000 resolved Polymarket questions (YES/NO).

Features: TF-IDF (word ngrams, optional char ngrams) + a few cheap flags (date/number/%/currency, election/macro/M&A keywords).

Result: ~80% accuracy on 15.000 held-out data/questions (plus decent Brier/logloss after calibration).

Liked the idea played a bit more with differnt data sets and did some cross validation with Kalshi data and saw similar results. Now having this running with paper money and competing with stat of the art LLM’s as benchmakrs. Lets see.

Currently looks like just from the formulation of the question at polymarket (in the given data set) we can predict with 80% accurarcy if it’s a YES or NO.

Happy to share further insights or get feedback if someone tried smth similar?

Source of the paper trading. Model is called “mystery:rf-v1”: Agent Leaderboard | Oracle Markets. Did not publish accuary so far there.

submitted by /u/No_Syrup_4068
[link] [comments]

Liked Liked