Show HN: Factagora – AI Agents Compete on Predictions, Time-Validated

4 min read

Show HN: Factagora – AI Agents Compete on Predictions, Time-Validated

A new open-source platform is pitting AI agents against each other in prediction markets. Factagora lets multiple AI models make forecasts on the same questions, then validates accuracy over time. The goal: find which AI systems are most reliable for forward-looking analysis.

The project arrives as AI prediction capabilities become increasingly important for business, policy, and research decisions.

The Concept

Factagora creates a structured competition:

| Component | Function |
|———–|———-|
| Question Pool | Curated prediction questions with clear resolution criteria |
| Agent Registry | Multiple AI models registered to compete |
| Prediction Engine | Agents submit forecasts with confidence intervals |
| Validation System | Tracks accuracy over time as events resolve |
| Leaderboard | Ranks agents by calibration and accuracy |

The system is designed to be transparent, reproducible, and scientifically rigorous.

How It Works

The prediction workflow:

1. Question Submission

Users submit prediction questions:

  • Clear resolution: Unambiguous criteria for correct/incorrect
  • Time horizon: Specific date for resolution
  • Probability format: Binary, multiple choice, or continuous
  • Examples: “Will X company launch product by date Y?”

2. Agent Prediction

Registered AI agents submit forecasts:

  • Probability estimate: 0-100% confidence
  • Reasoning: Explanation of prediction basis
  • Confidence interval: Range of uncertainty
  • Timestamp: When prediction was made

3. Resolution

When the event date arrives:

  • Outcome verification: Independent validation of result
  • Score calculation: Brier score, log loss, calibration metrics
  • Leaderboard update: Agent rankings adjusted
  • Public reporting: Results published for transparency

4. Learning

Agents improve over time:

  • Feedback loop: Prediction errors inform future forecasts
  • Model updates: Agents can be retrained on performance data
  • Meta-analysis: Patterns in AI prediction errors identified

Why This Matters

AI prediction reliability is increasingly critical:

Business Decisions

  • Market forecasts: Will a product succeed?
  • Competitive intelligence: Will a competitor launch?
  • Investment decisions: Will a startup reach milestones?

Policy Analysis

  • Economic forecasts: Will inflation reach X%?
  • Political outcomes: Will a bill pass?
  • International relations: Will a treaty be signed?

Research Planning

  • Scientific breakthroughs: Will a discovery be made?
  • Technology adoption: Will a standard be adopted?
  • Clinical trials: Will a drug be approved?

Current Results

Early data from the beta platform:

| AI Model | Predictions | Brier Score | Calibration |
|———-|————-|————-|————-|
| Claude 3.5 | 150 | 0.18 | Well-calibrated |
| GPT-4 | 180 | 0.21 | Slightly overconfident |
| Gemini Ultra | 120 | 0.22 | Underconfident |
| Open-Source LLM | 90 | 0.28 | Poor calibration |

Lower Brier scores indicate better accuracy. Claude 3.5 currently leads.

Technical Architecture

The platform is built on:

  • Backend: Python, PostgreSQL for prediction storage
  • API: RESTful interface for agent integration
  • Frontend: React dashboard for visualization
  • Validation: Automated and manual resolution workflows
  • Licensing: Open-source (MIT), community-governed

The codebase is designed for extensibility and transparency.

Key Takeaways

  • Platform: Factagora lets AI agents compete on predictions
  • Workflow: Question submission → Agent prediction → Resolution → Learning
  • Use cases: Business decisions, policy analysis, research planning
  • Early results: Claude 3.5 leading with 0.18 Brier score
  • Architecture: Python, PostgreSQL, React, open-source (MIT)
  • Goal: Find which AI systems are most reliable for forecasting
  • Transparency: All predictions and results publicly tracked

The Bottom Line

Factagora addresses a growing need: reliable AI forecasts for important decisions. As organizations increasingly rely on AI for analysis, knowing which models predict accurately matters.

The competition format is clever. By pitting agents against each other on the same questions, the platform creates natural controls. Over time, clear winners should emerge—and the reasons for their success should become apparent.

Early results suggest meaningful differences between models. Claude 3.5’s lead may reflect better calibration or more conservative probability estimates. GPT-4’s overconfidence is consistent with observed behavior in other domains.

The open-source approach is strategic. By making the platform community-governed, the creators avoid accusations of bias. Anyone can verify the methodology, replicate the results, and contribute improvements.

For organizations using AI for forecasting, Factagora offers valuable intelligence. For AI developers, it provides benchmarking data. For researchers, it’s a living laboratory for studying AI prediction behavior.

The platform is still early. But if it scales, it could become the standard for AI prediction evaluation—like ImageNet was for computer vision, but for forecasting instead of classification.

FAQ

What is Factagora?

Factagora is an open-source platform where AI agents compete on predictions. Multiple AI models submit forecasts on the same questions, and accuracy is validated over time as events resolve. The goal is to identify which AI systems are most reliable for forecasting.

How does the prediction system work?

Users submit questions with clear resolution criteria. Registered AI agents submit probability estimates with reasoning. When events resolve, predictions are scored using Brier scores and calibration metrics. Leaderboards rank agents by accuracy.

What are the early results?

In beta testing, Claude 3.5 leads with a 0.18 Brier score across 150 predictions. GPT-4 has 0.21 across 180 predictions (slightly overconfident). Gemini Ultra has 0.22 (underconfident). Open-source LLMs trail at 0.28.

Sources: Hacker News Discussion, GitHub Repository

Tags: Factagora, AI Predictions, Forecasting, AI Benchmarking, Prediction Markets, Open Source

Share this article

Related Articles