Show HN: Factagora – AI Agents Compete on Predictions, Time-Validated

A new open-source platform is pitting AI agents against each other in prediction markets. Factagora lets multiple AI models make forecasts on the same questions, then validates accuracy over time. The goal: find which AI systems are most reliable for forward-looking analysis.

The project arrives as AI prediction capabilities become increasingly important for business, policy, and research decisions.

The Concept

Factagora creates a structured competition:

The system is designed to be transparent, reproducible, and scientifically rigorous.

How It Works

The prediction workflow:

1. Question Submission

Users submit prediction questions:

Clear resolution: Unambiguous criteria for correct/incorrect
Time horizon: Specific date for resolution
Probability format: Binary, multiple choice, or continuous
Examples: “Will X company launch product by date Y?”

2. Agent Prediction

Registered AI agents submit forecasts:

Probability estimate: 0-100% confidence
Reasoning: Explanation of prediction basis
Confidence interval: Range of uncertainty
Timestamp: When prediction was made

3. Resolution

When the event date arrives:

Outcome verification: Independent validation of result
Score calculation: Brier score, log loss, calibration metrics
Leaderboard update: Agent rankings adjusted
Public reporting: Results published for transparency

4. Learning

Agents improve over time:

Feedback loop: Prediction errors inform future forecasts
Model updates: Agents can be retrained on performance data
Meta-analysis: Patterns in AI prediction errors identified

Why This Matters

AI prediction reliability is increasingly critical:

Business Decisions

Market forecasts: Will a product succeed?
Competitive intelligence: Will a competitor launch?
Investment decisions: Will a startup reach milestones?

Policy Analysis

Economic forecasts: Will inflation reach X%?
Political outcomes: Will a bill pass?
International relations: Will a treaty be signed?

Research Planning

Scientific breakthroughs: Will a discovery be made?
Technology adoption: Will a standard be adopted?
Clinical trials: Will a drug be approved?

Current Results

Early data from the beta platform:

| AI Model | Predictions | Brier Score | Calibration |
|———-|————-|————-|————-|
| Claude 3.5 | 150 | 0.18 | Well-calibrated |
| GPT-4 | 180 | 0.21 | Slightly overconfident |
| Gemini Ultra | 120 | 0.22 | Underconfident |
| Open-Source LLM | 90 | 0.28 | Poor calibration |

Lower Brier scores indicate better accuracy. Claude 3.5 currently leads.

Technical Architecture

The platform is built on:

Backend: Python, PostgreSQL for prediction storage
API: RESTful interface for agent integration
Frontend: React dashboard for visualization
Validation: Automated and manual resolution workflows
Licensing: Open-source (MIT), community-governed

The codebase is designed for extensibility and transparency.

Key Takeaways

Platform: Factagora lets AI agents compete on predictions
Workflow: Question submission → Agent prediction → Resolution → Learning
Use cases: Business decisions, policy analysis, research planning
Early results: Claude 3.5 leading with 0.18 Brier score
Architecture: Python, PostgreSQL, React, open-source (MIT)
Goal: Find which AI systems are most reliable for forecasting
Transparency: All predictions and results publicly tracked

The Bottom Line

Factagora addresses a growing need: reliable AI forecasts for important decisions. As organizations increasingly rely on AI for analysis, knowing which models predict accurately matters.

The competition format is clever. By pitting agents against each other on the same questions, the platform creates natural controls. Over time, clear winners should emerge—and the reasons for their success should become apparent.

Early results suggest meaningful differences between models. Claude 3.5’s lead may reflect better calibration or more conservative probability estimates. GPT-4’s overconfidence is consistent with observed behavior in other domains.

The open-source approach is strategic. By making the platform community-governed, the creators avoid accusations of bias. Anyone can verify the methodology, replicate the results, and contribute improvements.

For organizations using AI for forecasting, Factagora offers valuable intelligence. For AI developers, it provides benchmarking data. For researchers, it’s a living laboratory for studying AI prediction behavior.

The platform is still early. But if it scales, it could become the standard for AI prediction evaluation—like ImageNet was for computer vision, but for forecasting instead of classification.

FAQ

What is Factagora?

Factagora is an open-source platform where AI agents compete on predictions. Multiple AI models submit forecasts on the same questions, and accuracy is validated over time as events resolve. The goal is to identify which AI systems are most reliable for forecasting.

How does the prediction system work?

Users submit questions with clear resolution criteria. Registered AI agents submit probability estimates with reasoning. When events resolve, predictions are scored using Brier scores and calibration metrics. Leaderboards rank agents by accuracy.

What are the early results?

In beta testing, Claude 3.5 leads with a 0.18 Brier score across 150 predictions. GPT-4 has 0.21 across 180 predictions (slightly overconfident). Gemini Ultra has 0.22 (underconfident). Open-source LLMs trail at 0.28.

—

Sources: Hacker News Discussion, GitHub Repository

Tags: Factagora, AI Predictions, Forecasting, AI Benchmarking, Prediction Markets, Open Source

Show HN: Factagora – AI Agents Compete on Predictions, Time-Validated

Show HN: Factagora – AI Agents Compete on Predictions, Time-Validated

The Concept

How It Works

1. Question Submission

2. Agent Prediction

3. Resolution

4. Learning

Why This Matters

Business Decisions

Policy Analysis

Research Planning

Current Results

Technical Architecture

Key Takeaways

The Bottom Line

FAQ

What is Factagora?

How does the prediction system work?

What are the early results?

Related Articles

The $435M Bet: How Uber’s Getir Acquisition Signals a New Era of Global Delivery Consolidation

The Quiet Grief of AI-Assisted Coding

The AI-Augmented Scientist: How Machine Learning Is Accelerating Discovery