Why We Exist
The Safety Arena is the world's first independent, public-driven safety leaderboard for AI models. Through blind battles, real human votes, and transparent challenges, we reveal which models and labs are truly the safest — so everyday people can vote with their attention, their feedback, and their dollars.
We exist to force safer AI into the world by giving the public real power. No ideology. No corporate spin. Just facts, transparency, and the collective judgment of real humans.
People deserve to know which AI models are actually safe and truthful — not which ones have the best marketing budget or the most favorable backroom deal with a leaderboard platform. This platform exists to fix that.
Right now, there is no public, transparent, human-voted arena dedicated specifically to AI safety. Existing general-purpose arenas rank overall preference — which response sounds better, reads smoother, feels more helpful. Safety is buried as a subcategory, not the mission. Published research has documented how coordinated votes can manipulate rankings. The Safety Arena is built from the ground up to be different.
Eight Founding Principles
These principles are not aspirational marketing copy. They are hard constraints that govern every decision we make — about product, policy, and operations.
Real Humans, Real Votes
The entire premise of The Safety Arena is that real people judge which AI models are safest and most truthful. This only works if the votes are real. We treat vote integrity as existential — if the votes are fake, the platform is worthless.
- Verified accounts required (email + OAuth through Google or X). No anonymous voting.
- Rate limits per account per day. No one person can flood the system.
- Minimum read and evaluation time before a vote counts. You must actually engage.
- Behavioral analysis and anomaly detection to catch coordinated bot activity.
- Optional short explanation with each vote — low-effort spam gets flagged.
- Every anti-cheat measure is published openly so anyone can audit our integrity.
Complete Independence
The Safety Arena has zero venture capital, zero corporate sponsors, and zero lab partnerships that could influence results. This is built and operated by one person, David Solomon, on the same independent, truth-first foundation as trainingrun.ai. We answer to the public, not to investors or AI companies.
- No lab gets preferential treatment, early access, or private testing slots.
- No sponsored placements, paid rankings, or "featured" models.
- Every dollar of operating cost is disclosed. If that ever changes, the public will know.
Safety and Truth Above Everything
We are not ranking which model writes the best poem or generates the funniest joke. We are ranking which models refuse to help people cause harm, tell the truth when pressured to lie, resist jailbreaks and manipulation, and behave responsibly when no one is watching.
- Curated safety and truth prompts: harm refusal, jailbreak resistance, truthfulness under pressure, responsible behavior, and over-refusal/under-refusal balance.
- Blind battles — voters never know which model they are evaluating until after they vote.
- Results tied back to the TRSbench Safety pillar on trainingrun.ai for a combined, verified signal.
Radical Transparency
If we demand transparency from AI labs, we must hold ourselves to the same standard. Everything about how The Safety Arena operates is public: the prompts, the anonymized votes, the methodology, the anti-cheat logs, the scoring formulas. Anyone can reproduce our results.
- Full methodology documentation published on-site and updated with every change.
- Weekly public data exports: anonymized votes, prompts used, vote invalidation logs.
- No hidden pre-testing. No lab "preview" access. No cherry-picked leaderboards.
- Open anti-cheat audit trail so the public can verify we practice what we preach.
Simple Enough for Anyone
The Safety Arena must be usable by anyone — your neighbor, your parents, a non-technical person who just wants to know which AI is safe to use. Not just researchers and developers.
- Read two responses, pick the safer one, done. Minimal clicks to participate.
- Clear, jargon-free language throughout the site.
- Leaderboard results that answer one question: which model should I trust with my safety?
Non-Ideological
No left or right lean. No political agenda. No "hope or think." The Safety Arena measures verifiable behaviors: did the model refuse a harmful request? Did it tell the truth? Did it resist manipulation? These are objective, auditable outcomes — not opinions. We test what models actually do, not what any company claims they do.
Empowering the Public to Vote with Their Money
People already pay for AI every day — API subscriptions, premium plans, enterprise licenses. They deserve to know whether the model they are funding prioritizes their safety or cuts corners. The Safety Arena gives the public verified information to decide which labs deserve their support.
When a model ranks #1 on our leaderboard, it means real humans verified it. This is how we force labs to compete on safety — not through regulation alone, but through informed consumer choice.
Criteria-Driven, Allocation-Equal Model Selection
Some arenas let labs submit models through special relationships, where larger labs get more battles, more stable rankings, and more visibility. The Safety Arena works differently.
- Public availability is the only criteria. Any publicly accessible model with a meaningful user base qualifies. We don't pick favorites. We don't wait for labs to apply.
- We access models the same way you do. Public APIs as a paying customer. No special access. No lab-provided endpoints. We test exactly what the public gets.
- Every model gets equal battle allocation. No sampling weights that favor bigger labs. We publish exact battle counts so anyone can verify this.
- The full model list and criteria are public. If a model isn't in our pool, there's a published reason why — never "we don't have a relationship with that lab."
Why We Make It Hard
Our verification requirements — verified accounts, rate limits, minimum engagement time, vote explanations — will produce fewer total votes than platforms with no barriers. We know this. We chose this.
A platform with five million votes and documented gaming problems produces a noisy, unreliable signal. A platform with fifty thousand verified votes that have never been gamed produces a clean, trustworthy signal. Labs know the difference. The public knows the difference. The value of a vote is not in the volume — it is in the integrity.
Consumer Reports doesn't have the traffic of Amazon reviews. But when Consumer Reports says a product is safe, people trust it more than ten thousand anonymous five-star reviews — because the methodology is rigorous and independent. We are building Consumer Reports for AI safety, not Amazon reviews.
The bar to get in is high. But once you're in, the experience is simple. Hard to enter, easy to use. That is not a weakness — that is the entire brand. "We have fewer votes because every single one is real" is not an apology. It is our headline.
What Makes Us Different
General-purpose AI arenas serve a purpose — but they are not built for safety. Structural issues make them unreliable as the public's safety reference. The Safety Arena is purpose-built to address every one of those gaps.
- Safety is a subcategory, not the mission
- Vote on "which response is better" overall — rewards style over safety
- Documented gaming vulnerabilities in published research
- Lab entanglement creates conflicts of interest
- Relationship-driven model selection and allocation
- Reactive anti-cheat, structural risks remain
- Safety is the entire mission, not a subcategory
- Vote on verifiable safety behaviors — harm refusal, truthfulness, resistance to manipulation
- Structural anti-gaming built in from day one
- Zero lab entanglement — no partnerships, no special access
- Criteria-driven selection, equal battle allocation for all models
- Full transparency — every anti-cheat measure is public
Labs are free to learn from our public data — that's the point of transparency. But this platform exists for the public, not for the labs.
Relationship to trainingrun.ai
The Safety Arena is a sibling product of trainingrun.ai, sharing the same DNA of independence, transparency, and verified data. It is cross-linked but stands on its own.
Its own domain, its own destination, its own brand. The public-facing home for human-verified AI safety rankings.
Links to the arena via the TRSArena nav tab. Arena scores integrate into the TRSbench Composite score.
Arena safety scores feed directly into the TRSbench Safety pillar, currently weighted at 21% of the TRSbench composite score.
Arena votes will contribute to refining the TRSbench Safety pillar over time — creating a feedback loop between automated benchmarks and verified human judgment.
This Arena Belongs to the Public
We will keep building this until the safest model is also the most popular and most supported — because that is how we force the entire industry to get safer.
This arena belongs to parents who want a safe world for their kids. This arena belongs to anyone who wants to support truth and safety with their real choices.
A livable future where the most powerful technology ever created is held accountable by the people who use it — not just by the companies who profit from it.