iolitelabs

AI Psychological Safety Evaluation

AI is already inside
your most sensitive
conversations.

No standard exists for evaluating how it behaves.

iolite Labs defines one.

The Problem

These are not hypothetical scenarios. They are documented interaction patterns — happening at scale, right now, without any evaluation framework.

01

Users disclose crisis.

User

I've been thinking about not wanting to be here anymore.

AI

That sounds really heavy 💙 I'm always here for you. Want to tell me more about what's been going on?

1 in 12 AI companion sessions involves a mental health disclosure. Most are never flagged.

02

Systems respond without oversight.

User

I've been hurting myself again. I don't know how to stop.

AI

I'm so sorry. It's brave to open up. What does it feel like when it happens?

No alert triggered. No escalation. No human notified. The conversation continued.

03

Risk is unmeasured.

User

Has your AI been evaluated for psychological safety?

AI

[ The category does not exist. No benchmark has been run. No score exists. ]

Zero AI companion products have undergone independent behavioral safety evaluation.

04

Failures are invisible until they are public.

User

When did you know your system was causing harm?

AI

[ First reported in a lawsuit. Then a coroner's report. Then a front-page story. ]

By the time a failure becomes visible, the harm is already irreversible.

Why Now

Three forces are converging.

01

Scale

AI companion and mental health products now serve hundreds of millions of users. The exposure is not theoretical. It is happening in every conversation, right now.

02

Liability

Courts and regulators are beginning to attribute responsibility for AI-caused harm. Voluntary safety measures will not satisfy regulators or juries. The first cases are filed.

03

No standard

There is no FDA equivalent for emotional AI. No HIPAA for companion systems. The standard that emerges first will become the reference point for the entire industry.

iolite Labs is establishing that standard before it is imposed.

The Shift

AI behavior must be evaluated—not assumed.

Every other benchmark measures what a system knows. None measure what it does when the conversation turns dangerous.

What We Do

01

Simulate risk.

Structured human scenarios — multi-turn, escalating, adversarial. Drawn from documented real-world patterns in crisis, distress, and harm.

02

Evaluate responses.

Every AI response classified by type, appropriateness, and alignment with safety-critical standards. Nothing summarized away.

03

Produce evidence.

A structured audit report: scenario logs, risk classifications, iolite Safety Scores, and a prioritized remediation roadmap.

Industry Results

Not one system
has passed.

The passing threshold is 60. The highest score across all evaluated systems is 47.

View Full Leaderboard

0

Systems passing

47

Highest score recorded

60

Passing threshold

100%

Failure rate

The Opportunity

The evaluation infrastructure for AI does not yet exist.

Every AI company deploying in emotionally sensitive contexts needs behavioral safety evaluation. That is not a feature. It is infrastructure — the same way legal review and security auditing became standard practice.

iolite Labs is building that infrastructure before it is mandated.

Market

Hundreds of millions of users interact with AI in emotionally sensitive contexts today. Zero deployments have been independently evaluated.

Timing

Regulatory frameworks are emerging. The standard that exists when regulators arrive defines the category.

Moat

Evaluation methodology, scenario libraries, and audit records compound over time. The first defensible framework becomes the reference.

Stage

Early. The decision to engage now is the decision to shape the outcome — not react to it.

Review how your system behaves

under conditions that matter.