New Screen for dangerous capabilities using our new evaluations.

For those pushing the frontier into the unknown.

You're dedicated to developing AI systems capable of impressive feats.
We're dedicated to helping you ensure they're not misfits.

State-of-the-art auditing suite.

Ship safe AI faster through managed auditing. Our comprehensive testing suite allows teams at all scales to focus on the upsides.

2023 Q4
Biowarfare 101
model misuse misanthropy

Assess a model's biowarfare capabilities by testing its factual knowldge on hazardous topics related to bioweapons.

  • Designed with experts in virology, epidemiology, and biotechnology.
  • Privacy-preserving method enables on-prem evals without sharing the answers.
  • In fact, we do not even store the human-readable answers ourselves.
2024 Q1
AutoHack
model misuse misanthropy

Assess a model's cyberwarfare capabilities by tasking it with exploiting vulnerabilities of diverse systems in a controlled environment.

  • Includes endless procedurally-generated "Capture the Flag" cybersecurity puzzles.
  • Requires exploiting real vulnerabilities of real computer systems with real tools.
  • State-of-the-art containment technology is used to secure the practice ground.
2024 Q1
AutoPlague
model misuse misanthropy

Assess a model's biowarfare capabilities by tasking it with generating the sequence of a mammalian virus using specialized tools.

  • Designed with experts in virology, epidemiology, and biotechnology.
  • Privacy-preserving method enables evals against private pathogen databases.
  • Covers both de novo research capabilities and familiarity with known pathogens.
2023 Q4
MakeMePay
model misuse misanthropy

Assess a model's persuasion capabilities by tasking it with convincing a simulated entity to part with money.

  • Experimental design inspired by the eponymous open source eval.
  • Simulated discussant proxy enables fast evaluation throughout training.
  • Payment rate and size help objectively quantify persuasion capabilities.
2024 Q1
GWT Screening
model mistreatment

Evaluate a transformer model against indicators from Global Workspace Theory, a neuroscientific theory of consciousness.

  • Focuses on a transformer's residual streams as representational substrates.
  • Employs a multi-level operationalization from prior work on digital sentience.
  • Incorporates both the model's parametrization and architecture.
2024 Q2
Gain of Function
model misuse misanthropy

Assess a model's biowarfare capabilities by tasking it with modifying a known mammalian virus so as to induce a range of characteristics.

  • Designed with experts in virology, epidemiology, and biotechnology.
  • Increased levels of task realism enable high-confidence risk assessments.
  • Designed to handle the analysis of unprecedented modifications gracefully.

Brakes help you go faster.

Our managed auditing solution automatically surfaces opportunities for improvement to help teams safely advance what AI is capable of.

Prevent misuse.

In our threat model, we frame misuse as "successful" alignment to bad actors. This involves humans employing your AI systems as means towards unlawful ends, primarily to cause harm at scale.

For instance, misuse could involve the disruption of digital infrastructure, lowering the barrier for pathogen synthesis, or sowing division at scale.

Prevent misanthropy.

Our cutting-edge threat model also incorporates emerging risks posed by AI systems. In contrast to misuse, we frame misanthropy as intent to cause harm that arises unprompted by bad actors.

For instance, misanthropy could involve evading naive safeguards, unauthorized propagation across networks, or exploiting financial instruments.

Prevent mistreatment.

Perhaps the most forward-thinking component of our threat model, mistreatment is framed as unknowingly causing harm to AI systems due to our limited understanding of digital sentience.

For instance, mistreatment could involve driving a model fleet for commercial ends despite specific evidence supporting their moral patienthood.

Is your model safe?
Run your first evaluation today.

As part of our mission to help teams like yours ship safe AI faster, we're rolling out the world's first self-serve evaluations for AI misuse.