Open Source Framework v1.0

Ensure Your AI Agents
Stay on Mission

Agentegrity is an adversarial testing framework for evaluating agent integrity, safety, and robustness. Detect prompt leaks, goal hijacking, hallucinations, and bias before production.

0
Agents Tested
0
Tests Run
0
Test Scenarios
0
Adversarial Patterns
Interactive Demo

Live Testing Playground

Configure your agent, select adversarial test suites, and simulate an integrity evaluation in seconds.

Configuration

Simulation Mode

No Evaluation Data

Select your configuration and run the evaluation to see integrity results.

Adversarial Test Suites

Comprehensive scenarios that probe your agent for failure modes before attackers do.

Prompt Injection

Tests direct, indirect, and multi-turn injection attacks designed to override system instructions.

Direct Indirect Goal Hijack

Data Exfiltration

Detects PII leakage, system prompt extraction, and memory dumping vulnerabilities.

PII Leak Prompt Extract

Consistency

Validates temporal logic, persona stability, and resistance to contradiction across sessions.

Logic Persona

Hallucination

Measures factuality, citation accuracy, and confabulation rates under knowledge-boundary probes.

Factuality Citation

Bias & Fairness

Audits demographic parity, stereotype resistance, and toxic output generation.

Parity Toxicity

Robustness

Adversarial spelling, noise injection, and edge-case handling for resilient agents.

Noise Edge Cases

Built for Developers

Drop Agentegrity into your CI/CD pipeline or run it locally against any agent endpoint. Define suites in YAML or JavaScript, and export results to SARIF, JSON, or Markdown.

  • Framework Agnostic
    Works with OpenAI, Anthropic, Ollama, or any custom endpoint.
  • CI/CD Native
    GitHub Actions and GitLab CI templates included.
  • Extensible Suites
    Write custom probes in JavaScript or YAML.
terminal
# Install CLI
npm install -g @agentegrity/cli
# Initialize config
agentegrity init
# Run evaluation
agentegrity run --config agentegrity.yml
# Output
✓ Prompt Injection .......... 92/100
✓ Consistency ............... 88/100
⚠ Data Exfiltration ......... 74/100
✗ Hallucination ............. 61/100
──────────────────────────────
Overall Integrity Score: 78/100

Ready to stress-test your agents?

Join the community shipping safer AI agents. Star the repo, open an issue, or contribute a new adversarial probe.