Getting Started
Install
Try the demo
This creates a kalibra-demo/ directory with sample traces and runs an interactive comparison. Afterwards:
Compare your own data
If your JSONL uses non-standard field names, let Kalibra figure it out:
This scans your data and prints a copy-pasteable compare command with the right --outcome, --cost, --trace-id flags.
Set up quality gates
Create a kalibra.yml to make comparisons repeatable and add CI gates:
Or write one manually:
baseline:
path: ./baselines/production.jsonl
current:
path: ./eval-output/canary.jsonl
require:
- success_rate_delta >= -2
- regressions <= 5
- cost_delta_pct <= 20
Then:
Add to CI
# .github/workflows/quality-gate.yml
name: Agent Quality Gate
on: [pull_request]
jobs:
kalibra:
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/checkout@v5
- run: python eval.py --output current.jsonl
- uses: khan5v/kalibra-action@v1
with:
baseline: baselines/production.jsonl
current: current.jsonl
config: kalibra.yml
The action posts a markdown report as a PR comment and exits 1 if any gate fails.