Evaluation and Control: Evaluation Framework, zepctl CLI, and Dashboard Overhaul
An evaluation framework for testing Zep against your data, a redesigned dashboard with analytics, and zepctl — a CLI for administering Zep projects.
Last week we covered custom extraction instructions, graph search upgrades, and webhooks. This week: a framework for testing Zep against your own data, a dashboard overhaul, and a new CLI for Zep projects.
Evaluation Framework
How do you know Zep's context retrieval works for your data, in your domain? We built an evaluation framework to answer this with data. Write 3–5 example interactions, generate test cases from them, and run automated evaluation. The pipeline measures two things:
- Context completeness — Did Zep retrieve everything needed to answer the question?
- Answer accuracy — Given the retrieved context, did the LLM produce the correct answer?
Search parameters are configurable so you can tune retrieval for your use case. Run it in CI to catch regressions as your system evolves. Evaluation framework docs →
zepctl CLI
zepctl is a command-line interface for administering Zep projects. Install with brew install zepctl. Key capabilities:
- Graph management — Clone graphs between environments, add data, search with filters
- Batch operations — Import data from files or stdin, monitor async tasks to completion
- RTBF compliance — Delete a user and all associated data in one command
- Debugging — Inspect nodes, edges, and episodes. Export as JSON or YAML
- Advanced search — Property filters, date filters, exclusion filters, and multiple rerankers from the command line
Dashboard Overhaul
The Zep dashboard got a significant upgrade focused on visibility and day-to-day efficiency:
- Account analytics — Episode processing counts with hourly and 15-minute granularity, real-time status
- Bulk operations — Multi-select and bulk delete for users, graphs, threads, and episodes
- Column sorting — Server-side sorting on users, graphs, and threads tables
- Episode filtering — Filter by source type (text/json/message) and processing status
- User search — Find users across your project
- Graph visualization — Collapsible Entity Types legend with click-to-highlight
Other Updates
- Granular RBAC (Enterprise) — Permissions now support both account-wide and project-level scopes
- Audit Logging (Enterprise) — Track team actions with filtering by time, actor, action, and resource. 30 days in dashboard, 1 year retained
- Enterprise API Logging — API request logging for compliance and debugging
- ElevenLabs Voice Agent Guide — Building voice agents with persistent context using Zep + ElevenLabs
- NVIDIA NeMo Agent Toolkit — Integration with NVIDIA's enterprise agent toolkit for context-aware agents
- Stripe Self-Service Billing — Manage billing details and tax IDs directly through the dashboard
- Graph Visualization 2.0 — Faster graph explorer with nodes sized by connection count and no cap on rendered nodes
Get Started
All of these features are live. If you missed Part 1 — custom extraction instructions, graph search upgrades, and webhooks — read it here.