KDD Workshop on Evaluation and Trustworthiness of Agentic AI

Welcome to KDD Workshop on Evaluation and Trustworthiness of Agentic AI 2026!

This workshop advances evaluation and trustworthiness methodologies for agentic AI systems across their full deployment lifecycle, with particular emphasis on real-time post-market monitoring, model evolution, and production governance. As autonomous agents increasingly perform multi-step reasoning, planning, and action in open-ended real-world settings, traditional pre-deployment benchmarks and static evaluation frameworks prove insufficient.

We address core challenges including stochastic agent behavior, absence of ground truth, evolving user contexts, API-driven model updates, and lack of standardized metrics and audit practices. This workshop aims to foster interdisciplinary collaboration by bringing together researchers, industry practitioners, and policymakers to develop advanced evaluation techniques and governance frameworks for agentic AI systems that can be safely and reliably deployed in production.

Contact: kdd-ws-agentic-eval@amazon.com

Call for Contributions

This workshop focuses on the unique challenges of evaluating and ensuring trustworthiness of agentic AI systems throughout their deployment lifecycle. As large language models and autonomous agents are increasingly deployed in real-world, open-ended settings, we need new methods and frameworks that go beyond traditional pre-deployment benchmarks. Topics of interest include (but are not limited to):

Real-Time Post-Market Monitoring: Continuous evaluation of deployed agentic systems, including drift detection, anomaly identification, performance degradation tracking, and monitoring under evolving user populations and contexts.
Agentic AI Evaluation: Assessing autonomy, multi-step reasoning, planning and tool use, goal alignment, adaptability, emergent failure modes, and multi-agent orchestration in dynamic environments.
Model Evolution and API Risk: Evaluation methods for detecting regressions, capability shifts, and safety risks introduced by model updates, version changes, and upstream dependency modifications.
Trustworthiness and Safety: Evaluation of reliability, bias and fairness, privacy, misuse resistance, robustness to distribution shift, explainability of agent actions, and safety guarantees.
Benchmarking, Metrics, and Standardization: Agent-centric benchmarks, LLM-as-judge methods, standardized metrics and logging protocols, evaluation frameworks for compound AI systems, and best practices for production monitoring.
Lifecycle and Governance Frameworks: End-to-end evaluation spanning pre-training, fine-tuning, deployment, and post-market phases, including auditability, liability attribution, regulatory compliance, and alignment with emerging AI governance standards.
User-Centric and Cross-Modal Assessment: Human-centered evaluation, trust calibration, human-in-the-loop systems, and assessment of agent behavior across text, image, audio, video, and multimodal inputs.
Industrial and Public-Sector Applications: Case studies of real-world deployments, enterprise-scale monitoring systems, sector-specific requirements (healthcare, finance, customer service), and scalable evaluation infrastructure for agentic AI.

Submission Guidelines

Please ensure your paper submission is anonymous.
The accepted papers will be posted on the workshop website but will not be included in the KDD proceedings.
Paper submissions are limited to 9 pages, excluding references, must be in PDF and use ACM Conference Proceeding templates.
Additional supplemental material focused on reproducibility can be provided. Proofs, pseudo-code, and code may also be included in the supplement, which has no explicit page limit. The supplement format could be either single column or double column. The paper should be self-contained, since reviewers are not required to read the supplement.
The Word template guideline can be found here: link
The Latex/overleaf template guideline can be found here: link
A paper should be submitted in PDF format through OpenReview at the following link: OpenReview Submission Portal