August 9-13, 2026, Jeju, Korea. Held in conjunction with KDD'26, International Convention Center Jeju (ICC Jeju)
This workshop advances evaluation and trustworthiness methodologies for agentic AI systems across their full deployment lifecycle, with particular emphasis on real-time post-market monitoring, model evolution, and production governance. As autonomous agents increasingly perform multi-step reasoning, planning, and action in open-ended real-world settings, traditional pre-deployment benchmarks and static evaluation frameworks prove insufficient.
We address core challenges including stochastic agent behavior, absence of ground truth, evolving user contexts, API-driven model updates, and lack of standardized metrics and audit practices. This workshop aims to foster interdisciplinary collaboration by bringing together researchers, industry practitioners, and policymakers to develop advanced evaluation techniques and governance frameworks for agentic AI systems that can be safely and reliably deployed in production.
Contact: kdd-ws-agentic-eval@amazon.comThis workshop focuses on the unique challenges of evaluating and ensuring trustworthiness of agentic AI systems throughout their deployment lifecycle. As large language models and autonomous agents are increasingly deployed in real-world, open-ended settings, we need new methods and frameworks that go beyond traditional pre-deployment benchmarks. Topics of interest include (but are not limited to):
TBD
TBD
TBD