Aug 03, 2025(TBD). Toronto, ON, Canada. Held in conjunction with KDD'25
The rapid advancement of generative and agentic AI models has ushered in transformative applications across diverse domains, ranging from creative content generation to autonomous decision-making systems. However, as these models become more capable, their widespread deployment raises pressing concerns about evaluation, reliability, and ethical implications. Current evaluation methods are insufficient for capturing the complexities and risks of generative outputs and autonomous AI behaviors.
To bridge this gap, robust evaluation frameworks are needed to assess these models holistically, ensuring they are not only performant but also aligned with societal values and safety expectations. Current benchmarks primarily focus on standard performance metrics, overlooking critical aspects such as trustworthiness, interpretability, and real-world usability. Without rigorous evaluation methodologies, generative and agentic AI systems may inadvertently perpetuate biases, propagate misinformation, or act unpredictably in high-stakes scenarios.
This workshop aims to foster interdisciplinary collaboration by bringing together researchers, industry practitioners, and policymakers to develop advanced evaluation techniques for generative and agentic AI. Our discussions will focus on new methodologies for assessing reasoning capabilities, ethical robustness, and cross-modal generation, along with scalable and user-centric evaluation frameworks. By addressing these challenges, we seek to pave the way for more reliable and responsible AI systems that can be safely integrated into society.
Contact: kdd2025-ws-agentic-genai-eval@amazon.com
This workshop will focus on the unique challenges and opportunities presented by the intersection of evaluation and trustworthiness in the context of generative and agentic AI. Generative AI models, such as large language models (LLMs) and diffusion models, have shown remarkable abilities in generating human-quality text, images, and other forms of content. Agentic AI refers to AI systems that can act autonomously and purposefully, raising critical concerns about safety, control, and alignment with human values. Evaluating these advanced AI models requires going beyond traditional metrics and benchmarks. We need new methods and frameworks to assess their performance, identify potential biases, and ensure they are used responsibly. This workshop will delve into these issues, with a particular focus on the following topics:
Speakers to be announced
TBD
Our keynote speakers for the Agentic & GenAI Evaluation KDD 2025 workshop will be announced closer to the event date.
Check back for updates on distinguished speakers from academia and industry who will share their insights on the evaluation and trustworthiness of agentic and generative AI models.
August 2025 (2:00-6:00PM), [Location TBA]
Introduction by organizers.
Distinguished Speaker (TBA)
Selected paper presentations (3 papers, 15 minutes each)
Poster presentations and networking opportunity
Distinguished Speaker (TBA)
Selected paper presentations (3 papers, 15 minutes each)
Interactive panel with keynote speakers and selected experts
Closing remarks by organizers.
Papers will be announced after the submission deadline