Agentic & GenAI Evaluation KDD2025:

KDD workshop on Evaluation and Trustworthiness of Agentic and Generative AI Models

Aug 03, 2025(TBD). Toronto, ON, Canada. Held in conjunction with KDD'25


Welcome to Agentic & GenAI Evaluation KDD 2025!

The rapid advancement of generative and agentic AI models has ushered in transformative applications across diverse domains, ranging from creative content generation to autonomous decision-making systems. However, as these models become more capable, their widespread deployment raises pressing concerns about evaluation, reliability, and ethical implications. Current evaluation methods are insufficient for capturing the complexities and risks of generative outputs and autonomous AI behaviors.

To bridge this gap, robust evaluation frameworks are needed to assess these models holistically, ensuring they are not only performant but also aligned with societal values and safety expectations. Current benchmarks primarily focus on standard performance metrics, overlooking critical aspects such as trustworthiness, interpretability, and real-world usability. Without rigorous evaluation methodologies, generative and agentic AI systems may inadvertently perpetuate biases, propagate misinformation, or act unpredictably in high-stakes scenarios.

This workshop aims to foster interdisciplinary collaboration by bringing together researchers, industry practitioners, and policymakers to develop advanced evaluation techniques for generative and agentic AI. Our discussions will focus on new methodologies for assessing reasoning capabilities, ethical robustness, and cross-modal generation, along with scalable and user-centric evaluation frameworks. By addressing these challenges, we seek to pave the way for more reliable and responsible AI systems that can be safely integrated into society.

Contact: kdd2025-ws-agentic-genai-eval@amazon.com

Call for Contributions

  • Link to the submission website: https://openreview.net/group?id=KDD.org/2025/Workshop/Agentic-GenAI-Eval
  • This workshop will focus on the unique challenges and opportunities presented by the intersection of evaluation and trustworthiness in the context of generative and agentic AI. Generative AI models, such as large language models (LLMs) and diffusion models, have shown remarkable abilities in generating human-quality text, images, and other forms of content. Agentic AI refers to AI systems that can act autonomously and purposefully, raising critical concerns about safety, control, and alignment with human values. Evaluating these advanced AI models requires going beyond traditional metrics and benchmarks. We need new methods and frameworks to assess their performance, identify potential biases, and ensure they are used responsibly. This workshop will delve into these issues, with a particular focus on the following topics:

    • Agentic AI Evaluation: Assessing autonomous AI behavior, decision-making, goal alignment, adaptability, security, privacy, tool use, memory, and self-verification in dynamic and open-ended environments.
    • Trustworthiness in Generative AI Models: Truthfulness and Reliability, Ensuring Safety and Security, Addressing Bias and Fairness, Ethical Considerations, Privacy Considerations, Enhancing Misuse Resistance, Explainability, and Robustness.
    • User-Centric Assessment: Evaluating AI from a user experience perspective, including trust calibration, mental models, and usability.
    • Multi-Perspective Evaluation: Emphasizing logical reasoning, knowledge depth, problem-solving abilities, contextual understanding, and user alignment.
    • Evaluating Reasoning Models: Measuring AI's ability to conduct step-by-step reasoning, causal inference, and complex problem-solving across various domains.
    • Efficient Evaluation Methods: Scalable, automated, and cost-effective approaches for assessing generative AI performance with minimal manual oversight.
    • Synthetic Data Generation Evaluation: Assessing the quality, representativeness, and bias implications of synthetic data used for AI training and evaluation.
    • Evaluating Misinformation and Manipulative Content: Techniques for detecting, measuring, and mitigating misinformation propagation by generative models.
    • Cross-Modal Evaluation: Assessing AI's capabilities across text, image, audio, and multimodal generation.
    • Holistic Evaluation Frameworks: Developing standardized datasets, metrics, and methodologies for comprehensive AI assessment.

    Submission Guidelines

    • Paper submissions are limited to 9 pages, excluding references, must be in PDF and use ACM Conference Proceeding templates (two column format).
    • Additional supplemental material focused on reproducibility can be provided. Proofs, pseudo-code, and code may also be included in the supplement, which has no explicit page limit. The supplement format could be either single column or double column. The paper should be self-contained, since reviewers are not required to read the supplement.
    • The Word template guideline can be found here: [link]
    • The Latex/overleaf template guideline can be found here: [link]
    • The submissions will be judged for quality and relevance through single-blind reviewing.
    • A paper should be submitted in PDF format through OpenReview at the following link: https://openreview.net/group?id=KDD.org/2025/Workshop/Agentic-GenAI-Eval

    Keynote Speakers

    Speakers to be announced


    TBD

    Our keynote speakers for the Agentic & GenAI Evaluation KDD 2025 workshop will be announced closer to the event date.

    Check back for updates on distinguished speakers from academia and industry who will share their insights on the evaluation and trustworthiness of agentic and generative AI models.