Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework
This paper introduces a novel multi-agent framework for generating diverse and privacy-preserving synthetic QA datasets to evaluate Retrieval-Augmented Generation (RAG) systems. It addresses the critical need for high-quality evaluation datasets that capture real-world constraints like sensitive information protection and topical coverage. The framework aims to provide a practical and ethically aligned pathway for more comprehensive RAG system evaluation. ✨
Article Points:
1
RAG evaluation requires diverse, privacy-preserving synthetic QA datasets.
2
Existing RAG benchmarks often lack real-world complexity and topical coverage.
3
Proposed multi-agent framework generates diverse, privacy-aware QA datasets.
4
Diversity agent uses clustering for broad topical coverage and semantic variability.
5
Privacy agent detects and masks sensitive PII across various domains.
6
Framework outperforms baselines in diversity and ensures robust privacy masking.
Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework
Problem

RAG evaluation challenges

Benchmarks lack diversity & real-world complexity

Retrieval systems face privacy issues

Proposed Solution

Multi-agent framework for synthetic QA

Prioritizes semantic diversity

Ensures privacy preservation

Ethically aligned pathway

Multi-Agent Framework

Diversity Agent

- Clustering for topical coverage

Privacy Agent

- Detects & masks sensitive PII

QA Curation Agent

- Synthesizes private QA pairs

LangGraph orchestration

Evaluation

Diversity assessment

- LLM-as-a-Judge & CosineSimilarity

Privacy assessment

- AI4Privacy datasets for entity masking

Outperforms baseline methods

Future Work

Enhance agent autonomy & collaboration

Adaptive PII identification

Rigorous privacy attack evaluation

Align with evolving AI regulations