Cards

Developer-Gathered, AI-Crafted, Human-Checked.

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Deep Think with Confidence

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Large Concept Models: Language Modeling in a Sentence Representation Space

Google’s Approach for Secure AI Agents: An Introduction

RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL

What makes Claude Code so damn good (and how to recreate that magic in your agent)!?

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Dummy's Guide to Modern LLM Sampling Intro Knowledge

DOM-based Extension Clickjacking: Your Password Manager Data at Risk

The Future of AI: Exploring the Potential of Large Concept Models

Enhancing Retrieval-Augmented Generation: A Study of Best Practices

CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

Titans: Learning to Memorize at Test Time

Don’t Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

Tree-of-Code: A Hybrid Approach for Robust Complex Task Planning and Execution

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

Learning without training: The implicit dynamics of in-context learning

Retrieval-Augmented Reasoning with Lean Language Models

MCP vs CLI: Benchmarking Tools for Coding Agents