Cards

Developer-Gathered, AI-Crafted, Human-Checked.

Why Language Models Hallucinate

Extract-0: A SPECIALIZED LANGUAGE MODEL FOR DOCUMENT INFORMATION EXTRACTION

COMMONFORMS: A Large, Diverse Dataset for Form Field Detection

How OpenAI uses Codex

TEMPO: PROMPT-BASED GENERATIVE PRE-TRAINED TRANSFORMER FOR TIME SERIES FORECASTING

LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Where AI is failing design systems, and where we are failing AI

On the Theoretical Limitations of Embedding-Based Retrieval

Key-value memory in the brain

Learning Facts at Scale with Active Reading

Building your own CLI Coding Agent with Pydantic-AI

Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework

Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

PERSONA VECTORS: MONITORING AND CONTROLLING CHARACTER TRAITS IN LANGUAGE MODELS

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Deep Think with Confidence

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Large Concept Models: Language Modeling in a Sentence Representation Space

Google’s Approach for Secure AI Agents: An Introduction