LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
LLM-FE is a novel framework for automated feature engineering in tabular data, integrating Large Language Models' (LLMs) domain knowledge and reasoning with evolutionary search. It formulates feature engineering as a program search problem, where LLMs iteratively propose and refine feature transformations. Guided by data-driven feedback, LLM-FE consistently outperforms state-of-the-art baselines, significantly enhancing tabular prediction model performance. ✨
Article Points:
1
LLM-FE: LLMs + evolutionary search for automated tabular feature engineering.
2
Formulates FE as program search; LLMs propose, data feedback refines transformations.
3
Leverages LLM domain knowledge and iterative data-driven feedback for feature discovery.
4
Consistently outperforms state-of-the-art baselines in tabular prediction tasks.
5
Enhances performance across diverse models: XGBoost, MLP, TabPFN.
6
Domain knowledge, evolutionary search, and feedback are crucial for LLM-FE's impact.
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
Problem

Traditional FE: limited search space, no domain knowledge

LLM-based FE: direct prompting, no prior insights

Tabular data: challenging, vast combinatorial space

Approach

Combines LLMs' domain knowledge & reasoning

Uses evolutionary search for feature optimization

Formulates FE as a program search problem

Iterative generation & data-driven feedback

Key Components

Feature Generation: LLM creates programs

Data-Driven Evaluation: Model performance as reward

Experience Management: Multi-population memory

Structured Input Prompt: Guides LLM

Performance

Outperforms SOTA baselines consistently

Enhances XGBoost, MLP, TabPFN models

Effective on classification & regression tasks

Robust to noise, computationally efficient

Impact

Reduces manual effort, improves predictive power

Generates interpretable, contextually relevant features

Generalizable across models & LLM backbones

Future Directions

Integrate more powerful LLMs

Extend to data cleaning, augmentation

Apply to model tuning, HPO