A Pretrained Model Just Beat CatBoost on Tabular Data

Data scientists have long accepted a specific hierarchy in machine learning. If you are working with images, you use a Convolutional Neural Network. If you are working with text, you use a Transformer. But if you are working with tabular data—the rows and columns of spreadsheets and SQL databases that power most of the corporate world—you use a decision tree ensemble. For years, the industry standard has been a relentless cycle of deploying Random Forest, XGBoost, or CatBoost, followed by hours of tedious hyperparameter tuning to squeeze out a fractional percentage of accuracy. Deep learning was largely dismissed for these tasks, unable to consistently beat the efficiency of gradient-boosted trees.

The Architecture of TabPFN

TabPFN challenges this status quo by treating tabular prediction not as a training problem, but as an inference problem. Unlike traditional models that start with a blank slate for every new dataset, TabPFN is a tabular foundation model. It is pretrained on millions of synthetic tabular tasks generated from causal processes, allowing it to internalize the general logic of supervised learning. When a user provides a dataset, TabPFN does not undergo iterative training. Instead, it employs in-context learning, similar to how a Large Language Model processes a prompt, to make predictions directly from the pretrained weights.

The latest iteration, TabPFN-2.5, expands this capability to handle larger and more complex datasets. It is designed to match the performance of heavy-duty ensemble systems like AutoGluon while removing the need for manual tuning. To address the computational overhead of this approach, the developers introduced a distillation method that allows the model's predictions to be converted into smaller neural networks or tree ensembles, preserving accuracy while slashing latency for production environments. Access to the model requires a TabPFN API Key, available at https://ux.priorlabs.ai/home.

To quantify the performance of this approach, a controlled experiment was conducted using a synthetic binary classification dataset generated via `make_classification` from scikit-learn. The dataset consists of 5,000 samples and 20 features. To simulate real-world noise, 10 of these features are informative, while 5 are redundant. The data is split into an 80% training set and a 20% testing set to ensure a fair evaluation of generalization.

The first baseline is a Random Forest classifier configured with 200 trees. As a robust ensemble method, Random Forest aggregates multiple decision trees to reduce variance. In this test, the model achieved an accuracy of 95.5%. The training process took 9.56 seconds, while the inference time was a lean 0.0627 seconds. This establishes a high baseline for accuracy and a very fast response time for predictions.

The second baseline is CatBoost, a gradient boosting model optimized specifically for tabular data. CatBoost builds trees sequentially, with each new tree correcting the errors of its predecessor. This model pushed the accuracy higher to 96.7%. Despite using 500 boosting iterations, it trained slightly faster than Random Forest at 8.15 seconds. Most notably, its inference speed was the fastest of the group at 0.0119 seconds, cementing its reputation as the gold standard for low-latency production environments.

Finally, TabPFN was applied to the same dataset. Because it does not train in the traditional sense, the `.fit()` step primarily involves loading pretrained weights. This resulted in a fit time of just 0.47 seconds. More importantly, TabPFN achieved the highest accuracy of all three models at 98.8%. However, this performance came with a significant cost in speed: inference took 2.21 seconds.

The Inference-Training Paradox

The results reveal a fundamental shift in where the computational burden lies in the machine learning pipeline. In traditional models like CatBoost and Random Forest, the heavy lifting happens during the training phase. The model spends seconds or hours analyzing the data to build a static structure of trees. Once that structure is built, predicting a new value is a simple matter of traversing those trees, which is why inference is nearly instantaneous.

TabPFN flips this logic entirely. By utilizing in-context learning, it effectively moves the learning process from the training phase to the inference phase. When you ask TabPFN for a prediction, it processes the training data and the test sample simultaneously. It is not just looking up a value in a pre-built tree; it is performing a complex reasoning task based on the provided context. This explains why the fit time is nearly zero but the inference time is orders of magnitude slower than CatBoost.

This creates a new trade-off for developers. For the first time, the bottleneck is no longer the time spent tuning hyperparameters or waiting for a model to converge. Instead, the bottleneck is the latency of the prediction itself. For offline analysis or batch processing where accuracy is the only metric that matters, TabPFN is the clear winner. For real-time applications where milliseconds matter, the traditional tree-based approach remains dominant unless the TabPFN distillation process is used to compress the foundation model into a faster format.

This shift suggests a future where tabular data is no longer treated as a series of isolated problems to be solved one by one, but as a unified domain that can be mastered by a single, massive pretrained model.

AX BRIEF

A Pretrained Model Just Beat CatBoost on Tabular Data — Here's How

The Architecture of TabPFN

The Inference-Training Paradox

A Pretrained Model Just Beat CatBoost on Tabular Data — Here's How

The Architecture of TabPFN

The Inference-Training Paradox

Related Articles