Equinox JAX modules: filter_jit, filter_grad, and BatchNorm state training

This week’s JAX dev routine in Colab looks almost ritualistic: you run a cell, print the Equinox version, list the available devices, and then immediately sanity-check that your environment is ready to compile and train. Right after that, you build a small Linear module, inspect its PyTree leaves and structure, and realize the tutorial is quietly teaching a deeper idea: in Equinox, your “model” is treated as a PyTree first, and everything else follows from that.

Section 1: Equinox setup and PyTree-first module construction

The tutorial starts by placing Equinox on top of JAX, positioning it as a neural network library that leans into JAX’s functional design. It begins with the install-and-import phase, pulling in the core stack you need to run the examples end-to-end: JAX, Equinox, Optax for optimization, Jaxtyping for type annotations, and Matplotlib for visualization.

After the imports, the notebook prints the JAX and Equinox versions and then enumerates the devices available in the current runtime. In Colab, that step matters because it confirms whether you’re actually running on the accelerator you think you are, before you spend time compiling jitted functions.

From there, the tutorial makes the PyTree assumption concrete. It explains that eqx.Module treats the model as a PyTree, which changes how you reason about parameters, transformations, and serialization. Instead of hiding model internals behind opaque objects, Equinox’s PyTree approach makes it feel like you can “see” what will be treated as parameters versus what will remain static or auxiliary.

The tension here is subtle: most frameworks let you define a module and then hope the training loop “does the right thing.” Equinox flips that expectation by making the model’s structure inspectable early, so you can validate the assumptions before you ever start training.

The practical takeaway is that you should treat PyTree structure as part of your model design, not an implementation detail.

Section 2: filter_jit and filter_grad for array-only training

Once the notebook has established the PyTree mental model, it moves into the two workhorses that make Equinox feel different in real training code: filter_jit and filter_grad.

The tutorial first constructs a simple module that includes a Linear layer, then uses an example with Conv1dBlock to illustrate how static fields and learnable layers can coexist inside the same module. The point isn’t the convolution itself; it’s the separation of what should be treated as part of the differentiable computation versus what should remain fixed.

Next, it builds an MLP that includes dropout, and it demonstrates filter_jit by compiling the forward pass even when the model contains non-array fields. In JAX, compilation and differentiation often get tangled when objects contain values that aren’t arrays. filter_jit addresses that by letting you compile the parts that matter for execution while tolerating the presence of non-array fields.

Then the tutorial introduces filter_grad, which goes one step further: it doesn’t just compile; it decides what participates in gradient computation. In other words, filter_grad selects the array leaves that should receive gradients and computes derivatives only for those parts.

To make the behavior tangible, the notebook runs forward passes on synthetic data and evaluates gradients from the results. The effect is that Equinox connects model definition and differentiable computation in a way that stays JAX-friendly, without forcing you to restructure your entire module just to satisfy compilation rules.

The tension this section resolves is the common pain point where “the model is defined correctly” but training fails or becomes inefficient because JAX can’t cleanly trace through mixed data structures.

The conclusion you reach is that filter_jit and filter_grad turn PyTree structure into a controllable boundary between compiled execution and differentiable learning.

Section 3: PyTree splitting, tree_at immutable updates, and BatchNorm state

After showing how to compile and differentiate selectively, the tutorial shifts to a more stateful problem: BatchNorm.

PyTree manipulation utilities come into play first. The notebook splits the model into array and non-array components, then creates a trainable filter that targets the parts you want to update. This is where the earlier PyTree inspection pays off: you can reason about which leaves are trainable and which are not.

Then it uses tree_at to perform immutable updates. Instead of mutating the model in place, tree_at lets you produce an updated version of the module by replacing specific parts of the PyTree. The key detail is that it updates only what you specify, without rewriting the entire model object from scratch.

With that machinery in place, the tutorial defines a stateful layer-based model built around BatchNorm. It creates the model and its state separately, then runs a batch-style training pass that updates the state and returns the updated state information.

This is the twist inside the twist: the training loop here isn’t just “change weights.” It also carries BatchNorm’s running statistics as first-class state that must move forward with training. In many frameworks, BatchNorm state is handled implicitly; in this tutorial, it’s explicit and therefore inspectable.

The tension is that stateful layers often break the clean functional story people associate with JAX. Here, the notebook shows a pattern that keeps the functional approach intact by treating state as data that flows through the training step.

The resolution is that Equinox’s PyTree + immutable update approach makes BatchNorm state management feel like part of the model’s structure rather than an afterthought.

Section 4: residual blocks, warmup cosine schedules, serialization, and make_jaxpr

The final section scales the ideas into a more complete end-to-end training example.

The tutorial defines a residual block and a ResNetMLP model tailored for noisy sine regression, turning the earlier “toy module” work into a deeper pipeline. It then generates synthetic training and validation datasets, initializes the model, and sets up a warmup cosine learning rate schedule.

Optimization details follow the same selective-leaf philosophy. The optimizer state is prepared using only the model’s array leaves, aligning with the earlier filter_jit and filter_grad logic. Then the notebook defines jitted train_step and evaluate functions, which become the core mechanisms for training and validation.

The training loop runs across multiple epochs. Each epoch shuffles data, processes mini-batches, and tracks both training loss and validation loss over time. This part matters because it tests whether the earlier compilation, differentiation, and state handling patterns hold up when you put them into a realistic loop.

After training, the tutorial uses Equinox utilities to serialize the trained model. It reconstructs the same skeleton model and then deserializes into it, verifying that the restored weights match what you trained. That serialization check is important because it confirms that the PyTree-based structure is stable across save and load.

Finally, the notebook inspects the compiled computation graph using jax.make_jaxpr. Instead of treating the jitted train step as a black box, it lets you look at the graph that JAX actually executes, which supports debugging and introspection of the trained Equinox model.

The tension this section addresses is that “it trains” is not enough when you’re building production-grade JAX systems. You need confidence that compilation, state updates, serialization, and the executed graph all align with your intent.

The conclusion is that the tutorial connects PyTree modeling, selective compilation and differentiation, stateful BatchNorm handling, and graph-level debugging into one coherent workflow.

Equinox’s real promise in this notebook is that it treats model structure as inspectable data, so training becomes a sequence of explicit, verifiable transformations rather than a hidden sequence of framework magic.

Equinox JAX modules: filter_jit, filter_grad, and BatchNorm state training

Section 1: Equinox setup and PyTree-first module construction

Section 2: filter_jit and filter_grad for array-only training

Section 3: PyTree splitting, tree_at immutable updates, and BatchNorm state

Section 4: residual blocks, warmup cosine schedules, serialization, and make_jaxpr

Related Articles