numbox.core.variable

Overview

Framework for Directed Acyclic Graph (DAG) in pure Python. While this module does not contain any JIT-compiled bits in particular, or anything imported from numba in general, computationally heavy parts can be put on this graph as JIT-compiled functions via the formula key of the graph variables specifications (see below).

Modules

numbox.core.variable.variable

Overview

A graph can be defined as follows:

from numbox.core.variable.variable import Graph

def derive_x(y_):
    return 2 * y_

def derive_a(x_):
    return x_ - 74

def derive_u(a_):
    return 2 * a_

x = {"name": "x", "inputs": {"y": "basket"}, "formula": derive_x}
a = {"name": "a", "inputs": {"x": "variables1"}, "formula": derive_a}
u = {"name": "u", "inputs": {"a": "variables1"}, "formula": derive_u}

graph = Graph(
    variables_lists={
        "variables1": [x, a],
        "variables2": [u],
    },
    external_source_names=["basket"]
)

Here we have the variable y sourced externally from the basket, and calculated variables x and a in the variables1 namespace, and u in the variables2 namespace.

The dictionaries x, a, and u are called variable specifications. These specs on their own are agnostic about what namespace they can be put in. The namespaces however need to be specified via the variables_lists argument given to the Graph at the initialization time.

The full and unambiguous way to denote the variables is via their qualified names, applicable both to externally sourced variables, basket.y, as well as the calculated ones, variables1.x, variables1.a, variables2.u.

One of the variables specifications, designated with the key formula, specifies the function with the parameters that match the input variables (this graph node’s dependencies) that are in turn designated with the key inputs. While the names of the parameters of the function assigned to the formula key do not have to match the names of the inputs, their order is expected to follow one-to-one correspondence. This way the graph is instructed which inputs to use to get the values to be assigned to the parameters of the formula.

The Python function specified by the formula can be a wrapper around numba JIT-compiled function, i.e., a proxy to the numba’s FunctionType or CPUDispatcher objects [1].

The variable specification for inputs (if any) includes both the names of the dependencies variables required to calculate the given variable via the function given by the formula, as well as the namespaces where these variables are going to be looked for in.

Graph end nodes, located at the edge of the graph (a.k.a., leaf nodes) have neither inputs nor formula in their specifications. Specifying formula without inputs will not result in an exception, accommodating for the case of a function that computes and returns a value independent of any input parameters. It is also possible to specify inputs but no formula, which technically defines the placement of the node on the graph but leaves it up to the developer to defer specifying the node’s calculation logic until later in the runtime.

Names of the ‘external’ sources (of data values) need to be given to the Graph as well, via the external_source_names argument. When the numbox.core.variable.variable.Graph is compiled to the numbox.core.variable.variable.CompiledGraph, it will automatically figure out which variables need to be sourced from each of the specified external sources (such as, ‘basket’) in order to perform the required calculation:

from numbox.core.variable.variable import CompiledGraph

# What is required from this calculation, the names of qualified variables
required = ["variables2.u"]

# Compile the graph for the required variables
compiled = graph.compile(required)
assert isinstance(compiled, CompiledGraph)

# The graph will figure out what external variables it needs to do the calculation
required_external_variables = compiled.required_external_variables
assert list(required_external_variables.keys()) == ["basket"]
basket = required_external_variables["basket"]
assert list(basket.keys()) == ["y"]
assert basket["y"].name == "y"

Graph uses the variable specifications given to it to create instances of numbox.core.variable.variable.Variable. Namespaces of calculated Variable s are numbox.core.variable.variable.Variables. Namespaces of externally sourced Variable s are numbox.core.variable.variable.External .

Semantically, each Variable is defined by its scoped name, that is, a tuple of its namespace / source name and its own name.

In DAG terminology, External scopes contain variables with no inputs, that is, edge (or end / leaf) nodes.

Instances of Variable s and External are stored in the Graph’s instance’s registry:

from numbox.core.variable.variable import Variables, Variable

registry = graph.registry

# Get the namespaces...
variables1 = registry["variables1"]
variables2 = registry["variables2"]

# ... and the variables defined in these namespaces
assert list(variables1.variables.keys()) == ["x", "a"]
assert list(variables2.variables.keys()) == ["u"]

assert isinstance(variables1, Variables)
assert isinstance(variables1.variables["x"], Variable)

basket_ = registry["basket"]
... # same `basket` as above
assert basket_["y"] is basket["y"]

That is, users are not expected to instantiate neither Variable s nor Variables s, although they are certainly allowed to do so if needed (it is recommended to design one’s code so that Variable instances when needed are simply retrieved from the registry of the Graph instance). Instead, users provide variable specifications, as the dictionaries x, u, a in the example above (and the variable name “y” that is referred to and implied to be ‘external’) that are given to the Graph. The Graph then creates instances of Variables (one per namespace) and instances of External (one per an ‘external’ source). Finally, Variables and External in turn create instances of Variable s and store them.

To calculate the required variables, one first needs to instantiate the execution-scope instance of the storage numbox.core.variable.variable.Values of the values of all variables scoped in Variables and External namespaces. This storage will get automatically populated with all calculated nodes as a mapping from the corresponding Variable to instances of numbox.core.variable.variable.Value. The latter wraps the data. All the data of non-external variables is initialized to the instance _null of the numbox.core.variable.variable._Null.

Then, one needs to supply external_values of the leaf nodes that are needed for the calculation. As discussed above, these required external variables are identified programmatically. Provided values for these have been provided, one can calculate the graph as:

from numbox.core.variable.variable import Values

# Instantiate the storage
values = Values()

# Request the calculation by executing the graph
compiled.execute(
    external_values={"basket": {"y": 137}},
    values=values,
)

This populates the values with the correct data:

x_var = variables1["x"]
a_var = variables1["a"]
u_var = variables2["u"]

assert values.get(x_var).value == 274
assert values.get(a_var).value == 200
assert values.get(u_var).value == 400

The graph can be recomputed if some of its nodes have been changed. Only the affected nodes will be re-evaluated:

compiled.recompute({"basket": {"y": 1}}, values)
assert values.get(basket["y"]).value == 1
assert values.get(x_var).value == 2
assert values.get(a_var).value == -72
assert values.get(u_var).value == -144

References

class numbox.core.variable.variable.CompiledGraph(ordered_nodes: list[numbox.core.variable.variable.CompiledNode], required_external_variables: dict[str, dict[str, numbox.core.variable.variable.Variable]], debug: bool = False, dependents: dict[numbox.core.variable.variable.Variable, list[numbox.core.variable.variable.CompiledNode]] = <factory>, affected_cache: dict[frozenset[numbox.core.variable.variable.Variable], list[numbox.core.variable.variable.CompiledNode]] = <factory>)[source]

Bases: object

affected_cache: dict[frozenset[Variable], list[CompiledNode]]

debug: bool = False

dependents: dict[Variable, list[CompiledNode]]

execute(external_values: dict[str, dict[str, Any]], values: Storage)[source]

Main entry point to calculate values of nodes of the compiled graph. Calculation requires the following inputs:

Parameters:

external_values – actual values of all required external variables, this can be a superset of what is really needed for the calculation. The map is first from the name of the external namespace and then from the name of the variable within that source to the variable’s actual value.
values – runtime storage of all values, e.g., an instance of Values.

ordered_nodes: list[CompiledNode]

recompute(changed: dict[str, dict[str, Any]], values: Storage)[source]

Parameters:

changed – dict of sources to names to new values of changed Variable instances coming from either External or Variables source.
values – storage of all Variable values.

Recompute takes priority: each node in the affected downstream cone is reset and recomputed from its formula, so the graph structure decides the final values. A supplied value persists only for a node not downstream of any other change; a co-changed downstream value is recomputed, not held.

required_external_variables: dict[str, dict[str, Variable]]

class numbox.core.variable.variable.CompiledNode(variable: numbox.core.variable.variable.Variable, inputs: list[numbox.core.variable.variable.Variable])[source]

Bases: object

inputs: list[Variable]

variable: Variable

class numbox.core.variable.variable.External(name: str)[source]

Bases: Namespace

An ‘external’ namespace that facilitates discovery of requested names.

When requesting a Variable with the given name via a typical __getitem__ call, if the Variable is not found, it will be created and added to this dictionary. This way the graph will be able to infer which variables are required from the external source abstracted by this namespace.

declare(name: str, params: Params) → Variable[source]: Pre-seed a typed external before compile (the only supported route to attach params to an external, which is otherwise auto-created untyped on lookup).

class numbox.core.variable.variable.Graph(variables_lists: dict[str, list[VarSpec]], external_source_names: list[str])[source]

Bases: object

compile(required: list[str] | str, debug: bool = False) → CompiledGraph[source]

Required:: list of qualified variables names that need to be calculated.

dependents_of(qual_names: list[str] | set[str] | str) → set[str][source]: Return qualified names of Variable`s that directly or indirectly depend on any of `qual_names.

explain(qual_name: str, right_to_left: bool = True) → str[source]

Follow the dependencies chain to explain how the given variable is derived.

Uses metadata of the Variable instances.

Parameters:

qual_name – qualified name of the Variable.
right_to_left – when True (default), begin explanation with qual_name. That is, move towards the ends of the graph.

class numbox.core.variable.variable.Namespace[source]

Bases: ABC

keys()[source]

name: str

update(key: str, var: Variable) → None[source]: Post-initialization update for dynamically generated Variable instances.

class numbox.core.variable.variable.Params(jitable: bool = True, type: Any = None)[source]

Bases: object

Optional per-Variable declaration driving static jitability in compile_kernel. jitable=False declares a deliberately plain-Python node; type is the variable’s numba Type (None means undeclared).

Like formula, params must be attached to a node before the first compile() of any required set containing that node: a Graph caches its compiled result, so a params attached afterward is not picked up.

jitable: bool = True

type: Any = None

class numbox.core.variable.variable.Storage(*args, **kwargs)[source]

Bases: Protocol

get(variable: Variable) → Value[source]: Principal access point to the requested variable. Instantiates the corresponding value when first invoked for the given variable.

class numbox.core.variable.variable.Value(variable: ~numbox.core.variable.variable.Variable, value: ~typing.Any | ~numbox.core.variable.variable._Null = <numbox.core.variable.variable._Null object>)[source]

Bases: object

Value of the corresponding Variable. Best used when created indirectly by the Values storage.

value: Any | _Null = <numbox.core.variable.variable._Null object>

variable: Variable

class numbox.core.variable.variable.Values[source]

Bases: object

Values of all Variable instances, computed and external, will be held here.

get(variable: Variable) → Value[source]

class numbox.core.variable.variable.VarSpec[source]

Bases: VarSpecBase

formula: Callable

inputs: dict[str, str]

metadata: str

name: str

params: Params

class numbox.core.variable.variable.VarSpecBase[source]

Bases: TypedDict

name: str

class numbox.core.variable.variable.Variable(name: str, source: str = '', inputs: ~typing.Mapping[str, str] = <factory>, formula: ~typing.Callable = None, metadata: str | None = None, params: ~numbox.core.variable.variable.Params | None = None)[source]

Bases: object

An instance of Variable is anything that can be calculated from the values of the given inputs dependencies using the provided formula (i.e., a Python function).

Calculated value can be None, that is why a non-calculated value is designated with _null.

An instance of Variable is best created within the given Namespace. For example, when the Variables subtype of the Namespace is instantiated, it gets populated with the freshly created Variable instances per the VarSpec specifications passed to it. Or, when the External subtype of the Namespace is queried for the given variable name, if a Variable with such a name is not already present in that external namespace, it will be created and stored there.

Parameters:

name – name of the Variable instance.
source – name of the Namespace instance which is the namespace / source of this Variable.
inputs – (optional) map from names of the Variable inputs (which are names of other Variable instances) to names of their Namespace instances.
formula – (optional) function that calculates the value of this Variable from its inputs.
metadata – any possible metadata associated with this variable.

formula: Callable = None

inputs: Mapping[str, str]

metadata: str | None = None

name: str

params: Params | None = None

qual_name() → str[source]: Qualified name of Variable incorporates both the name of the Variable and the name of its source / namespace.

source: str = ''

class numbox.core.variable.variable.Variables(name: str, variables: list[VarSpec])[source]: Bases: Namespace

numbox.core.variable.variable.make_qual_name(namespace_name: str, var_name: str) → str[source]

Each Variable instance is best initialized in and owned by a Namespace object (such as, instances of External and Variables), with the given namespace_name.

This function thereby returns qualified name of the Variable instance.

numbox.core.variable.compile_kernel

Overview

Alongside numbox.core.variable.variable.Graph.compile() (which produces a numbox.core.variable.variable.CompiledGraph evaluated node-by-node in pure Python), numbox.core.variable.compile_kernel.compile_kernel() compiles a Graph into fused @njit kernel code for a requested set of variables. It does not replace core.work or CompiledGraph; it is an additional, JIT’ed evaluation path.

When every formula is njit-able the graph fuses into one @njit kernel that takes the required external inputs as positional arguments and returns the requested variables as a tuple, with every interior graph node lowered to an SSA temporary inside the single compiled function. No per-node type information needs to be supplied: numba infers every interior type from the runtime argument types. Plain-Python formulas are auto-wrapped with njit(). When some formulas are not njit-able for the actual argument types, the first call detects them and the graph is split into @njit segments orchestrated from Python (see Graphs with non-jittable nodes below).

Per-node type information is optional. Each Variable may carry a params (a Params(jitable, type)) declaring whether its formula is jittable and the variable’s numba type. A node with no params behaves exactly as above – jitability is discovered at the first call. When every node in the required cone is declared and every consumed external is typed, compile_kernel resolves the execution mode at build time instead: an all-jittable graph compiles eagerly into one fused kernel; a declared jittable/non-jittable mix compiles eagerly into a static segment plan with no probing; and CompiledKernel.partition is populated at build, inspectable before any call. Declaring types moves type errors to build time: a coercible but wrong params.type (for instance declaring int64 over a body that naturally returns float64) raises at compile_kernel rather than silently truncating, because it is caught by an explicit unconstrained return-type probe of the formula – not by binding the formula to the declared signature, which numba would silently coerce. A graph that declares nothing is byte-for-byte the behavior described above.

The call to compile_kernel returns a numbox.core.variable.compile_kernel.CompiledKernel. It exposes .kernel (the hot-path callable — positional in, tuple out: the bare numba dispatcher once the graph resolves fully fused, the Python master when the graph is segmented around non-jittable nodes) and a dict-in / dict-out .execute convenience that mirrors numbox.core.variable.variable.CompiledGraph.execute(). The qualified names of the kernel’s positional inputs and tuple outputs are available as .params and .outputs, the generated kernel text as .source, and the per-variable temporary identifiers as .identifiers.

This fused path does not honor the cacheable memoization of individual nodes; use numbox.core.variable.variable.CompiledGraph (or the core.work graph) when that is needed. It does, however, support incremental recompute of only the affected nodes via CompiledKernel.recompute (see Incremental recompute below).

Caching. The fused kernel is cached on disk, content-addressed by a fingerprint of the generated kernel source, every formula’s behavioral state (bytecode, constants, default values, closure-cell values, referenced module-level globals including helper functions, defining module), and the effective jit flags. Changing any of these recompiles instead of reusing a stale binary; cosmetic edits that do not change behavior (comments, local renames) do not. The generated source never mentions types, so a declared graph’s signatures are folded into the digest as well (the consumed external signature for an eager fused kernel, each segment’s live-in/out signature for an eager segment): two declared-type variants of one type-free graph therefore get distinct cache anchors and never reuse each other’s binary. Formulas whose state cannot be fingerprinted – a cres-compiled callable, or a value with no canonical form – make that one kernel uncacheable: always recompiled per process, never wrong. The cache keyword is tri-state: None (the default) defers to jit_options["cache"], then the NUMBOX_JIT_OPTIONS environment default, then True; an explicit True/False wins. Two costs are worth knowing: a formula that references or closes over a large array pays a per-compile sha256 over that array’s bytes (proportional to its size) on every compile_kernel call; and numba itself declines to disk-cache a kernel that calls a @cfunc formula or references a large global array – the kernel still computes correctly, it is simply recompiled in each process regardless of the content-addressed anchor. A @vectorize (DUFunc) formula, by contrast, caches cleanly.

Practical limits. Graph traversal is recursive: dependency chains deeper than roughly sys.getrecursionlimit() raise a RecursionError naming the remedy (raise the limit before compiling). Cold compilation of the fused kernel costs on the order of 20 ms and ~1 MiB of memory per formula node (numba 0.65, CPython 3.12); graphs beyond a few thousand nodes compile increasingly slowly and are better split or evaluated via numbox.core.variable.variable.CompiledGraph.

A graph can be compiled to a fused kernel as follows:

from numba import njit
from numbox.core.variable.variable import Graph
from numbox.core.variable.compile_kernel import compile_kernel

graph = Graph(
    variables_lists={"variables": [
        {"name": "x", "inputs": {"y": "basket"}, "formula": njit(lambda y: 2 * y)},
        {"name": "u", "inputs": {"x": "variables"}, "formula": njit(lambda x: x - 74)},
    ]},
    external_source_names=["basket"],
)

ck = compile_kernel(graph, ["variables.u"])
assert ck.execute({"basket": {"y": 100}}) == {"variables.u": 126}
assert ck.kernel(100) == (126,)

Here the dict-in / dict-out ck.execute looks up the required external value basket.y and returns the requested variables.u, while ck.kernel is called positionally with the external input and returns the output tuple directly.

Graphs with non-jittable nodes

compile_kernel detects non-jittable formulas automatically at the first call: it first tries to compile the fully fused kernel for the actual argument types; if that fails, it probes each node against the real intermediate values, runs the offenders in plain Python, and fuses the jittable remainder into the minimal number of @njit segments any topological order permits (one plus the maximum number of jit/Python alternations along a dependency path). A Python master then threads values between segments and Python nodes. Compile-time failures demote a node; runtime errors always propagate.

import json

from numbox.core.variable.compile_kernel import compile_kernel
from numbox.core.variable.variable import Graph

def n3(v):
    json.dumps({"k": 1})    # no nopython lowering for the json module
    return v * 3.0

graph = Graph(
    variables_lists={"calc": [
        {"name": "n1", "inputs": {"x": "ext"}, "formula": lambda x: x + 1.0},
        {"name": "n2", "inputs": {"n1": "calc"}, "formula": lambda n1: n1 * 2.0},
        {"name": "n3", "inputs": {"n2": "calc"}, "formula": n3},
        {"name": "n4", "inputs": {"n3": "calc"}, "formula": lambda n3: n3 - 4.0},
        {"name": "n5", "inputs": {"n4": "calc"}, "formula": lambda n4: n4 / 2.0},
    ]},
    external_source_names=["ext"],
)
ck = compile_kernel(graph, "calc.n5")
ck.kernel(7.0)              # first call: probes, partitions, still correct
print(str(ck.partition))    # 2 jit segments around the python n3, with reasons

ck.partition is None until the first call resolves the mode; a fully fused graph reports a single jit segment. Each jit segment is cached content-addressed on disk the same way the fused kernel is; the learned partition itself is per-process. If a later call’s types break a segment, the partition is re-learned for those values and replaces the previous plan — workloads alternating between type families whose partitions differ re-pay discovery on each alternation.

Incremental recompute

CompiledKernel.recompute is a value-only refresh that re-evaluates only the cone of nodes affected by a change, reading every unchanged input from a persistent value store seeded by a prior full call. It mirrors numbox.core.variable.variable.CompiledGraph.recompute(): same types across calls, {source: {name: value}} in, a tuple in outputs order out. A changed name may resolve to an interior node, in which case its value is overridden and only its downstream cone recomputes. Do not interleave input-changing throughput kernel(...) calls between recompute calls – the store is seeded once and a throughput call does not update it.

For an undeclared graph a changed value of a different numba type triggers a one-time flush-and-reseed recovery. A declared kernel enforces a contract instead: a changed value is checked for numba assignability to the node’s declared type, so a value numba cannot assign raises a crisp declared type X, got Y error, while a benign difference numba accepts – a C-contiguous array against an 'A'-layout array declaration, or a safe scalar promotion – is accepted. The check is convertibility, not type identity.

from numba import njit
from numbox.core.variable.variable import Graph
from numbox.core.variable.compile_kernel import compile_kernel

graph = Graph(
    variables_lists={"variables": [
        {"name": "a", "inputs": {"y": "basket"}, "formula": njit(lambda y: y + 1.0)},
        {"name": "b", "inputs": {"y": "basket"}, "formula": njit(lambda y: y * 2.0)},
        {"name": "u", "inputs": {"a": "variables", "b": "variables"},
         "formula": njit(lambda a, b: a + b)},
    ]},
    external_source_names=["basket"],
)

ck = compile_kernel(graph, ["variables.u"])
assert ck.kernel(100.0) == (301.0,)        # full call seeds the value store
assert ck.recompute({"basket": {"y": 101.0}}) == (304.0,)

Compile a core.variable Variable graph into fused @njit kernel(s).

Alongside core.work (a structref graph), this turns a Graph/CompiledGraph into JIT-compiled straight-line code. When every formula is njit-able the whole graph becomes a single fused @njit function whose interior nodes are SSA temporaries (no per-node type info needed: numba infers every interior type from the kernel’s runtime argument types). When some formulas are not njit-able, the first call detects them automatically – numba compile errors demote a node to plain Python, runtime errors always propagate – and a Python master orchestrates fused @njit segments around the demoted nodes, with a fusion-maximizing linearization choosing the segment boundaries. The resulting partition is described by CompiledKernel.partition (a PartitionReport with per-node demotion reasons); formulas with no Python fallback (cres/CompileResultWAP, CFunc, DUFunc) are always treated as jittable.

Jitability may be either discovered at the first call (above) or declared up front. Each Variable carries an optional params (Params(jitable, type)). A node with no params is discovered exactly as before – byte-for-byte the same behavior. When every node in the required cone is declared (and every consumed external is typed), compile_kernel() resolves the execution mode at build time instead of at the first call: an all-jittable graph compiles eagerly into one fused kernel (“fused”), a declared jit/Python mix compiles eagerly into a static segment plan (“segmented”) with no probing, and CompiledKernel.partition is populated at build (inspectable before any call). A declared params.type that the formula does not naturally yield is caught at build by an explicit unconstrained return-type probe – not by binding the formula to the declared signature, which would silently coerce a convertible-but-wrong scalar type (a node declared int64 over a float-returning body would otherwise return a truncated value). Any node left undeclared (or only partially typed) keeps the runtime-discovery path; a graph that declares nothing behaves exactly as today.

The on-disk cache is content-addressed per compiled unit (the fused kernel, or each jit segment): the digest fingerprints each formula’s code, constants, default arguments, closure-cell values, referenced globals, and the kernel’s effective jit flags, so a stale binary is never reused and two distinct kernels never collide. The kernel source never mentions types, so declared signatures are appended to the digest as well: two declared-type variants of one graph therefore get distinct cache anchors. A formula with no canonical fingerprint forces its unit uncached (no anchor, no numba cache) – never reused, never wrong.

class numbox.core.variable.compile_kernel.CompiledKernel(kernel: Dispatcher, params: list[tuple[str, str, str]], outputs: list[str], source: str, identifiers: dict[str, str], ctx: _KernelCtx, required_vars: list[Variable], external_vars: list[Variable], is_declared: bool = False)[source]

Bases: object

A fused @njit kernel compiled from a Variable graph.

Attributes:

kernel      - hot-path callable: resolver before the first call, the
              bare numba dispatcher once fused, the segmented master
              otherwise. Positional external args (in `params` order)
              -> tuple (in `outputs` order).
recompute   - value-only incremental refresh of only the cone affected
              by a change, over a store seeded by a prior `kernel` call;
              returns a tuple in `outputs` order (see `recompute`).
params      - external input qual_names, kernel-argument order.
outputs     - requested variable qual_names, return-tuple order.
source      - generated kernel source text.
identifiers - {qual_name: temp identifier} for inspection.
partition   - PartitionReport describing what runs where. None until the
              first call for undeclared graphs; set at build time for
              fully-declared (eager) graphs.
is_declared - True when the graph was fully declared and the mode resolved
              eagerly at build; False for a discovery (undeclared) kernel.
              Declared kernels enforce the `recompute()` type contract
              instead of re-discovering.

execute(external_values: dict) → dict[source]: Dict-in / dict-out convenience, symmetric with CompiledGraph.execute.

property kernel: Callable

recompute(changed: dict) → tuple[source]

Incrementally re-evaluate only the cone affected by changed.

Mirrors numbox.core.variable.variable.CompiledGraph.recompute(): this is a value-only refresh, not a recompile. changed is {source: {name: value}}; the returned tuple is in outputs order. The same variables may carry different values across calls, but their numba types must stay the same as the seeding call (a type change is recovered from once, see below, but the contract is same-types).

Precondition: a prior full call (kernel(...) / execute(...)) must have seeded the value store. Calling recompute first raises RuntimeError.

What it does: it writes the changed values into a persistent value store, collects the downstream cone of the changed nodes, and re-fuses just that cone – reading every unchanged input from the store. The cone sub-plan is compiled on first use and kept in a bounded LRU cache keyed on the cone and its live-in boundary, so a recurring change pattern reuses its compiled plan without re-fusing. Nodes that were demoted to plain Python at seed time stay Python in the cone; the jittable remainder fuses into @njit segments.

Interior overrides: a changed name may resolve to an interior (computed) node rather than an external input – mirroring the interpreted path. Its value is overridden in the store and only its downstream cone recomputes; the overridden node’s own formula is not re-run – unless the node is itself downstream of another changed input in the same call, in which case graph priority applies and it is recomputed from its formula. The first override of a not-yet-seen interior node expands the change-source set and rebuilds the persisted-node boundary (and invalidates cached cone plans, whose boundaries have shifted).

Limitations:

Do not interleave input-changing kernel(...) throughput calls between recompute calls. recompute is the stateful entry point: the store is seeded once, and a throughput call does not update it, so a subsequent recompute would read stale unchanged values. Use recompute for the incremental workflow and the bare kernel for independent one-shot calls.
An interior plain-Python (demoted) node must return a stable numba type across recomputes. The same-types contract extends to demoted outputs: a demoted node whose output type drifts between recomputes is not supported.

On a live-in type change a cached cone dispatcher fails to compile; the whole cone-plan cache is flushed once, the store re-seeded from the last full call, the change re-applied, and the cone rebuilt against the new types.

Declared kernels enforce a contract instead of recovering. For a kernel built from a fully-declared graph, a changed value is checked for numba assignability to the node’s declared type: a value numba cannot assign to the declared type raises a crisp declared type X, got Y error, while a benign difference numba accepts – a C-contiguous array against an 'A'-layout array declaration, or a safe scalar promotion – is accepted. The check is convertibility, not type identity, and is scoped to declared (eager) kernels; an individually-declared node inside an otherwise discovered kernel keeps the flush-and-reseed recovery above.

numbox.core.variable.compile_kernel.compile_kernel(graph: Graph, required: str | list[str], *, jit_options: dict | None = None, cache: bool | None = None) → CompiledKernel[source]

Compile graph into a fused @njit kernel for the required variables.

Parameters:

graph – its dependency structure and formulas are fused into one straight-line @njit function (see CompiledKernel).
required – Order is preserved and fixes the order of CompiledKernel.outputs / the kernel’s return tuple; a duplicate entry raises ValueError (each output is requested once – the return tuple is positional, so a repeat carries no information).
jit_options – merged over numbox’s defaults (NUMBOX_JIT_OPTIONS env) and passed to @njit. All options except cache participate in the content-addressed digest.
cache – tri-state. None (default) defers to jit_options[“cache”], then the NUMBOX_JIT_OPTIONS env default, then True. An explicit True/False wins over both.

Error timing: structural problems raise here (unknown or malformed required entries, non-callable formulas, arity mismatches against the declared inputs, graphs deeper than the recursion limit). For an undeclared (or partially-declared) graph, numba typing problems surface at the kernel’s first call (auto-njit of plain-Python formulas is lazy). For a fully-declared graph (every node carries params, every consumed external is typed) the mode resolves eagerly here, so type errors move to build time: a formula whose natural return at the declared input types is non-convertible to the declared type, a cross-node type mismatch, and – crucially – a coercible-but-wrong params.type. The last is caught by an explicit unconstrained return-type probe that compares the formula’s naturally inferred return type against the declaration; binding the formula to the declared signature does not catch it, because numba silently coerces a convertible scalar (declaring int64 over a x * 1.5 body would otherwise compute 7, not 7.5). Fully-declared graphs thus fail fast at build; any-undeclared graph fails at the first call, exactly as today. Runtime errors never demote – they propagate.

Caching: the kernel digest fingerprints each formula’s bytecode, constants, default values, closure-cell values, referenced module-level globals (including helper functions, recursively), defining module, and the effective jit flags. Because the generated source never mentions types, a declared graph’s signatures are appended to the digest too (the consumed external signature for an eager fused kernel, each segment’s live-in/out signature for an eager segment), so two declared-type variants of one type-free graph get distinct cache anchors and never reuse each other’s binary. A formula whose state cannot be fingerprinted (e.g. cres/CompileResultWAP objects, values with no canonical form) downgrades that one kernel to cache=False: always recompiled, never stale. When caching is enabled, a content-addressed anchor .py file is written under numba’s cache directory; with caching off (or the cache dir unwritable, which warns and degrades) nothing is written.

Non-jittable formulas: for an undeclared graph the first call resolves the execution mode. If the fully fused kernel cannot be typed for the actual argument types, each node is probed against the real intermediate values; nodes whose formulas fail to compile (or whose input values numba cannot type) run in plain Python, and the jittable remainder is fused into segments orchestrated from Python. A declared jitable=False node is instead demoted by declaration (no probing): a graph mixing declared jittable and declared-Python nodes resolves eagerly to a “segmented” plan at build, CompiledKernel.partition populated immediately with per-node reasons. CompiledKernel.partition describes the result, including per-node demotion reasons; it is None before the first call only for an undeclared graph (set at build for a fully-declared one). CompiledKernel.kernel is the hot-path callable: the bare @njit dispatcher once the graph resolves fully fused, the Python master when segmented. For an undeclared graph a later call whose types break a segment re-learns and replaces the partition (one active plan); a declared kernel does not re-discover. The crisp declared type X, got Y contract is enforced at build time and on recompute() (can_convert against each declared type), not on the throughput kernel(…) path: throughput retains numba’s polymorphic widening across calls, and a later kernel(…) whose off-contract type breaks a jit segment raises numba’s own typing error (the declared _demoted is left untouched – no silent re-discovery). Once fully fused, fused is permanent (a later-signature typing failure raises). The discovery call computes jit-node values through per-node dispatchers while later calls use fused segments – identical under default IEEE semantics, but non-default jit_options such as fastmath could in principle differ across fusion boundaries.

The identifier-assignment and formula helpers used by the compiler live in numbox.core.variable.utils.

Formula and identifier helpers for the fused-kernel compiler.

Shared utilities used by numbox.core.variable.compile_kernel: identifier assignment for generated kernel source, formula njit-wrapping, and formula arity validation. Kept here so they have a stable home as the kernel machinery grows and so a second consumer can reuse them without importing compile_kernel.