numbox.core.work

Overview

Functionality for fully-jitted and light-weight calculation on a graph.

Modules

numbox.core.work.node

Overview

numbox.core.work.node.Node represents a node on a directed acyclic graph (DAG) that exists in a fully jitted scope and is accessible both at the low-level and via a Python proxy.

Node can be used on its own (in which case the recommended way to create it is via the factory function numbox.core.work.node.make_node()) or as a prototype to more functionally-rich graph nodes, such as numbox.core.work.work.Work.

The logic of Node and its sub-classes follows a graph-optional design - no graph orchestration structure is required to register and manage the graph of Node instance objects - which in turn reduces unnecessary computation overhead and simplifies the program design.

To that end, each node is identified by its name and contains a uniformly-typed vector-like container member (rendered by the numba-native numba.core.typed.List) with all the input nodes references that it bears a directed dependency relationship to. This enables a traversal not only of graphs of Node instances themselves but also graphs of objects representable by it, such as, the graphs of Work nodes.

Node implementation makes heavy use of the numba meminfo paradigm that manages memory-allocated payload via smart pointer (pointer to numba’s meminfo object) reference counting. This allows users to reference the desired memory location via a ‘void’ structref type, such as, numbox.core.any.erased_type.ErasedType, or numbox.core.utils.void_type.VoidType, or base structref type, such as, numbox.core.work.node_base.NodeBaseType, and dereference its payload accordingly when needed via the appropriate numbox.utils.lowlevel.cast().

class numbox.core.work.node.Node(name, inputs)[source]

Bases: NodeBase

all_inputs_names()[source]

depends_on(obj_)[source]

get_input(i)[source]

get_inputs_names()[source]

property inputs

class numbox.core.work.node.NodeTypeClass(*args, **kwargs)[source]: Bases: NodeBaseTypeClass

numbox.core.work.node.make_node(name, inputs=())[source]

numbox.core.work.node.ol_all_inputs_names(self_ty)[source]

numbox.core.work.node.ol_depends_on(self_ty, obj_ty)[source]

numbox.core.work.node.ol_get_input(self_ty, i_ty)[source]

numbox.core.work.node.ol_get_inputs_names(self_ty)[source]

numbox.core.work.node.ol_node(name_ty, inputs_ty)[source]

numbox.core.work.node_base

Overview

Base class for numbox.core.work.node.Node and numbox.core.work.work.Work. Contains functionality dependent only on the node name.

class numbox.core.work.node_base.NodeBase(name)[source]

Bases: StructRefProxy

property name

class numbox.core.work.node_base.NodeBaseTypeClass(*args, **kwargs)[source]: Bases: StructRef

numbox.core.work.print_tree

Overview

Provides utilities to print a tree from the given node’s dependencies. The node can be either instance of numbox.core.work.node.Node or numbox.core.work.work.Work:

from numbox.core.work.node import make_node
from numbox.core.work.print_tree import make_image

n1 = make_node("first")
n2 = make_node("second")
n3 = make_node("third", inputs=(n1, n2))
n4 = make_node("fourth")
n5 = make_node("fifth", inputs=(n3, n4))
tree_image = make_image(n5)
print(tree_image)

which outputs:

fifth--third---first
       |       |
       |       second
       |
       fourth

Notice that the tree depth extends in horizontal direction, the width extends in vertical direction and is aligned to recursively fit images of the sub-trees.

For the sake of readability, if multiple nodes depend on the given node, the latter will be accordingly displayed multiple times on the tree image, for instance:

n1 = make_node("n1")
n2 = make_node("n2", (n1,))
n3 = make_node("n3", inputs=(n1,))
n4 = make_node("n4", inputs=(n2, n3))
tree_image = make_image(n4)

produces:

n4--n2--n1
    |
    n3--n1

Here it is understood that both references to ‘n1’ point to the same node, that happens to be a source of two other nodes, ‘n2’ and ‘n3’.

class numbox.core.work.print_tree.ImmutableItemDict[source]: Bases: dict

numbox.core.work.print_tree.calculate_col_widths(graph)[source]

numbox.core.work.print_tree.make_graph(node)[source]

numbox.core.work.print_tree.make_image(node)[source]

numbox.core.work.work

Overview

Defines numbox.core.work.work.Work StructRef. Work is a unit of calculation work that is designed to be included as a node on a jitted graph of other Work nodes.

Work type subclasses numbox.core.work.node_base.NodeBase and follows the logic of graph design of numbox.core.work.node.Node. However, since numba StructRef does not support low-level subclasses, there is no inheritance relation between NodeBaseType and WorkType, leaving the data design to follow the composition pattern. Namely, the member (name) of the NodeBase payload is a header in the payload of Work, allowing to perform a meaningful numbox.utils.lowlevel.cast().

The main way to create Work object instance is via the numbox.core.work.work.make_work() constructor (Work(…) instantiation is in fact disabled both in Python and jitted scope) that can be invoked either from Python or jitted scope (plain-Python or jitted run function below):

import numpy
from numba import float64, njit
from numbox.core.work.work import make_work
from numbox.utils.highlevel import cres

@cres(float64(), cache=True)
def derive_work():
    return 3.14

@njit(cache=True)
def run(derive_):
    work = make_work("work", 0.0, derive=derive_)
    work.calculate()
    return work.data

assert numpy.isclose(run(derive_work), 3.14)

When called from jitted scope, if cacheability of the caller function is a requirement, the derive function should be passed to run as a FunctionType (not njit-produced CPUDispatcher) argument, i.e., decorated with numbox.utils.highlevel.cres()). Otherwise, simply pulling derive_work from the global scope within argument-less run will prevent its caching.

For performance-critical large graphs containing hundreds or more nodes created in a jitted scope, using numbox.core.work.work.make_work() is not feasible as it either results in large memory use (and takes up a lot of disk space when the jitted caller is cached), or takes a substantial time to compile when make_work is declared with inline=True directive (albeit resulting in a much slimmer and optimized final compilation result). For that purpose it is recommended to use a low-level intrinsic numbox.core.work.lowlevel_work_utils.ll_make_work() as follows:

from numba import njit
from numba.core.types import float64
from numpy import isclose

from numbox.core.work.node_base import NodeBaseType
from numbox.core.work.lowlevel_work_utils import ll_make_work, create_uniform_inputs
from numbox.core.work.print_tree import make_image
from numbox.utils.highlevel import cres

@cres(float64())
def derive_v0():
    return 3.14

@njit(cache=True)
def v0_maker(derive_):
    return ll_make_work("v0", 0.0, (), derive_)

v0 = v0_maker(derive_v0)
assert v0.data == 0
assert v0.name == "v0"
assert v0.inputs == ()
assert not v0.derived
v0.calculate()
assert isclose(v0.data, 3.14)

Importantly, Work objects support numbox.core.work.work.ol_as_node() rendition as_node that creates a numbox.core.work.node.Node instance with the same name as the Work instance and the vector of inputs of the numbox.core.work.node_base.NodeBase type referencing the original Work instance’s sources. Upon the first invocation of as_node on the given Work instance, Node representations for itself and recursively all its sources are created and stored in their node attributes only once. Subsequent invocations of as_node on either the given Work node or any of nodes on its sub-graph will return the previously created Node objects stored as the node attribute.

Graph manager

While not a requirement, it is recommended that the Work instance’s name attribute matches the name of the variable to which that instance is assigned. Moreover, no out-of-the-box assertions for uniqueness of the Work names is provided. The users are free to implement their own graph managers that register the Work nodes and assert additional requirements on the names as needed. The core numbox library maintains agnostic position to whether such an overhead is universally beneficial (and is worth the performance tradeoff).

One option to build a graph manager would be via the constructor such as:

from numba.core.errors import NumbaError
from numbox.core.configurations import default_jit_options
from numbox.core.work.node import NodeType
from numbox.core.work.work import _make_work
from numbox.utils.lowlevel import _cast
from work_registry import _get_global, registry_type

@njit(**default_jit_options)
def make_registered_work(name, data, sources=(), derive=None):
    """ Optional graph manager. Consider using `make_work`
    where performance is more critical and name clashes are
    unlikely and/or inconsequential. """
    registry_ = _get_global(registry_type, "_work_registry")
    if name in registry_:
        raise NumbaError(f"{name} is already registered")
    work_ = _make_work(name, data, sources, derive)
    registry_[name] = _cast(work_, NodeType)
    return work_

Here numbox.core.work.work.ol_make_work() is the original Work constructor overload, while the utility registry module can be defined as

"""
These functions included in this module::

    get_or_make_global
    _get_global
    _set_global

were mainly based on the

    `CognitiveRuleEngine <https://github.com/DannyWeitekamp/Cognitive-Rule-Engine/blob/main/cre/utils.py>`_

open-source project, distributed under

MIT License

Copyright (c) 2023 Daniel Weitekamp

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell  # noqa: E501
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  # noqa: E501

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  # noqa: E501

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.  # noqa: E501
"""

from llvmlite import ir
from numba import njit
from numba.extending import intrinsic
from numba.core import cgutils
from numba.core.types import DictType, unicode_type, void
from numba.typed.typeddict import Dict

from numbox.core.configurations import default_jit_options
from numbox.core.work.node import NodeType


def get_or_make_global(context, builder, fe_type, name):
    mod = builder.module
    try:
        gv = mod.get_global(name)
    except KeyError:
        ll_ty = context.get_value_type(fe_type)
        gv = ir.GlobalVariable(mod, ll_ty, name=name)
        gv.linkage = "common"
        gv.initializer = cgutils.get_null_value(gv.type.pointee)
    return gv


@intrinsic(prefer_literal=True)
def _get_global(typingctx, type_ref, name_ty):
    ty = type_ref.instance_type
    name = name_ty.literal_value

    def codegen(context, builder, signature, arguments):
        gv = get_or_make_global(context, builder, ty, name)
        v = builder.load(gv)
        context.nrt.incref(builder, ty, v)
        return v
    sig = ty(type_ref, name_ty)
    return sig, codegen


@intrinsic(prefer_literal=True)
def _set_global(typingctx, type_ref, name_ty, v_ty):
    ty = type_ref.instance_type
    name = name_ty.literal_value

    def codegen(context, builder, signature, arguments):
        _, __, v = arguments
        gv = get_or_make_global(context, builder, ty, name)
        builder.store(v, gv)
    sig = void(type_ref, name_ty, v_ty)
    return sig, codegen


registry_type = DictType(unicode_type, NodeType)


@njit(**default_jit_options)
def set_global(registry_):
    _set_global(registry_type, "_work_registry", registry_)


registry = Dict.empty(unicode_type, NodeType)
set_global(registry)

Implementation details

Behind the scenes, Work accommodates individual access to its sources (other Work nodes that are pointing to the given Work node on the DAG) via a ‘Python-native compiler’ backdoor, which is essentially a relative pre-runtime technique to leverage Python’s compile and exec functions before preparing for overload in the numba jitted scope. This technique is fully compatible with caching of jitted functions and facilitates a natural Python counterpart to virtual functions (unsupported in numba). Here it is extensively utilized in numbox.core.work.work.ol_calculate() that overloads calculate method of the Work class.

Invoking calculate method on the Work node triggers DFS calculation of its sources - all of the sources are automatically calculated before the node itself is calculated. Calculation of the Work node sets the value of its data attribute to the outcome of the calculation, which in turn can depend on the data values of its sources.

To avoid repeated calculation of the same node, Work has derived boolean flag that is set to True once the node has been calculated, preventing subsequent re-derivation. In particular, this ensures that DFS calculation of the node’s sources happens just once.

class numbox.core.work.work.Work(*args, **kws)[source]

Bases: NodeBase

Structure describing a unit of work.

Instances of this class can be connected in a graph with other Work instances.

Attributes

namestr: Name of the structure instance.
inputsUniTuple[NodeBaseType]: Uniform tuple of Work.sources, cast as NodeBaseType.
dataAny: Scalar or array data payload contained in (and calculated by) this structure.
sourcesTuple[Work, …]: Heterogeneous tuple of Work instances that this Work instance depends on.
deriveFunctionType: Function of the signature determined by the data types of sources and data.
derivedint8: Flag indicating whether the data has already been calculated.
nodeNodeType: Work as Node, with its sources in a List.

(name, ) attributes of the Work structure payload are homogeneously typed across all instances of Work and accommodate cast-ability to the numbox.core.node_base.NodeBase base of NodeBaseType.

all_inputs_names()[source]

as_node()[source]

calculate()[source]

property data

depends_on(obj_)[source]

property derived

get_input(i)[source]

get_inputs_names()[source]

property inputs

load(data)[source]

make_inputs_vector()[source]

property sources

numbox.core.work.work.make_inputs_vector_code(num_sources)[source]

numbox.core.work.work.make_work(name, data, sources=(), derive=None)[source]

numbox.core.work.work.ol_all_inputs_names(self_ty)[source]

numbox.core.work.work.ol_as_node(self_ty)[source]

numbox.core.work.work.ol_calculate(self_ty)[source]

numbox.core.work.work.ol_depends_on(self_ty, obj_ty)[source]

numbox.core.work.work.ol_get_input(self_ty, i_ty)[source]

numbox.core.work.work.ol_get_inputs_names(self_ty)[source]

numbox.core.work.work.ol_load(work_ty, data_ty: DictType)[source]: Load data into the graph with the root node work. Data is provided as dictionary mapping node name to Any type containing erased payload p of the work.data type.

numbox.core.work.work.ol_make_inputs_vector(self_ty)[source]

numbox.core.work.work_utils

Overview

Convenience utilities for creating Work-graphs from Python scope.

The numbox.core.work.work.make_work() constructor accepts cres-compiled derive function as an argument that requires an explicitly provided signature of the derive function. Return type of the derive function should match the type of the data attribute of the corresponding Work instance while its argument types should match the data types of the Work instance sources.

Utilities defined in this module make it easier to ensure these requirements are met with a minimal amount of coding:

import numpy
from numbox.core.work.work_utils import make_init_data, make_work_helper


pi = make_work_helper("pi", 3.1415)


def derive_circumference(diameter_, pi_):
    return diameter_ * pi_


def run(diameter_):
    diameter = make_work_helper("diameter", diameter_)
    circumference = make_work_helper(
        "circumference",
        make_init_data(),
        sources=(diameter, pi),
        derive_py=derive_circumference,
        jit_options={"cache": True}
    )
    circumference.calculate()
    return circumference.data


if __name__ == "__main__":
    assert numpy.isclose(run(1.41), 3.1415 * 1.41)

numbox.core.work.work_utils.make_init_data(shape=(), val=0.0, ty=None)[source]

numbox.core.work.work_utils.make_work_helper(name, init_data, sources=(), derive_py=None, jit_options=None)[source]: Utility for creating instances of Work from Python scope. Python abstraction for the jitted scope functionality.