numbarrow.utils

numbarrow.utils.utils

Overview

Low-level pointer utilities for zero-copy access to Arrow memory buffers. Provides Numba-compatible functions that reinterpret a raw memory address (from pyarrow.Buffer.address) as a typed NumPy array, enabling @njit code to read Arrow buffer data directly without copying.

The key abstraction is arrays_viewers — a dictionary mapping NumPy dtypes to pre-compiled viewer functions. Each viewer takes (address, length) and returns a NumPy array backed by the memory at that address.

Low-level pointer utilities for zero-copy access to Arrow memory buffers.

Provides Numba-compatible functions that reinterpret a raw memory address (obtained from pyarrow.Buffer.address) as a typed NumPy array, enabling @njit code to read Arrow buffer data directly without copying.

numbarrow.utils.utils.numpy_array_from_ptr_factory(dtype_)[source]

Create a JIT-compiled function that views memory at a given address as a NumPy array.

Returns an @njit function with signature (ptr_as_int, sz) -> ndarray that uses numba.carray() to reinterpret sz elements starting at address ptr_as_int as a contiguous C-order NumPy array of dtype_. No data is copied — the returned array is a view over the original memory.

Parameters:

dtype – NumPy dtype for the resulting array (e.g. np.int32)

Returns:

JIT-compiled function (int, int) -> np.ndarray

numbarrow.utils.arrow_array_utils

Overview

Higher-level utilities for extracting data from PyArrow array buffers as NumPy arrays. Handles uniform arrays (fixed-width elements), string arrays (variable-length with offset buffers), struct arrays, and list-of-struct arrays.

Utilities for extracting data from PyArrow array buffers as NumPy arrays.

Handles uniform arrays (fixed-width elements), string arrays (variable-length with offset buffers), struct arrays, and list-of-struct arrays. Validity bitmaps are extracted as uint8 arrays for use with is_null().

numbarrow.utils.arrow_array_utils.create_bitmap(bitmap_buf: Buffer | None, offset: int = 0, length: int = 0)[source]

Create numpy array of uint8 type containing bit-map of valid array entries, adjusted for array offset.

numbarrow.utils.arrow_array_utils.create_str_array(pa_str_array: StringArray) ndarray[source]

Copy data from densely packed pa.StringArray into padded numpy array of the character sequence type determined by the length of the longest string.

numbarrow.utils.arrow_array_utils.structured_array_adapter(struct_array: StructArray) Tuple[ndarray | None, Dict[str, ndarray | None], Dict[str, ndarray]][source]

NumPy adapter of PyArrow StructArray.

Returns a 3-tuple: - struct-level validity bitmap (None if all rows valid) - dict mapping field names to per-field validity bitmaps - dict mapping field names to per-field value arrays

numbarrow.utils.arrow_array_utils.structured_list_array_adapter(list_array: ListArray) Tuple[ndarray | None, Dict[str, ndarray | None], Dict[str, ndarray]][source]

NumPy adapter of PyArrow array of same-length lists of structures.

Parameters:

list_array – PyArrow array with elements being of pa.ListType.

Each list is in turn of the same length, and each element of the list is of pa.StructType.

Returns a 3-tuple of: the struct-level validity bitmap (or None if all values are valid), a dictionary mapping field names to per-field validity bitmaps (each None if all values are valid), and a dictionary mapping field names to the contiguous field data arrays.

Data is not copied as it is uniformly stored in a columnar format, that is, the underlying values are stored contiguously in a pa.StructArray.

numbarrow.utils.arrow_array_utils.uniform_arrow_array_adapter(pa_array: Array) Tuple[ndarray | None, ndarray][source]

NumPy adapter for PyArrow arrays with uniformly sized elements. Returns views over bitmap and data contiguous memory regions as numpy arrays.