numbarrow.utils
numbarrow.utils.utils
Overview
Low-level pointer utilities for zero-copy access to Arrow memory buffers.
Provides Numba-compatible functions that reinterpret a raw memory address
(from pyarrow.Buffer.address) as a typed NumPy array, enabling @njit
code to read Arrow buffer data directly without copying.
The key abstraction is arrays_viewers — a dictionary mapping NumPy dtypes
to pre-compiled viewer functions. Each viewer takes (address, length) and
returns a NumPy array backed by the memory at that address.
Low-level pointer utilities for zero-copy access to Arrow memory buffers.
Provides Numba-compatible functions that reinterpret a raw memory address
(obtained from pyarrow.Buffer.address) as a typed NumPy array, enabling
@njit code to read Arrow buffer data directly without copying.
- numbarrow.utils.utils.numpy_array_from_ptr_factory(dtype_)[source]
Create a JIT-compiled function that views memory at a given address as a NumPy array.
Returns an
@njitfunction with signature(ptr_as_int, sz) -> ndarraythat usesnumba.carray()to reinterpret sz elements starting at address ptr_as_int as a contiguous C-order NumPy array of dtype_. No data is copied — the returned array is a view over the original memory.- Parameters:
dtype – NumPy dtype for the resulting array (e.g.
np.int32)- Returns:
JIT-compiled function
(int, int) -> np.ndarray
numbarrow.utils.arrow_array_utils
Overview
Higher-level utilities for extracting data from PyArrow array buffers as NumPy arrays. Handles uniform arrays (fixed-width elements), string arrays (variable-length with offset buffers), struct arrays, and list-of-struct arrays.
Utilities for extracting data from PyArrow array buffers as NumPy arrays.
Handles uniform arrays (fixed-width elements), string arrays (variable-length
with offset buffers), struct arrays, and list-of-struct arrays. Validity
bitmaps are extracted as uint8 arrays for use with is_null().
- numbarrow.utils.arrow_array_utils.create_bitmap(bitmap_buf: Buffer | None, offset: int = 0, length: int = 0)[source]
Create numpy array of uint8 type containing bit-map of valid array entries, adjusted for array offset.
- numbarrow.utils.arrow_array_utils.create_str_array(pa_str_array: StringArray) ndarray[source]
Copy data from densely packed pa.StringArray into padded numpy array of the character sequence type determined by the length of the longest string.
- numbarrow.utils.arrow_array_utils.structured_array_adapter(struct_array: StructArray) Tuple[ndarray | None, Dict[str, ndarray | None], Dict[str, ndarray]][source]
NumPy adapter of PyArrow StructArray.
Returns a 3-tuple: - struct-level validity bitmap (None if all rows valid) - dict mapping field names to per-field validity bitmaps - dict mapping field names to per-field value arrays
- numbarrow.utils.arrow_array_utils.structured_list_array_adapter(list_array: ListArray) Tuple[ndarray | None, Dict[str, ndarray | None], Dict[str, ndarray]][source]
NumPy adapter of PyArrow array of same-length lists of structures.
- Parameters:
list_array – PyArrow array with elements being of pa.ListType.
Each list is in turn of the same length, and each element of the list is of pa.StructType.
Returns a 3-tuple of: the struct-level validity bitmap (or
Noneif all values are valid), a dictionary mapping field names to per-field validity bitmaps (eachNoneif all values are valid), and a dictionary mapping field names to the contiguous field data arrays.Data is not copied as it is uniformly stored in a columnar format, that is, the underlying values are stored contiguously in a pa.StructArray.