numbarrow.utils
numbarrow.utils.utils
Overview
Low-level pointer utilities for zero-copy access to Arrow memory buffers.
Provides Numba-compatible functions that reinterpret a raw memory address
(from pyarrow.Buffer.address) as a typed NumPy array, enabling @njit
code to read Arrow buffer data directly without copying.
The key abstraction is arrays_viewers — a dictionary mapping NumPy dtypes
to pre-compiled viewer functions. Each viewer takes (address, length) and
returns a NumPy array backed by the memory at that address.
Low-level pointer utilities for zero-copy access to Arrow memory buffers.
Provides Numba-compatible functions that reinterpret a raw memory address
(obtained from pyarrow.Buffer.address) as a typed NumPy array, enabling
@njit code to read Arrow buffer data directly without copying.
- numbarrow.utils.utils.numpy_array_from_ptr_factory(dtype_)[source]
Create a JIT-compiled function that views memory at a given address as a NumPy array.
Returns an
@njitfunction with signature(ptr_as_int, sz) -> ndarraythat usesnumba.carray()to reinterpret sz elements starting at address ptr_as_int as a contiguous C-order NumPy array of dtype_. No data is copied — the returned array is a view over the original memory.- Parameters:
dtype – NumPy dtype for the resulting array (e.g.
np.int32)- Returns:
JIT-compiled function
(int, int) -> np.ndarray
numbarrow.utils.arrow_array_utils
Overview
Higher-level utilities for extracting data from PyArrow array buffers as NumPy arrays. Handles uniform arrays (fixed-width elements), string arrays (variable-length with offset buffers), struct arrays, and list-of-struct arrays.
Utilities for extracting data from PyArrow array buffers as NumPy arrays.
Handles uniform arrays (fixed-width elements), string arrays (variable-length
with offset buffers), struct arrays, and list-of-struct arrays. Validity
bitmaps are extracted as uint8 arrays for use with is_null().
- numbarrow.utils.arrow_array_utils.create_bitmap(bitmap_buf: Buffer | None)[source]
Create numpy array of uint8 type containing bit-map of valid array entries
- numbarrow.utils.arrow_array_utils.create_str_array(pa_str_array: StringArray) ndarray[source]
Copy data from densely packed pa.StringArray into padded numpy array of the character sequence type determined by the length of the longest string.
- numbarrow.utils.arrow_array_utils.structured_array_adapter(struct_array: StructArray) Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]
NumPy adapter of PyArrow StructArray.
Returns tuple of two dictionaries, the first dictionary maps names of the structure fields to the contiguous bitmap arrays, the second maps these names to the contiguous value arrays.
- numbarrow.utils.arrow_array_utils.structured_list_array_adapter(list_array: ListArray) Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]
NumPy adapter of PyArrow array of same-length lists of structures.
- Parameters:
list_array – PyArrow array with elements being of pa.ListType.
Each list is in turn of the same length, and each element of the list is of pa.StructType.
Returns tuple of two dictionaries, the first dictionary maps names of the structure fields to the contiguous bitmap array, the second maps these names to the contiguous values arrays.
Data is not copied as it is uniformly stored in a columnar format, that is, the underlying values are stored contiguously in a pa.StructArray.