from_pyarrow

Contents

from_pyarrow#

from_pyarrow(table: Table, reject_nesting: list[str] | str | None = None, autocast_list: bool = False, use_pandas_metadata: bool = True) NestedFrame[source]#

Load a pyarrow Table object into a NestedFrame.

Parameters:
  • table (pa.Table) – PyArrow Table object to load NestedFrame from

  • reject_nesting (list or str, default=None) – Column(s) to reject from being cast to a nested dtype. By default, nested-pandas assumes that any struct column with all fields being lists is castable to a nested column. However, this assumption is invalid if the lists within the struct have mismatched lengths for any given item. Columns specified here will be read using the corresponding pandas.ArrowDtype.

  • autocast_list (bool, default=False) – If True, automatically cast list columns to nested columns with NestedDType.

  • use_pandas_metadata (bool, default=True) – If True (default), apply the pandas metadata stored in the Parquet file’s schema when constructing the NestedFrame (e.g. restoring the index and column dtypes). This matches the default behavior of pd.read_parquet. Set to False to ignore the metadata.

Return type:

NestedFrame

Examples

>>> import nested_pandas as npd
>>> import pyarrow as pa
>>> table = pa.table({
...     "obj_id": [1, 2, 3],
...     "nested": pa.array([
...         [{"flux": 0.5, "time": 1}],
...         [{"flux": 1.2, "time": 2}, {"flux": 0.8, "time": 3}],
...         [{"flux": 2.0, "time": 4}],
...     ])
... })
>>> npd.from_pyarrow(table)
   obj_id                              nested
0       1              [{flux: 0.5, time: 1}]
1       2  [{flux: 1.2, time: 2}; …] (2 rows)
2       3              [{flux: 2.0, time: 4}]