Nested-Pandas#

An extension of pandas for efficient representation of nested associated datasets.

Nested-Pandas extends the pandas package with tooling and support for nested dataframes packed into values of top-level dataframe columns. Pyarrow is used internally to aid in scalability and performance.

Nested-Pandas allows data like this:

To instead be represented like this:

Where the nested data is represented as nested dataframes:

# Each row of "object_nf" now has it's own sub-dataframe of matched rows from "source_df"
object_nf.loc[0]["nested_sources"]

Allowing powerful and straightforward operations, like:

# Compute the mean flux for each row of "object_nf"
import numpy as np
object_nf.map_rows(np.mean, "nested_sources.flux")

Nested-Pandas is motivated by time-domain astronomy use cases, where we see typically two levels of information, information about astronomical objects and then an associated set of N measurements of those objects. Nested-Pandas offers a performant and memory-efficient package for working with these types of datasets.

Core advantages being:

hierarchical column access
efficient packing of nested information into inputs to custom user functions
avoiding costly groupby operations

How to Use This Guide#

Begin with the Getting Started guide to learn the basics of installation and walkthrough a simple example of using nested-pandas.

The Tutorials section showcases the fundamental features of nested-pandas.

API-level information about nested-pandas is viewable in the API Reference section.

The About Nested-Pandas section provides information on the design and performance advantages of nested-pandas.

Learn more about contributing to this repository in our Contribution Guide.

Nested-Pandas

Contents

Nested-Pandas#

How to Use This Guide#