describe

describe#

NestedFrame.describe(exclude_nest: bool = False, percentiles=None, include=None, exclude=None)[source]#

Generate descriptive statistics, including nested columns with prefix to indicate the source.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values, similar to the behavior of pandas.DataFrame.describe().

Nested columns use pyarrow data types for efficiency, which are not always directly compatible with pandas’ type-based filtering.

  • pyarrow strings are not viewed as object type.

  • numerical types from pyarrow (i.e., int, double) are still matched by pandas’ np.number, so filtering with include=[np.number] will include numeric nested columns.

Parameters:
  • exclude_nest (bool, default False) – If set to True, will exclude the nested structure and only computes the statistics over the base columns

  • percentiles (list-like of numbers, optional) – The percentiles to include in the output. All should fall between 0 and 1. Defaults to [.25, .5, .75].

  • include ('all', list-like of dtypes or None (default), optional) – A white list of data types to include in the output.

  • exclude (list-like of dtypes or None (default), optional) – A black list of data types to exclude from the output.

Returns:

A NestedFrame with the summary statistics.

Return type:

NestedFrame

Raises:

ValueError – If no statistics can be generated from the columns. A combined error message will be given.

Examples

>>> from nested_pandas.datasets.generation import generate_data
>>> nf = generate_data(5,5, seed=1)
>>> nf_desc = nf.describe()
>>> nf_desc
              a         b   nested.t  nested.flux  nested.flux_error
count  5.000000  5.000000       25.0         25.0               25.0
mean   0.317310  0.623897  10.095623    45.252724                1.0
std    0.274904  0.351880   6.434858    30.152261                0.0
min    0.000114  0.184677   0.547752     1.828828                1.0
25%    0.146756  0.372520    3.96203    21.162812                1.0
50%    0.302333  0.691121  10.663306    44.789353                1.0
75%    0.417022  0.793535  16.014891    69.975836                1.0
max    0.720324  1.077633  19.365232    98.886109                1.0

-See Also#

-pandas.DataFrame.describe()