describe#
- NestedFrame.describe(exclude_nest: bool = False, percentiles=None, include=None, exclude=None)[source]#
Generate descriptive statistics, including nested columns with prefix to indicate the source.
Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values, similar to the behavior of pandas.DataFrame.describe().
Nested columns use pyarrow data types for efficiency, which are not always directly compatible with pandas’ type-based filtering.
pyarrow strings are not viewed as object type.
numerical types from pyarrow (i.e., int, double) are still matched by pandas’ np.number, so filtering with include=[np.number] will include numeric nested columns.
- Parameters:
exclude_nest (bool, default False) – If set to True, will exclude the nested structure and only computes the statistics over the base columns
percentiles (list-like of numbers, optional) – The percentiles to include in the output. All should fall between 0 and 1. Defaults to [.25, .5, .75].
include ('all', list-like of dtypes or None (default), optional) – A white list of data types to include in the output.
exclude (list-like of dtypes or None (default), optional) – A black list of data types to exclude from the output.
- Returns:
A NestedFrame with the summary statistics.
- Return type:
- Raises:
ValueError – If no statistics can be generated from the columns. A combined error message will be given.
Examples
>>> from nested_pandas.datasets.generation import generate_data >>> nf = generate_data(5,5, seed=1)
>>> nf_desc = nf.describe() >>> nf_desc a b nested.t nested.flux nested.flux_error count 5.000000 5.000000 25.0 25.0 25.0 mean 0.317310 0.623897 10.095623 45.252724 1.0 std 0.274904 0.351880 6.434858 30.152261 0.0 min 0.000114 0.184677 0.547752 1.828828 1.0 25% 0.146756 0.372520 3.96203 21.162812 1.0 50% 0.302333 0.691121 10.663306 44.789353 1.0 75% 0.417022 0.793535 16.014891 69.975836 1.0 max 0.720324 1.077633 19.365232 98.886109 1.0
-See Also#