dropna#
- NestedFrame.dropna(*, axis: int | ~typing.Literal['index', 'columns', 'rows']=0, how: Literal['any', 'all', _NoDefault.no_default]=<no_default>, thresh: int | Literal[_NoDefault.no_default] = <no_default>, on_nested: bool = False, subset: Hashable | Sequence[Hashable] | None = None, inplace: bool = False, ignore_index: bool = False) NestedFrame | None[source]#
Remove missing values for one layer of the NestedFrame.
- Parameters:
axis ({0 or 'index', 1 or 'columns'}, default 0) –
Determine if rows or columns which contain missing values are removed.
0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.
Only a single axis is allowed.
how ({'any', 'all'}, default 'any') –
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
’any’ : If any NA values are present, drop that row or column.
’all’ : If all values are NA, drop that row or column.
thresh (int, optional) – Require that many non-NA values. Cannot be combined with how.
on_nested (str or bool, optional) – If not False, applies the call to the nested dataframe in the column with label equal to the provided string. If specified, the nested dataframe should align with any columns given in subset.
subset (column label or sequence of labels, optional) –
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
Access nested columns using nested_df.nested_col (where nested_df refers to a particular nested dataframe and nested_col is a column of that nested dataframe).
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.
ignore_index (bool, default
False) –If
True, the resulting axis will be labeled 0, 1, …, n - 1.Added in version 2.0.0.
- Returns:
DataFrame with NA entries dropped from it or None if
inplace=True.- Return type:
DataFrame or None
Examples
A common usecase for dropna is to remove empty nested rows:
>>> from nested_pandas.datasets.generation import generate_data >>> nf = generate_data(5,5, seed=1)
>>> # this query empties several of the nested dataframes >>> nf = nf.query("nested.t > 19") >>> nf a b nested 0 0.417022 0.184677 None 1 0.720324 0.372520 [{t: 19.365232, flux: 90.85955, flux_error: 1.... 2 0.000114 0.691121 [{t: 19.157791, flux: 14.672857, flux_error: 1... 3 0.302333 0.793535 None 4 0.146756 1.077633 None
>>> # dropna removes rows with those emptied dataframes >>> nf.dropna(subset="nested") a b nested 1 0.720324 0.372520 [{t: 19.365232, flux: 90.85955, flux_error: 1.... 2 0.000114 0.691121 [{t: 19.157791, flux: 14.672857, flux_error: 1...
dropna can also be used on nested columns:
>>> nf = generate_data(5,5, seed=1) >>> # Either on the whole dataframe >>> nf.dropna(on_nested="nested") a b nested 0 0.417022 0.184677 [{t: 8.38389, flux: 31.551563, flux_error: 1.0... 1 0.720324 0.372520 [{t: 13.70439, flux: 68.650093, flux_error: 1.... 2 0.000114 0.691121 [{t: 4.089045, flux: 83.462567, flux_error: 1.... 3 0.302333 0.793535 [{t: 17.562349, flux: 1.828828, flux_error: 1.... 4 0.146756 1.077633 [{t: 0.547752, flux: 75.014431, flux_error: 1.... >>> # or on a specific nested column >>> nf.dropna(subset="nested.t") a b nested 0 0.417022 0.184677 [{t: 8.38389, flux: 31.551563, flux_error: 1.0... 1 0.720324 0.372520 [{t: 13.70439, flux: 68.650093, flux_error: 1.... 2 0.000114 0.691121 [{t: 4.089045, flux: 83.462567, flux_error: 1.... 3 0.302333 0.793535 [{t: 17.562349, flux: 1.828828, flux_error: 1.... 4 0.146756 1.077633 [{t: 0.547752, flux: 75.014431, flux_error: 1....
Notes
Operations that target a particular nested structure return a dataframe with rows of that particular nested structure affected.
Values for on_nested and subset should be consistent in pointing to a single layer, multi-layer operations are not supported.