join_nested

join_nested#

NestedFrame.join_nested(obj, name: str, *, how: str = 'left', on: None | str | list[str] = None, dtype: NestedDtype | pd.ArrowDtype | pa.DataType | None = None) Self[source]#

Packs input object to a nested column and adds it to the NestedFrame

This method returns a new NestedFrame with the added nested column.

Parameters:
  • obj (pd.DataFrame or a sequence of items convertible to nested structures) – The object to be packed into nested pd.Series and added to the NestedFrame. If a DataFrame is passed, it must have non-unique index values, which are used to pack the DataFrame. If a sequence of elements is passed, it is packed into a nested pd.Series. Sequence elements may be individual pd.DataFrames, dictionaries (keys are nested column names, values are arrays of the same length), or any other object convertible to pa.StructArray. Additionally, None and pd.NA are allowed as elements to represent missing values.

  • name (str) – The name of the nested column to be joined to the NestedFrame.

  • how ({'left', 'right', 'outer', 'inner'}, default: 'left') –

    How to handle the operation of the two objects:

    • left: use calling frame’s index.

    • right: use the calling frame’s index and order but drop values not in the other frame’s index.

    • outer: form union of calling frame’s index with other frame’s index, and sort it lexicographically.

    • inner: form intersection of calling frame’s index with other frame’s index, preserving the order of the calling index.

  • on (str or list of str, default: None) – Column(s) in the calling frame to join on instead of the index. The original index is always preserved. The column(s) are used only as join keys and are dropped from the nested structure.

  • dtype (dtype or None) – NestedDtype to use for the nested column; pd.ArrowDtype or pa.DataType can also be used to specify the nested dtype. If None, the dtype is inferred from the input object.

Returns:

A new NestedFrame with the joined nested column.

Return type:

NestedFrame

Examples

>>> import nested_pandas as npd
>>> nf = npd.NestedFrame({"a": [1, 2, 3], "b": [4, 5, 6]},
...            index=[0,1,2])
>>> nf2 = npd.NestedFrame({"c":[1,2,3,4,5,6,7,8,9]},
...             index=[0,0,0,1,1,1,2,2,2])
>>> # By default, aligns on the index
>>> nf.join_nested(nf2, "nested")
   a  b                nested
0  1  4  [{c: 1}; …] (3 rows)
1  2  5  [{c: 4}; …] (3 rows)
2  3  6  [{c: 7}; …] (3 rows)
>>> # We can also align on columns. The index is preserved.
>>> nf = npd.NestedFrame({"a": [1,2,2,3], "b": [4,4,5,6]}).set_index(["a", "b"])
>>> nf2 = npd.NestedFrame({"a": [1,2,2,2], "b": [4,4,4,5], "c": [1,2,3,4]})
>>> nf.join_nested(nf2, "nested", on=["a", "b"])
                    nested
a b
1 4              [{c: 1}]
2 4  [{c: 2}; …] (2 rows)
  5              [{c: 4}]
3 6                  None