generate_parquet_file

generate_parquet_file#

generate_parquet_file(n_base, n_layer, path, seed=None)[source]#

Generates a toy dataset and outputs it as a parquet file.

Parameters:
  • n_base (int) – The number of rows to generate for the base layer

  • n_layer (int, or dict) – The number of rows per n_base row to generate for a nested layer. Alternatively, a dictionary of layer label, layer_size pairs may be specified to created multiple nested columns with custom sizing.

  • path (str,) – The path to the parquet file to write.

  • seed (int, default=None) – A seed to use for random generation of data

Return type:

None