Subject: How to Initialize a Delta Table That Doesn't Exist Using Kedro?
Hello everyone,
I’m facing an issue related to the Delta library. When I attempt to read a Delta table using spark.DeltaTableDataset
, I receive a message stating that the table does not exist. This is expected since the table hasn't been created yet. However, my goal is to initialize the table with data that I will subsequently provide.
Unfortunately, the DeltaTableDataset
does not support write operations. Does anyone know how to handle the initialization of a Delta table in this scenario?
Currently, I am working on a custom hook using the @hook_impl
decorator:
@hook_impl def before_dataset_loaded(self, dataset_name: str, node: Node) -> None: # My logic to initialize the Delta tableThe idea is to initialize the Delta table (if it doesn’t already exist) using PySpark within this hook. However, I am struggling to dynamically retrieve the schema of the table for its creation.
hey @Mohamed El Guendouz, our @juanlu had tried this hack DeltaTable.is_deltatable()
before when working with the delta table.
https://github.com/delta-io/delta-rs/pull/2715
Thank you @Huong Nguyen ! 🙂
Ultimately, I created a custom Dataset to give it a specific schema.