Hello Kedro Community,
I am working on a project where I need to store a Spark DataFrame in Delta format using Kedro. Specifically, I want to ensure that the data is stored in a specific way, as shown in the following function:
python
def export_results_to_delta(summary_df, output_path="/mnt/success5/Success5_results/metric_changes"): if DeltaTable.isDeltaTable(spark, output_path): DeltaTable.forPath(spark, output_path).alias("target").merge( summary_df.alias("source"), """target.reference_id = source.reference_id AND target.country = source.country AND target.provider_id = source.provider_id AND target.matching_run_id = source.matching_run_id""" ).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute() else: summary_df.write.format("delta").mode("overwrite").partitionBy( "country", "matching_run_id", "provider_id" ).save(output_path)Is it possible to create a catalog entry in Kedro that allows me to store the dataset in this manner? If so, could you please provide an example of how to configure the catalog entry?
Issue Summary
Confusion with Credential Configuration in Kedro 0.19 vs 0.18
Hello Kedro team,
I have encountered an issue regarding the configuration of credentials for accessing storage via abfss
in Kedro 0.19.3, which was not present in version 0.18. Here is a summary of the problem:
In Kedro 0.18, I configured the credentials for accessing storage through Spark configurations with Azure Service Principal, and everything worked fine. However, after upgrading to Kedro 0.19.3, the same setup stopped working. After spending a couple of days troubleshooting, I discovered that adding the credentials as environment variables resolved the issue.
My questions are: