ISSUE: Deployment of Kedro Pipelines on GCP with Dataproc and Cloud Composer
Description
I am conducting a POC with Kedro on a GCP environment and need assistance deploying my Kedro project in a GCP-compatible format. The goal is to package the Kedro project for execution on Cloud Dataproc clusters.
The intended workflow is as follows:
Hello 🙂
I would like to know if, when generating the Airflow DAG for a Kedro project using the kedro-airflow
tool, is it possible to create a separate DAG for each pipeline in the project rather than a single DAG per project? If so, how can I configure each DAG to specify start times and other parameters for each DAG corresponding to each pipeline in the project?
Hello,
I would like to work with Delta Tables using PySpark in a GCS bucket, but I'm having trouble using spark.DeltaTableDataset
:
table_name: type: spark.DeltaTableDataset filepath: "<a target="_blank" rel="noopener noreferrer" href="gs://XXXX/poc-kedro/table_name/*.parquet">gs://XXXX/poc-kedro/table_name/*.parquet</a>"Could you tell me what might be wrong with this?
Hi everyone,
I’m a Data Engineer, and my team is working on multiple pipelines, each addressing different use cases (1 use case = 1 pipeline). We have both ingestion pipelines and export pipelines delivering data to various clients.
We’re considering grouping certain nodes into a common library to be shared across these pipelines. I wanted to ask if this is considered a good practice within the Kedro framework. If so, could you recommend an approach or best practices for implementing this?
Additionally, do you have any recommendations for structuring a Kedro project when working with multiple pipelines like this?
Thanks in advance for your help!
Best regards,
El Guendouz Mohamed