Jannik Wiedenhaupt

Solved

Using Pandas with Bigquery and Parallel Runners

Hey team, is there a way to use pandas bigquer with parallel runners or is the answer to use ibis again?

10 comments

Create Empty Bigquery Tables In Kedro

Hey team, thank you so for answering my previous questions. Another question on setting up a dataset correctly in the beginning.:

I have table schemas defined in my kedro catalog for bigquery tables. I would like to make sure that I create empty versions of these tables at the beginning of my kedro pipeline based on these schemas.

How can I do this cleanly in kedro?

4 comments

JJannik Wiedenhaupt

Solved

Validating External Dataset Accessibility in BigQuery

Hey team, what is a good way of checking whether all the input tables for the nodes that I want to run, are accessible. I am having issues with permissions in BigQuery and testing is cumbersome. Is there a way to run a validation of all external datasets in the catalog?

I was thinking of adding a hook and a metadata tag that identifies the datasets as external.

My main concerns are

how do I handle different dataset types
how do I only ping each table (or load just the first row) instead of loading it in full for speed reasons

5 comments

JJannik Wiedenhaupt

Pass Current Date To Kedro Pipelines

Hi folks, I would like to pass the current date to my kedro pipelines at multiple steps. What is the best way to do this?

5 comments

JJannik Wiedenhaupt

Can async functions be passed to nodes

Hey team, is there any way to pass async functions to nodes?

11 comments

JJannik Wiedenhaupt

Caching Results to Avoid Expensive Operations

When you have an expensive operation, is there a good way of loading from an existing dataset? I am trying to check if a certain ID already existst and only perform the functionality of a node when it is new. If it is new, I then add those new entries to the saved dataset so that next time, I don't recalculate it. Effectively caching results.

2 comments

JJannik Wiedenhaupt

Defining a Default BigQuery Dataset Project-Wide

When using BigQuery Datasets how do you define a default dataset project wide?

12 comments

JJannik Wiedenhaupt

Kedro viz only shows one node from multiple pipelines

Does anybody know why kedro viz might only show 1 node? I have 3 pipelines but only one node from one of the pipelines is shown.

All my pipelines are summed into one default in the registry

3 comments

JJannik Wiedenhaupt

Csv column dtypes not being set correctly

Hey everyone, I am trying to define the column dtypes of a CSV dataset because some columns contain IDs that Kedro interprets as floats, but should be interpreted as strings instead. Setting

load_args:
  dtype:
    user_id: str

save_args:
  dtype:
    user_id: str

does not seem to work for me. Appreciate your help!

9 comments

Join the Kedro community

Using Pandas with Bigquery and Parallel Runners

Create Empty Bigquery Tables In Kedro

Validating External Dataset Accessibility in BigQuery

Pass Current Date To Kedro Pipelines

Can async functions be passed to nodes

Caching Results to Avoid Expensive Operations

Defining a Default BigQuery Dataset Project-Wide

Kedro viz only shows one node from multiple pipelines

Csv column dtypes not being set correctly