Join the Kedro community

Home
Members
Jannik Wiedenhaupt
J
Jannik Wiedenhaupt
Offline, last seen 3 weeks ago
Joined October 9, 2024

Hey team, is there a way to use pandas bigquer with parallel runners or is the answer to use ibis again?

10 comments
D
J

Hey team, thank you so for answering my previous questions. Another question on setting up a dataset correctly in the beginning.:

I have table schemas defined in my kedro catalog for bigquery tables. I would like to make sure that I create empty versions of these tables at the beginning of my kedro pipeline based on these schemas.

How can I do this cleanly in kedro?

4 comments
J
P
D

Hey team, what is a good way of checking whether all the input tables for the nodes that I want to run, are accessible. I am having issues with permissions in BigQuery and testing is cumbersome. Is there a way to run a validation of all external datasets in the catalog?

I was thinking of adding a hook and a metadata tag that identifies the datasets as external.

My main concerns are

  1. how do I handle different dataset types
  2. how do I only ping each table (or load just the first row) instead of loading it in full for speed reasons

5 comments
J
S
D

Hi folks, I would like to pass the current date to my kedro pipelines at multiple steps. What is the best way to do this?

5 comments
m
J
R
M

Hey team, is there any way to pass async functions to nodes?

11 comments
d
A
J
O

When you have an expensive operation, is there a good way of loading from an existing dataset? I am trying to check if a certain ID already existst and only perform the functionality of a node when it is new. If it is new, I then add those new entries to the saved dataset so that next time, I don't recalculate it. Effectively caching results.

2 comments
J
D

When using BigQuery Datasets how do you define a default dataset project wide?

12 comments
d
L
J

Does anybody know why kedro viz might only show 1 node? I have 3 pipelines but only one node from one of the pipelines is shown.

All my pipelines are summed into one default in the registry

3 comments
R
J

Hey everyone, I am trying to define the column dtypes of a CSV dataset because some columns contain IDs that Kedro interprets as floats, but should be interpreted as strings instead. Setting

load_args:
  dtype:
    user_id: str

save_args:
  dtype:
    user_id: str

does not seem to work for me. Appreciate your help!

9 comments
R
J
N
D