Join the Kedro community

Updated last week

Kedro dataset factories

Good morning, we have a question about Kedro dataset factories, we'd be hoping you'd be able to help. I will put the details in the thread to keep this channel tidy πŸ™‚

J
J
4 comments

We have a custom dataset defined as

class MyDataset(SparkDataset):

    def __init__(  # noqa: PLR0913
        self,
        *,
        filepath: str,
        table: str
    ):
        ...

We are then trying to use it in our catalog, but this entry was failing
integration.int.{source}.data1:
  type: MyDataset
  filepath: ${globals:integration_source_path}/int/{source}/data1
  table: {source}_data1

with the following error pointing to the table: {source}_data1 line:
An error has occurred: Invalid YAML or JSON file .../catalog.yml, unable to read line 20, position 17.
                    ERROR    An error has occurred: Invalid YAML or   ....py:212
                             JSON file                                          
                             .../catalog.yml,           
                             unable to read line 20, position 17.

We managed to solve it by putting {source} at the end of the table name, like this:
integration.int.{source}.data1:
  type: MyDataset
  filepath: ${globals:integration_source_path}/int/{source}/data1
  table: data1_{source}

Is this an expected behaviour, or should we raise it as an issue?

Hi Jacques,
YAML gets confused because it sees the leading { and tries (and fails) to parse it as a mapping. So table: data1_{source} or table: "{source}_data1" should work. and I think no need to raise an issue.

Thanks a lot, I'll try with the double quotes to see if it works!

it worked, thanks again πŸ™‚

Add a reply
Sign up and join the conversation on Slack