Kedro dataset factories

Question

Good morning, we have a question about Kedro dataset factories, we'd be hoping you'd be able to help. I will put the details in the thread to keep this channel tidy 🙂

Jacques Vergine · Answer

We have a custom dataset defined as class MyDataset(SparkDataset):

def __init__(  # noqa: PLR0913
        self,
        *,
        filepath: str,
        table: str
    ):
        ... We are then trying to use it in our catalog, but this entry was failing integration.int.{source}.data1:
  type: MyDataset
  filepath: ${globals:integration_source_path}/int/{source}/data1
  table: {source}_data1 with the following error pointing to the  table: {source}_data1  line: An error has occurred: Invalid YAML or JSON file .../catalog.yml, unable to read line 20, position 17.
                    ERROR    An error has occurred: Invalid YAML or   ....py:212
                             JSON file                                          
                             .../catalog.yml,           
                             unable to read line 20, position 17. We managed to solve it by putting  {source}  at the end of the table name, like this: integration.int.{source}.data1:
  type: MyDataset
  filepath: ${globals:integration_source_path}/int/{source}/data1
  table: data1_{source} Is this an expected behaviour, or should we raise it as an issue?

Jitendra Gundaniya · Answer

Hi Jacques,
YAML gets confused because it sees the leading { and tries (and fails) to parse it as a mapping. So table: data1_{source} or table: "{source}_data1" should work. and I think no need to raise an issue.

Jacques Vergine · Answer

Thanks a lot, I'll try with the double quotes to see if it works!

Jacques Vergine · Answer

it worked, thanks again 🙂

Join the Kedro community

Kedro dataset factories