Thiago José Moser Poletto

Vertex AI Pipelines and MLOps

Guys, I would like to know if any of you guys work with vertex AI pipelines and how you guys handle MLOPs...

Attribute Error: 'CustomDataCatalog' Object Has No Attribute '_data_sets'

hey guys, does anyone had this issue before?

AttributeError: 'CustomDataCatalog' object has no attribute '_data_sets'

11 comments

TThiago José Moser Poletto

Enforcing Schema Data Type on Catalog

Hey Guys, quick question:

Is there a way to enforce on the catalog the schema data type?

Like:

cars:
  type: pandas.CSVDataset
  filepath: data/01_raw/company/cars.csv
  load_args:
    sep: ','
  save_args:
    index: False
    date_format: '%Y-%m-%d %H:%M'
    decimal: .
    schema:

11 comments

TThiago José Moser Poletto

Partitioning issues with PartitionedDataset

hey guys I'm having some issues when applying partitions.PartitionedDataset, I manage to create multiple files but when accessing them on a .ipynb to check each partition, thats my problem, and I would like to make sure they are Ok in order to open one by one by iterating over them on the next pipeline, can someone help me with that?

my_partitioned_dataset:
  type: partitions.PartitionedDataset
  path: data/02_intermediate  # path to the location of partitions
  dataset: pandas.CSVDataset

14 comments

TThiago José Moser Poletto

Handling Large Databases with Partial Node Processing

Guys, are there any built-in solution to handle large databases, so that the nodes run them partially, like, lets say, a 100k rows will be running in batches of 10k each. Instead of doing by hand with for loop or something like it...

8 comments

TThiago José Moser Poletto

Simpler Way To Use A Run Identifier On The Path Into The Catalog

Guys, I would like to check with you if theres a simpler way to use a run_identifier on the path into the catalog:

I'm loading a base from BigQuery and spliting each row to run in another pipeline, where I load and save dynamically the inputs/outputs.

I would like to get a value from a column and use as run_identifier in the path on catalog:

filepath: ${root_folder}/${current_datetime}/${run_identifier}/data/model/{placeholder:name}.pt

is there a way known to do something like that? I open to suggestions...

51 comments

TThiago José Moser Poletto

Kedro vertex ai plugin installation issue with version constraints

Guys I do have a problem I'm trying to use the kedro vertex ai plugin, but every time I try to do install, it also update kedro to 0.19.9, but the versions limit is "kedro>=0.18.1,<0.19.0". Any suggestions on how to workaround?

16 comments

TThiago José Moser Poletto

Seeking Help with Kedro Vertex AI Plugin Async Node Runs

Hey guys I would like to know if theres anyone that have tested the Kedro Vertex AI Plugin, on its latest version. I'm having some issues with async node runs, for some reason it is taking a lot longer than when run locally. It might be because I'm allocanting a GPU to parto of the process, but it shouldn't, in my perspective, so if anyone have any ideas or suggestions, I'll appreciate that...

12 comments

Join the Kedro community

Vertex AI Pipelines and MLOps

Attribute Error: 'CustomDataCatalog' Object Has No Attribute '_data_sets'

Enforcing Schema Data Type on Catalog

Partitioning issues with PartitionedDataset

Handling Large Databases with Partial Node Processing

Simpler Way To Use A Run Identifier On The Path Into The Catalog

Kedro vertex ai plugin installation issue with version constraints

Seeking Help with Kedro Vertex AI Plugin Async Node Runs