Hi everyone! I'm having trouble using tensorflow.TensorFlowModelDataset
with an S3 bucket. The model saves fine locally, but when I configure it to save/load directly from S3, it doesn't work.
Some key points:
boto3
or another script, I can access it in S3 just fine..h5
models – Initially, I could retrieve .h5
files from S3 but loading was not working properly, so I switched to the .keras
format, which works fine when handling files manually.tensorflow.TensorFlowModelDataset
with S3? Is there a recommended workaround or configuration to get it working? Any insights would be much appreciated!Hi @João Dias, can you please clarify whether this is happening only on save and whether you can load the model successfully?I am only having problems when the node output - the model - is pointed to s3.
@Elena Khaustova I will get back to you asap, because I have faced other issues and now lost track of myself.
For now, the problem was in both saving and loading from s3. I am beginning to suspect it is my keras version
When saving/loading TensorFlowModelDataset
first writes the model to a local temporary directory.
def save(self, data: tf.keras.Model) -> None: save_path = get_filepath_str(self._get_save_path(), self._protocol) with tempfile.TemporaryDirectory(prefix=self._tmp_prefix) as tempdir: if self._is_h5: path = str(PurePath(tempdir) / TEMPORARY_H5_FILE) # noqa: PLW2901 else: # We assume .keras path = str(PurePath(tempdir) / TEMPORARY_KERAS_FILE) # noqa: PLW2901 tf.keras.models.save_model(data, path, **self._save_args) # Use fsspec to take from local tempfile directory/file and # put in ArbitraryFileSystem self._fs.put(path, save_path)
Can you doublecheck that save_path
is what you expect it to be and that local copy of model is created on save?
I'm back with more info.
I'm using Kedro to save a TensorFlow model as .keras
directly to S3, but I'm getting an "Access Denied" error.
.h5
and .keras
locally as node outputs works fine, and the file
command confirms they are correctly saved..pkl
scaler in the same directory where I want to store the models.catalog.yml
to save .keras
directly to S3, Kedro throws "Access Denied," even though .h5
and other datasets uploaded fine through a script and also other datasets were saved in the same bucket during preprocessing.lstm_model: type: tensorflow.TensorFlowModelDataset filepath: <a target="_blank" rel="noopener noreferrer" href="s3://my-bucket/data/06_models/lstm_model.keras">s3://my-bucket/data/06_models/lstm_model.keras</a> credentials: dev_s3
This looks like a bug and I suspect it relates to temporal directories created (see the message above). I can’t test it with S3 right now but if you try the suggestion above - it might help to understand the reason of the problem.
Otherwise, feel free to open an issue so we investigate it.
Thank you for the support! I will open an issue and I am happy to contribute as much as I can to resolve this