[Question: Repo separation btw ETL and ML apps]
Hello, team!
I have some question regarding best practices. I am developing a relatively classic ML solution which reads data from S3, runs ETL, and then trains and serves multiple models. Each model has a different preprocessing pipeline while the ETL contains model-independent logic. I plan to use Kedro with Kedro-MLflow plugin. I think, the application architecture suggested works great for me but I have doubts about separation of concerns. My main concern is about keeping ETL and ML applications together in one repository. Here are some thoughts and inputs which I think will be useful for the decision:
hi @Oleg Litvinov, Kedro doesn't have a best practice as such—it really depends on your team's workflows and requirements. Kedro is modular in nature and supports both integrated and decoupled approaches. Given your setup with distinct infrastructure requirements, separating them makes sense.
hi @Rashida Kanchwala! Thank you very much for sharing your thoughts. I appreciate it!