Join the Kedro community

Updated last week

Setting Up An Internal Mlflow Server

Hi folks,
We have our own MLFlow server on internal S3.
Below are the setting I used locally:

os.environ["MLFLOW_TRACKING_URI"] = "<a target="_blank" rel="noopener noreferrer" href="https://xxx.com/mlflow/">https://xxx.com/mlflow/</a>"
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "<a target="_blank" rel="noopener noreferrer" href="http://s3xxx.com">http://s3xxx.com</a>"
os.environ["S3_BUCKET_PATH"] = "<a target="_blank" rel="noopener noreferrer" href="s3://xxx/mlflow">s3://xxx/mlflow</a>"
os.environ["AWS_ACCESS_KEY_ID"] = "xxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxx"
os.environ['MLFLOW_TRACKING_USERNAME'] = 'xxx'
os.environ['MLFLOW_TRACKING_PASSWORD'] = 'xxx'
os.environ["MLFLOW_TRACKING_SERVER_CERT_PATH"] = "C:\\xxx\\ca-bundle.crt"
EXPERIMENT_NAME = "ZeMC012"
In order to use in Kedro framework, I create a mlflow.yml file in conf/local folder and the content like this:
server: 
  mlflow_tracking_uri: <a target="_blank" rel="noopener noreferrer" href="https://xxx.com/mlflow/">https://xxx.com/mlflow/</a>
  MLFLOW_S3_ENDPOINT_URL: <a target="_blank" rel="noopener noreferrer" href="http://s3xxx.com">http://s3xxx.com</a>
  S3_BUCKET_PATH: <a target="_blank" rel="noopener noreferrer" href="s3://xxx/mlflow">s3://xxx/mlflow</a>
  AWS_ACCESS_KEY_ID: xxx
  AWS_SECRET_ACCESS_KEY: xxx
  MLFLOW_TRACKING_USERNAME: xxx
  MLFLOW_TRACKING_PASSWORD: xxx
  MLFLOW_EXPERIMENT_NAME: ZeMC012
  MLFLOW_TRACKING_SERVER_CERT_PATH: C:/xxx/ca-bundle.crt
But I got error ValidationError: 8 validation errors for KedroMlflowConfig
How should I modify it?

D
S
6 comments

Hi Shu-Chun,
Have you tried using the Kedro-MLflow plugin? Here's the link for more details: Kedro-MLflow Setup. It helps generate a correct mlflow.yml file, and as I understand, there should be multiple sections included.


After I used kedro mlflow init to generate mlflow.yml, I don't see the those parameters in the template:

MLFLOW_S3_ENDPOINT_URL: <a target="_blank" rel="noopener noreferrer" href="http://s3xxx.com">http://s3xxx.com</a>
S3_BUCKET_PATH: <a target="_blank" rel="noopener noreferrer" href="s3://xxx/mlflow">s3://xxx/mlflow</a>
MLFLOW_TRACKING_USERNAME: xxx
MLFLOW_TRACKING_PASSWORD: xxx
MLFLOW_TRACKING_SERVER_CERT_PATH: C:/xxx/ca-bundle.crt
Where and how should I put those parameters?
Since I still got error messages:
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)
MaxRetryError: HTTPSConnectionPool(host='xxx.com', port=443): Max retries exceeded with url: 
/mlflow/api/2.0/mlflow/experiments/get-by-name?experiment_name=ZeMC012 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]     
certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)')))

SSLError: HTTPSConnectionPool(host='xxx.com', port=443): Max retries exceeded with url:
/mlflow/api/2.0/mlflow/experiments/get-by-name?experiment_name=ZeMC012 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]     
certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)')))

MlflowException: API request to <a target="_blank" rel="noopener noreferrer" href="https://xxx.com/mlflow/api/2.0/mlflow/experiments/get-by-name">https://xxx.com/mlflow/api/2.0/mlflow/experiments/get-by-name</a> failed with exception
HTTPSConnectionPool(host='dad-rbg.icp.infineon.com', port=443): Max retries exceeded with url:
/mlflow/api/2.0/mlflow/experiments/get-by-name?experiment_name=ZeMC012 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]     
certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)')))

, you can add those settings manually under the tracking section. It seems the errors are occurring because the connection to the MLflow server wasn't properly established, likely due to a missing MLFLOW_TRACKING_SERVER_CERT_PATH.

what do you mean about tracking section? Which file could I add MLFLOW_TRACKING_SERVER_CERT_PATH ?

It looks like you should try to split them into two groups. Some variables, like MLFLOW_S3_ENDPOINT_URL, S3_BUCKET_PATH, and MLFLOW_TRACKING_SERVER_CERT_PATH, should remain as OS environment variables, as they were originally. The credentials for MLflow tracking (username and password) should be specified in mlflow.yml under the credentials section (as shown in the manual: Kedro Data Catalog - Dataset Access Credentials). Alternatively, you could try specifying them as environment variables as well.

But after I run kedro mlflow init
The mlflow.yml file is written:

# All credentials needed for mlflow must be stored in credentials .yml as a dict
# they will be exported as environment variable
# If you want to set some credentials,  e.g. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
# > in `credentials.yml`:
# your_mlflow_credentials:
#   AWS_ACCESS_KEY_ID: 132456
#   AWS_SECRET_ACCESS_KEY: 132456
# > in this file `mlflow.yml`:
# credentials: mlflow_credentials
Here mixes up AWS credential and mlflow credentail, which is not clear for me. Do I need both?
Currently, in mlflow.yml, I have:
server:
  mlflow_tracking_uri: <a target="_blank" rel="noopener noreferrer" href="https://xxx.com/mlflow/">https://xxx.com/mlflow/</a> 
  mlflow_registry_uri: null 
  credentials: mlflow_credentials  
  request_header_provider:
    type: null 
    pass_context: False 
    init_kwargs: {}
And in credentials.yml, I have:
mlflow_credentials:
   MLFLOW_TRACKING_USERNAME: xxx
   MLFLOW_TRACKING_PASSWORD: xxx
Both mlflow.yml and credentials.yml are in conf/local folder.
Even I have s3 credential in credentials.yml. But it's not read anywhere.
On the other hand, I still don't know how to read my certificate file.
MLFLOW_TRACKING_SERVER_CERT_PATH: C:/xxx/ca-bundle.crt

Add a reply
Sign up and join the conversation on Slack