Jacob Pieniazek

Hello! I am running into issues with Kedro 0.19.11 release while running pipelines in databricks. Specifically, I am running into an error where an imported python module for a node is unable to find active SparkSession via SparkSession.getActiveSession() (see first image). Our pipeline is comprised entirely of Ibis.TableDataset datasets & I/O with pyspark backend. What is throwing me is that other nodes use the pyspark connection and are able to perform operations properly across the spark session, but fails on this single node when leveraging an imported module that it is unable to find the spark session. This issue is not present in Kedro 0.19.10. My best guess is that it has something to do with the updated code in kedro/runner/sequential_runner.py using ThreadPoolExecutor and possible scoping issues? Apologies on the somewhat scattered explanation, there is quite a bit I don't fully understand here, so appreciate any help or guidance. Lmk if I can provide any additional info as well.

Join the Kedro community

Issues With Kedro 0.19.11 Release While Running Pipelines In Databricks