Spark dataframes lead to continous Spark queries #221

sparekh-bbg · 2021-08-18T14:46:30Z

Using this extension in a notebook that also uses pySpark leads to continuous Spark queries. By default, Spark is a lazy-evaluation system, and only runs queries when there is an output operation on a dataframe. With the extension loaded, however, there are Spark queries running continuously.

My guess is the extension appears to try to continuously convert/show any variables that are Spark dataframes. This is a special problem with larger dataframes, as it keeps the Spark instance continuously busy.

Using Jupyter Lab version 3 and lckr-jupyterlab-variableinspector 3.0.9

One mitigation may be to have a setting/option to skip variables that point to Spark DFs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark dataframes lead to continous Spark queries #221

Spark dataframes lead to continous Spark queries #221

sparekh-bbg commented Aug 18, 2021

Spark dataframes lead to continous Spark queries #221

Spark dataframes lead to continous Spark queries #221

Comments

sparekh-bbg commented Aug 18, 2021