Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to stabilize the snowpark odd errors #230

Closed
wants to merge 2 commits into from

Conversation

joshelser
Copy link
Contributor

  • Cache the Session returned by Session.builder.create()
  • Properly pin snowpark to use 1.5.1 (latest in INFORMATION_SCHEMA.PACKAGES)
  • Use the execute_select variant on Connection which calls to_pandas()

Aggressively clicking locally prior to this change, would result in an exception every ~5 page views. After this change, I've yet to get an exception on the page (one in the console, though). I'm not sure what about the non is_select branch is giving us grief though.

rel #35

I'm a little suspect of this, not knowing what exactly was cranky in that else branch. Food for thought for now. Curious if it helps devdeploy+local streamlit for y'all too.

vqmarkman
vqmarkman previously approved these changes Sep 1, 2023
Copy link
Contributor

@vqmarkman vqmarkman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we should merge this change just to see if exceptions go away.

* Cache the Session returned by Session.builder.create()
* Properly pin snowpark to use 1.5.1 (latest in INFORMATION_SCHEMA.PACKAGES)
* Use the `execute_select` variant on Connection which calls
  `to_pandas()`

Aggressively clicking locally prior to this change, would result in an
exception every ~5 page views. After this change, I've yet to get an
exception on the page (one in the console, though). I'm not sure what
about the non `is_select` branch is giving us grief though.

rel sundeck-io#35
@rymurr
Copy link
Contributor

rymurr commented Sep 4, 2023

Is this fixed because of hte execute_select or the session cache? I don't feel great about this either. Its almost like changing the execute type takes longer (translation to pandas) which means we don't hit the 'race'

@joshelser
Copy link
Contributor Author

joshelser commented Sep 4, 2023

I don't feel great about this either. Its almost like changing the execute type takes longer (translation to pandas) which means we don't hit the 'race'

Yeeeep. After posting this PR, I did have one of the same errors which I hadn't seen which does make it sound like it's just making it less likely to hit this problem, not fully addressing it.

If I trust the documentation and what I can see local streamlit doing, the root of the problem is the Session non-thread-safeness e.g.

  • Visit /
  • Streamlit starts running queryA
  • Visit '/Settings`
  • Streamlit starts running queryB
  • queryA finishes
  • Streamlit doesn't get results for queryB

We could add a mutex around the snowpark Session to serialize concurrent queries, but that would be a big refactor on a hunch. I spent some time digging through streamlit forums/docs and snowpark-python issues this weekend, but couldn't find anything which matches our symptoms.

@rymurr
Copy link
Contributor

rymurr commented Sep 5, 2023

Does this happen in the Snowflake app? Or is this only happening when running streamlit locally?

I feel like:

  1. if this is local only I am not so bothered by it (eg we have bigger fish to fry)
  2. The ultimate cause of this is storing the session in a static variable (https://github.com/sundeck-io/OpsCenter/blob/main/app/ui/connection.py#L28) and using it direclty. I think something like https://github.com/streamlit/streamlit/blob/4f45c18a4323a796440d651ba77b5eb29409cb2b/lib/streamlit/connections/snowpark_connection.py#L208 is better. Ideally we would switch most of our code to
with connection.get() as session:
   do stuff

Then we are guaranteed to keep our sessions from crossing.

@joshelser
Copy link
Contributor Author

When we first released the app, I feel like we also saw this running as an app in Snowflake, but I don't trust my memory. Agree that isolating the session is probably the correct fix to the problem. Not sure if that would also bring connection latency with it inside Snowflake where we use get_active_session(), i.e. if we close that session, do we force a new session to be made by the snowpark api?

@rymurr
Copy link
Contributor

rymurr commented Sep 5, 2023

i.e. if we close that session, do we force a new session to be made by the snowpark api?

I think the with lock and context manager paradigm in the link above will help this a lot, we won't have to close and open the session just hide it behind a context manager

Not sure if that would also bring connection latency with it inside Snowflake where we use get_active_session()

I agree, we will likely need subtley different handling for local vs snowflake. Howevr switching to the new connection api in streamlit is prob the way forward. We may be able to completely deprecate our connection handling (https://docs.streamlit.io/knowledge-base/tutorials/databases/snowflake#using-a-snowpark-session)

@joshelser
Copy link
Contributor Author

Talked with Ryan on this one, too. There's one more hunch about how we can try to update Cypress tests to avoid changing "the product" for tests.

If that doesn't work, we can try out https://docs.streamlit.io/library/api-reference/connections/st.experimental_connection. Either way, closing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants