-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection establishment errors introduced with 2.3.12
#149
Comments
Ok, so I have been trying to get to the bottom here. What is failing is
the SSL connection fails, and it seems to be the case (at least from
what I can see) that it fails when a connection is reused via the pool
from one thread to another. The driver manager is fine with that, but
the driver (both Easysoft and Microsoft) both seem to fail in the same
way when the SSL connection is passed between threads (seems to fail
with a invalid MAC in the TLS data).
Its not perfect, but I have altered the pooling code to not not pass
connections between processes. After doing that it seems to be working
fine with the test case. Its a shame that we don't seem to be able to do
this safely, but its the way it seems to be.
The change is back in git and in unixODBC-2.3.13pre.tar.gz on the ftp
site. Give that a go and see if it works for you.
if the MS folk have any further info about SSL and the driver please let
me know.
|
On 19/09/2023 15:49, Nick Gorham wrote:
Ok, so I have been trying to get to the bottom here. What is failing
is the SSL connection fails, and it seems to be the case (at least
from what I can see) that it fails when a connection is reused via the
pool from one thread to another. The driver manager is fine with that,
but the driver (both Easysoft and Microsoft) both seem to fail in the
same way when the SSL connection is passed between threads (seems to
fail with a invalid MAC in the TLS data).
Its not perfect, but I have altered the pooling code to not not pass
connections between processes. After doing that it seems to be working
fine with the test case. Its a shame that we don't seem to be able to
do this safely, but its the way it seems to be.
The change is back in git and in unixODBC-2.3.13pre.tar.gz on the ftp
site. Give that a go and see if it works for you.
if the MS folk have any further info about SSL and the driver please
let me know.
Ok, think I now know whats going on. Seems that Python makes use of
magic memory, I can see that the connections in the pool seem to live in
different memory spaces, and because of this when the child end and the
main process takes over (in the test application) the memory pool is
rewound. This is leaking connections that were in the child pool, and
also it can mean that a connection that was created in the main thread,
may have been used in the child thread, and then when time is rewound
(that's what it looks like to the process), that connection that was
used is now back in the state of being unused again.
The driver can get away with that, as there were nothing in process, so
it just carries along talking to the server on the same socket. However
if SSL is in use, the replay detection in TLS kicks in as the server at
the other end spots the fact that the client has gone backwards in time.
Then when the server reports that back, the child sees that the server
is also out of sequence. TLS shuts down the connection at this point and
it fails as we have seen.
Bottom line is, I don't think the driver manager pooling is doing
anything wrong here, its just not going to play with what python is
doing. It worked in previous versions as the pooling was not sharing the
connections as it is now, so in effect, it was working by luck.
|
Thanks @lurcher for tackling this issue so fast! To me, it looks as if bb7ed6e already solved a fair chunk of the TCP issues if was encountering. However, there are still a couple of |
Are you actually forking the process and trying to reuse connections that way? Yes, as @lurcher says, that will cause a bunch of problems. However, multiple threads within the same process taking their turn with a single connection handle (as long as it's not concurrent) shouldn't be a problem. |
On 20/09/2023 15:42, v-chojas wrote:
Are you actually forking the process and trying to reuse connections
that way? Yes, as @lurcher <https://github.com/lurcher> says, that
will cause a bunch of problems. However, multiple threads within the
same process taking their turn with a single connection handle (as
long as it's not concurrent) shouldn't be a problem.
Message ID: ***@***.***>
I think that's the problem, behind the simple looking application, the
process is forking the child.
|
On 20/09/2023 15:55, Nick Gorham wrote:
On 20/09/2023 15:42, v-chojas wrote:
>
> Are you actually forking the process and trying to reuse connections
> that way? Yes, as @lurcher <https://github.com/lurcher> says, that
> will cause a bunch of problems. However, multiple threads within the
> same process taking their turn with a single connection handle (as
> long as it's not concurrent) shouldn't be a problem.
>
> Message ID: ***@***.***>
>
I think that's the problem, behind the simple looking application, the
process is forking the child.
I would have saved myself a lot of time if I had just realized it was
forking in the first place :-(.
|
Sorry for the confusion, yes Python multiprocessing Pools fork processes. As Python does not have 'true' multithreading due to the GIL (Global Interpreter Lock), multiple processes are needed to truly parallelize in Python.
I am forking the processes (or rather Python does), but I do not really intend to reuse the connections by doing that. As I am creating new sqlalchemy engines for every process, I thought that the connections wouldn't be reused. |
On 20/09/2023 16:03, Nick Gorham wrote:
On 20/09/2023 15:55, Nick Gorham wrote:
> On 20/09/2023 15:42, v-chojas wrote:
>>
>> Are you actually forking the process and trying to reuse connections
>> that way? Yes, as @lurcher <https://github.com/lurcher> says, that
>> will cause a bunch of problems. However, multiple threads within the
>> same process taking their turn with a single connection handle (as
>> long as it's not concurrent) shouldn't be a problem.
>>
>> Message ID: ***@***.***>
>>
> I think that's the problem, behind the simple looking application,
> the process is forking the child.
>
I would have saved myself a lot of time if I had just realized it was
forking in the first place :-(.
The problem is caused by a connection being opened and closed from the
main process. That then gets copied over the fork each time it creates a
child because it gets pushed back into the pool. This connection then
gets reused in the child each time. Simple to see the problem once you
realize whats going on.
|
@lurcher Thanks for the clarification. Did I understand correctly that currently there's no way of getting the pooling to work on Python and you therefore rolled back the changes that were made with I am asking as I am still facing |
On 26/09/2023 14:22, Robin Holzinger wrote:
@lurcher <https://github.com/lurcher> Thanks for the clarification.
Did I understand correctly that currently there's no way of getting
the pooling to work on Python and you therefore rolled back the
changes that were made with |2.3.12| in that regard?
I am asking as I am still facing |TCP 0x20 (ResourceClosedErrors)|
issues when running the most recent commit of unixobc's the main
branch (|0x9b31cd2|).
Pooling should (AFAIK) be fine in python now, as long as you only try
and pool within the same process with the same DSN. What your code was
doing was creating a pooled connection in the main code, then that
pooled connection was being used in the child processes, which was
breaking that connection especially when SSL was used, as it would see
the reversion and throw the error your seeing.
I have not been making changes to 2.3.12, that's released and fixed. All
the changes have been in 2.3.13pre. What I did at one point was add code
that stopped the pooling working across threads and processes, this
solved your problem as it effectively stopped the pooling. But I rolled
that back (in 2.3.13pre) because it would have affected users pooing in
threaded applications when there was no need for that.
What I suggested that I think will work for you is to use a different
DSN (same details, but because the name is different it won't get mixed
up in the pooling lists) in the main process and the clients to prevent
them getting reused in child processes.
Other than that, the way Python is doing its multiprocessing is not
compatible with connection pooing.
|
Change your multiprocessing start type to spawn. It's much slower but, it will not copy memory and you should avoid this issue: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods |
Are you doing some heavy computations in your script? While you are correct for CPU-bound tasks, Python multithreading is fine to workaround IO-wait, that is common when working with databases. |
I am using
SQLAlchemy
(2.0) to maintain connections to anMS SQL Server
instance inPython
. Under the hoodunixodbc
is used as a driver when the scripts are run underLinux
.With the release of
2.3.12
I noticed some multiprocessed (Pythonmultiprocessing
) applications failing with sqlalchemy/unixodbc errors that did not appear to be deterministic. After further debugging, I found out that this problem is also occurring in single-threaded applications that try to establish many connections in a short amount of time via SqlAlchemy having pooling enabled.After building
unixodbc
at different commits onmain
and integrating these builds into my Python environment I was able to narrow down the issue to Fix possible seg faults with SQLAPI and pooling. As this commit also has significant changes with regard to pooling it is very likely that this is indeed the root of my problems.The error messages that I’m facing look like the following:
Most of these errors have in common that they contain some
TCP Provider
error (mostly error code0x2…
).I also drafted a small example showing the problems.
Before diving deeper into the inner workings of
unixodbc
and the specifics of commit 2a73009 I wanted to ask if anyone has an idea where this could be coming from. @lurcher Any idea?In case anyone else is experiencing these issues, please let me know.
The text was updated successfully, but these errors were encountered: