Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New two phase transaction in thin mode has different default timeout than thick mode #376

Open
CaselIT opened this issue Aug 8, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@CaselIT
Copy link

CaselIT commented Aug 8, 2024

Hi,

With the 2.3 release I've added support for oracledb two phase commit in sqlalchemy. sqlalchemy/sqlalchemy@a9c0487
In the ci we noticed that a test was unreliable and decided to skip it in the ci. The test is this one https://github.com/sqlalchemy/sqlalchemy/blob/6cf5e2a188fc5e337d22a098a5fe9a9fe10cc7e7/test/engine/test_transaction.py#L476-L527 when commit=True. The same code using only oracledb is reported below.

Initially the idea was to ask guidance since it seemed there was some kind of timeout, but trying the standalone example with different timeout and with and without the oracle client, seem to have surfaced a bug in the thin mode implementation.

  1. What versions are you using?

2.3.0

  1. Is it an error or a hang or a crash?

error

  1. What error(s) or behavior you are seeing?
oracledb.exceptions.DatabaseError: ORA-24756: transaction with identifier gti: 0x000c371f.666F6F bid: 0x626172
 does not exist
  1. Does your application call init_oracle_client()?

no, calling it fixes it

  1. Include a runnable Python script that shows the problem.
import time
import oracledb
import random
import sys
import platform

print("platform.platform:", platform.platform())
print("sys.maxsize > 2**32:", sys.maxsize > 2**32)
print("platform.python_version:", platform.python_version())
print("oracledb.__version__:", oracledb.__version__)

# oracledb.init_oracle_client() # calling this fixes it
url = dict(user="scott", password="tiger", host="127.0.0.1", port=1521, service_name="FREEPDB1")

def setup():
    with oracledb.connect(**url) as connection, connection.cursor() as c:
        c.execute("CREATE TABLE tbl(id integer, name varchar2(42))")
        connection.commit()

def teardown():
    with oracledb.connect(**url) as connection, connection.cursor() as c:
        c.execute("drop table tbl")
        connection.commit()
        for x in connection.tpc_recover():
            try:
                connection.tpc_forget(x)
            except:
                pass

def using_oracledb():
    connection = oracledb.connect(**url)
    xid_tuple = (base + i, b"foo", b"bar")
    xid = connection.xid(*xid_tuple)
    connection.tpc_begin(xid)
    with connection.cursor() as c:
        c.execute(
            "insert into tbl(id, name) values (:id, :name)",
            dict(id=1, name="user1"),
        )
    assert connection.tpc_prepare()
    connection.close()

    with oracledb.connect(**url) as connection2, connection2.cursor() as c:
        assert c.execute("select id from tbl").fetchall() == []

    with oracledb.connect(**url) as connection3:
        recoverables = connection3.tpc_recover()
        assert xid in recoverables
        # time.sleep(2) # adding a timeout makes it error in very few tries (1 or 2)
        connection3.tpc_commit(xid)

        with connection3.cursor() as c:
            assert c.execute("select id from tbl").fetchall() == [(1,)]
        print("ok", xid)

    with oracledb.connect(**url) as connection4, connection4.cursor() as c:
        c.execute("delete from tbl")
        connection4.commit()

# teardown()
setup()
try:
    base = random.randint(0, 1_000_000)
    i = 0
    while True:
        print("try", i + 1, '\n')
        using_oracledb()
        i += 1
        if i % 20 == 0 and input(f"Run {i}. Exit? (y)") == "y":
            break
finally:
    teardown()

The above script errors usually after 10 to 30 runs:

platform.platform: Windows-11-10.0.22631-SP0
sys.maxsize > 2**32: True
platform.python_version: 3.12.4
oracledb.__version__: 2.3.0
try 1

ok Xid(format_id=375793, global_transaction_id=b'foo', branch_qualifier=b'bar')
try 2

ok Xid(format_id=375794, global_transaction_id=b'foo', branch_qualifier=b'bar')
...
try 15

Traceback (most recent call last):
  File "path\to\test4.py", line 77, in <module>
    using_oracledb()
  File "path\to\test4.py", line 59, in using_oracledb
    connection3.tpc_commit(xid)
  File "path\to\oracledb\connection.py", line 1057, in tpc_commit
    self._impl.tpc_commit(xid, one_phase)
  File "src\\oracledb\\impl/thin/connection.pyx", line 470, in oracledb.thin_impl.ThinConnImpl.tpc_commit
  File "src\\oracledb\\impl/thin/protocol.pyx", line 434, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 435, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 428, in oracledb.thin_impl.Protocol._process_message
  File "src\\oracledb\\impl/thin/messages.pyx", line 72, in oracledb.thin_impl.Message._check_and_raise_exception
oracledb.exceptions.DatabaseError: ORA-24756: transaction with identifier gti: 0x0005bbff.666F6F bid: 0x626172
 does not exist
Help: https://docs.oracle.com/error-help/db/ora-24756/

uncommenting the timeout makes the fail even quicker:

platform.platform: Windows-11-10.0.22631-SP0
sys.maxsize > 2**32: True
platform.python_version: 3.12.4
oracledb.__version__: 2.3.0
try 1

Traceback (most recent call last):
  File "path\to\test4.py", line 77, in <module>
    using_oracledb()
  File "path\to\test4.py", line 59, in using_oracledb
    connection3.tpc_commit(xid)
  File "path\to\oracledb\connection.py", line 1057, in tpc_commit
    self._impl.tpc_commit(xid, one_phase)
  File "src\\oracledb\\impl/thin/connection.pyx", line 470, in oracledb.thin_impl.ThinConnImpl.tpc_commit
  File "src\\oracledb\\impl/thin/protocol.pyx", line 434, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 435, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 428, in oracledb.thin_impl.Protocol._process_message
  File "src\\oracledb\\impl/thin/messages.pyx", line 72, in oracledb.thin_impl.Message._check_and_raise_exception
oracledb.exceptions.DatabaseError: ORA-24756: transaction with identifier gti: 0x00064460.666F6F bid: 0x626172
 does not exist
Help: https://docs.oracle.com/error-help/db/ora-24756/

With or without timeout using the oracle client results in no error (run about 100 try without timeout and 60 with). I've tried only on windows, but @zzzeek can reproduce locally on linux (fedora I think) using a different oracle version and config

@CaselIT CaselIT added the bug Something isn't working label Aug 8, 2024
@anthony-tuininga
Copy link
Member

Ok, I looked into this and technically this isn't a bug. The problem is that if you call tpc_begin() without specifying a value for the timeout parameter, the default is 0 and that means you are telling the system that as soon as a transaction becomes inactive, the system can clean it up. So whether it actually gets around to it or not defines whether or not you get the error you are seeing. If you ensure that you set a value for the timeout parameter that covers the length of time that the transaction will be inactive, then the problem goes away completely. The fact that thick mode doesn't exhibit this behavior suggests to me that the default is not actually 0 but some other value. I'll ask internally and find out what it is and change thin mode (and the documentation) to whatever I discover. In the meantime, you can simply set the timeout parameter yourself and verify that the issue is resolved.

@CaselIT
Copy link
Author

CaselIT commented Aug 9, 2024

Thanks for checking. So a timeout was invited!

I failed to notice that timeout parameter in tpc_begin, but the behaviour seems consistent with it.

I'll retry the example with a timeout set, and see if it's settable using the sqlalchemy api

@CaselIT
Copy link
Author

CaselIT commented Aug 9, 2024

it does indeed seem the timeout fault. I think it would make sense from an usage point of view to have the same timeout as the thick mode, or at least document the difference.

see if it's settable using the sqlalchemy api

seems not. Will coordinate with mike about accepting kw-args there

@CaselIT CaselIT changed the title New two phase transaction in thin mode is unreliable New two phase transaction in thin mode has different default timeout than thick mode Aug 9, 2024
@anthony-tuininga
Copy link
Member

it does indeed seem the timeout fault. I think it would make sense from an usage point of view to have the same timeout as the thick mode, or at least document the difference.

I agree and intend to make them the same once I find out what thick mode does. Stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants