Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

ReferenceCounter can become negative #447

Closed
ixje opened this issue Jan 31, 2022 · 18 comments
Closed

ReferenceCounter can become negative #447

ixje opened this issue Jan 31, 2022 · 18 comments

Comments

@ixje
Copy link
Contributor

ixje commented Jan 31, 2022

TestNet block 805344 has 1 transaction that near the end of its execution will have a ReferenceCounter with a negative count. I don't think this should be possible. I have not looked further into the implications, but per definition it sounds wrong.

To be more specific. The script ends with 3 RET instructions. After executing the first of those 3 RETs the reference count will be -1. If you'd add an instruction counter like so

                    ExecutionContext context = CurrentContext!;
                    instruction_counter++;
                    PreExecuteInstruction();
                    try
                    {
                        ExecuteInstruction();
                    }

then it will be -1 after instruction 237.

@ixje
Copy link
Contributor Author

ixje commented Jan 31, 2022

Yes correct

@erikzhang
Copy link
Member

@ixje Can you check if #459 fixed this issue?

@ixje
Copy link
Contributor Author

ixje commented May 9, 2022

@erikzhang I can't say because the new code throws an exception at
https://github.com/neo-project/neo-vm/pull/459/files#diff-16db3bdafec7d5064e400bb7ca4815ee7b674442a256858494ddbf890fa8eeb0R136 (line 136 of ReferenceCounter in the new code)
with the message
The given key 'Neo.VM.Types.ByteString' was not present in the dictionary. for instruction 237 (see first post)

The TX now fails with a FAULT, where it previously succeeded.

@ixje
Copy link
Contributor Author

ixje commented May 9, 2022

fwiw; the tarjin-2 branch also fails but at a different place.

System.Collections.Generic.KeyNotFoundException: The given key 'Neo.VM.Types.ByteString' was not present in the dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Neo.VM.ReferenceCounter.CheckZeroReferred() in /home/erik/Documents/code/neo-ref-ctr/neo-vm/src/neo-vm/ReferenceCounter.cs:line 127
   at Neo.VM.ExecutionEngine.PostExecuteInstruction() in /home/erik/Documents/code/neo-ref-ctr/neo-vm/src/neo-vm/ExecutionEngine.cs:line 1664
   at Neo.SmartContract.ApplicationEngine.PostExecuteInstruction() in /home/erik/Documents/code/neo-ref-ctr/neo/src/neo/SmartContract/ApplicationEngine.cs:line 518
   at Neo.VM.ExecutionEngine.ExecuteNext() in /home/erik/Documents/code/neo-ref-ctr/neo-vm/src/neo-vm/ExecutionEngine.cs:line 1463

@erikzhang
Copy link
Member

I don't know. I don't think this is theoretically possible. What version of node are you using?

@erikzhang
Copy link
Member

When this happens, a CompoundType that was not assigned a ReferenceCounter was pushed onto the stack. The exception occurs when it is popped off the stack and the vm tries to remove the reference to it. We need to find out where this CompoundType with no ReferenceCounter came from.

@erikzhang
Copy link
Member

@ixje Can you dump all the items in the EvaluationStack before the exception is thrown?

@ixje
Copy link
Contributor Author

ixje commented May 9, 2022 via email

@ixje
Copy link
Contributor Author

ixje commented May 9, 2022

patches.zip
Apply the above patches to neo and neo-vm. Then sync up to testnet block 805344, put a break point on line 516 of ApplicationEngine.cs (in the PostExecuteInstruction() override). Run and when it breaks step into base.PostExecuteInstruction() and next into CheckZeroReferred() function.

The above patches were made on neo commit 1e449185c0d05e9824c257ab4404f380106d15b2 and neo-vm commit 491d317 (of the tarjin-2 branch).

I don't think this is theoretically possible

I didn't think it would be possible either, but here is a screenshot running the TX with neo-vm commit 288bc88 (latest master as of writing)

image
You can tell the state is HALT by the branch chosen and the ReferenceCounter shows negative.

@ixje
Copy link
Contributor Author

ixje commented May 9, 2022

fwiw; branch tarjin-3 also fails with a key not found

@roman-khimov
Copy link
Contributor

NeoGo execution trace for this tx (opcode, parameter, reference counter before instruction processing):

PUSHDATA1 3f8667bb58f2c287e2370b8c1a538c01078503262ab23aeca2c46079ab854889 0
PUSHDATA1 4f66666572 1
PUSH2  2
PACK  3
PUSH1  3
PUSHDATA1 a9f1462d30fdb15a09497403e06ececdeac64cca 4
PUSHDATA1 2f570e495ae609a68484384e02299ee043511912 5
PUSH4  6
PACK  7
PUSH15  7
PUSHDATA1 7472616e73666572 8
PUSHDATA1 f563ea40bc283d4d0e05c48ea305b3f2a07340ef 9
SYSCALL 627d5b52 10
PUSH0  6
SYSCALL 1af77b67 7
INITSLOT 0003 3
NOP  3
LDARG2  3
PUSHNULL  4
EQUAL  5
JMPIFNOT_L 0c000000 4
NOP  3
JMP_L 3a010000 3
RET  3
INITSLOT 0003 5
NOP  5
LDARG2  5
PUSHNULL  6
EQUAL  7
JMPIFNOT_L 0c000000 6
NOP  5
LDARG2  5
PUSH0  6
PICKITEM  7
PUSHDATA1 4e656f4653466565 6
EQUAL  7
JMPIFNOT_L 13000000 6
NOP  5
LDARG2  5
PUSH0  6
PICKITEM  7
PUSHDATA1 4d696e74 6
EQUAL  7
JMPIFNOT_L 1a000000 6
NOP  5
LDARG2  5
PUSH0  6
PICKITEM  7
PUSHDATA1 4c697374 6
EQUAL  7
JMPIFNOT_L 20000000 6
NOP  5
LDARG2  5
PUSH0  6
PICKITEM  7
PUSHDATA1 5075726368617365 6
EQUAL  7
JMPIFNOT_L 15000000 6
NOP  5
LDARG2  5
PUSH0  6
PICKITEM  7
PUSHDATA1 4f66666572 6
EQUAL  7
JMPIFNOT_L 15000000 6
NOP  5
LDARG1  5
LDARG2  6
PUSH1  7
PICKITEM  8
CALL_L 13040000 7
INITSLOT 0002 7
NOP  7
LDARG0  7
CALL_L bbfaffff 8
INITSLOT 0101 8
NOP  8
SYSCALL 39536e3c 8
PUSHDATA1 f563ea40bc283d4d0e05c48ea305b3f2a07340ef 9
EQUAL  10
JMPIFNOT_L 2d000000 9
NOP  8
LDARG0  8
PUSH1  9
PACK  10
PUSH15  10
PUSHDATA1 676574436f6d6d697373696f6e 11
CALL_L dafcffff 12
CALL_L 33000000 12
PUSHDATA1 536b794d61726b6574436f6e7472616374 12
SYSCALL 9bf667ce 13
PUSH2  14
PACK  15
RET  15
PUSHDATA1 4d696e74436f6e7472616374 15
OVER  16
PUSH1  17
PICKITEM  18
SWAP  17
CAT  17
SWAP  16
PUSH0  16
PICKITEM  17
SYSCALL 925de831 14
DUP  13
ISNULL  14
JMPIF_L 0f000000 14
DUP  13
SIZE  14
PUSHINT8 14 14
JMPEQ_L 06000000 15
NOP  13
RET  13
SYSCALL 627d5b52 13
INITSSLOT 02 9
PUSHA a1f5ffff 9
PUSHA cef3ffff 10
PUSHA 9ef3ffff 11
PUSH3  12
PACK  13
STSFLD0  13
PUSHA 8ff5ffff 13
PUSHA bcf3ffff 14
PUSHNULL  15
PUSH3  16
PACK  17
STSFLD1  17
RET  17
INITSLOT 0001 17
CALL_L 54fdffff 17
PUSHDATA1 534343 17
SYSCALL 9bf667ce 18
PUSH2  19
PACK  20
RET  20
LDARG0  20
OVER  21
PUSH1  22
PICKITEM  23
SWAP  22
CAT  22
SWAP  21
PUSH0  21
PICKITEM  22
SYSCALL 925de831 19
DUP  18
ISNULL  19
JMPIFNOT 04 19
CONVERT 21 18
RET  18
STLOC0  9
LDLOC0  9
PUSH0  10
EQUAL  11
JMP_L 0f000000 10
RET  10
NOT  8
JMPIFNOT_L 23000000 8
NOP  7
CALL_L 66000000 7
PUSHDATA1 544f 7
SYSCALL 9bf667ce 8
PUSH2  9
PACK  10
RET  10
LDARG0  10
CALL_L f2f7ffff 11
SYSCALL 2d510830 11
RET  20
PUSH3  20
PICKITEM  21
CAT  12
CONVERT 28 11
LDARG1  11
PUSH2  12
PICK  13
PUSH1  13
PICKITEM  14
ROT  13
CAT  13
ROT  12
PUSH0  12
PICKITEM  13
SYSCALL e63f1884 10
CALL_L 53000000 7
PUSHDATA1 544348 7
SYSCALL 9bf667ce 8
PUSH2  9
PACK  10
RET  10
LDARG0  10
CALL_L d3f7ffff 11
SYSCALL 2d510830 11
RET  20
PUSH3  20
PICKITEM  21
CAT  12
CONVERT 28 11
SYSCALL 39536e3c 11
PUSH2  12
PICK  13
PUSH1  13
PICKITEM  14
ROT  13
CAT  13
ROT  12
PUSH0  12
PICKITEM  13
SYSCALL e63f1884 10
NEWARRAY0  7
DUP  8
CALL_L b4f7ffff 9
SYSCALL 2d510830 9
RET  18
PUSH3  18
PICKITEM  19
APPEND  10
DUP  9
LDARG0  10
APPEND  11
DUP  10
LDARG1  11
APPEND  12
DUP  11
SYSCALL 39536e3c 12
APPEND  13
PUSHDATA1 4f66666572 12
SYSCALL 95016f61 13
RET  7
NOP  5
JMP_L 7d000000 5
NOP  5
NOP  5
NOP  5
NOP  5
NOP  5
RET  5
RET  1
RET  1

@roman-khimov
Copy link
Contributor

Same thing with the stack contents (but slots are also actively used here, so stack alone probably isn't sufficient): https://gist.github.com/roman-khimov/31ab0daa131c98371d3b66754c209272

@erikzhang
Copy link
Member

erikzhang commented May 10, 2022

Not sure if neo-project/neo#2731 fixes this. But that fixed a similar bug.

@ixje
Copy link
Contributor Author

ixje commented May 10, 2022

Not sure if neo-project/neo#2731 fixes this. But that fixed a similar bug.

I applied those changes and it still fails.

@roman-khimov
Copy link
Contributor

The way I see it is:

SYSCALL 627d5b52 10
[
    {
        "type": "Array",
        "value": [
            {
                "type": "ByteString",
                "value": "L1cOSVrmCaaEhDhOAime4ENRGRI="
            },
            {
                "type": "ByteString",
                "value": "qfFGLTD9sVoJSXQD4G7OzerGTMo="
            },
            {
                "type": "Integer",
                "value": "1"
            },
            {
                "type": "Array",
                "value": [
                    {
                        "type": "ByteString",
                        "value": "T2ZmZXI="
                    },
                    {
                        "type": "ByteString",
                        "value": "P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="
                    }
                ]
            }
        ]
    },
    {
        "type": "Integer",
        "value": "15"
    },
    {
        "type": "ByteString",
        "value": "dHJhbnNmZXI="
    },
    {
        "type": "ByteString",
        "value": "9WPqQLwoPU0OBcSOowWz8qBzQO8="
    }
]

Native NEO is being called to transfer 1 NEO from A to B with some data that is a byte array with two elements inside. B is some contract. The way it's executed is

SYSCALL 627d5b52 10
rm ref 10 {"type":"ByteString","value":"9WPqQLwoPU0OBcSOowWz8qBzQO8="}
rm ref 9 {"type":"ByteString","value":"dHJhbnNmZXI="}
rm ref 8 {"type":"Integer","value":"15"}
rm ref 7 {"type":"Array","value":[{"type":"ByteString","value":"L1cOSVrmCaaEhDhOAime4ENRGRI="},{"type":"ByteString","value":"qfFGLTD9sVoJSXQD4G7OzerGTMo="},{"type":"Integer","value":"1"},{"type":"Array","value":[{"type":"ByteString","value":"T2ZmZXI="},{"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}]}]}
rm ref 6 {"type":"ByteString","value":"L1cOSVrmCaaEhDhOAime4ENRGRI="}
rm ref 5 {"type":"ByteString","value":"qfFGLTD9sVoJSXQD4G7OzerGTMo="}
rm ref 4 {"type":"Integer","value":"1"}
rm ref 3 {"type":"Array","value":[{"type":"ByteString","value":"T2ZmZXI="},{"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}]}
rm ref 2 {"type":"ByteString","value":"T2ZmZXI="}
rm ref 1 {"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}
add ref 0 {"type":"Array","value":[{"type":"ByteString","value":"T2ZmZXI="},{"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}]}
add ref 1 {"type":"ByteString","value":"T2ZmZXI="}
add ref 2 {"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}
add ref 3 {"type":"Integer","value":"1"}
add ref 4 {"type":"ByteString","value":"qfFGLTD9sVoJSXQD4G7OzerGTMo="}
add ref 5 {"type":"ByteString","value":"L1cOSVrmCaaEhDhOAime4ENRGRI="}
PUSH0  6
add ref 6 {"type":"Integer","value":"0"}
SYSCALL 1af77b67 7
rm ref 7 {"type":"Integer","value":"0"}
rm ref 6 {"type":"ByteString","value":"L1cOSVrmCaaEhDhOAime4ENRGRI="}
rm ref 5 {"type":"ByteString","value":"qfFGLTD9sVoJSXQD4G7OzerGTMo="}
rm ref 4 {"type":"Integer","value":"1"}
rm ref 3 {"type":"Array","value":[{"type":"ByteString","value":"T2ZmZXI="},{"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}]}
rm ref 2 {"type":"ByteString","value":"T2ZmZXI="}
rm ref 1 {"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}

So the stack is technically absolutely empty. Then GAS contract kicks in an invokes onNEP17Payment on B:

INITSLOT 0003 3
[
    {
        "type": "Any"
    },
    {
        "type": "Integer",
        "value": "213"
    },
    {
        "type": "Any"
    }
]

Or

add ref 0 {"type":"Any"}
add ref 1 {"type":"Integer","value":"213"}
add ref 2 {"type":"Any"}
INITSLOT 0003 3

It ends and then it's time for onNEP17Payment call from NEO:

INITSLOT 0003 5
[
    {
        "type": "Array",
        "value": [
            {
                "type": "ByteString",
                "value": "T2ZmZXI="
            },
            {
                "type": "ByteString",
                "value": "P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="
            }
        ]
    },
    {
        "type": "Integer",
        "value": "1"
    },
    {
        "type": "ByteString",
        "value": "L1cOSVrmCaaEhDhOAime4ENRGRI="
    }
]

With appropriate data from the original parameters. As @ixje noted, that's where the difference between NeoGo and C# node occurs, we've got:

add ref 0 {"type":"Array","value":[{"type":"ByteString","value":"T2ZmZXI="},{"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}]}
add ref 1 {"type":"ByteString","value":"T2ZmZXI="}
add ref 2 {"type":"ByteString","value":"P4Znu1jywofiNwuMGlOMAQeFAyYqsjrsosRgeauFSIk="}
add ref 3 {"type":"Integer","value":"1"}
add ref 4 {"type":"ByteString","value":"L1cOSVrmCaaEhDhOAime4ENRGRI="}
INITSLOT 0003 5

Which I think is the way it should be, the array is pushed onto the empty stack, so we properly add all of its child elements to the counter. If C# node has a counter of 3 at this point then it's likely treating this array as already existing on the stack.

@ixje
Copy link
Contributor Author

ixje commented May 10, 2022

@erikzhang it does not fail, but not sure if it is correct either. Previously the reference count between neo-go and C# was identical up to

PUSHDATA1 3f8667bb58f2c287e2370b8c1a538c01078503262ab23aeca2c46079ab854889 0
PUSHDATA1 4f66666572 1
PUSH2  2
PACK  3
PUSH1  3
PUSHDATA1 a9f1462d30fdb15a09497403e06ececdeac64cca 4
PUSHDATA1 2f570e495ae609a68484384e02299ee043511912 5
PUSH4  6
PACK  7
PUSH15  7
PUSHDATA1 7472616e73666572 8
PUSHDATA1 f563ea40bc283d4d0e05c48ea305b3f2a07340ef 9
SYSCALL 627d5b52 10
PUSH0  6
SYSCALL 1af77b67 7
INITSLOT 0003 3
NOP  3
LDARG2  3
PUSHNULL  4
EQUAL  5
JMPIFNOT_L 0c000000 4
NOP  3
JMP_L 3a010000 3
RET  3
INITSLOT 0003 5 <- C# ref count here is 3 instead of 5

now the reference counter in C# is 3 higher than neo-go after the 2nd syscall (to System.Contract.CallNative)

@erikzhang
Copy link
Member

Fixed in neo-project/neo#2732

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants