Parsing the log topics for events (filter_log) may fail depending on the "indexed" inputs in the ABI #335

apehex · 2023-08-27T11:19:23Z

Hey there :)

I just noticed that filter_log sometimes misses events, in the Python SDK.

If the ABI used to trigger the event doesn't exactly match the ABI used to parse, filter_log fails.
In particular, developers are free to choose which event arguments are indexed.

For example the following event:

event Transfer(address indexed from, address to, uint256 value);

Will not be matched by an ABI built for:

event Transfer(address indexed from, address indexed to, uint256 value);

This can be identified by catching LogTopicError exceptions in filter_log, like:
Expected 2 log topics. Got 1

The text was updated successfully, but these errors were encountered:

apehex · 2023-09-08T16:02:50Z

So far I'm working around this issue locally & manually with custom parsing.
I have a few propositions to handle this issue.

Solution 1: passing all the ABI variants as inputs

First, generate all the variants of the ABI with each input "indexed" flag switch on and off:

def generate_all_abi_indexation_variants(abi: ABIEvent) -> dict:
    """Generate all the variants of the ABI by switching each "indexed" field true / false for the inputs."""
    _count = len(abi.get('inputs', ()))
    _indexed = tuple(itertools.product(*(_count * ((True, False), ))))
    _abis = {_c: [] for _c in range(_count + 1)} # order by number of indexed inputs
    for _i in _indexed: # each indexation variant
        _abis[sum(_i)].append(_apply_indexation_mask(abi=abi, mask=_i))
    return _abis

Then it could be used directly with the current filter_log:

_abis = generate_all_abi_indexation_variants(abi=abi)
tx.filter_log(list(_abis.values())) # tx is a TransactionEvent

Pros

This solution will always work!

It doesn't require any modifications on filter_log since it already handles lists of ABIs.

Cons

It comes at the cost of performances.
With only 3 input arguments in the ABI, there would be 8 variants and the computation would be 8x slower.

Solution 2: Use only the relevant ABI variants

In the snippet above, the variants are sorted by number of indexed inputs.

This could allow to reduce the computation time and only use the variants that match the number of topics.
Inside filter_log, select only the matching ABIs:

_abi = abi.get(len(log['topics']) - 1, None)
contract = web3Provider.eth.contract("0x0000000000000000000000000000000000000000", abi=_abi)
for event_name in event_names:
    try:
        results.append(
            contract.events[event_name]().processLog(log))

filter_log could be modified to handle dict, list and a single ABI value.

Pros

Less processing than solution 1: instead of growing exponantially with inputs, now it's "only" one of the binomial coefficient.

Cons

filter_log would have to be reworked.

Solution 3: using the most probable ABI

Here, the idea would be to generate only one ABI per number of indexed inputs.

For example, the ABI for the ERC20 Transfer event with 1 indexed input would be:

event Transfer(address indexed from, address to, uint256 value);

And we ignore the other 2 variants with 1 indexed input:

event Transfer(address from, address indexed to, uint256 value);
event Transfer(address from, address to, uint256 indexed value);

It would be up to the user to generate the mapping from input count to ABI.
The naive way, giving more importance to the arguments from left to right, would be:

def generate_the_most_probable_abi_indexation_variants(abi: ABIEvent) -> dict:
    """Generate the most probable variant of the ABI for each count of indexed inputs."""
    _count = len(abi.get('inputs', ()))
    _indexed = tuple((_i * [True] + (_count - _i) * [False]) for _i in range(_count + 1)) # index from left to right, without gaps
    return {sum(_i): _apply_indexation_mask(abi=abi, mask=_i) for _i in _indexed} # order by number of indexed inputs

And then select the relevant ABI in filter_log:

_abi = abi.get(len(log['topics']) - 1, None)
contract = web3Provider.eth.contract("0x0000000000000000000000000000000000000000", abi=_abi)
for event_name in event_names:
    try:
        results.append(
            contract.events[event_name]().processLog(log))

Pros

This method would not impact performances at all: only one ABI processed per log.

Cons

The filter_log would have to be modified too.

It will still miss events when the most probable ABI doesn't match the actual ABI of the emitted event.

Using the function to its fullest would also require the user to be aware of the issue.
Still, the generation of the variants could be handled directly by filter_log to keep its usage transparent.

christian-forta · 2023-09-08T21:06:21Z

@haseebrabbani, what are your thoughts on this?

christian-forta assigned haseebrabbani Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing the log topics for events (filter_log) may fail depending on the "indexed" inputs in the ABI #335

Parsing the log topics for events (filter_log) may fail depending on the "indexed" inputs in the ABI #335

apehex commented Aug 27, 2023 •

edited

Loading

apehex commented Sep 8, 2023

christian-forta commented Sep 8, 2023

Parsing the log topics for events (filter_log) may fail depending on the "indexed" inputs in the ABI #335

Parsing the log topics for events (filter_log) may fail depending on the "indexed" inputs in the ABI #335

Comments

apehex commented Aug 27, 2023 • edited Loading

apehex commented Sep 8, 2023

Solution 1: passing all the ABI variants as inputs

Pros

Cons

Solution 2: Use only the relevant ABI variants

Pros

Cons

Solution 3: using the most probable ABI

Pros

Cons

christian-forta commented Sep 8, 2023

apehex commented Aug 27, 2023 •

edited

Loading