Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose RETURNDATACOPY and RETURNDATASIZE. #211

Merged
merged 15 commits into from
Dec 1, 2017
61 changes: 61 additions & 0 deletions EIPS/returndatacopy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## Preamble

EIP:
Title: New opcodes: RETURNDATASIZE and RETURNDATACOPY
Author: Christian Reitwiessner <chris@ethereum.org>
Type: Standard Track
Category Core
Status: Draft
Created: 2017-02-13
Requires:
Replaces: 5/8


## Simple Summary

A mechanism to allow returning arbitrary-length data inside the EVM has been requested for quite a while now. Existing proposals always had very intricate problems associated with charging gas. This proposal solves the same problem while at the same time, it has a very simple gas charging mechanism and reqires minimal changes to the call opcodes. Its workings are very similar to the way calldata is handled already: After a call, return data is kept inside a virtual buffer from which the caller can copy it (or parts thereof) into memory. At the next call, the buffer is overwritten. This mechanism is 100% backwards compatible.

## Abstract

Please see summary.

## Motivation

In some situations, it is vital for a function to be able to return data whose length cannot be anticipated before the call. In principle, this can be solved without alterations to the EVM, for example by splitting the call into two calls where the first is used to compute only the size. All of these mechanisms, though, are very expensive in at least some situations. A very useful example of such a worst-case situation is a generic forwarding contract: A contract that takes call data, potentially makes some checks and then forwards it as is to another contract. The return data should of course be transferred in a similar way to the original caller. Since the contract is generic and does not know about the contract it calls, there is no way to determine the size of the output without adapting the called contract accordingly or trying a logarithmic number of calls.

Compiler implementors are advised to reserve a zero-length area for return data if the size of the return data is unknown before the call and then use `RETURNDATACOPY` in conjunction with `RETURNDATASIZE` to actually retrieve the data.

Note that this proposal also makes the EIP that proposes to allow to return data in case of an intentional state reversion (EIP [206](https://github.com/ethereum/EIPs/pull/206)) much more useful. Since the size of the failure data might be larger than the regular return data (or even unknown), it is possible to retrieve the failure data after the CALL opcode has signalled a failure, even if the regular output area is not large enough to hold the data.

## Specification

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this EIP specify the starting block somehow, or not?

Add two new opcodes:

`RETURNDATASIZE`: `0xd`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this opcode number? It creates gap after SIGNEXTEND 0xb.

Copy link
Member

@axic axic May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be after EXTCODECOPY as that block contains call related lookup. That is 0x3d and 0x3e.


Pushes the size of the return data (or the failure return data, see EIP [206](https://github.com/ethereum/EIPs/pull/206)) of the previous call onto the stack. If there was no previous call, pushes zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "previous call" mean "previous call in the current transaction" or "previous call in the current message call/contract creation"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous call made from the current call frame, i.e. the EVM execution that shares the same memory with the current executing opcode - not sure if there is a proper name for that somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find any. Sometimes the Yellow Paper says "this execution" or "message-call or contract-creation". Maybe "in the current machine state" is good enough.

Gas costs: 2 (same as `CALLDATASIZE`)

`RETURNDATACOPY`: `0xe`

This opcode has similar semantics to `CALLDATACOPY`, but instead of copying data from the call data, it copies data from the return data of the previous call. If the return data is accessed beyond its length, it is considered to be filled with zeros. If there was no previous call, copies zeros.
Gas costs: `3 + 3 * ceil(amount / 32)` (same as `CALLDATACOPY`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need something like this:

In a machine state, the return data of the previous call is maintained as follows. When a new machine state is launched, the return data of the previous call is defined to be the empty byte sequence. When the program counter reaches CALL, CREATE, CALLCODE, DELEGATECALL or STATICCALL, the return data of the previous call is reset to the empty byte sequence. When this instruction gives return data, the resultant data becomes the the return data of the previous call.

Especially, it's currently impossible to guess CREATE counts as a previous call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the following scenario:

  • Call foo()
    • Call bar() -> returns 42
    • RETURNDATA is now 42
    • Error (e.g. oog or invalid jump)
  • What does RETURNDATA give now? Was it cleared when going up a level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holiman I'm not sure I fully understand. In your example, the call to the foo contract signals a failure, correct? The RETURNDATA is always cleared when going up a level unless the call frame returns data using return or revert. In that case, it is set to that data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cleared at Call foo() and stays empty, regardless of what happens in other call stacks. It is not cleared when going up. The RETURNDATA in different machine states do not interfere with each other.

In this scenario, at least two machine states are involved. The machine state that calls foo() and the machine state that calls bar(). The machine state that calls foo() has RETURNDATA reset at Call foo().

In the Yellow Paper (9.4.1. "The Machine State"), a machine state is defined to be a tuple containing the program counter.

Copy link
Member

@pirapira pirapira Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseth, have you changed your answer to my old question:

Does "previous call" mean "previous call in the current transaction" or "previous call in the current message call/contract creation"?

Previous call made from the current call frame, i.e. the EVM execution that shares the same memory with the current executing opcode - not sure if there is a proper name for that somewhere.

Now your description reads as if the RETURNDATA buffer belongs to the transaction, not to the machine state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pirapira it is kind of a different viewpoint on the same thing. As mentioned in another comment, over all call stack frames, only one return data buffer has nonzero size at any point in time. Because of that, you can also think of a single return data buffer for the whole transaction. But I think that viewpoint (one buffer for the whole transaction) might just be useful for implementations. The specification is probably easier to understand when talking about one buffer per call stack frame.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of that, you can also think of a single return data buffer for the whole transaction

It was in that mode of thinking that my question about the clearing above came about. Ok!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested because I want to know if I should change the formulation in YP ethereum/yellowpaper#264 (currently a new buffer is added to the machine state; adding a transaction-wide buffer is also doable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseth OK. I'll try to follow your choice in the EIP text.

## Rationale

Other solutions that would allow returning dynamic data were considered, but they all had to deduct the gas from the call opcode and thus were both complicated to implement and specify ([5/8](https://github.com/ethereum/EIPs/issues/8)). Since this proposal is very similar to the way calldata is handled, it fits nicely into the concept. Furthermore, the eWASM architecture already handles return data in exactly the same way.

Note that the EVM implementation needs to keep the return data until the next call or the return from the current call. Since this resource was already paid for as part of the memory of the callee, it should not be a problem. Implementations may either choose to keep the full memory of the callee alive until the next call or copy only the return data to a special memory area.

The number values of the opcodes were allocated in the same nibble block that also contains `CALLDATASIZE` and `CALLDATACOPY`.

## Backwards Compatibility

This proposal introduces two new opcodes and stays fully backwards compatible apart from that.

## Test Cases

## Implementation

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).