Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

panprog - Invalid oracle versions can cause desync of global and local positions making protocol lose funds and being unable to pay back all users #49

Open
sherlock-admin opened this issue Aug 15, 2023 · 28 comments
Labels
Escalation Resolved This issue's escalations have been approved/rejected High A valid High severity issue Reward A payout will be made for this issue Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed

Comments

@sherlock-admin
Copy link
Contributor

sherlock-admin commented Aug 15, 2023

panprog

high

Invalid oracle versions can cause desync of global and local positions making protocol lose funds and being unable to pay back all users

Summary

When oracle version is skipped for any reason (marked as invalid), pending positions are invalidated (reset to previous latest position):

    function _processPositionGlobal(Context memory context, uint256 newPositionId, Position memory newPosition) private {
        Version memory version = _versions[context.latestPosition.global.timestamp].read();
        OracleVersion memory oracleVersion = _oracleVersionAtPosition(context, newPosition);
>        if (!oracleVersion.valid) newPosition.invalidate(context.latestPosition.global);
...
    function _processPositionLocal(
        Context memory context,
        address account,
        uint256 newPositionId,
        Position memory newPosition
    ) private {
        Version memory version = _versions[newPosition.timestamp].read();
>        if (!version.valid) newPosition.invalidate(context.latestPosition.local);

This invalidation is only temporary, until the next valid oracle version. The problem is that global and local positions can be settled with different next valid oracle version, leading to temporary desync of global and local positions, which in turn leads to incorrect accumulation of protocol values, mostly in profit and loss accumulation, breaking internal accounting: total collateral of all users can increase or decrease due to this while the funds deposited remain the same, possibly triggering a bank run, since the last user to withdraw will be unable to do so, or some users might get collateral reduced when it shouldn't (loss of funds for them).

Vulnerability Detail

In more details, if there are 2 pending positions with timestamps different by 2 oracle versions and the first of them has invalid oracle version at its timestamp, then there are 2 different position flows possible depending on the time when the position is settled (update transaction called):

  1. For earlier update the flow is: previous position (oracle v1) -> position 1 (oracle v2) -> position 2 (oracle v3)
  2. For later update position 1 is skipped completely (the fees for the position are also not taken) and the flow is: previous position (oracle v1) -> invalidated position 1 (in the other words: previous position again) (oracle v2) -> position 2 (oracle v3)

While the end result (position 2) is the same, it's possible that pending global position is updated earlier (goes the 1st path), while the local position is updated later (goes the 2nd path). For a short time (between oracle versions 2 and 3), the global position will accumulate everything (including profit and loss) using the pending position 1 long/short/maker values, but local position will accumulate everything using the previous position with different values.

Consider the following scenario:
Oracle uses granularity = 100. Initially user B opens position maker = 2 with collateral = 100.
T=99: User A opens long = 1 with collateral = 100 (pending position long=1 timestamp=100)
T=100: Oracle fails to commit this version, thus it becomes invalid
T=201: At this point oracle version at timestamp 200 is not yet commited, but the new positions are added with the next timestamp = 300:
User A closes his long position (update(0,0,0,0)) (pending position: long=1 timestamp=100; long=0 timestamp=300)
At this point, current global long position is still 0 (pending the same as user A local pending positions)

T=215: Oracle commits version with timestamp = 200, price = $100
T=220: User B settles (update(2,0,0,0) - keeping the same position).
At this point the latest oracle version is the one at timestamp = 200, so this update triggers update of global pending positions, and current latest global position is now long = 1.0 at timestamp = 200.
T=315: Oracle commits version with timestamp = 300, price = $90
after settlement of both UserA and UserB, we have the following:

  1. Global position settlement. It accumulates position [maker = 2.0, long = 1.0] from timestamp = 200 (price=$100) to timestamp = 300 (price=$90). In particular:
    longPnl = 1*($90-$100) = -$10
    makerPnl = -longPnl = +$10
  2. User B local position settlement. It accumulates position [maker = 2.0] from timestamp = 200 to timestamp = 300, adding makerPnl ($10) to user B collateral. So user B collateral = $110
  3. User A local position settlement. When accumulating, pending position 1 (long = 1, timestamp = 100) is invalidated to previous position (long = 0) and also fees are set to 0 by invalidation. So user A local accumulates position [long = 0] from timestamp = 0 to timestamp = 300 (next pending position), this doesn't change collateral at all (remains $100). Then the next pending position [long = 0] becomes the latest position (basically position of long=1 was completely ignored as if it has not existed).

Result:
User A deposited $100, User B deposited $100 (total $200 deposited)
after the scenario above:
User A has collateral $110, User B has collateral $100 (total $210 collateral withdrawable)
However, protocol only has $200 deposited. This means that the last user will be unable to withdraw the last $10 since protocol doesn't have it, leading to a user loss of funds.

Impact

Any time the oracle skips a version (invalid version), it's likely that global and local positions for different users who try to trade during this time will desync, leading to messed up accounting and loss of funds for users or protocol, potentially triggering a bank run with the last user being unable to withdraw all funds.

The severity of this issue is high, because while invalid versions are normally a rare event, however in the current state of the codebase there is a bug that pyth oracle requests are done using this block timestamp instead of granulated future time (as positions do), which leads to invalid oracle versions almost for all updates (that bug is reported separately). Due to this other bug, the situation described in this issue will arise very often by itself in a normal flow of the user requests, so it's almost 100% that internal accounting for any semi-active market will be broken and total user collateral will deviate away from real deposited funds, meaning the user funds loss.

But even with that other bug fixed, the invalid oracle version is a normal protocol event and even 1 such event might be enough to break internal market accounting.

Proof of concept

The scenario above is demonstrated in the test, add this to test/unit/market/Market.test.ts:

it('panprog global-local desync', async () => {
    const positionMaker = parse6decimal('2.000')
    const positionLong = parse6decimal('1.000')
    const collateral = parse6decimal('100')

    const oracleVersion = {
        price: parse6decimal('100'),
        timestamp: TIMESTAMP,
        valid: true,
    }
    oracle.at.whenCalledWith(oracleVersion.timestamp).returns(oracleVersion)
    oracle.status.returns([oracleVersion, oracleVersion.timestamp + 100])
    oracle.request.returns()

    dsu.transferFrom.whenCalledWith(userB.address, market.address, collateral.mul(1e12)).returns(true)
    await market.connect(userB).update(userB.address, positionMaker, 0, 0, collateral, false)

    const oracleVersion2 = {
        price: parse6decimal('100'),
        timestamp: TIMESTAMP + 100,
        valid: true,
    }
    oracle.at.whenCalledWith(oracleVersion2.timestamp).returns(oracleVersion2)
    oracle.status.returns([oracleVersion2, oracleVersion2.timestamp + 100])
    oracle.request.returns()

    dsu.transferFrom.whenCalledWith(user.address, market.address, collateral.mul(1e12)).returns(true)
    await market.connect(user).update(user.address, 0, positionLong, 0, collateral, false)

    var info = await market.locals(userB.address);
    console.log("collateral deposit maker: " + info.collateral);
    var info = await market.locals(user.address);
    console.log("collateral deposit long: " + info.collateral);

    // invalid oracle version
    const oracleVersion3 = {
        price: 0,
        timestamp: TIMESTAMP + 200,
        valid: false,
    }
    oracle.at.whenCalledWith(oracleVersion3.timestamp).returns(oracleVersion3)

    // next oracle version is valid
    const oracleVersion4 = {
        price: parse6decimal('100'),
        timestamp: TIMESTAMP + 300,
        valid: true,
    }
    oracle.at.whenCalledWith(oracleVersion4.timestamp).returns(oracleVersion4)

    // still returns oracleVersion2, because nothing commited for version 3, and version 4 time has passed but not yet commited
    oracle.status.returns([oracleVersion2, oracleVersion4.timestamp + 100])
    oracle.request.returns()

    // reset to 0
    await market.connect(user).update(user.address, 0, 0, 0, 0, false)

    // oracleVersion4 commited
    oracle.status.returns([oracleVersion4, oracleVersion4.timestamp + 100])
    oracle.request.returns()

    // settle
    await market.connect(userB).update(userB.address, positionMaker, 0, 0, 0, false)

    const oracleVersion5 = {
        price: parse6decimal('90'),
        timestamp: TIMESTAMP + 400,
        valid: true,
    }
    oracle.at.whenCalledWith(oracleVersion5.timestamp).returns(oracleVersion5)
    oracle.status.returns([oracleVersion5, oracleVersion5.timestamp + 100])
    oracle.request.returns()

    // settle
    await market.connect(userB).update(userB.address, positionMaker, 0, 0, 0, false)
    await market.connect(user).update(user.address, 0, 0, 0, 0, false)

    var info = await market.locals(userB.address);
    console.log("collateral maker: " + info.collateral);
    var info = await market.locals(user.address);
    console.log("collateral long: " + info.collateral);
})

Console output for the code:

collateral deposit maker: 100000000
collateral deposit long: 100000000
collateral maker: 110000028
collateral long: 100000000

Maker has a bit more than $110 in the end, because he also earns funding and interest during the short time when ephemeral long position is active (but user A doesn't pay these fees).

Code Snippet

_processPositionGlobal invalidates position if oracle version is invalid for its timestamp:
https://github.com/sherlock-audit/2023-07-perennial/blob/main/perennial-v2/packages/perennial/contracts/Market.sol#L390-L393

_processPositionLocal does the same:
https://github.com/sherlock-audit/2023-07-perennial/blob/main/perennial-v2/packages/perennial/contracts/Market.sol#L430-L437

_settle loops over global and local positions until the latest oracle version timestamp. In this loop each position is invalidated to previous latest if it has invalid oracle timestamp. So if _settle is called after the invalid timestamp, previous latest is accumulated for it:
https://github.com/sherlock-audit/2023-07-perennial/blob/main/perennial-v2/packages/perennial/contracts/Market.sol#L333-L347

Later in the _settle, the latest global and local position are advanced to latestVersion timestamp, the difference from the loop is that since position timestamp is set to valid oracle version, _processPositionGlobal and _processPositionLocal here will be called with valid oracle and thus position (which is otherwise invalidated in the loop) will be valid and set as the latest position:
https://github.com/sherlock-audit/2023-07-perennial/blob/main/perennial-v2/packages/perennial/contracts/Market.sol#L349-L360

This means that for early timestamps, invalid version positions will become valid in the sync part of the _settle. But for late timestamps, invalid version position will be skipped completely in the loop before sync. This is the core reason of desync between local and global positions.

Tool used

Manual Review

Recommendation

The issue is that positions with invalid oracle versions are ignored until the first valid oracle version, however the first valid version can be different for global and local positions. One of the solutions I see is to introduce a map of position timestamp -> oracle version to settle, which will be filled by global position processing. Local position processing will follow the same path as global using this map, which should eliminate possibility of different paths for global and local positions.

It might seem that the issue can only happen with exactly 1 oracle version between invalid and valid positions. However, it's also possible that some non-requested oracle versions are commited (at some random timestamps between normal oracle versions) and global position will go via the route like t100[pos0]->t125[pos1]->t144[pos1]->t200[pos2] while local one will go t100[pos0]->t200[pos2] OR it can also go straight to t300 instead of t200 etc. So the exact route can be anything, and local oracle will have to follow it, that's why I suggest a path map.

There might be some other solutions possible.

@github-actions github-actions bot added the Medium A valid Medium severity issue label Aug 18, 2023
@sherlock-admin
Copy link
Contributor Author

1 comment(s) were left on this issue during the judging contest.

141345 commented:

m

@arjun-io arjun-io added Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed labels Aug 22, 2023
@sherlock-admin sherlock-admin changed the title Wonderful Silver Loris - Invalid oracle versions can cause desync of global and local positions making protocol lose funds and being unable to pay back all users panprog - Invalid oracle versions can cause desync of global and local positions making protocol lose funds and being unable to pay back all users Aug 23, 2023
@sherlock-admin sherlock-admin added the Reward A payout will be made for this issue label Aug 23, 2023
@panprog
Copy link

panprog commented Aug 25, 2023

Escalate

This should be high, because:

  1. When the situation described in the issue happens, it causes serious internal accounting issue causing loss of funds by users and possibly causing bank run and then loss of funds by the last users.
  2. The condition for this to happen is invalid oracle version, which in the current state of the code will happen regularly (see panprog - Oracle request timestamp and pending position timestamp mismatch can make most position updates invalid #42)
  3. For the situation described to happen it's enough for 1 user to request to open position, oracle to be invalid for that position timestamp, and then the user to request any modification to this position (increase or decrease), then another user to do any action after the next oracle is commited. That's it, internal accounting is broken.
  4. The stated flow of events can happen by itself very regularly in any semi-active market, leading to worse and worse accounting broking up.
  5. OR malicious user can try to abuse this scenario to profit off it or just to break the protocol. Doing this is mostly free (except for some keeper fees). In such case the protocol will be broken very quickly.

I don't know why it's judged medium, but this issue is very likely to happen and will cause a lot of damage to the market, thus it should be high.

@sherlock-admin2
Copy link
Contributor

Escalate

This should be high, because:

  1. When the situation described in the issue happens, it causes serious internal accounting issue causing loss of funds by users and possibly causing bank run and then loss of funds by the last users.
  2. The condition for this to happen is invalid oracle version, which in the current state of the code will happen regularly (see panprog - Oracle request timestamp and pending position timestamp mismatch can make most position updates invalid #42)
  3. For the situation described to happen it's enough for 1 user to request to open position, oracle to be invalid for that position timestamp, and then the user to request any modification to this position (increase or decrease), then another user to do any action after the next oracle is commited. That's it, internal accounting is broken.
  4. The stated flow of events can happen by itself very regularly in any semi-active market, leading to worse and worse accounting broking up.
  5. OR malicious user can try to abuse this scenario to profit off it or just to break the protocol. Doing this is mostly free (except for some keeper fees). In such case the protocol will be broken very quickly.

I don't know why it's judged medium, but this issue is very likely to happen and will cause a lot of damage to the market, thus it should be high.

You've created a valid escalation!

To remove the escalation from consideration: Delete your comment.

You may delete or edit your escalation comment anytime before the 48-hour escalation window closes. After that, the escalation becomes final.

@sherlock-admin sherlock-admin added the Escalated This issue contains a pending escalation label Aug 25, 2023
@Emedudu
Copy link

Emedudu commented Aug 26, 2023

Escalate

When oracle version is skipped for any reason (marked as invalid), pending positions are invalidated (reset to previous latest position):

This is not HIGH because there is a limitation: "When oracle version is skipped for any reason"
This is a VERY unlikely event.
So by Sherlock rules, it is a MEDIUM:
"Medium: There is a viable scenario (even if unlikely) that could cause the protocol to enter a state where a material amount of funds can be lost."

@panprog
Copy link

panprog commented Aug 26, 2023

This is not HIGH because there is a limitation: "When oracle version is skipped for any reason"
This is a VERY unlikely event.

While it is supposed to be a rare event, in the current state of the code, this is a VERY LIKELY event, see #42
It should be judged based on the current code, not on assumptions of how it will work in the future.

@Emedudu
Copy link

Emedudu commented Aug 26, 2023

Skipping of oracle versions is an unlikely event.
This was stated under the impact section of the issue:
"The severity of this issue is high, because while invalid versions are normally a rare event, however in the current state of the codebase there is a bug that pyth oracle requests are done using this block timestamp instead of granulated future time (as positions do), which leads to invalid oracle versions almost for all updates (that bug is reported separately)."

This makes this issue to fall under MEDIUM severity according to Sherlock's classification rules.

@Minh-Trng
Copy link

It should be judged based on the current code, not on assumptions of how it will work in the future.

it should be treated like different submissions describing different impacts with the same root cause:
does fixing that one root cause mitigate all described impacts? then all of them are considered duplicates.

now, clearly your submission is not a duplicate, but it builds on that same root cause. if this root cause were fixed, your impact would still hold, but with a much lower likelihood

@panprog
Copy link

panprog commented Aug 26, 2023

now, clearly your submission is not a duplicate, but it builds on that same root cause. if this root cause were fixed, your impact would still hold, but with a much lower likelihood

Yes, I chain 2 issues to demonstrate high impact. In this case both issues should be high. We can't start predicting future "what happens if that one is fixed..." The way it is now - existance of either issue creates a high impact for the protocol, and each issue is a separate one.
I disagree that the issue should be "isolated" and impact considered as if the other issues are fixed.
Sherlock has the following judging rule:

Future issues: Issues that result out of a future integration/implementation that was not intended (mentioned in the docs/README) or because of a future change in the code (as a fix to another issue) are not valid issues.

While it doesn't provide the same clear rule for the opposite, I believe it's a logical continuation of that rule to that impact shouldn't be decreased because of a future change in the code (as a fix to another issue).

@sherlock-admin2
Copy link
Contributor

Escalate

When oracle version is skipped for any reason (marked as invalid), pending positions are invalidated (reset to previous latest position):

This is not HIGH because there is a limitation: "When oracle version is skipped for any reason"
This is a VERY unlikely event.
So by Sherlock rules, it is a MEDIUM:
"Medium: There is a viable scenario (even if unlikely) that could cause the protocol to enter a state where a material amount of funds can be lost."

You've created a valid escalation!

To remove the escalation from consideration: Delete your comment.

You may delete or edit your escalation comment anytime before the 48-hour escalation window closes. After that, the escalation becomes final.

@Minh-Trng
Copy link

The judging rule explicitly talks about unintended future implementation. So your logical continuation would also only be applicable to unintended future implementations (which I would absolutely agree with).

Your submission however even shows that you were aware of the correct intended future implementations and the change in likelihood that it would bring.

@panprog
Copy link

panprog commented Aug 29, 2023

The judging rule explicitly talks about unintended future implementation.

It says OR: so either unintended future implementation OR because of future fix of another issue.

I still think this is high because

  1. Even by itself, it messes up accounting and causes loss of funds if oracle version is invalid. Invalid oracle versions are normal protocol operation, even if rare. This can be compared to liquidations - they're also rare but normal protocol operation, so issues in liquidations are considered high. Similarly, this issue due to invalid oracle version should also be high.
  2. In the current implementation, oracle versions are invalid very frequently due to another bug, so this should be high either way. I still think disregarding the other bugs when considering impact is incorrect. And the fact that I'm aware of the intended future implementation is irrelevant: the way it is right now, the issue in this report happens very frequently by itself.

@141345
Copy link
Collaborator

141345 commented Aug 29, 2023

Medium severity seems more appropriate.

Because:

  • the likelihood that "oracle version is skipped" is not common scenario.
  • the loss is not significant.

Based on sherlock's H/M criteria

Medium: There is a viable scenario (even if unlikely) that could cause the protocol to enter a state where a material amount of funds can be lost. The attack path is possible with assumptions that either mimic on-chain conditions or reflect conditions that have a reasonable chance of becoming true in the future.

@panprog
Copy link

panprog commented Aug 29, 2023

  • the likelihood that "oracle version is skipped" is not common scenario.

In the current state of the code it is very likely scenario (expected behavior is for this to be not common scenario, but currently it is common)

  • the loss is not significant.

This depends. If malicious user want to cause damage (or comes up with a profitable scenario), the loss can be very significant. In the current state of the code - since it will happen often by itself, each instance will not be very significant (0.01-0.2% of the protocol funds depending on price volatility), but it will add up to large amounts over time.

@141345
Copy link
Collaborator

141345 commented Aug 30, 2023

Although #42 points out the possibility of skipped version, it does not look like some common events.

0.01-0.2% each time won't be considered material loss. Some normal operation can have even more profit than that, such as spot-futures arbitrage, cross exchange arbitrage.

And the attacker need to control the conditions to perform the action. As such, the loss amount and requirements fall into the Med.

@panprog
Copy link

panprog commented Aug 31, 2023

I've added a detailed response previously, but don't see it now, maybe it got deleted or not sent properly. Here it is again.

Although #42 points out the possibility of skipped version, it does not look like some common events.

No, it's very easy to have skipped versions in the current implementation. For example:
t=24: user opens postiion (request timestamp = 25, position timestamp = 100)
t=48: user closes position (request timestamp = 48, position timestamp = 100)
t=96: user opens position (request timestamp = 96, position timestamp = 100)
t=108: user closes position (request timestamp = 108, position timestamp = 200)
...
If oracle is commited requested (in the normal operation of the protocol), the commited timestamps will be: 25, 48, 96, 108. Timestamp = 100 will be missing and all position at this timestamp will have invalid oracle version.
In order to have a valid oracle version for position, a request to open or close position must be made at exactly the timestamp of the previous positions (t=100 in example). However, a lot of times there won't be even a block with such timestamp. For example, in ethereum blocks are currently happening every 12 seconds and have odd timestamps, and in the other networks the time between blocks is random, so the probability to actually have the block timestamp divisible by granularity is low.
Even if granularity and block timestamp align well, it still requires that request is made at exactly the granularity timestamp, so for example if granularity = 120 and time between blocks is 12, then every 10th block must have position open in order to request at the right timestamp.
It's still possible to commit unrequested, but first, this is not incentivized (as there is no reward to the keeper who commits unrequested) and second, there is still a time window when this has to be commited. So in the example above, commit unrequested for timestamp = 100 can only be done after timestamp = 96 is commited but before the timestamp = 108 is commited. So it's still easy to miss this time window to commit unrequested.
So in summary, the way it is now, it's easier to have invalid oracle version, than it is to have valid oracle version.

0.01-0.2% each time won't be considered material loss. Some normal operation can have even more profit than that, such as spot-futures arbitrage, cross exchange arbitrage.

I argue that this is actually a material loss - it's the percentage off the protocol funds. So if $100M are deposited into protocol, the loss can be like $100K per instance. And since it can happen almost every granularity, this will add up very quick.

And the attacker need to control the conditions to perform the action. As such, the loss amount and requirements.

Even if there is no attacker, the loss will be smaller, but it will be continuous in time, so even if it's, say, $10K per instance (with $100M deposited), it will add up to tens of millions over less than a day.

@panprog
Copy link

panprog commented Sep 2, 2023

Regarding the impact, I want to point that #62 is high and has similar impact (messed up internal accounting), however the real damage from it is:

The global account's assets and shares should be calculated with toAssetsGlobal and toSharesGlobal respectively, but now, they are calculated with toAssetsLocal and toSharesLocal.
toAssetsGlobal subtracts the globalKeeperFees from the global deposited assets, while toAssetsLocal subtracts globalKeeperFees/Checkpoint.count fees from the local account's assets.

So the real damage from #62 is reduction of deposited assets by (keeper fees / count) instead of (keeper fees). Since keeper fees are low (comparable to gas fees), the real damage is not that large compared to funds deposited (less than 0.001% likely).

However, the issue from this report also causes messed up internal accounting, but the real damage depends on position size and price volatility and will be much higher than in #62 on average, even when happening by itself. If coming from malicious parties, this can be a very large amount.

Even though damage in #62 is much easier to inflict, I believe that due to higher damage per instance of this issue, the overall damage over time from this issue will be higher than from #62. Something like:
$5 per granilarity time from #62 (and still has to be executed by attacker - so the attack has a gas cost of similar amount)
$10K-$100K per invalid oracle from this one - currently that's maybe once per 10-100 granularities by itself in the current state, so at least $100-$1000 per granularity time (and happens by itself).

So based on possible real overall damage caused, this issue should be high.

@Emedudu
Copy link

Emedudu commented Sep 4, 2023

This should be MEDIUM because skipped oracle versions is not a common event.

In the current state of the code it is very likely scenario (expected behavior is for this to be not common scenario, but currently it is common)

Skipped oracle versions is unlikely, and the reason why it is a likely scenario in the current code is due to a bug, which has already been reported in #42.
Fixing issue 42 will cause the possibility of this to be unlikely.
So, this report is a combination of issue 42(which is already of high severity), and another bug, which is very unlikely by itself.

Since different attack scenarios, with same fixes are considered duplicates, this issue should be a MEDIUM because issue 42(which allows this bug to be a likely scenario) when fixed, will make this issue unlikely

@141345
Copy link
Collaborator

141345 commented Sep 6, 2023

1st, the scenario is conditional, not the kind on daily basis.
2nd, loss magnitude like 0.01-0.2% is common, different exchanges could have that magnitude of price difference, (common arbitrage opportunity, future contracts of different terms can also have larger arbitrage than this).

As such, conditional loss and capped loss amount, will suggest medium severity.

@panprog
Copy link

panprog commented Sep 6, 2023

1st, the scenario is conditional, not the kind on daily basis.

I don't think high severity means unconditional. My understanding is that high severity is high impact which can happen with rather high probability. And high probability doesn't mean it can happen every transaction, it just means that it can reasonably happen within a week or a month.
Example (from the other contest): liquidation can be frontrun to block it without spending funds: was judged high, even though liquidation is not very frequent (days can pass without single liquidation).

And this issue probability to happen is high.

2nd, loss magnitude like 0.01-0.2% is common, different exchanges could have that magnitude of price difference, (common arbitrage opportunity, future contracts of different terms can also have larger arbitrage than this).

Why do you only consider it as a 1 time event? The way it is now, that'll be like 0.1% per 1-10 minutes, this will add up to 10%+ in less than a day.

As such, conditional loss and capped loss amount, will suggest medium severity.

I think there are no new arguments presented here, so I keep it up to Sherlock to decide. I think ultimately it comes down to:

  • this issue is high as is now due to the other bug.
  • if the other bug is fixed (and oracle versions work the way they're supposed to work), this one will be medium.

So my argument is that the way it is now, it's high and severity shouldn't be downgraded as if the other bug is fixed.

@141345
Copy link
Collaborator

141345 commented Sep 6, 2023

I don't think it's one time, it's something intermittent, it happens every once in a while, but the frequency might not be as high as per 1-10 minutes.

And even it happened, there could be loss, and also sometimes no loss.

That's why based on the probability of happending and loss, it is more suitable for conditional and capped loss.

@panprog
Copy link

panprog commented Sep 7, 2023

I still think this is High, the way it is now - it can happen as often as once per each granularity if malicious user abuses it, or maybe once per 10 granularities with semi-active trading by itself. Either way it's very possible and causes loss of funds.
As I said earlier, I see this as medium only if #42 is disregarded, but I don't see any ground to ignore that bug when assessing severity of this one.

@141345
Copy link
Collaborator

141345 commented Sep 7, 2023

What about the cross exchange price difference, the magnitude could be the same level. Also the spot and perpetual contract could deviates, those cases are not considered exchange's loss.

@panprog
Copy link

panprog commented Sep 7, 2023

What about the cross exchange price difference, the magnitude could be the same level. Also the spot and perpetual contract could deviates, those cases are not considered exchange's loss.

It's different, they're normal protocol operation and funds just changing hands. This issue leads to mismatch between funds protocol has and funds protocol owes (protocol has 100, but total collateral users can withdraw can be 200, so not all users can withdraw, which can trigger bank run with last users unable to withdraw)

@kbrizzle
Copy link

kbrizzle commented Sep 8, 2023

To chime in here from our perspective -- this issue identified a pretty fundamental flaw in our accounting system that would have caused the markets to be completely out of sync (both w.r.t. positions as well as balances) in the event of an unfortunately timed invalid version.

While ideally rare, occasional invalid versions are expected behavior. Invalid versions can occur for a number of reasons during the course of normal operation:

  • Underlying oracle does not have data available for any timestamp in the validity window (occurs in Pyth from time to time)
  • No keeper responds within grace period window
  • Future oracle integration / upgrade adds anomaly detection to invalidate versions via a number of metrics

Given that this bug would have likely surfaced in normal operation plus its noted effect, our opinion is that this should be marked as High.

Fixed in: equilibria-xyz/perennial-v2#82 and equilibria-xyz/perennial-v2#94.

@hrishibhat hrishibhat added High A valid High severity issue and removed Medium A valid Medium severity issue labels Sep 8, 2023
@hrishibhat
Copy link

Result:
High
Unique
After considering all the comments above and discussing this further.
In addition to the Sponsor comments above the issue highlights the issues in context with the current state of the codebase. the submission justifies the severity by mentioning another underlying issue in the impact section clearly, which is again pointed out in the escalation. The future implementation rule does not apply here. Also, Agree with the points raised by @panprog in the subsequent comments.
Considering this a valid high

@sherlock-admin2 sherlock-admin2 removed the Escalated This issue contains a pending escalation label Sep 8, 2023
@sherlock-admin sherlock-admin added the Escalation Resolved This issue's escalations have been approved/rejected label Sep 8, 2023
@sherlock-admin2
Copy link
Contributor

Escalations have been resolved successfully!

Escalation status:

@jacksanford1
Copy link

From WatchPug:

https://github.com/equilibria-xyz/perennial-v2/blob/602218ca8cb62db649bf4b6a6722824e9ab20166/packages/perennial/contracts/Market.sol#L563-L595

image

L590 will revert due to underflow in the following case:

context.latestPosition.local.magnitude() is equal to 0
pendingLocalPositions[0].magnitude() is equal to 10
pendingLocalPositions[1].magnitude() is equal to 5
The initial value of previousMagnitude is 0.

After the first run (i: 0):

closableAmount is equal to 0
previousMagnitude is equal to 10
At the second run (i: 1), at L590, 0 - 10 will revert.

@arjun-io
Copy link

From WatchPug:

https://github.com/equilibria-xyz/perennial-v2/blob/602218ca8cb62db649bf4b6a6722824e9ab20166/packages/perennial/contracts/Market.sol#L563-L595

image L590 will revert due to underflow in the following case:

context.latestPosition.local.magnitude() is equal to 0 pendingLocalPositions[0].magnitude() is equal to 10 pendingLocalPositions[1].magnitude() is equal to 5 The initial value of previousMagnitude is 0.

After the first run (i: 0):

closableAmount is equal to 0 previousMagnitude is equal to 10 At the second run (i: 1), at L590, 0 - 10 will revert.

Under the new delta invalidation system, the aggregate amount of your pending closes must be less than your currently settled (latest) position as specified here. This underflow revert is actually intentional to enforce this invariant. See this test to cross-reference expected behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Escalation Resolved This issue's escalations have been approved/rejected High A valid High severity issue Reward A payout will be made for this issue Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed
Projects
None yet
Development

No branches or pull requests

10 participants