Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Under load, brands do not match #5571

Closed
arirubinstein opened this issue Jun 10, 2022 · 13 comments
Closed

Under load, brands do not match #5571

arirubinstein opened this issue Jun 10, 2022 · 13 comments
Assignees
Labels
AMM bug Something isn't working Inter-protocol Overarching Inter Protocol
Milestone

Comments

@arirubinstein
Copy link
Contributor

Describe the bug

2022-06-10T18:38:43.026Z SwingSet: ls: v28: RangeError#266: Brands in left Object [Alleged: BLDPoolLiquidity brand] {} and right Object [Alleged: BLDPoolLiquidity brand] {} should match but do not
2022-06-10T18:38:43.027Z SwingSet: ls: v28: RangeError: Brands in left (an object) and right (an object) should match but do not
 at makeError (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2705)
 at fail (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2832)
 at baseAssert (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2850)
 at equal (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2861)
 at checkLRAndGetHelpers (.../ertp/src/amountMath.js:164)
 at isGTE (.../ertp/src/amountMath.js:303)
 at handleAddPoolOffer (.../run-protocol/src/vpool-xyk-amm/addPool.js:145)
 at handleAddPoolOffer ()
 at apply ()
 at apply ()
 at dispatchToHandler (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:8729)
 at (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:8362)
 at win (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:9072)
 at ()

slog at:
https://drive.google.com/file/d/1FIg_y0Tp1PsE438LtKBxhp_ki1yqfjEF/view?usp=sharing

@arirubinstein arirubinstein added bug Something isn't working SwingSet package: SwingSet labels Jun 10, 2022
@warner
Copy link
Member

warner commented Jun 11, 2022

The cursed delivery was a handleOffer() on deliveryNum 679, and the marshalled arguments were:

[
  {
    "@qclass": "slot",
    "iface": "Alleged: InvitationHandle",
    "index": 0
  },
  {
    "@qclass": "slot",
    "iface": "Alleged: zoeSeatAdmin",
    "index": 1
  },
  {
    "initialAllocation": {
      "Central": {
        "brand": {
          "@qclass": "slot",
          "iface": "Alleged: RUN brand",
          "index": 2
        },
        "value": {
          "@qclass": "bigint",
          "digits": "1500000000"
        }
      },
      "Liquidity": {
        "brand": {
          "@qclass": "slot",
          "iface": "Alleged: BLDPoolLiquidity brand",
          "index": 3
        },
        "value": {
          "@qclass": "bigint",
          "digits": "0"
        }
      },
      "Secondary": {
        "brand": {
          "@qclass": "slot",
          "iface": "Alleged: BLD brand",
          "index": 4
        },
        "value": {
          "@qclass": "bigint",
          "digits": "300000000"
        }
      }
    },
    "notifier": {
      "@qclass": "slot",
      "iface": "Alleged: notifier",
      "index": 5
    },
    "offerArgs": {
      "@qclass": "undefined"
    },
    "proposal": {
      "exit": {
        "onDemand": null
      },
      "give": {
        "Central": {
          "brand": {
            "@qclass": "slot",
            "index": 2
          },
          "value": {
            "@qclass": "bigint",
            "digits": "1500000000"
          }
        },
        "Secondary": {
          "brand": {
            "@qclass": "slot",
            "index": 4
          },
          "value": {
            "@qclass": "bigint",
            "digits": "300000000"
          }
        }
      },
      "want": {
        "Liquidity": {
          "brand": {
            "@qclass": "slot",
            "index": 3
          },
          "value": {
            "@qclass": "bigint",
            "digits": "1000"
          }
        }
      }
    },
    "seatHandle": {
      "@qclass": "slot",
      "iface": "Alleged: SeatHandleHandle",
      "index": 6
    }
  }
]

The slots were ["o+56","o-149","o-54","o-124","o-119","o-150","o-151"], making the arguments more like:

[
  InvitationHandle(o+56),
  ZoeSeatAdmin(o-149),
  {
    initialAllocation: {
      Central: { value: 1500000000n, brand: RUNBrand(o-54) },
      Liquidity: { value: 0n, brand: BLDPoolLiquidityBrand(o-124) },
      Secondary: { value: 300000000n, brand: BLDBrand(o-119) },
    },
    "notifier": Notifier(o-150),
    "offerArgs": undefined,
    "proposal": {
      "exit": { "onDemand": null },
      "give": {
        Central: { value: 1500000000n, brand: RUNBrand(o-54) },
        Secondary: { value: 300000000n, brand: BLDBrand(o-119) },
      },
      "want": {
        Liquidity: { value: 1000n, brand: BLDPoolLiquidityBrand(o-124) },
      }
    },
    seatHandle: SeatHandleHandle(o-151),
  }
]

Now, what's interesting is that the error message ("Brands in left Object [Alleged: BLDPoolLiquidity brand] {} and right Object [Alleged: BLDPoolLiquidity brand] {} should match but do not") is showing the same BLDPoolLiquidity brand name for each side.

I'm looking at the deliveries to that vat, and the only time BLDPoolLiquidity brand appears is in the arguments to handleOffer deliveries. I wrote a little tool to examine those deliveries, and only three iface-bearing imports appear:

  • Alleged: BLD brand: o-119
  • Alleged: BLDPoolLiquidity brand: o-124
  • Alleged: RUN brand: o-54

The only vref that arrives with BLDPoolLiquidity brand as its iface is o-124. So it doesn't seem like something from outside the vat is sending two distinct objects that just happen to have the same iface. That points to a bug in liveslots, maybe somehow it's allowing two distinct Presences for the same vref.

I think the next step is to talk with someone who knows the contract well, and figure out which deliveries are providing the two sides of that comparison.

Also to try to reproduce it in a single vat, from just the slogfile trace.

@warner
Copy link
Member

warner commented Jun 11, 2022

For reference, Ari said this run was using 9d670a .

@arirubinstein
Copy link
Contributor Author

@warner
Copy link
Member

warner commented Jun 12, 2022

All three validators match on that delivery, so at least it's not a nondeterminism in the JS engine (the delivery->results function).

@warner
Copy link
Member

warner commented Jun 14, 2022

In the 679 vpool-xyk-amm deliveries leading up to the "brands do not match" error, I see 78 errors emitted in one of three categories:

  • 37 cases of keyword "BLDPoolLiquidity" must be unique
  • 40 cases of "secondaryBrand" not found [Alleged: BLD brand]
  • 1 case of "secondaryBrand" already registered Object [Alleged: BLD brand]

I think this is the army of clients all trying to be the first to register something unique, which is at least the right conditions for a race bug.

Interestingly, the "already registered" error appears on delivery 303, in the middle of the other two types of errors. It seems surprising that it would continue to complain that BLD isn't found even after it complained that BLD was already registered, but I don't really know what the registration is about.

@warner
Copy link
Member

warner commented Jun 14, 2022

@Chris-Hibbert and I narrowed this down to a race condition in the AMM's addPool.js code, specifically somewhere in

export const makeAddIssuer = (
zcf,
isInSecondaries,
brandToLiquidityMint,
getAddIssuerToReserve,
) => {
/**
* @param {Issuer} secondaryIssuer
* @param {string} keyword
*/
return async (secondaryIssuer, keyword) => {
const [secondaryAssetKind, secondaryBrand] = await Promise.all([
E(secondaryIssuer).getAssetKind(),
E(secondaryIssuer).getBrand(),
]);
assert(
!isInSecondaries(secondaryBrand),
X`issuer ${secondaryIssuer} already has a pool`,
);
assert(
secondaryAssetKind === AssetKind.NAT,
X`${keyword} asset not fungible (must use NAT math)`,
);
const liquidityKeyword = `${keyword}Liquidity`;
zcf.assertUniqueKeyword(liquidityKeyword);
const mint = await zcf.makeZCFMint(
liquidityKeyword,
AssetKind.NAT,
harden({ decimalPlaces: 6 }),
);
await zcf.saveIssuer(secondaryIssuer, keyword);
// this ensures that getSecondaryIssuer(thisIssuer) will return even before the pool is created
brandToLiquidityMint.init(secondaryBrand, mint);
const { issuer: liquidityIssuer } = mint.getIssuerRecord();
// defer lookup until necessary. more aligned with governed param we expect this to be eventually.
const addIssuerToReserve = getAddIssuerToReserve();
// tell the reserve about this brand, which it will validate by calling back
// to AMM for the issuer
await addIssuerToReserve(secondaryBrand);
return liquidityIssuer;
};

Chris will write up the details properly, but my (weak) understanding is:

  • @arirubinstein 's load generator has dozens of clients fighting to create the RUN/BLD AMM
  • they're all using the same "one true BLD" Issuer, so there can only be one true RUN/BLD pool
  • this addIssuer() function wants to pick a winner, then create and return a new BLDLiquidity issuer for them
  • a race in this function causes multiple contestants (perhaps only the first two) to be told that they've won, each with a distinct liquidity token issuer, however only one of those issuers is stored for later use
  • the (really) losing client then comes back with an offer that references the "false" liquidity token issuer (the one they were told about), and it does not match the issuer that was stored under the BLD issuer, hence the mismatch

@dtribble
Copy link
Member

dtribble commented Jun 14, 2022

All the assertUniqueKeyword are merely advisory because they are invalidated at each await. The test must be performed synchronously with the update. That means it must be in the same turn as zoeStorageManager.js#L133:

instanceRecordManager.addIssuerToInstanceRecord(keyword, issuerRecord);

which will always succeed, replacing any existing assignment to that keyword. Anything else is a TOCTOU confusion.

@Chris-Hibbert
Copy link
Contributor

Chris-Hibbert commented Jun 14, 2022

I think Zoe is safe and consistent.

the unsafe addIssuerToInstanceRecord is called from three places

  • zoeStorageManager:saveIssuer
    • provided to zcf, which makes it available to contracts immediately after a call to assertUniqueKeyword()
  • zoeStorageManager:wrapIssuerKitWithZoeMint
    • bottoms out in makeZcfMint(), which calls assertUniqueKeyword() immediately before.
  • zcfZygote:recordIssuer
    • called in doMakeZcfMint, which is called from registerFeeMint and makeZcfMint, both of which call assertUniqueKeyword() before, and without an intervening await.

That still leaves an issue in the AMM's addIssuer, which safely creates a unique liquidity token per pool, but unsafely hands out incorrect doppelgangers to anyone who asks before the canonical secondary pool is finished being created.

@dtribble
Copy link
Member

zoeStorageManager:saveIssuer

  • provided to zcf, which makes it available to contracts immediately after a call to assertUniqueKeyword()

It's not "immediately" because there is an intervening await in zoeStorageManager:saveIssuer:

    const saveIssuer = async (issuerP, keyword) => {
      const issuerRecord = await issuerStorage.storeIssuer(issuerP);
      await escrowStorage.createPurse(issuerRecord.issuer, issuerRecord.brand);
      instanceRecordManager.addIssuerToInstanceRecord(keyword, issuerRecord);
      return issuerRecord;
    };

So multiple issuers could be successfully stored as if they were the issuer for the keyword, but only the last one would win and be legit. That seems pretty unquestionably a race bug.

@Chris-Hibbert
Copy link
Contributor

Proposed solution:

saveIssuer() could call instanceRecordManager.reserveKeyword(keyword). That would add keyword to an internal list, and return a promise or a handle. The second attempt would be rebuffed. The first attempt would either redeem the handle or on an error, it would cancel the reservation.

instanceRecordStorage and zoeStorageManager can hold join responsibility for this, since they're both internal to Zoe.

Alternatively, we can disclaim the problem and just say "contracts should make sure to not register multiple issuers with the same keyword".

Any other approaches we should consider?

@Chris-Hibbert
Copy link
Contributor

@erights please respond to the above proposal.

@erights
Copy link
Member

erights commented Jun 16, 2022

I am reluctant to add more internal state, especially state that would need to be durable. Aside from that I'm still chewing on this. We should talk.

Chris-Hibbert added a commit that referenced this issue Jun 22, 2022
refs: #5571

There are two bugs in #5571, and this fixes only one of them.
Chris-Hibbert added a commit that referenced this issue Jun 28, 2022
refs: #5571

There are two bugs in #5571, and this fixes only one of them.
Chris-Hibbert added a commit that referenced this issue Jun 29, 2022
refs: #5571

There are two bugs in #5571, and this fixes only one of them.
Chris-Hibbert added a commit that referenced this issue Jun 29, 2022
refs: #5571

There are two bugs in #5571, and this fixes only one of them.
mergify bot pushed a commit that referenced this issue Jun 30, 2022
* fix: handle parallel requests for the same issuer in AMM's addIssuer

refs: #5571

There are two bugs in #5571, and this fixes only one of them.

* chore: clarify a comment

* refactor: revert to multiple await style; handle race at .init()

Added a test for multiple attempts to add the same issuer with the
same keyword.

* refactor: rearrange error handling

drop try-catch for brand-to-promise

add catch for saving issuer and notifying Reserve

* chore: delete from brandToLiquidityMint on failure

name cleanup

* chore: cleanups: types, variable name, logging

* chore: move DISPLAY_INFO to top level scope

* chore: delete entries from brandToLiquidityIssuer when done

This is probably what was intended instead of the
code fixed in #5611

* chore: convert a .then to E.when().

* refactor: return addIssuer as named (rather than anonymous) function
@Chris-Hibbert
Copy link
Contributor

#5692 is directed at the remaining issues.

@Tartuffo Tartuffo added this to the Mainnet 1 milestone Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMM bug Something isn't working Inter-protocol Overarching Inter Protocol
Projects
None yet
Development

No branches or pull requests

6 participants