Under load, brands do not match #5571

arirubinstein · 2022-06-10T19:59:14Z

Describe the bug

2022-06-10T18:38:43.026Z SwingSet: ls: v28: RangeError#266: Brands in left Object [Alleged: BLDPoolLiquidity brand] {} and right Object [Alleged: BLDPoolLiquidity brand] {} should match but do not
2022-06-10T18:38:43.027Z SwingSet: ls: v28: RangeError: Brands in left (an object) and right (an object) should match but do not
 at makeError (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2705)
 at fail (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2832)
 at baseAssert (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2850)
 at equal (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:2861)
 at checkLRAndGetHelpers (.../ertp/src/amountMath.js:164)
 at isGTE (.../ertp/src/amountMath.js:303)
 at handleAddPoolOffer (.../run-protocol/src/vpool-xyk-amm/addPool.js:145)
 at handleAddPoolOffer ()
 at apply ()
 at apply ()
 at dispatchToHandler (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:8729)
 at (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:8362)
 at win (/usr/src/agoric-sdk/packages/SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js:9072)
 at ()

slog at:
https://drive.google.com/file/d/1FIg_y0Tp1PsE438LtKBxhp_ki1yqfjEF/view?usp=sharing

The text was updated successfully, but these errors were encountered:

warner · 2022-06-11T01:40:36Z

The cursed delivery was a handleOffer() on deliveryNum 679, and the marshalled arguments were:

[
  {
    "@qclass": "slot",
    "iface": "Alleged: InvitationHandle",
    "index": 0
  },
  {
    "@qclass": "slot",
    "iface": "Alleged: zoeSeatAdmin",
    "index": 1
  },
  {
    "initialAllocation": {
      "Central": {
        "brand": {
          "@qclass": "slot",
          "iface": "Alleged: RUN brand",
          "index": 2
        },
        "value": {
          "@qclass": "bigint",
          "digits": "1500000000"
        }
      },
      "Liquidity": {
        "brand": {
          "@qclass": "slot",
          "iface": "Alleged: BLDPoolLiquidity brand",
          "index": 3
        },
        "value": {
          "@qclass": "bigint",
          "digits": "0"
        }
      },
      "Secondary": {
        "brand": {
          "@qclass": "slot",
          "iface": "Alleged: BLD brand",
          "index": 4
        },
        "value": {
          "@qclass": "bigint",
          "digits": "300000000"
        }
      }
    },
    "notifier": {
      "@qclass": "slot",
      "iface": "Alleged: notifier",
      "index": 5
    },
    "offerArgs": {
      "@qclass": "undefined"
    },
    "proposal": {
      "exit": {
        "onDemand": null
      },
      "give": {
        "Central": {
          "brand": {
            "@qclass": "slot",
            "index": 2
          },
          "value": {
            "@qclass": "bigint",
            "digits": "1500000000"
          }
        },
        "Secondary": {
          "brand": {
            "@qclass": "slot",
            "index": 4
          },
          "value": {
            "@qclass": "bigint",
            "digits": "300000000"
          }
        }
      },
      "want": {
        "Liquidity": {
          "brand": {
            "@qclass": "slot",
            "index": 3
          },
          "value": {
            "@qclass": "bigint",
            "digits": "1000"
          }
        }
      }
    },
    "seatHandle": {
      "@qclass": "slot",
      "iface": "Alleged: SeatHandleHandle",
      "index": 6
    }
  }
]

The slots were ["o+56","o-149","o-54","o-124","o-119","o-150","o-151"], making the arguments more like:

[
  InvitationHandle(o+56),
  ZoeSeatAdmin(o-149),
  {
    initialAllocation: {
      Central: { value: 1500000000n, brand: RUNBrand(o-54) },
      Liquidity: { value: 0n, brand: BLDPoolLiquidityBrand(o-124) },
      Secondary: { value: 300000000n, brand: BLDBrand(o-119) },
    },
    "notifier": Notifier(o-150),
    "offerArgs": undefined,
    "proposal": {
      "exit": { "onDemand": null },
      "give": {
        Central: { value: 1500000000n, brand: RUNBrand(o-54) },
        Secondary: { value: 300000000n, brand: BLDBrand(o-119) },
      },
      "want": {
        Liquidity: { value: 1000n, brand: BLDPoolLiquidityBrand(o-124) },
      }
    },
    seatHandle: SeatHandleHandle(o-151),
  }
]

Now, what's interesting is that the error message ("Brands in left Object [Alleged: BLDPoolLiquidity brand] {} and right Object [Alleged: BLDPoolLiquidity brand] {} should match but do not") is showing the same BLDPoolLiquidity brand name for each side.

I'm looking at the deliveries to that vat, and the only time BLDPoolLiquidity brand appears is in the arguments to handleOffer deliveries. I wrote a little tool to examine those deliveries, and only three iface-bearing imports appear:

Alleged: BLD brand: o-119
Alleged: BLDPoolLiquidity brand: o-124
Alleged: RUN brand: o-54

The only vref that arrives with BLDPoolLiquidity brand as its iface is o-124. So it doesn't seem like something from outside the vat is sending two distinct objects that just happen to have the same iface. That points to a bug in liveslots, maybe somehow it's allowing two distinct Presences for the same vref.

I think the next step is to talk with someone who knows the contract well, and figure out which deliveries are providing the two sides of that comparison.

Also to try to reproduce it in a single vat, from just the slogfile trace.

warner · 2022-06-11T02:49:02Z

For reference, Ari said this run was using 9d670a .

arirubinstein · 2022-06-12T18:50:21Z

All slogs @ https://drive.google.com/drive/folders/1qHpDlJsiEXj0541cpqziV15Y2OZdHgxW?usp=sharing

warner · 2022-06-12T23:06:18Z

All three validators match on that delivery, so at least it's not a nondeterminism in the JS engine (the delivery->results function).

warner · 2022-06-14T00:06:32Z

In the 679 vpool-xyk-amm deliveries leading up to the "brands do not match" error, I see 78 errors emitted in one of three categories:

37 cases of keyword "BLDPoolLiquidity" must be unique
40 cases of "secondaryBrand" not found [Alleged: BLD brand]
1 case of "secondaryBrand" already registered Object [Alleged: BLD brand]

I think this is the army of clients all trying to be the first to register something unique, which is at least the right conditions for a race bug.

Interestingly, the "already registered" error appears on delivery 303, in the middle of the other two types of errors. It seems surprising that it would continue to complain that BLD isn't found even after it complained that BLD was already registered, but I don't really know what the registration is about.

warner · 2022-06-14T01:15:10Z

@Chris-Hibbert and I narrowed this down to a race condition in the AMM's addPool.js code, specifically somewhere in

agoric-sdk/packages/run-protocol/src/vpool-xyk-amm/addPool.js

Lines 17 to 59 in 8ad7422

    
           export const makeAddIssuer = ( 
        
             zcf, 
        
             isInSecondaries, 
        
             brandToLiquidityMint, 
        
             getAddIssuerToReserve, 
        
           ) => { 
        
             /** 
        
              * @param {Issuer} secondaryIssuer 
        
              * @param {string} keyword 
        
              */ 
        
             return async (secondaryIssuer, keyword) => { 
        
               const [secondaryAssetKind, secondaryBrand] = await Promise.all([ 
        
                 E(secondaryIssuer).getAssetKind(), 
        
                 E(secondaryIssuer).getBrand(), 
        
               ]); 
        
               assert( 
        
                 !isInSecondaries(secondaryBrand), 
        
                 X`issuer ${secondaryIssuer} already has a pool`, 
        
               ); 
        
               assert( 
        
                 secondaryAssetKind === AssetKind.NAT, 
        
                 X`${keyword} asset not fungible (must use NAT math)`, 
        
               ); 
        
               const liquidityKeyword = `${keyword}Liquidity`; 
        
               zcf.assertUniqueKeyword(liquidityKeyword); 
        
               const mint = await zcf.makeZCFMint( 
        
                 liquidityKeyword, 
        
                 AssetKind.NAT, 
        
                 harden({ decimalPlaces: 6 }), 
        
               ); 
        
               await zcf.saveIssuer(secondaryIssuer, keyword); 
        
               // this ensures that getSecondaryIssuer(thisIssuer) will return even before the pool is created 
        
               brandToLiquidityMint.init(secondaryBrand, mint); 
        
               const { issuer: liquidityIssuer } = mint.getIssuerRecord(); 
        
               // defer lookup until necessary. more aligned with governed param we expect this to be eventually. 
        
               const addIssuerToReserve = getAddIssuerToReserve(); 
        
               // tell the reserve about this brand, which it will validate by calling back 
        
               // to AMM for the issuer 
        
               await addIssuerToReserve(secondaryBrand); 
        
               return liquidityIssuer; 
        
             };

Chris will write up the details properly, but my (weak) understanding is:

@arirubinstein 's load generator has dozens of clients fighting to create the RUN/BLD AMM
they're all using the same "one true BLD" Issuer, so there can only be one true RUN/BLD pool
this addIssuer() function wants to pick a winner, then create and return a new BLDLiquidity issuer for them
a race in this function causes multiple contestants (perhaps only the first two) to be told that they've won, each with a distinct liquidity token issuer, however only one of those issuers is stored for later use
the (really) losing client then comes back with an offer that references the "false" liquidity token issuer (the one they were told about), and it does not match the issuer that was stored under the BLD issuer, hence the mismatch

dtribble · 2022-06-14T16:47:06Z

All the assertUniqueKeyword are merely advisory because they are invalidated at each await. The test must be performed synchronously with the update. That means it must be in the same turn as zoeStorageManager.js#L133:

instanceRecordManager.addIssuerToInstanceRecord(keyword, issuerRecord);

which will always succeed, replacing any existing assignment to that keyword. Anything else is a TOCTOU confusion.

Chris-Hibbert · 2022-06-14T21:56:33Z

I think Zoe is safe and consistent.

the unsafe addIssuerToInstanceRecord is called from three places

zoeStorageManager:saveIssuer
- provided to zcf, which makes it available to contracts immediately after a call to assertUniqueKeyword()
zoeStorageManager:wrapIssuerKitWithZoeMint
- bottoms out in makeZcfMint(), which calls assertUniqueKeyword() immediately before.
zcfZygote:recordIssuer
- called in doMakeZcfMint, which is called from registerFeeMint and makeZcfMint, both of which call assertUniqueKeyword() before, and without an intervening await.

That still leaves an issue in the AMM's addIssuer, which safely creates a unique liquidity token per pool, but unsafely hands out incorrect doppelgangers to anyone who asks before the canonical secondary pool is finished being created.

dtribble · 2022-06-15T00:23:06Z

zoeStorageManager:saveIssuer

provided to zcf, which makes it available to contracts immediately after a call to assertUniqueKeyword()

It's not "immediately" because there is an intervening await in zoeStorageManager:saveIssuer:

    const saveIssuer = async (issuerP, keyword) => {
      const issuerRecord = await issuerStorage.storeIssuer(issuerP);
      await escrowStorage.createPurse(issuerRecord.issuer, issuerRecord.brand);
      instanceRecordManager.addIssuerToInstanceRecord(keyword, issuerRecord);
      return issuerRecord;
    };

So multiple issuers could be successfully stored as if they were the issuer for the keyword, but only the last one would win and be legit. That seems pretty unquestionably a race bug.

refs: #5571 There are two bugs in #5571, and this fixes only one of them.

Chris-Hibbert · 2022-06-15T17:43:25Z

Proposed solution:

saveIssuer() could call instanceRecordManager.reserveKeyword(keyword). That would add keyword to an internal list, and return a promise or a handle. The second attempt would be rebuffed. The first attempt would either redeem the handle or on an error, it would cancel the reservation.

instanceRecordStorage and zoeStorageManager can hold join responsibility for this, since they're both internal to Zoe.

Alternatively, we can disclaim the problem and just say "contracts should make sure to not register multiple issuers with the same keyword".

Any other approaches we should consider?

Chris-Hibbert · 2022-06-16T17:27:08Z

@erights please respond to the above proposal.

erights · 2022-06-16T20:51:12Z

I am reluctant to add more internal state, especially state that would need to be durable. Aside from that I'm still chewing on this. We should talk.

refs: #5571 There are two bugs in #5571, and this fixes only one of them.

* fix: handle parallel requests for the same issuer in AMM's addIssuer refs: #5571 There are two bugs in #5571, and this fixes only one of them. * chore: clarify a comment * refactor: revert to multiple await style; handle race at .init() Added a test for multiple attempts to add the same issuer with the same keyword. * refactor: rearrange error handling drop try-catch for brand-to-promise add catch for saving issuer and notifying Reserve * chore: delete from brandToLiquidityMint on failure name cleanup * chore: cleanups: types, variable name, logging * chore: move DISPLAY_INFO to top level scope * chore: delete entries from brandToLiquidityIssuer when done This is probably what was intended instead of the code fixed in #5611 * chore: convert a .then to E.when(). * refactor: return addIssuer as named (rather than anonymous) function

Chris-Hibbert · 2022-07-06T18:44:19Z

#5692 is directed at the remaining issues.

arirubinstein added bug Something isn't working SwingSet package: SwingSet labels Jun 10, 2022

warner added Inter-protocol Overarching Inter Protocol AMM and removed SwingSet package: SwingSet labels Jun 14, 2022

warner assigned Chris-Hibbert Jun 14, 2022

warner mentioned this issue Jun 14, 2022

replay error involving promise fulfilling when it should reject (on linux/arm) #5587

Closed

Chris-Hibbert added a commit that referenced this issue Jun 15, 2022

fix: handle parallel requests for the same issuer in AMM's addIssuer

2209cc8

refs: #5571 There are two bugs in #5571, and this fixes only one of them.

Chris-Hibbert mentioned this issue Jun 15, 2022

fix: handle parallel requests to add an issuer in AMM #5596

Merged

warner mentioned this issue Jun 15, 2022

fix extract-transcript and replay-transcript tools #5602

Closed

Chris-Hibbert mentioned this issue Jun 21, 2022

AMM enforce minimum liquidity by wrapping addPool with a new contract #5634

Closed

Chris-Hibbert added a commit that referenced this issue Jun 22, 2022

fix: handle parallel requests for the same issuer in AMM's addIssuer

9c3d5ea

refs: #5571 There are two bugs in #5571, and this fixes only one of them.

Chris-Hibbert added a commit that referenced this issue Jun 28, 2022

fix: handle parallel requests for the same issuer in AMM's addIssuer

18fbd56

refs: #5571 There are two bugs in #5571, and this fixes only one of them.

Chris-Hibbert added a commit that referenced this issue Jun 29, 2022

fix: handle parallel requests for the same issuer in AMM's addIssuer

0641849

refs: #5571 There are two bugs in #5571, and this fixes only one of them.

Chris-Hibbert added a commit that referenced this issue Jun 29, 2022

fix: handle parallel requests for the same issuer in AMM's addIssuer

67bc039

refs: #5571 There are two bugs in #5571, and this fixes only one of them.

Chris-Hibbert mentioned this issue Jul 6, 2022

zcfZygote.saveIssuer() has a race #5692

Open

Chris-Hibbert closed this as completed Jul 6, 2022

Tartuffo added this to the Mainnet 1 milestone Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Under load, brands do not match #5571

Under load, brands do not match #5571

arirubinstein commented Jun 10, 2022

warner commented Jun 11, 2022

warner commented Jun 11, 2022

arirubinstein commented Jun 12, 2022

warner commented Jun 12, 2022

warner commented Jun 14, 2022 •

edited

Loading

warner commented Jun 14, 2022

dtribble commented Jun 14, 2022 •

edited

Loading

Chris-Hibbert commented Jun 14, 2022 •

edited

Loading

dtribble commented Jun 15, 2022

Chris-Hibbert commented Jun 15, 2022

Chris-Hibbert commented Jun 16, 2022

erights commented Jun 16, 2022

Chris-Hibbert commented Jul 6, 2022

Under load, brands do not match #5571

Under load, brands do not match #5571

Comments

arirubinstein commented Jun 10, 2022

Describe the bug

warner commented Jun 11, 2022

warner commented Jun 11, 2022

arirubinstein commented Jun 12, 2022

warner commented Jun 12, 2022

warner commented Jun 14, 2022 • edited Loading

warner commented Jun 14, 2022

dtribble commented Jun 14, 2022 • edited Loading

Chris-Hibbert commented Jun 14, 2022 • edited Loading

dtribble commented Jun 15, 2022

Chris-Hibbert commented Jun 15, 2022

Chris-Hibbert commented Jun 16, 2022

erights commented Jun 16, 2022

Chris-Hibbert commented Jul 6, 2022

warner commented Jun 14, 2022 •

edited

Loading

dtribble commented Jun 14, 2022 •

edited

Loading

Chris-Hibbert commented Jun 14, 2022 •

edited

Loading