Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization Error in MarketMap::sync for SpotMarket #55

Closed
0xNico opened this issue Sep 23, 2024 · 7 comments
Closed

Deserialization Error in MarketMap::sync for SpotMarket #55

0xNico opened this issue Sep 23, 2024 · 7 comments

Comments

@0xNico
Copy link

0xNico commented Sep 23, 2024

Description

We are encountering a SizeMismatch error when deserializing SpotMarket data in the MarketMap::sync method. This error occurs during the initialization of the Drift Gateway, preventing the system from properly syncing market data.

Error Details

The error occurs in the following function:

<drift::state::spot_market::SpotMarket as anchor_lang::AccountDeserialize>::try_deserialize::h4a79f2753865082b

This suggests that there's a mismatch between the expected size of the SpotMarket struct and the actual size of the data being deserialized.

Possible Causes

  1. Version Mismatch: There might be an incompatibility between the SDK version and the on-chain program version, specifically for spot markets.
  2. Struct Definition Change: The SpotMarket struct definition in the SDK might not match the current on-chain data structure.
  3. Data Corruption: There could be corrupted or unexpected data in the spot market account being deserialized.
  4. Serialization/Deserialization Logic: There might be an issue in how the data is being serialized on-chain or deserialized in the SDK.

Impact

This error prevents the Drift Gateway from initializing properly, which could lead to system-wide issues and prevent proper market data synchronization.

Suggested Investigation Steps

  1. Verify that the SDK version matches the on-chain program version.
  2. Compare the SpotMarket struct definition in the SDK with the on-chain program's definition.
  3. Investigate any recent changes to the SpotMarket struct or related serialization/deserialization logic.
  4. Examine the raw data of the spot market accounts to ensure they are not corrupted.

Proposed Solution: Improved Logging and Error Handling

We suggest implementing more robust logging and error handling in the MarketMap::sync method. Here's a proposed update to the sync function:

pub(crate) async fn sync(&self) -> SdkResult<()> {
    if self.synced {
        return Ok(());
    }

    let sync_lock = self.sync_lock.as_ref().expect("expected sync lock");
    let lock = match sync_lock.try_lock() {
        Ok(lock) => lock,
        Err(_) => {
            log::warn!("Sync already in progress, skipping");
            return Ok(());
        }
    };

    log::info!("Starting market sync");

    // ... [existing code for setting up RPC request] ...

    if let OptionalContext::Context(accounts) = response {
        log::info!("Received {} accounts to process", accounts.value.len());

        let mut successful_syncs = 0;
        let mut skipped_syncs = 0;

        for (index, account) in accounts.value.into_iter().enumerate() {
            let slot = accounts.context.slot;
            let market_data = account.account.data;

            log::debug!(
                "Processing account {}/{}, data size: {}",
                index + 1,
                accounts.value.len(),
                market_data.len()
            );

            match decode::<T>(market_data.clone()) {
                Ok(data) => {
                    self.marketmap
                        .insert(data.market_index(), DataAndSlot { data, slot });
                    successful_syncs += 1;
                    log::debug!("Successfully processed market index: {}", data.market_index());
                }
                Err(e) => {
                    skipped_syncs += 1;
                    log::warn!(
                        "Skipping market due to decoding error. Data size: {}, Error: {:?}",
                        market_data.len(),
                        e
                    );
                    if log::log_enabled!(log::Level::Trace) {
                        log::trace!("Skipped market data: {:?}", market_data);
                    }
                }
            }
        }

        self.latest_slot
            .store(accounts.context.slot, Ordering::Relaxed);

        log::info!(
            "Sync completed. Successful: {}, Skipped: {}, Latest slot: {}",
            successful_syncs,
            skipped_syncs,
            accounts.context.slot
        );

        // Mark as synced even if some markets were skipped
        self.synced = true;
    } else {
        log::warn!("Received unexpected response format from RPC");
    }

    drop(lock);
    Ok(())
}

This implementation provides:

  • More detailed logging at various stages of the sync process.
  • Error handling that allows the sync to continue even if some markets fail to decode.
  • A summary of successful and failed syncs.
  • The ability to trace raw market data for failed decodes (when trace logging is enabled).

Additional Context

We're using the latest version of the Drift Protocol SDK. The error occurs consistently during the Drift Gateway initialization process.

Next Steps

We would appreciate your insights on this issue and the proposed solution. If you need any additional information or if you'd like us to perform any specific tests, please let us know.

@jordy25519
Copy link
Collaborator

jordy25519 commented Sep 23, 2024

thank you for the detailed issue!

please confirm 2 things:

  1. which rustc/cargo tooclchain was used to compile this? e.g output of rustup show active-toolchain from the build dir

  2. which platform/OS are you using?

must use a rust version <=1.76.0 and the toolchain host should be x86_64
e.g on apple m1 must use1.76.0-x86_64-apple-darwin

I'll take a look at your improved logging suggestions

@0xNico
Copy link
Author

0xNico commented Sep 23, 2024

stable-aarch64-apple-darwin (default)
rustc 1.78.0 (9b00956e5 2024-04-29)

Mac M2.

Will attempt to switch toolchain host and report back.

@0xNico
Copy link
Author

0xNico commented Sep 23, 2024

Unfortunately changing the toolchain does not resolve the problem. Although the gateway is able to be initialised, after just a few seconds it will panic. Likely due to bytemuck. Running RUST_BACKTRACE=full reveals the original details from the opening of the issue.

thread 'main' panicked at /Users/nico/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.16.3/src/internal.rs:32:3:
from_bytes>SizeMismatch

@jordy25519
Copy link
Collaborator

hmm and was it rust version <= 1.76.0? see this thread for details, the layout of i/u128 changed and it broke bytemuck zero copy after rust 1.77.0: drift-labs/protocol-v2#891 (comment)

if downgrading rust versions doesn't solve it, then some info about the gateway RPC provider used, startup command, and commit would be great to help me reproduce the issue.

@0xNico
Copy link
Author

0xNico commented Sep 23, 2024

downgrading to rustc 1.76.0 using the recommended in repository does not solve the issue.

gateway RPC provider is a helius mainnet RPC.
command is as recommended - <rpc_url>, <emulation> <pubkey> <gateway path>
commit is latest hash of drift-gateway.

@0xNico
Copy link
Author

0xNico commented Sep 24, 2024

  • Checked anchor version on compile, vendors and forces to 0.29.0
  • Checked bytemuck version 1.16.2 to be compatible with ^
  • Checked through gateway dependencies and lock multiple times.
  • Check toolchain being used - implemented override for correct version.
  • Try with exact copy of cargo.lock from repository - no work.

Here is a copy of the RUST_BACKTRACE which highlights the issue surrounding PerpMarket deserialisation.
Screenshot 2024-09-24 at 14 44 54

@0xNico
Copy link
Author

0xNico commented Sep 24, 2024

Issue Resolved.

To resolve a similar issue to mine you must vendor bytemuck = "=1.17.0" and ensure pythnet dependency is patched in cargo to not pull anchor-lang = "0.30.0" from there - force override rust toolchain to <=1.76.0 (86 64 non aarch) before each build as I found sometimes the toolchain could return to previous and be hard to find.

@0xNico 0xNico closed this as completed Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants