Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add back changes for solana token 2022 #6107

Merged
merged 15 commits into from
Jun 10, 2024
Merged

add back changes for solana token 2022 #6107

merged 15 commits into from
Jun 10, 2024

Conversation

jeff-dude
Copy link
Member

fyi @andrewhong5297
we tried to merge this PR and let it rebuild in prod + it's downstream dependencies. in the current prod setup, this fails on spill to disk error.

steps taken prior to revert:

  • run on current setup, spill to disk failure
  • reduce thread count, hoping less models running will help, same error
  • work with @belen-pruvost on the infra GH actions to increase cluster size, leveraging memory-intensive machines but same amount of workers, still failed on spill
  • increase disk size from 500 --> 1000GB (belen helped here, not part of GH action at this time)
  • change dbt cloud job to run solana transfers alone, then let the rest run in normal modified step
    • new step for transfers in simple select
    • exclude spell in modified step
  • then we hit dbt cloud errors
    • manually canceled a scheduled run that kicked off while cluster was rebuilding
    • instead of canceling like normal, one step within keeps spinning forever and can't cancel it
    • job itself marked as "succeeded" while this step still runs
    • future manual runs of the job fail on "manifest not available from previous run" which points to run above that is stuck

i have reverted the original PR and modified to dbt cloud job to try and get around these bugs. only incremental step is enabled, the setting for comparing to previous run is turned off to get around error. the cluster has been reverted back to its normal sizing.

assuming incremental runs as normal, we can reconfigure the dbt cloud job back to normal. then we would need to take steps again like above to try and get the solana spells to run in prod without failing.

fyi @aalan3

@jeff-dude
Copy link
Member Author

due to the dbt cloud bugs, we haven't tried running with increased disk size yet. the next time we try this, we should consider:

  • memory focused machines
  • more workers
  • increased disk size

@andrewhong5297
Copy link
Collaborator

thanks for pushing on this! I appreciate it 🙏

@aalan3
Copy link
Contributor

aalan3 commented Jun 7, 2024

I think we need to split these models into intermediary models in order for this to work, otherwise the next time we do a full refresh the bigger cluster might not work. I also suspect that running these with partition by creates memory issues.

For metadata:

  • Try to avoid using partition by this is really heavy usually
  • Split between the different token types - spl_token and token22

For transfers

  • Try to avoid partition by (it's done here when getting fees)
  • Split out fees into its own model
  • Split spl_token and token22

@andrewhong5297
Copy link
Collaborator

I think we need to split these models into intermediary models in order for this to work, otherwise the next time we do a full refresh the bigger cluster might not work. I also suspect that running these with partition by creates memory issues.

For metadata:

  • Try to avoid using partition by this is really heavy usually

  • Split between the different token types - spl_token and token22

For transfers

  • Try to avoid partition by (it's done here when getting fees)

  • Split out fees into its own model

  • Split spl_token and token22

tokens_solana.fungible runs fairly fast, I believe its transfers thats the problem.

We can do tokens_solana.fungible and the fee updates model in one PR, then the transfers model in another PR.

Does that work?

@aalan3
Copy link
Contributor

aalan3 commented Jun 7, 2024

We can do tokens_solana.fungible and the fee updates model in one PR, then the transfers model in another PR.

Yes that works

@andrewhong5297
Copy link
Collaborator

this PR should only be merged after this one #6113

, call_outer_instruction_index as outer_instruction_index
, COALESCE(call_inner_instruction_index,0) as inner_instruction_index
, call_outer_executing_account as outer_executing_account
FROM base tr
--get token and accounts
LEFT JOIN {{ ref('solana_utils_token_accounts') }} tk_s ON tk_s.address = tr.account_source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewhong5297 could these be inner joins?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes changed

, f.fee_time
, row_number() over (partition by tr.call_tx_id, tr.call_outer_instruction_index, tr.call_inner_instruction_index order by f.fee_time desc) as latest_fee
FROM {{ source('spl_token_2022_solana','spl_token_2022_call_transferChecked') }} tr
LEFT JOIN {{ ref('tokens_solana_fees_history') }} f ON tr.account_tokenMint = f.account_mint AND tr.call_block_time >= f.fee_time
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only care about the latest fee for a transfer (per predicate WHERE latest_fee = 1).
Could we join to a more narrow window of fees? Depending on how ofter there are fee updates, it could be for example one day:
... AND tr.call_block_time >= f.fee_time AND tr.call_block_time - interval '1 day' < f.fee_time
That would limit the cost of TopNRanking operator downstream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, because the latest fee might be set even a year or two ago.

@aalan3 aalan3 merged commit d06e7c7 into main Jun 10, 2024
1 of 2 checks passed
@aalan3 aalan3 deleted the add-solana-back branch June 10, 2024 11:57
@github-actions github-actions bot locked and limited conversation to collaborators Jun 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants