Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: fetch activity for collection as opposed to individual nft #217

Merged
merged 5 commits into from
Oct 13, 2023

Conversation

crnkovic
Copy link
Contributor

Summary

https://app.clickup.com/t/861nap53e

In this PR, we move away from fetching activity for each individual NFT, to fetching activity for entire collections. This greatly helps with performance as we suspect there are many, many NFTs that have no recent activity. Periodically running the job to fetch their activity is a waste of both the server resources and API quota as many of them will report no new activity.

By fetching the activity on the collection level, we can avoid many unnecessary requests and jobs being pushed to the queue. For example, if a collection has hundreds of NFTs, but none of those NFTs has any activity, this would be 1 job/request to ensure no new activity has happened for all those hundreds of NFTs.

We link NFTs with their respective activity by the collection ID and their token ID instead of the internal NFT ID as it is now.

As some collections have millions of NFTs and millions of activity items, we can prefetch the entire activity log for these collections and manually insert them into the database, avoiding unnecessary jobs/requests taking hours to index everything on the production server. Once we have all the activity up to a certain point in time (in the near past), we can schedule the job to sync the latest activity. I don't assume there will be a lot of recent activity that's too hard for us to handle, but we should fine-tune this as we have more actual data on how active collections are. For more details on data migration, look for comments on the ClickUp card.

Code changes

  • Instead of using nft_id in the nft_activity table, we now use collection_id and token_number. This allows us to prefetch the activity for the NFTs that do not exist in our database. Then we use those values as keys for activities relationship on the NFT model (diff).
  • When the collection is created in the database, we dispatch a FetchCollectionActivity job. This ensures we index the activity for newly created collections.
  • This PR adds the is_fetching_activity boolean column to collections. As fetching entire activity log can take hours, we want to ignore those collections that are still getting indexed from the scheduler. In essence, let it index in peace.
  • Rest of the logic is the same. We use the same Mnemonic endpoint (omitting the tokenId parameter) to fetch the activity. We UPSERT them, and if we get the limit (500 items) we dispatch another job in case there are more items to retrieve. Once we get less than 500 items, we can assume there is no more activity items to fetch.
  • By using timestamp as a cursor, FetchCollectionActivity job is restitant to errors. If the job is interrupted (for example failed job), the next time it runs it will just take the latest timestamp activity and start from that point.

Retrieving the latest activity only for a specific NFT

Since the job to fetch the collection activity uses the timestamp of the newest activity item as a cursor, dispatching a job to fetch the activity for a specific NFT would break both this logic (timestamp cursor would move) and sorting ("Recently Received") would be out of date as only some NFTs have their activity updated. So I opted not to include that in this PR at all. The only way to update the activity for a specific NFT is to update the activity for the entire collection.

Disabling activity indexing

There are times when we just want to disable indexing activities, as this takes a long time. This can be for demo, as we have a good set of data (10M) to test everything, or during the importing of production data. In that case, you can set ACTIVITIES_ENABLED=false environment variable.

Ignoring specific collections from having their activity indexed

If you want to ignore only some collections from having their activity indexed, you can update the activity_blacklist array to include the address of a collection in the config/dashbrd.php config file.

Setup

Our drive contains 2 files, demo.sql.gz and prod.sql.gz. Those contain activities data compressed into the archive. Workflow for this PR:

  1. Set ACTIVITIES_ENABLED=false environment variable so activities indexer doesn't trigger
  2. Truncate the nft_activity table. Make sure to restart the identity so the primary key sequence (ID) restarts. TablePlus has this option when truncating, which is how I did it.
  3. Deploy the PR and run migrations. You can repeat the truncation after this step just to be sure it's empty.
  4. Download gzips locally from drive and scp them onto the server into the /tmp directory: scp /path/to/demo.sql.gz forge@ip:/tmp
  5. SSH in the server, log in as postgres user and unzip the archive: sudo su postgres && cd /tmp && gzip -k -d demo.sql.gz
  6. Import the dump into the database (for prod this took around 5mins for me): psql -U postgres -d database_name < demo.sql (replace database_name with forge or whatever the database name is)
  7. Ensure it imported everything after it finishes
  8. Delete demo.sql and demo.sql.gz files
  9. Enable activity indexing: ACTIVITIES_ENABLED=true and restart Horizon with php artisan horizon:terminate. Ideally, we'd leave this disabled for demo as there's no need to keep indexing activities and SQL dump doesn't contain all activities.

If importing the dump into the demo fails, let me know. This will only happen if some of the records from the collections table have been removed, since we now match the collection_id from the dump to the id on the collections table. Our drive contains the CSV containing collections for both demo and prod as well.

Testing

For locally testing the PR, you don't need anything special. Set the ACTIVITIES_ENABLED=true and run php artisan collections:fetch-activity --collection-id=1 or whatever ID you want. Make sure to pass the ID or it will run for every collection and will take a while.

Checklist

  • I checked my UI changes against the design and there are no notable differences
  • I checked my UI changes for any responsiveness issues
  • I checked my (code) changes for obvious issues, debug statements and commented code
  • I opened a corresponding card on Clickup for any remaining TODOs in my code
  • I added a short description on how to test this PR (if necessary)
  • I added a storybook entry for the component that was added (if necessary)
  • Documentation (if necessary)
  • Tests (if necessary)
  • Ready to be merged

@crnkovic crnkovic marked this pull request as ready for review October 12, 2023 13:20
Copy link
Contributor

@ItsANameToo ItsANameToo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one issue Sam found and I was able to reproduce: App\Data\Collections\CollectionNftData::fromModel(): Argument #1 ($nft) must be of type App\Models\Nft, null given. This occurs when you have a collection owned by a user and fetch activity for it. Navigating to the collection page will throw the error as the user only owns a subset of the total amount of NFTs and it seems to try to find all the NFTs belonging to the activity rows even though we may not have all of them in our database

@ItsANameToo
Copy link
Contributor

The issue is with nft: CollectionNftData::fromModel($nftActivity->nft), as that can be null (in Data/Nfts/NftActivityData.php)

@ItsANameToo ItsANameToo marked this pull request as draft October 12, 2023 13:30
@crnkovic crnkovic marked this pull request as ready for review October 12, 2023 14:23
@ItsANameToo ItsANameToo added this to the 0.8.0 milestone Oct 12, 2023
@ItsANameToo ItsANameToo changed the title refactor: fetch activity for the entire collection as opposed to each nft refactor: fetch activity for collection as opposed to individual nft Oct 13, 2023
@ItsANameToo ItsANameToo merged commit bab3ce2 into rc-0.8.0 Oct 13, 2023
22 checks passed
@ItsANameToo ItsANameToo deleted the refactor/collection-activity-rebase branch October 13, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants