refactor: fetch activity for collection as opposed to individual nft #217

crnkovic · 2023-10-12T12:57:59Z

Summary

https://app.clickup.com/t/861nap53e

In this PR, we move away from fetching activity for each individual NFT, to fetching activity for entire collections. This greatly helps with performance as we suspect there are many, many NFTs that have no recent activity. Periodically running the job to fetch their activity is a waste of both the server resources and API quota as many of them will report no new activity.

By fetching the activity on the collection level, we can avoid many unnecessary requests and jobs being pushed to the queue. For example, if a collection has hundreds of NFTs, but none of those NFTs has any activity, this would be 1 job/request to ensure no new activity has happened for all those hundreds of NFTs.

We link NFTs with their respective activity by the collection ID and their token ID instead of the internal NFT ID as it is now.

As some collections have millions of NFTs and millions of activity items, we can prefetch the entire activity log for these collections and manually insert them into the database, avoiding unnecessary jobs/requests taking hours to index everything on the production server. Once we have all the activity up to a certain point in time (in the near past), we can schedule the job to sync the latest activity. I don't assume there will be a lot of recent activity that's too hard for us to handle, but we should fine-tune this as we have more actual data on how active collections are. For more details on data migration, look for comments on the ClickUp card.

Code changes

Instead of using nft_id in the nft_activity table, we now use collection_id and token_number. This allows us to prefetch the activity for the NFTs that do not exist in our database. Then we use those values as keys for activities relationship on the NFT model (diff).
When the collection is created in the database, we dispatch a FetchCollectionActivity job. This ensures we index the activity for newly created collections.
This PR adds the is_fetching_activity boolean column to collections. As fetching entire activity log can take hours, we want to ignore those collections that are still getting indexed from the scheduler. In essence, let it index in peace.
Rest of the logic is the same. We use the same Mnemonic endpoint (omitting the tokenId parameter) to fetch the activity. We UPSERT them, and if we get the limit (500 items) we dispatch another job in case there are more items to retrieve. Once we get less than 500 items, we can assume there is no more activity items to fetch.
By using timestamp as a cursor, FetchCollectionActivity job is restitant to errors. If the job is interrupted (for example failed job), the next time it runs it will just take the latest timestamp activity and start from that point.

Retrieving the latest activity only for a specific NFT

Since the job to fetch the collection activity uses the timestamp of the newest activity item as a cursor, dispatching a job to fetch the activity for a specific NFT would break both this logic (timestamp cursor would move) and sorting ("Recently Received") would be out of date as only some NFTs have their activity updated. So I opted not to include that in this PR at all. The only way to update the activity for a specific NFT is to update the activity for the entire collection.

Disabling activity indexing

There are times when we just want to disable indexing activities, as this takes a long time. This can be for demo, as we have a good set of data (10M) to test everything, or during the importing of production data. In that case, you can set ACTIVITIES_ENABLED=false environment variable.

Ignoring specific collections from having their activity indexed

If you want to ignore only some collections from having their activity indexed, you can update the activity_blacklist array to include the address of a collection in the config/dashbrd.php config file.

Setup

Our drive contains 2 files, demo.sql.gz and prod.sql.gz. Those contain activities data compressed into the archive. Workflow for this PR:

Set ACTIVITIES_ENABLED=false environment variable so activities indexer doesn't trigger
Truncate the nft_activity table. Make sure to restart the identity so the primary key sequence (ID) restarts. TablePlus has this option when truncating, which is how I did it.
Deploy the PR and run migrations. You can repeat the truncation after this step just to be sure it's empty.
Download gzips locally from drive and scp them onto the server into the /tmp directory: scp /path/to/demo.sql.gz forge@ip:/tmp
SSH in the server, log in as postgres user and unzip the archive: sudo su postgres && cd /tmp && gzip -k -d demo.sql.gz
Import the dump into the database (for prod this took around 5mins for me): psql -U postgres -d database_name < demo.sql (replace database_name with forge or whatever the database name is)
Ensure it imported everything after it finishes
Delete demo.sql and demo.sql.gz files
Enable activity indexing: ACTIVITIES_ENABLED=true and restart Horizon with php artisan horizon:terminate. Ideally, we'd leave this disabled for demo as there's no need to keep indexing activities and SQL dump doesn't contain all activities.

If importing the dump into the demo fails, let me know. This will only happen if some of the records from the collections table have been removed, since we now match the collection_id from the dump to the id on the collections table. Our drive contains the CSV containing collections for both demo and prod as well.

Testing

For locally testing the PR, you don't need anything special. Set the ACTIVITIES_ENABLED=true and run php artisan collections:fetch-activity --collection-id=1 or whatever ID you want. Make sure to pass the ID or it will run for every collection and will take a while.

Checklist

I checked my UI changes against the design and there are no notable differences
I checked my UI changes for any responsiveness issues
I checked my (code) changes for obvious issues, debug statements and commented code
I opened a corresponding card on Clickup for any remaining TODOs in my code
I added a short description on how to test this PR (if necessary)
I added a storybook entry for the component that was added (if necessary)
Documentation (if necessary)
Tests (if necessary)
Ready to be merged

ItsANameToo

one issue Sam found and I was able to reproduce: App\Data\Collections\CollectionNftData::fromModel(): Argument #1 ($nft) must be of type App\Models\Nft, null given. This occurs when you have a collection owned by a user and fetch activity for it. Navigating to the collection page will throw the error as the user only owns a subset of the total amount of NFTs and it seems to try to find all the NFTs belonging to the activity rows even though we may not have all of them in our database

ItsANameToo · 2023-10-12T13:27:19Z

The issue is with nft: CollectionNftData::fromModel($nftActivity->nft), as that can be null (in Data/Nfts/NftActivityData.php)

fix: rebase collection activity on rc-0.8.0

a4d540e

github-actions bot added the refactor label Oct 12, 2023

style: resolve style guide violations

af9d7da

ItsANameToo mentioned this pull request Oct 12, 2023

refactor: fetch activity for the entire collection as opposed to each nft #68

Closed

9 tasks

wip

00fb34d

crnkovic marked this pull request as ready for review October 12, 2023 13:20

crnkovic requested review from samharperpittam and boldninja as code owners October 12, 2023 13:20

ItsANameToo requested changes Oct 12, 2023

View reviewed changes

ItsANameToo marked this pull request as draft October 12, 2023 13:30

crnkovic added 2 commits October 12, 2023 16:14

wip

835a117

wip

11fda84

crnkovic marked this pull request as ready for review October 12, 2023 14:23

ItsANameToo approved these changes Oct 12, 2023

View reviewed changes

ItsANameToo added this to the 0.8.0 milestone Oct 12, 2023

oXtxNt9U approved these changes Oct 13, 2023

View reviewed changes

ItsANameToo changed the title ~~refactor: fetch activity for the entire collection as opposed to each nft~~ refactor: fetch activity for collection as opposed to individual nft Oct 13, 2023

ItsANameToo merged commit bab3ce2 into rc-0.8.0 Oct 13, 2023
22 checks passed

ItsANameToo deleted the refactor/collection-activity-rebase branch October 13, 2023 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: fetch activity for collection as opposed to individual nft #217

refactor: fetch activity for collection as opposed to individual nft #217

crnkovic commented Oct 12, 2023

ItsANameToo left a comment

ItsANameToo commented Oct 12, 2023

refactor: fetch activity for collection as opposed to individual nft #217

refactor: fetch activity for collection as opposed to individual nft #217

Conversation

crnkovic commented Oct 12, 2023

Summary

Code changes

Retrieving the latest activity only for a specific NFT

Disabling activity indexing

Ignoring specific collections from having their activity indexed

Setup

Testing

Checklist

ItsANameToo left a comment

Choose a reason for hiding this comment

ItsANameToo commented Oct 12, 2023