Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forum Loader Optimization #45

Open
ajskateboarder opened this issue Aug 2, 2023 · 21 comments
Open

Forum Loader Optimization #45

ajskateboarder opened this issue Aug 2, 2023 · 21 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@ajskateboarder
Copy link
Contributor

ajskateboarder commented Aug 2, 2023

It takes like 30 seconds to load one page of a topic

Maybe we should move from file-based archiving to something based in a DB (sqlite or supabase depending on how my current pr goes)

@redstone-dev
Copy link
Collaborator

I'd just like to say that a lot of that slowdown only came after I moved the pfp loading from the client to the server for some reason :P

I'll make a commit (basically) reverting that

@NotFenixio
Copy link
Contributor

It's probably because the posts are loaded synchronously, so you have to wait about 1-3 seconds for a post to load, and keeping in mind that there are about 10 posts per topic...

We need to make it asynchronous.

(note: i haven't worked with anything asynchronous in my life, but it is theoretically "faster".)

@redstone-dev
Copy link
Collaborator

We need to make it asynchronous.

I totally forgot asyncio existed. 🤦🏻
I don't know much about it so someone else can do it lmao

@NotFenixio
Copy link
Contributor

I don't know much about it so someone else can do it lmao

Who says a human has to do it? :trollface:

Jokes aside, I think making something asynchronous means we need to make the whole project asynchronous.

@redstone-dev
Copy link
Collaborator

That's actually not true. If I remember correctly, the way asyncio works, you can have some parts of your project async and the other synchronous.

@ajskateboarder
Copy link
Contributor Author

ajskateboarder commented Aug 5, 2023

It's probably because the posts are loaded synchronously, so you have to wait about 1-3 seconds for a post to load, and keeping in mind that there are about 10 posts per topic...

We need to make it asynchronous.

I think the issue comes from having to write every single API response to a cache directory (just in case scratchdb goes down), even if the same response already exists. Asynchronous code doesn't make I/O work faster.

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though

@redstone-dev
Copy link
Collaborator

redstone-dev commented Aug 5, 2023

I think the issue comes from having to write every single API response to a cache directory

I'm going to test this, actually, by removing the @archive_result() decorator from the get_topic_posts function.

@NotFenixio
Copy link
Contributor

NotFenixio commented Aug 6, 2023

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though

Yep, it's probably more reliable. Do I start making a parser, or should we wait until most of the site works?
nvm, let's get the site working first

Asynchronous code doesn't make I/O work faster.

The current code loads posts so that the server has to wait for the previous posts to load before loading the next one.
We could use asynchronous code to load multiple posts at once.

@redstone-dev
Copy link
Collaborator

redstone-dev commented Aug 15, 2023

I found that if you run flask run --with-threads, Snazzle runs a lot faster. We still do need to add asynchronous post loading, though.

EDIT:

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though

I will try working on this

@redstone-dev redstone-dev changed the title Can we talk about the forum browser performance Forum Browser Optimization Aug 15, 2023
@redstone-dev
Copy link
Collaborator

We could also try using Cython, which will compile Python to C which is much faster

@ajskateboarder
Copy link
Contributor Author

We could also try using Cython, which will compile Python to C which is much faster

What exactly would we use Cython for?

@redstone-dev
Copy link
Collaborator

redstone-dev commented Aug 15, 2023

We could also try using Cython, which will compile Python to C which is much faster

What exactly would we use Cython for?

After thinking about it, I think we'd need to convert all of Flask to use Cython, so it's probably better to optimize our existing code.

My initial thought was that our code would be converted to C and compiled so it would be faster. Correct me if I'm wrong, but I think this would be harder to develop for, because in order to make our code run in C, we have to do that explicitly and that requires special syntax, and most people that would contribute to Snazzle probably don't know this special syntax, therefore making it harder to develop for.

Also, somehow I at first confused the capabilities of Cython with those of PyPy.

Finally we could also add mypy for type checking which would make our code more type-safe.

@ajskateboarder
Copy link
Contributor Author

My initial thought was that our code would be converted to C and compiled so it would be faster.

Cython does not make code faster in all cases. It's typically used more for heavy math/statistics computing (such as numpy and pandas)

Correct me if I'm wrong, but I think this would be harder to develop for, because in order to make our code run in C, we have to do that explicitly and that requires special syntax, and most people that would contribute to Snazzle probably don't know this special syntax, therefore making it harder to develop for.

That, and also that you would need to install a C compiler, which would be Visual Studio on Windows 😭

@redstone-dev
Copy link
Collaborator

redstone-dev commented Aug 29, 2023

I'm going to use multiprocessing for this purpose.

@ajskateboarder
Copy link
Contributor Author

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data.

RSS only contains the most recent posts, so we can't show all posts from it unfortunately. I don't know what else to use if we want this to be reliable

@redstone-dev
Copy link
Collaborator

redstone-dev commented Aug 29, 2023

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data.

RSS only contains the most recent posts, so we can't show all posts from it unfortunately. I don't know what else to use if we want this to be reliable

We could get data from ScratchDB and then use RSS to top it up with data that ScratchDB hasn't indexed yet. If there's a ScratchDB outage we'll display an alert to the user that all older posts won't be visible until ScratchDB comes back online.

@redstone-dev redstone-dev changed the title Forum Browser Optimization Optimization Sep 6, 2023
@redstone-dev redstone-dev changed the title Optimization Overall Optimization Sep 6, 2023
@redstone-dev
Copy link
Collaborator

This is basically a non-issue with the new Svelte port. However, before we discontinue the legacy codebase I think it would be worthwile to refine it a bit.

@dynamixbot
Copy link
Member

What if the pages are loaded at the same time but the posts in those pages are loaded one-by one. This would mean that once 1 page is loaded, every other page is loaded too, not requiring any more processing. By setting the post count per page to 20, we just need to load 20 posts at the same time with others. So if a thread has 20 pages, then first it would load the first post of every page, then the second, the third and so on. We can do this by loading the posts by their ones digit. so we start from 1, which loads the 1 from every thread. then 2, then 3 and so on until 0 (0 comes at last because each page ends with 0 in the ones digit) . Or we can just ditch this and just try to make loading parallel instead of serial (which is my approach).

@dynamixbot dynamixbot changed the title Overall Optimization Forum Loader Optimization Apr 6, 2024
@dynamixbot dynamixbot added enhancement New feature or request help wanted Extra attention is needed labels Apr 26, 2024
@dynamixbot
Copy link
Member

bump

@redstone-dev
Copy link
Collaborator

With the release of Snazzle Production Server, bjoern should speed up page loading, but the main bottleneck (when ScratchDB still worked) was getting post data from it. It seems that we just need to make as little HTTP requests as possible to make Snazzle more performant.

@dynamixbot
Copy link
Member

With the release of Snazzle Production Server, bjoern should speed up page loading, but the main bottleneck (when ScratchDB still worked) was getting post data from it. It seems that we just need to make as little HTTP requests as possible to make Snazzle more performant.

It probably is faster (can't install snazzle 😭)
Also btw what do you think about my forum loader structure idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants