Replies: 1 comment 1 reply
-
Thank you for writing this up. Sounds pretty thorough... what would be the process for upgrading the DB server, though? Similar process? It seems a bit harder to test, but I don't know much about this. It also seems like the script may need to be updated for every Ubuntu version upgrade. Want to note here that Jason and I did quite a bit of tweaking to the mysql config on the db server based on advice from people on SO, and this improved performance pretty dramatically. So let's be sure to copy that config in whatever clone migration we do, it is not a default config! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
There was a recent security issue with the version of the operating system we use for MO. We were able to address this case by doing a fairly simple upgrade, but it raised some questions in my mind about how to be better about tracking such things and trying them out in a safe environment before committing to them. At the moment we have the following servers at Digital Ocean:
mushroomobserver.org - Main production server
db-2020-10.mushroomobserver.org - Our database server
test.mushroomobserver.org - A recently created server for testing some of the changes to the Create Observation workflow
Small Updates
The approach I took wasn't really well thought out and took some unnecessary risks. Specifically, I simply ran an automated update on the production server and then rebooted without doing any testing. After that I updated the test server. Fortunately this worked this time. However, in retrospect I really should have started with the test server and verified that things were working before doing it on production. Luckily we didn't have to change anything on the database server in this case, but we would have run the same risk without the possibility of trying it first on a distinct test database server. We could clone that server and try an upgrade, but I don't know for sure how much that would cost (it's our most expense server) or exactly how we'd test it once the clone was made.
Big Updates
The second issue is that the versions of the operating system we have on these systems are getting out of date. The database server is particularly old (2020) and no longer being "supported". The production and test servers are a bit more up to date (2022), but should be upgraded. Our process for big upgrades like this is pretty cumbersome at the moment. Essentially we create a new server by hand and work on it until it's working and then we swap out the server. Build and swap is not an unreasonable approach, but I think a lot could be done to simplify that setup process using scripts and some better automation tools.
Scalability
The other big concern I have is related to how scalable our current production server is. Specifically, what happens if 10-20 folks are simultaneously uploading observations to the system. To assess this concern I spent some time analyzing past behavior on the site. From what I can tell observation uploads take on average 6-7 seconds but can sometimes take around 30 seconds (based on a day and half worth of data). Over that time 113 observations were created.
Looking at it from another angle, our maximum number of observations in a day over the last year is 238. I would expect an event like NEMF or NAMA could increase that by 4x or more and would often be at specific times of day (when folks return from forays). The maximum we've every had uploaded in a day was 959 on July 17, 2012 which looks like a day that Jason decided to create a very large number of lichen observations (896). The next highest (865) was on 2014-09-03 when Christian created 794 from a trip to Alaska.
Assuming that Jason was pushing the system in some automated way it looks like he was averaging 15s per upload based on looking at the time between subsequent observations (median was 6s).
To address these performance concerns I think we'll want to enable running with multiple servers. We could consider focusing on threading, but I expect that would require a lot more development and will still have a ceiling based on how many cores the server has (our current server has 2 CPUs). Running multiple servers would be similar to what we're currently doing with the test and production servers except we'd put them behind a load balancer that would switch between the servers based on the load. In theory this approach would allow up to spin up and down servers as needed. It would also give us an easy way to provide a real maintenance page when we need to bring the system down during a code deployment or when updating the operating systems.
Suggested tasks
Beta Was this translation helpful? Give feedback.
All reactions