-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Dataverse to work with secondary reads from a scaled database #4765
Comments
@thaorell @pameyer and I have been discussing this issue at http://irclog.iq.harvard.edu/dataverse/2018-07-17 and heard that @tkmonson is working on it, so I dragged it to the new "Community Dev" column at https://waffle.io/IQSS/dataverse @thaorell emailed me some slides from a recent talk he and @tkmonson gave and said it's ok to upload them to this issue: Dataverse Lunch Talk.pdf Here are the most relevant slides for this issue (looks like great stuff!): |
I got on 48c59f5 on my laptop but to test I was hoping to use the new The error that @tkmonson reported in Slack is a "broken pipe" exception in Glassfish's server.log. @tkmonson at some point we need to bring your code and @thaorell 's code together. I will very likely ask you to resolve merge conflicts soon so if you want you can start to work on merging your code into his since his is the bigger change (#4805, above) and yours is smaller (#4827). So you can start looking at that if you want. |
@tkmonson hi! I'm catching up after a week long vacation. I see you made pull request #4916 changing |
@pdurbin Yes, it's ready for code review at this point. I just forgot to change the title of the PR after finishing the changes yesterday. |
@tkmonson One suggestion here. While this doesn't minimize the work here, it could help for future changes: Rather than embed the Persistence Context in each EJB, add a new EJB called something like EntityManagerBean and have that embed the Persistence Context. Have all other EJBs embed this new EJB and have all persistence unit related methods in this new EJB. This new EJB could have methods like getEntityManager() that always returns the main, and then have you can move this method from DatasetServiceBean onto the new EJB: That way, if other EJBs need this logic, we don't need to duplicate it, plus we can add more complex EntityManager logic, as needed. @michbarsinai thoughts on this? |
@scolapasta Way ahead of you! I'm almost finished making those exact changes. I'll try to push them tomorrow. |
In Slack @tkmonson asked us to move this issue to code review so I just did. |
@scolapasta has been tasked with working with the MOC and RH folks to put together a plan containing a few different options for moving Harvard Dataverse to MOC, one of which would leverage this code. As it stands now, we’ll put this PR on ice and not merge it, as it will potentially slow down development (developers will need to determine which entity manager to use - read vs write) and solves a problem that does not yet exist. If we decide to go down a path that will leverage this code, the work done here will prove valuable. I'll move this to the inbox for now. |
This is the PR that was put on ice: @danmcp are you still interested in this issue? If not, let's at least grab a beer sometime. 😄 🍻 |
@pdurbin I don't think any use cases are pushing it at the moment. No concerns if we want to close. Can always reopen later if the use case does pop up again. And it would be great to connect sometime! |
@danmcp exactly! Once I close this (and I will, thanks), there's a reopen button right there! I'll shoot you an email about that beer. 😄 |
With the recent containerization work, the postgres component added the ability to scale:
The problem now is that Dataverse isn't written to handle a scaled database. So in the above implementation, Dataverse is configured to only talk to the master instance of postgres. More work will need to be done in Dataverse to take advantage of the master slave setup to get any performance benefits. Namely handling primary vs. secondary reads. The implications could be widespread so using two datasources as a first step might be appropriate and switching over DB calls 1 by 1 to allow secondary reads will probably be the best approach.
The text was updated successfully, but these errors were encountered: