-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MOC collaboration phase 1: Create a Swift/OpenStack storage driver #2909
Comments
Launched a brand-new CentOS node on the MOC, at 129.10.3.145.
|
@pameyer from @sbgrid expressed interest in object storage (S3) at http://irclog.iq.harvard.edu/dataverse/2016-02-25#i_31584 . e672380 is a good commit to look at (heads up @bmckinney ). Also, this issue is related to the more general "object storage" issue at #1347. |
Scholars Portal in Ontario is both a Dataverse and OpenStack Swift user -- we have a 350TB Swift object store in production. Would be happy to help with testing as this develops. |
I was wondering which branch I could use to get the version of DV that supports Swift. I would be interested in trying it with our local cluster. Alan On Jul 29, 2016, at 2:38 PM, Danny Brooke <notifications@github.commailto:notifications@github.com> wrote: Hey @landreevhttps://github.com/landreev, @anuj-rthttps://github.com/anuj-rt - is this moving forward with the MOC work? If it's done or being tracked elsewhere can we close it? Thanks! — |
@anuj-rt ok, sorry. I guess I got excited by stuff like |
@pdurbin Yes, that line pulls the URL of the file stored on Swift but the authentication method is different. Actually, @landreev does have a separate swift branch which works for Swift users and if I understand the above scenario correctly that branch will work for its purpose. Even this branch can be made to work with both keystone and swift users, some changes in the code base will be required. |
That is interesting. What version is that branch at? We are running 4.5.1 now and would love to have the option of having data files stored on our OpenStack Swift storage cluster — for replication, to eliminate file storage size limits, etc… Alan On Oct 24, 2016, at 4:26 PM, Anuj Thakur <notifications@github.commailto:notifications@github.com> wrote: @pdurbinhttps://github.com/pdurbin Actually, @landreevhttps://github.com/landreev does have a separate swift branch which works for Swift users and if I understand the above scenario correctly that branch will work for its purpose. Even this branch can be made to work with both keystone and swift users, some changes in the code base will be required. — |
You are not alone, Phil. It would be great if there were a generic implementation of storage services in DV that could be instantiated with code for third party providers — block storage, object storage, maybe even offline storage. getFile() etc... On Oct 24, 2016, at 4:20 PM, Philip Durbin <notifications@github.commailto:notifications@github.com> wrote: @anuj-rthttps://github.com/anuj-rt ok, sorry. I guess I got excited by stuff like this.getDataFile().setStorageIdentifier(swiftFileObject.getPublicURL()). Carry on! — |
@spalan for more on what @anuj-rt is up to I highly recommend his series of blog posts:
I just discovered these myself at https://groups.google.com/d/msg/dataverse-community/tDjFfGQ-f0Q/psIVyNLgAQAJ |
A note on performance: First of all, let's be careful not to jump to conclusion about swift in general, that it is "slow", based on the fact that it is observably slow in our current test build, used with this particular swift end point (http://rdgw.kaizen.massopencloud.org/swift/v1). It is of course slow. Do note that it may have nothing to do with the actual swift technology at all - and be solely the result of the network speed between us and BU. In reality, it's probably a combination of both things. The MOC people themselves appear to believe that the node is indeed slow. Just like with everything cloud based, its speed is likely the function of the actual hardware behind the virtual front. Still, I'm guessing it should be fair to assume that, in practice, accessing these buckets over the wire will always be more expensive than getting the bytes off the local disk. We can assume that it should be possible to buy guaranteed faster server, or use an endpoint closer to the server, etc. etc. But then it is actually more important, to test this thing with a slow node - just in order to identify all the bottlenecks. And for that purpose this node is just perfect. Writing on that remote node over the wire is especially slow. The save operation on the upload page, as currently experienced on dataverse-internal is very slow, and that's just how it is; what we are seeing is the speed of that remote transfer, there's no wasted overhead there, as far as I can tell. The performance of the dataset page, when there are thumbnails, was even worse - but that was actually caused by something wasteful and inefficient. Namely, while we were caching the actual images, once read from the files, we were still assuming that checking if a cached file exists on the filesystem was free, so we kept doing it repeatedly via various rendered="..." logic rules on the page (which primefaces keeps calling repeatedly as it's rendering the page). With swift, even these "if (file.exists())" calls are NOT free. Plus the swift driver kept repeating an expensive authentication handshake before each check... so that was snowballing out of control. I checked in some improvements for that yesterday. Simple stuff - cache everything, don't re-authenticate unless you have to; and it makes a difference. A dataset with a screen full of images should now be loading in a more reasonable time. Note that it will still be slower, if you run it against a dataset with the same images stored locally. But that's life. Also note that it was solely on account of thumbnails. A page for a dataset without images loads just as fast, regardless of where the files are stored. (and if thumbnails are a problem, they can actually be disabled). We may ask the MOC team for credentials for the swift node that they are planning to use for the actual storage-and-computing project (swift-1.massopencloud.org, I think?) - and see if it's any faster. Or see how it performs on their server, which is on a local subnet link to the cloud... But as I said, testing the worst case scenario may be even more important/useful. |
Good info, thanks for the write up. I'd also like to add that what initially appears to be a performance issue, like the inefficient thumbnail code based on local access assumption, may be a coding issue so it's worth taking a look to eliminate that. That said, there is at least one issue that seems to fall into this category: Another, simpler example of the above dataset save after file upload issue: Also: Performance issues aside, the above two issues are what's left. |
#3747 This is a new branch based on 3747-swift-with-derivative-file-support Conflicts (3747-swift-with-derivative-file-support wins): doc/sphinx-guides/source/installation/config.rst src/main/java/edu/harvard/iq/dataverse/dataaccess/DataAccess.java src/main/java/edu/harvard/iq/dataverse/dataaccess/SwiftAccessIO.java
There's a third issue to look into...
I first noticed this on e2f5dcc on the new 909-3747-swift-compute-button branch but the test also fails as of 7dc197a on the
|
Regarding the "frozen upload page" bug - the reason I never saw it as I was working on it, is that it only happens behind Apache. If you go to dataverse-internal directly, at http://dataverse-internal.iq.harvard.edu:8080, it's working. It's still slow, but working. |
I was hoping that this was that "chunking encoding bug" biting us in the ass again... i.e., that we may have forgotten to apply that grizzly patch to Glassfish on that server... but no, the patch is there. |
OK, so this had nothing to do with tabular ingests, or any kind of extra processing, or any database timing conflicts (as were guessing earlier). It's fixed by increasing the timeout, on this line in /etc/httpd/conf.d/dataverse.conf: ProxyPass / ajp://localhost:8009/ timeout=1800 Once again, it's still slow-ish. @kcondon, your test pack still takes 2 min. to save. But it works, the files get saved, the ingests get started and are completed eventually, the thumbnails are generated, etc. |
Re: Download url results in filesystem name in Swift rather than original name as in local file. The whole point of that "Download URL", on the files page, was to have a direct link to the swift location, for swift files. So when you click on it, you are downloading from Swift, not from us. Swift does not know anything about our human-readable file name. So that generated name is all you get. At some point Anuj made an attempt to use the "pretty" file names for storage on the swift side... It was decided to reverse that change, because the pretty file names are editable by the owner, so it would be much harder to guarantee uniqueness on the swift side. The MOC team agreed that they were OK with those machine-generated file names. And you still have the "download" button, that will send you to our download API, that will give you the human-readable file name. (all that said, may be a better solution would be to provide this direct swift url in addition to our API url on the file page, not instead of it? And maybe it could benefit from some additional label - like, "this file is stored on a Swift end point; you can access it directly there at ..."; rather than just showing it as a "Download URL". But, since this is a UI issue, I suggest that it's addressed/handled in the UI ("compute button") issue; if you feel it's worth addressing. |
@kcondon: I pushed a commit removing the extra logging/experimental code added to the branch while investigating this. |
@pdurbin the RA test, FileMetadataIT should now be passing. |
Yes! FileMetadataIT seems to pass now. Thanks! As of 423d1da I just ran
This behavior could well be something I introduced while working on #3559 and I'd like to fix this but I don't think it should block merging pull request #3788 if he deems it ready. Also I put a bug in the ears of @TaniaSchlatter @mheppler @dlmurphy and @jggautier about the "Download URL" issue you mentioned above having to do with the UI: https://iqss.slack.com/archives/G4D346Y4X/p1493730829970078 . As you say, perhaps this could be addressed in #3747 but it would need to be put one somebody's todo list. |
@landreev I've traced the creation of those three "thumb48" files to the
I'm pretty sure these "thumb48" files weren't being created when I added that test in pull request #3703. I'd like to suggest that we get things back to normal now in this issue (#2909) or #3747 (since we need to merge the latest into that "compute ui" branch) or #2460 which is the issue I'm using the track the fact that I had to disable all sorts of Search API related tests because I was getting |
@landreev I just pushed a fix at 8584031 into |
I just noticed that because the |
In 0357449 I resolved merged conflicts in pull request #3788. This would have been a blocker for @kcondon actually merging the code once he was done with QA. I merged the latest from develop into the |
Thanks @pdurbin for resolving the merge conflict and making the branch ready for merging. |
Creating this GitHub issue, so that I could make a branch for the code I'm writing. And to serve as a starting point for other developers who may get involved in this project.
Buzzword Glossary:
MOC - Massachusetts Open Cloud project;
OpenStack - open source OS for cloud computing; used by the MOC.
Swift aka OpenStack Object Storage - scalable/redundant storage system used in OpenStack;
JOSS/javaswift - Java client for Swift
Ceph - distributed storage platform; in our specific case, Ceph implements block access as Swift reads and writes data files.
The first, demo iteration of this storage driver will be using JOSS to read objects from a Swift endpoint. Thus Dataverse will be able to serve files physically stored on/by the MOC, transparently to the user. I.e. the file will be represented by a local DataFile object in the Dataverse database; and have its corresponding byte stream stored in the cloud and referenced by the storage identifier attribute of the DataFile. The code for this driver is based on the examples provided in javaswift/tutorial-joss-quickstart.
An instruction on how to access the MOC Swift "endpoint" and the Ceph storage "tenant" created for this collaboration is provided by Ata Turk from bu.edu here: https://www.dropbox.com/s/p053agfx31oxf6l/Readme.txt?dl=0 (ask me for the password).
The Dataverse instance for this project will be installed on a virtual server on the MOC. We'll have an account there that will allow us to launch a new VM from a CentOS image - should be trivial. (https://github.com/CCI-MOC/moc-public/wiki/Getting-started) I will add more info on that plus the address of the Dataverse node, etc.
The text was updated successfully, but these errors were encountered: