Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOC collaboration phase 1: Create a Swift/OpenStack storage driver #2909

Closed
landreev opened this issue Feb 3, 2016 · 28 comments
Closed

MOC collaboration phase 1: Create a Swift/OpenStack storage driver #2909

landreev opened this issue Feb 3, 2016 · 28 comments
Assignees
Labels
Type: Feature a feature request

Comments

@landreev
Copy link
Contributor

landreev commented Feb 3, 2016

Creating this GitHub issue, so that I could make a branch for the code I'm writing. And to serve as a starting point for other developers who may get involved in this project.

Buzzword Glossary:

MOC - Massachusetts Open Cloud project;
OpenStack - open source OS for cloud computing; used by the MOC.
Swift aka OpenStack Object Storage - scalable/redundant storage system used in OpenStack;
JOSS/javaswift - Java client for Swift
Ceph - distributed storage platform; in our specific case, Ceph implements block access as Swift reads and writes data files.

The first, demo iteration of this storage driver will be using JOSS to read objects from a Swift endpoint. Thus Dataverse will be able to serve files physically stored on/by the MOC, transparently to the user. I.e. the file will be represented by a local DataFile object in the Dataverse database; and have its corresponding byte stream stored in the cloud and referenced by the storage identifier attribute of the DataFile. The code for this driver is based on the examples provided in javaswift/tutorial-joss-quickstart.

An instruction on how to access the MOC Swift "endpoint" and the Ceph storage "tenant" created for this collaboration is provided by Ata Turk from bu.edu here: https://www.dropbox.com/s/p053agfx31oxf6l/Readme.txt?dl=0 (ask me for the password).

The Dataverse instance for this project will be installed on a virtual server on the MOC. We'll have an account there that will allow us to launch a new VM from a CentOS image - should be trivial. (https://github.com/CCI-MOC/moc-public/wiki/Getting-started) I will add more info on that plus the address of the Dataverse node, etc.

@landreev landreev self-assigned this Feb 3, 2016
@landreev
Copy link
Contributor Author

landreev commented Feb 4, 2016

Launched a brand-new CentOS node on the MOC, at 129.10.3.145.
(ask me for ssh credentials if you need to log in).
To get your own MOC account: fill the form at https://github.com/CCI-MOC/moc-public/wiki/MOC-Production-(Kaizen)-User-Account-Requests
Once you have received the password for the account:
The "getting-started" instructions, at https://github.com/CCI-MOC/moc-public/wiki/Getting-started appear to be missing a couple of important steps (as of today). But video tutorials are provided, where everything is explained:

  1. how to configure and launch a VM:
    skip to 3:15!
    https://www.youtube.com/watch?v=9_PbcPV_jEU
  2. once the VM is running, how to ssh to it from the outside:
    https://www.youtube.com/watch?v=ZjdrVHPjltI

@pdurbin
Copy link
Member

pdurbin commented Feb 25, 2016

@pameyer from @sbgrid expressed interest in object storage (S3) at http://irclog.iq.harvard.edu/dataverse/2016-02-25#i_31584 . e672380 is a good commit to look at (heads up @bmckinney ). Also, this issue is related to the more general "object storage" issue at #1347.

@spalan
Copy link

spalan commented Feb 25, 2016

Scholars Portal in Ontario is both a Dataverse and OpenStack Swift user -- we have a 350TB Swift object store in production. Would be happy to help with testing as this develops.

@djbrooke
Copy link
Contributor

Hey @landreev, @anuj-rt - is this moving forward with the MOC work? If it's done or being tracked elsewhere can we close it? Thanks!

@spalan
Copy link

spalan commented Jul 29, 2016

I was wondering which branch I could use to get the version of DV that supports Swift. I would be interested in trying it with our local cluster.

Alan

On Jul 29, 2016, at 2:38 PM, Danny Brooke <notifications@github.commailto:notifications@github.com> wrote:

Hey @landreevhttps://github.com/landreev, @anuj-rthttps://github.com/anuj-rt - is this moving forward with the MOC work? If it's done or being tracked elsewhere can we close it? Thanks!


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/2909#issuecomment-236260233, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABzXrNBS6xFFcvtpy0n9Wj8eARR_AeUhks5qakiagaJpZM4HSrLr.

@pdurbin
Copy link
Member

pdurbin commented Oct 22, 2016

@spalan pull request #3239 contains some Swift code by @anuj-rt but I'm not sure exactly what it does.

@anuj-rt
Copy link
Contributor

anuj-rt commented Oct 24, 2016

@spalan @pdurbin @landreev pull request #3239 contains swift code with keystone authentication. Instead of getting a token from Swift service endpoint we get a token from Keystone identity service. This was needed moving forward with the use of MOC's OpenStack production environment.

@pdurbin
Copy link
Member

pdurbin commented Oct 24, 2016

@anuj-rt ok, sorry. I guess I got excited by stuff like this.getDataFile().setStorageIdentifier(swiftFileObject.getPublicURL()). Carry on!

@anuj-rt
Copy link
Contributor

anuj-rt commented Oct 24, 2016

@pdurbin Yes, that line pulls the URL of the file stored on Swift but the authentication method is different.

Actually, @landreev does have a separate swift branch which works for Swift users and if I understand the above scenario correctly that branch will work for its purpose. Even this branch can be made to work with both keystone and swift users, some changes in the code base will be required.

@spalan
Copy link

spalan commented Oct 25, 2016

That is interesting. What version is that branch at? We are running 4.5.1 now and would love to have the option of having data files stored on our OpenStack Swift storage cluster — for replication, to eliminate file storage size limits, etc…

Alan

On Oct 24, 2016, at 4:26 PM, Anuj Thakur <notifications@github.commailto:notifications@github.com> wrote:

@pdurbinhttps://github.com/pdurbin Actually, @landreevhttps://github.com/landreev does have a separate swift branch which works for Swift users and if I understand the above scenario correctly that branch will work for its purpose. Even this branch can be made to work with both keystone and swift users, some changes in the code base will be required.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/2909#issuecomment-255855811, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABzXrPwqwqq3H4R0N77OWe_P6hcGb4Sfks5q3RSOgaJpZM4HSrLr.

@spalan
Copy link

spalan commented Oct 25, 2016

You are not alone, Phil. It would be great if there were a generic implementation of storage services in DV that could be instantiated with code for third party providers — block storage, object storage, maybe even offline storage.

getFile()
putFile()
deleteFile()
replaceFile()

etc...

On Oct 24, 2016, at 4:20 PM, Philip Durbin <notifications@github.commailto:notifications@github.com> wrote:

@anuj-rthttps://github.com/anuj-rt ok, sorry. I guess I got excited by stuff like this.getDataFile().setStorageIdentifier(swiftFileObject.getPublicURL()). Carry on!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/2909#issuecomment-255854220, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABzXrCPA39866RAydfU6jB6j6Di-dsqPks5q3RMggaJpZM4HSrLr.

@pdurbin
Copy link
Member

pdurbin commented Oct 26, 2016

@landreev
Copy link
Contributor Author

A note on performance:

First of all, let's be careful not to jump to conclusion about swift in general, that it is "slow", based on the fact that it is observably slow in our current test build, used with this particular swift end point (http://rdgw.kaizen.massopencloud.org/swift/v1). It is of course slow. Do note that it may have nothing to do with the actual swift technology at all - and be solely the result of the network speed between us and BU. In reality, it's probably a combination of both things. The MOC people themselves appear to believe that the node is indeed slow. Just like with everything cloud based, its speed is likely the function of the actual hardware behind the virtual front.

Still, I'm guessing it should be fair to assume that, in practice, accessing these buckets over the wire will always be more expensive than getting the bytes off the local disk. We can assume that it should be possible to buy guaranteed faster server, or use an endpoint closer to the server, etc. etc. But then it is actually more important, to test this thing with a slow node - just in order to identify all the bottlenecks. And for that purpose this node is just perfect.

Writing on that remote node over the wire is especially slow. The save operation on the upload page, as currently experienced on dataverse-internal is very slow, and that's just how it is; what we are seeing is the speed of that remote transfer, there's no wasted overhead there, as far as I can tell.

The performance of the dataset page, when there are thumbnails, was even worse - but that was actually caused by something wasteful and inefficient. Namely, while we were caching the actual images, once read from the files, we were still assuming that checking if a cached file exists on the filesystem was free, so we kept doing it repeatedly via various rendered="..." logic rules on the page (which primefaces keeps calling repeatedly as it's rendering the page). With swift, even these "if (file.exists())" calls are NOT free. Plus the swift driver kept repeating an expensive authentication handshake before each check... so that was snowballing out of control. I checked in some improvements for that yesterday. Simple stuff - cache everything, don't re-authenticate unless you have to; and it makes a difference. A dataset with a screen full of images should now be loading in a more reasonable time. Note that it will still be slower, if you run it against a dataset with the same images stored locally. But that's life.

Also note that it was solely on account of thumbnails. A page for a dataset without images loads just as fast, regardless of where the files are stored. (and if thumbnails are a problem, they can actually be disabled).

We may ask the MOC team for credentials for the swift node that they are planning to use for the actual storage-and-computing project (swift-1.massopencloud.org, I think?) - and see if it's any faster. Or see how it performs on their server, which is on a local subnet link to the cloud... But as I said, testing the worst case scenario may be even more important/useful.

@kcondon
Copy link
Contributor

kcondon commented Apr 28, 2017

Good info, thanks for the write up. I'd also like to add that what initially appears to be a performance issue, like the inefficient thumbnail code based on local access assumption, may be a coding issue so it's worth taking a look to eliminate that.

That said, there is at least one issue that seems to fall into this category:
on dataset save after uploading a mix of files, some subsettable, the dataset file upload spinner disappears, the page remains grayed out and never comes back, yet the files are uploaded and ingested correctly. This does not happen when it is configured as local storage.

Another, simpler example of the above dataset save after file upload issue:
-upload 13 very small image files, total of 232k and see the same frozen gray page on swift storage only, works on local with same branch.

Also:
-Download url results in filesystem name in Swift rather than original name as in local file.

Performance issues aside, the above two issues are what's left.

@kcondon kcondon removed their assignment Apr 28, 2017
@landreev landreev self-assigned this May 1, 2017
pdurbin added a commit that referenced this issue May 1, 2017
 #3747

This is a new branch based on 3747-swift-with-derivative-file-support

Conflicts (3747-swift-with-derivative-file-support wins):
doc/sphinx-guides/source/installation/config.rst
src/main/java/edu/harvard/iq/dataverse/dataaccess/DataAccess.java
src/main/java/edu/harvard/iq/dataverse/dataaccess/SwiftAccessIO.java
@pdurbin
Copy link
Member

pdurbin commented May 1, 2017

There's a third issue to look into... FileMetadataIT.java is failing, specifically the cleanup that happens after the (only) test is run:

// delete dataset
given().header(keyString, token)
        .delete("/api/datasets/" + dsId)
        .then().assertThat().statusCode(200);

I first noticed this on e2f5dcc on the new 909-3747-swift-compute-button branch but the test also fails as of 7dc197a on the 3747-swift-with-derivative-file-support branch. Here's a stack trace from the latter (from my laptop):

[2017-05-01T13:08:04.412-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.api.AbstractApiBean] [tid: _ThreadID=126 _ThreadName=http-listener-1(3)] [timeMillis: 1493658484412] [levelValue: 1000] [[
  Error while executing command edu.harvard.iq.dataverse.engine.command.impl.DeleteDatasetCommand@3ad7b1e5
edu.harvard.iq.dataverse.engine.command.exception.CommandExecutionException: Failed to initialize physical access driver.
	at edu.harvard.iq.dataverse.engine.command.impl.DeleteDataFileCommand.executeImpl(DeleteDataFileCommand.java:81)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:29)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:13)
	at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine.java:207)
	at edu.harvard.iq.dataverse.EjbDataverseEngine$1$1.submit(EjbDataverseEngine.java:378)
	at edu.harvard.iq.dataverse.engine.command.impl.DestroyDatasetCommand.executeImpl(DestroyDatasetCommand.java:69)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:29)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:13)
	at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine.java:207)
	at edu.harvard.iq.dataverse.EjbDataverseEngine$1$1.submit(EjbDataverseEngine.java:378)
	at edu.harvard.iq.dataverse.engine.command.impl.DeleteDatasetVersionCommand.executeImpl(DeleteDatasetVersionCommand.java:42)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:29)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:13)
	at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine.java:207)
	at edu.harvard.iq.dataverse.EjbDataverseEngine$1$1.submit(EjbDataverseEngine.java:378)
	at edu.harvard.iq.dataverse.engine.command.impl.DeleteDatasetCommand.executeImpl(DeleteDatasetCommand.java:30)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:29)
	at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:13)
	at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine.java:207)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1081)
	at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1153)
	at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4786)
	at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
	at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:46)
	at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
	at sun.reflect.GeneratedMethodAccessor2279.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
	at sun.reflect.GeneratedMethodAccessor2280.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
	at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:369)
	at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4758)
	at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4746)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:88)
	at com.sun.proxy.$Proxy3430.submit(Unknown Source)
	at edu.harvard.iq.dataverse.__EJB31_Generated__EjbDataverseEngine__Intf____Bean__.submit(Unknown Source)
	at edu.harvard.iq.dataverse.api.AbstractApiBean.execCommand(AbstractApiBean.java:411)
	at edu.harvard.iq.dataverse.api.Datasets.lambda$deleteDataset$100(Datasets.java:202)
	at edu.harvard.iq.dataverse.api.AbstractApiBean.response(AbstractApiBean.java:470)
	at edu.harvard.iq.dataverse.api.Datasets.deleteDataset(Datasets.java:201)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:387)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:331)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:103)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:271)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:372)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:344)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
	at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1682)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:344)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:205)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at edu.harvard.iq.dataverse.api.ApiBlockingFilter.doFilter(ApiBlockingFilter.java:162)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:30)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDispatcher.java:873)
	at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:739)
	at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:575)
	at org.apache.catalina.core.ApplicationDispatcher.doDispatch(ApplicationDispatcher.java:546)
	at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDispatcher.java:428)
	at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:378)
	at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:34)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:316)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
	at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
	at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
	at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:174)
	at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:415)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:282)
	at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:459)
	at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:167)
	at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:201)
	at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:175)
	at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:235)
	at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
	at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
	at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:561)
	at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to open local file file:///Users/pdurbin/dataverse/files/10.5072/FK2/H8ON76/data/subdir/yn152_4_002.img
	at edu.harvard.iq.dataverse.dataaccess.FileAccessIO.open(FileAccessIO.java:104)
	at edu.harvard.iq.dataverse.engine.command.impl.DeleteDataFileCommand.executeImpl(DeleteDataFileCommand.java:79)
	... 127 more
]]

@landreev
Copy link
Contributor Author

landreev commented May 1, 2017

Regarding the "frozen upload page" bug - the reason I never saw it as I was working on it, is that it only happens behind Apache. If you go to dataverse-internal directly, at http://dataverse-internal.iq.harvard.edu:8080, it's working. It's still slow, but working.

@landreev
Copy link
Contributor Author

landreev commented May 1, 2017

I was hoping that this was that "chunking encoding bug" biting us in the ass again... i.e., that we may have forgotten to apply that grizzly patch to Glassfish on that server... but no, the patch is there.

@landreev
Copy link
Contributor Author

landreev commented May 1, 2017

OK, so this had nothing to do with tabular ingests, or any kind of extra processing, or any database timing conflicts (as were guessing earlier).
It's just a timeout on the Apache proxy, past which PrimeFaces can no longer refresh the page. This can be reproduced with "dumb" files - no images, no ingestables - as long as Glassfish is behind Apache, and as long as the save takes longer than a minute, you get this bug.

It's fixed by increasing the timeout, on this line in /etc/httpd/conf.d/dataverse.conf:

ProxyPass / ajp://localhost:8009/ timeout=1800

Once again, it's still slow-ish. @kcondon, your test pack still takes 2 min. to save. But it works, the files get saved, the ingests get started and are completed eventually, the thumbnails are generated, etc.

@landreev
Copy link
Contributor Author

landreev commented May 1, 2017

Re:

Download url results in filesystem name in Swift rather than original name as in local file.

The whole point of that "Download URL", on the files page, was to have a direct link to the swift location, for swift files. So when you click on it, you are downloading from Swift, not from us. Swift does not know anything about our human-readable file name. So that generated name is all you get. At some point Anuj made an attempt to use the "pretty" file names for storage on the swift side... It was decided to reverse that change, because the pretty file names are editable by the owner, so it would be much harder to guarantee uniqueness on the swift side.

The MOC team agreed that they were OK with those machine-generated file names. And you still have the "download" button, that will send you to our download API, that will give you the human-readable file name.

(all that said, may be a better solution would be to provide this direct swift url in addition to our API url on the file page, not instead of it? And maybe it could benefit from some additional label - like, "this file is stored on a Swift end point; you can access it directly there at ..."; rather than just showing it as a "Download URL".

But, since this is a UI issue, I suggest that it's addressed/handled in the UI ("compute button") issue; if you feel it's worth addressing.

@landreev
Copy link
Contributor Author

landreev commented May 1, 2017

@kcondon: I pushed a commit removing the extra logging/experimental code added to the branch while investigating this.

@landreev
Copy link
Contributor Author

landreev commented May 2, 2017

@pdurbin the RA test, FileMetadataIT should now be passing.
I'm not sure whether we actually need that test, in its current form... but that's probably OT.

@landreev landreev assigned kcondon and unassigned landreev May 2, 2017
@pdurbin
Copy link
Member

pdurbin commented May 2, 2017

FileMetadataIT should now be passing

Yes! FileMetadataIT seems to pass now. Thanks! As of 423d1da I just ran mvn test -Dtest=DataversesIT,DatasetsIT,SwordIT,AdminIT,BuiltinUsersIT,UsersIT,UtilIT,ConfirmEmailIT,FileMetadataIT,FilesIT,SearchIT on my laptop which is the full test suite as of this writing. Oddly, these files are being left behind now with git asking me if I want to add them:

    scripts/search/data/binary/trees.png.thumb48
    src/main/webapp/resources/images/cc0.png.thumb48
    src/main/webapp/resources/images/dataverseproject.png.thumb48

This behavior could well be something I introduced while working on #3559 and I'd like to fix this but I don't think it should block merging pull request #3788 if he deems it ready.

Also I put a bug in the ears of @TaniaSchlatter @mheppler @dlmurphy and @jggautier about the "Download URL" issue you mentioned above having to do with the UI: https://iqss.slack.com/archives/G4D346Y4X/p1493730829970078 . As you say, perhaps this could be addressed in #3747 but it would need to be put one somebody's todo list.

@pdurbin
Copy link
Member

pdurbin commented May 2, 2017

@landreev I've traced the creation of those three "thumb48" files to the testDatasetThumbnail test in SearchIT:

public void testDatasetThumbnail() {

I'm pretty sure these "thumb48" files weren't being created when I added that test in pull request #3703. I'd like to suggest that we get things back to normal now in this issue (#2909) or #3747 (since we need to merge the latest into that "compute ui" branch) or #2460 which is the issue I'm using the track the fact that I had to disable all sorts of Search API related tests because I was getting PSQLException: ERROR: deadlock.

@pdurbin
Copy link
Member

pdurbin commented May 2, 2017

@landreev I just pushed a fix at 8584031 into 3747-swift-with-derivative-file-support to blow away those three thumbnail files that are now being created by the SearchIT test. I'm still not sure what changed but at least now the test suite continues to clean up after itself so that git doesn't prompt us to add these files to the source tree.

@pdurbin
Copy link
Member

pdurbin commented May 2, 2017

I just noticed that because the develop branch has advanced, pull request #3788 now has merge conflicts for doc/sphinx-guides/source/installation/config.rst. I'll merge the latest from develop and resolve the conflicts.

pdurbin added a commit that referenced this issue May 2, 2017


Tweaked Swift write-up.

Conflicts:
doc/sphinx-guides/source/installation/config.rst
@pdurbin
Copy link
Member

pdurbin commented May 2, 2017

In 0357449 I resolved merged conflicts in pull request #3788. This would have been a blocker for @kcondon actually merging the code once he was done with QA. I merged the latest from develop into the 3747-swift-with-derivative-file-support branch and make some tweaks to what was written in the Installation Guide for Swift because I was having trouble following it and had to refer to my notes at https://help.hmdc.harvard.edu/Ticket/Display.html?id=248384#txn-5019303

@landreev
Copy link
Contributor Author

landreev commented May 2, 2017

Thanks @pdurbin for resolving the merge conflict and making the branch ready for merging.
I hope we can do just that, merge it today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature a feature request
Projects
None yet
Development

No branches or pull requests

6 participants