Skip to content

Commit

Permalink
Merge pull request #6143 from IQSS/6035-memory-leaks
Browse files Browse the repository at this point in the history
6035 memory leaks
  • Loading branch information
kcondon authored Sep 10, 2019
2 parents 13041f8 + 5e47e25 commit 729101f
Show file tree
Hide file tree
Showing 19 changed files with 241 additions and 13 deletions.
8 changes: 8 additions & 0 deletions doc/release-notes/6035-configurable-user-sessions-timeout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Idle session timeout for logged-in users has been made configurable in this release.
The default is now set to 8 hours (this is a change from the previous default value of 24 hours).
If you want to change it, set the setting :LoginSessionTimeout to the new value *in minutes*.
For example, to reduce the timeout to 4 hours:

curl -X PUT -d 240 http://localhost:8080/api/admin/settings/:LoginSessionTimeout

Once again, this is the session timeout for *logged-in* users only. For the anonymous sessions the sessions are set to time out after the default ``session-timeout`` value (also in minutes) in the web.xml of the Dataverse application, which is set to 10 minutes. You will most likely not ever need to change this, but if you do, configure it by editing the web.xml file.
133 changes: 133 additions & 0 deletions doc/sphinx-guides/source/developers/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,139 @@ For example...
would be consistent with a file descriptor leak on the dataset page.

jmap and jstat
++++++++++++++

``jmap`` and ``jstat`` are parts of the standard JDK distribution.
jmap allows you to look at the contents of the java heap. It can be used to create a heap dump, that you can then feed to another tool, such as ``Memory Analyzer Tool`` (see above). It can also be used as a useful tool of its own, for example, to list all the classes currently instantiated in memory:

.. code-block:: bash
$ jmap -histo <glassfish process id>
will output a list of all classes, sorted by the number of instances of each individual class, with the size in bytes.
This can be very useful when looking for memory leaks in the application. Another useful tool is ``jstat``, that can be used in combination with ``jmap`` to monitor the effectiveness of garbage collection in reclaiming allocated memory.

In the example script below we stress running Dataverse applicatione with GET requests to a specific dataverse page, use ``jmap`` to see how many Dataverse, Dataset and DataFile class object get allocated, then run ``jstat`` to see how the numbers are affected by both "Young Generation" and "Full" garbage collection runs (``YGC`` and ``FGC`` respectively):

(This is script is provided **as an example only**! You will have to experiment and expand it to suit any specific needs and any specific problem you may be trying to diagnose, and this is just to give an idea of how to go about it)

.. code-block:: bash
#!/bin/sh
# the script takes the numeric id of the glassfish process as the command line argument:
id=$1
while :
do
# Access the dataverse xxx 10 times in a row:
for ((i = 0; i < 10; i++))
do
# hide the output, standard and stderr:
curl http://localhost:8080/dataverse/xxx 2>/dev/null > /dev/null
done
sleep 1
# run jmap and save the output in a temp file:
jmap -histo $id > /tmp/jmap.histo.out
# grep the output for Dataverse, Dataset and DataFile classes:
grep '\.Dataverse$' /tmp/jmap.histo.out
grep '\.Dataset$' /tmp/jmap.histo.out
grep '\.DataFile$' /tmp/jmap.histo.out
# (or grep for whatever else you may be interested in)
# print the last line of the jmap output (the totals):
tail -1 /tmp/jmap.histo.out
# run jstat to check on GC:
jstat -gcutil ${id} 1000 1 2>/dev/null
# add a time stamp and a new line:
date
echo
done
The script above will run until you stop it, and will output something like:

.. code-block:: none
439: 141 28200 edu.harvard.iq.dataverse.Dataverse
472: 160 24320 edu.harvard.iq.dataverse.Dataset
674: 60 9600 edu.harvard.iq.dataverse.DataFile
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 35.32 20.15 � 7 2.145 0 0.000 2.145
Total 108808814 5909776392
Wed Aug 14 23:13:42 EDT 2019
385: 181 36200 edu.harvard.iq.dataverse.Dataverse
338: 320 48640 edu.harvard.iq.dataverse.Dataset
524: 120 19200 edu.harvard.iq.dataverse.DataFile
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 31.69 45.11 � 9 3.693 0 0.000 3.693
Total 167998691 9080163904
Wed Aug 14 23:14:59 EDT 2019
367: 201 40200 edu.harvard.iq.dataverse.Dataverse
272: 480 72960 edu.harvard.iq.dataverse.Dataset
442: 180 28800 edu.harvard.iq.dataverse.DataFile
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 28.05 69.94 � 11 5.001 0 0.000 5.001
Total 226826706 12230221352
Wed Aug 14 23:16:16 EDT 2019
... etc.
How to analyze the output, what to look for:

First, look at the numbers in the jmap output. In the example above, you can immediately see, after the first three iterations, that every 10 dataverse page loads results in the increase of the number of Dataset classes by 160. I.e., each page load leaves 16 of these on the heap. We can also see that each of the 10 page load cycles increased the heap by roughly 3GB; that each cycle resulted in a couple of YG (young generation) garbage collections, and in the old generation allocation being almost 70% full. These numbers in the example are clearly quite high and are an indication of some problematic memory allocation by the dataverse page - if this is the result of something you have added to the page, you probably would want to investigate and fix it. However, overly generous memory use **is not the same as a leak** necessarily. What you want to see now is how much of this allocation can be reclaimed by "Full GC". If all of it gets freed by ``FGC``, it is not the end of the world (even though you do not want your system to spend too much time running ``FGC``; it costs CPU cycles, and actually freezes the application while it's in progress!). It is however a **really** serious problem, if you determine that a growing portion of the old. gen. memory (``"O"`` in the ``jmap`` output) is not getting freed, even by ``FGC``. This *is* a real leak now, i.e. something is leaving behind some objects that are still referenced and thus off limits to garbage collector. So look for the lines where the ``FGC`` counter is incremented. For example, the first ``FGC`` in the example output above:

.. code-block:: none
271: 487 97400 edu.harvard.iq.dataverse.Dataverse
216: 3920 150784 edu.harvard.iq.dataverse.Dataset
337: 372 59520 edu.harvard.iq.dataverse.DataFile
Total 277937182 15052367360
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 77.66 88.15 � 17 8.734 0 0.000 8.734
Wed Aug 14 23:20:05 EDT 2019
265: 551 110200 edu.harvard.iq.dataverse.Dataverse
202: 4080 182400 edu.harvard.iq.dataverse.Dataset
310: 450 72000 edu.harvard.iq.dataverse.DataFile
Total 142023031 8274454456
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 71.95 20.12 � 22 25.034 1 4.455 29.489
Wed Aug 14 23:21:40 EDT 2019
We can see that the first ``FGC`` resulted in reducing the ``"O"`` by almost 7GB, from 15GB down to 8GB (from 88% to 20% full). The number of Dataset classes has not budged at all - it has grown by the same 160 objects as before (very suspicious!). To complicate matters, ``FGC`` does not **guarantee** to free everything that can be freed - it will balance how much the system needs memory vs. how much it is willing to spend in terms of CPU cycles performing GC (remember, the application freezes while ``FGC`` is running!). So you should not assume that the "20% full" number above means that you have 20% of your stack already wasted and unrecoverable. Instead, look for the next **minium** value of ``"O"``; then for the next, etc. Now compare these consecutive miniums. With the above test (this is an output of a real experiment, a particularly memory-hungry feature added to the dataverse page), the minimums sequence (of old. gen. usage, in %) was looking as follows:


.. code-block:: none
2.19
2.53
3.00
3.13
3.95
4.03
4.21
4.40
4.64
5.06
5.17
etc. ...
It is clearly growing - so now we can conclude that indeed something there is using memory in a way that's not recoverable, and this is a clear problem.




----

Previous: :doc:`making-releases` | Next: :doc:`unf/index`
9 changes: 9 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1421,6 +1421,15 @@ The relative path URL to which users will be sent for signup. The default settin

``curl -X PUT -d '/dataverseuser.xhtml?editMode=CREATE' http://localhost:8080/api/admin/settings/:SignUpUrl``

:LoginSessionTimeout
++++++++++++++++++++

Session timeout (in minutes) for logged-in users. The default is 8 hours (480 minutes). For the anonymous user sessions, the timeout is set to the default value, configured in the web.xml file of the Dataverse application.

In the example below we reduce the timeout to 4 hours:

``curl -X PUT -d 240 http://localhost:8080/api/admin/settings/:LoginSessionTimeout``

:TwoRavensUrl
+++++++++++++

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.ejb.EJB;
import javax.faces.context.FacesContext;
import javax.faces.view.ViewScoped;
import javax.inject.Inject;
import javax.inject.Named;
Expand Down Expand Up @@ -220,7 +221,19 @@ private TreeNode getDataverseNode(Dataverse dataverse, TreeNode root, boolean ex
public String logout() {
dataverseSession.setUser(null);
dataverseSession.setStatusDismissed(false);


// Important part of completing a logout - kill the existing HTTP session:
// from the ExternalContext.invalidateSession doc page:
// "Invalidates this session then unbinds any objects bound to it."
// - this means whatever allocated SessionScoped classes associated
// with this session may currently be on the heap will become
// garbage-collectable after we log the user out.
FacesContext.getCurrentInstance().getExternalContext().invalidateSession();
// Note that the HTTP session no longer exists -
// .getExternalContext().getSession(false) will return null at this point!
// so it is important to redirect the user to the next page, where a new
// session is going to be issued to them.

String redirectPage = navigationWrapper.getPageFromContext();
try {
redirectPage = URLDecoder.decode(redirectPage, "UTF-8");
Expand Down
15 changes: 14 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DataverseSession.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import edu.harvard.iq.dataverse.actionlogging.ActionLogServiceBean;
import edu.harvard.iq.dataverse.authorization.users.GuestUser;
import edu.harvard.iq.dataverse.authorization.users.User;
import edu.harvard.iq.dataverse.util.SystemConfig;
import java.io.IOException;
import java.io.Serializable;
import java.util.Locale;
Expand All @@ -16,6 +17,7 @@
import javax.inject.Inject;
import javax.inject.Named;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpSession;

/**
*
Expand All @@ -40,6 +42,9 @@ public class DataverseSession implements Serializable{
@Inject
SettingsWrapper settingsWrapper;

@EJB
SystemConfig systemConfig;

private static final Logger logger = Logger.getLogger(DataverseSession.class.getCanonicalName());

private boolean statusDismissed = false;
Expand Down Expand Up @@ -133,6 +138,14 @@ public void updateLocaleInViewRoot() {
}
}


public void configureSessionTimeout() {
HttpSession httpSession = (HttpSession) FacesContext.getCurrentInstance().getExternalContext().getSession(false);

if (httpSession != null) {
logger.fine("jsession: "+httpSession.getId()+" setting the lifespan of the session to " + systemConfig.getLoginSessionTimeout() + " minutes");
httpSession.setMaxInactiveInterval(systemConfig.getLoginSessionTimeout() * 60); // session timeout, in seconds
}

}

}
1 change: 1 addition & 0 deletions src/main/java/edu/harvard/iq/dataverse/LoginPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ public String login() {
AuthenticatedUser r = authSvc.getUpdateAuthenticatedUser(credentialsAuthProviderId, authReq);
logger.log(Level.FINE, "User authenticated: {0}", r.getEmail());
session.setUser(r);
session.configureSessionTimeout();

if ("dataverse.xhtml".equals(redirectPage)) {
redirectPage = redirectToRoot();
Expand Down
1 change: 1 addition & 0 deletions src/main/java/edu/harvard/iq/dataverse/Shib.java
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ public String confirmAndConvertAccount() {
private void logInUserAndSetShibAttributes(AuthenticatedUser au) {
au.setShibIdentityProvider(shibIdp);
session.setUser(au);
session.configureSessionTimeout();
logger.fine("Groups for user " + au.getId() + " (" + au.getIdentifier() + "): " + getGroups(au));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import java.util.HashMap;
import java.util.Map;
import javax.ejb.EJB;
import javax.enterprise.context.RequestScoped;
import javax.faces.view.ViewScoped;
import javax.inject.Inject;
import javax.inject.Named;
Expand All @@ -32,7 +33,8 @@
*
* @author Leonid Andreev
*/
@ViewScoped
//@ViewScoped
@RequestScoped
@Named
public class ThumbnailServiceWrapper implements java.io.Serializable {
@Inject
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,7 @@ public String save() {
// Authenticated user registered. Save the new bulitin, and log in.
builtinUserService.save(builtinUser);
session.setUser(au);
session.configureSessionTimeout();
/**
* @todo Move this to
* AuthenticationServiceBean.createAuthenticatedUser
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ public String createNewAccount() {
newUser.getDisplayInfo().getPosition());
final AuthenticatedUser user = authenticationSvc.createAuthenticatedUser(newUser.getUserRecordIdentifier(), getUsername(), newAud, true);
session.setUser(user);
session.configureSessionTimeout();
/**
* @todo Move this to AuthenticationServiceBean.createAuthenticatedUser
*/
Expand Down Expand Up @@ -210,6 +211,7 @@ public String convertExistingAccount() {
builtinUserSvc.removeUser(existingUser.getUserIdentifier());

session.setUser(existingUser);
session.configureSessionTimeout();
AuthenticationProvider newUserAuthProvider = authenticationSvc.getAuthenticationProvider(newUser.getServiceId());
JsfHelper.addSuccessMessage(BundleUtil.getStringFromBundle("oauth2.convertAccount.success", Arrays.asList(newUserAuthProvider.getInfo().getTitle())));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ public void exchangeCodeForToken() throws IOException {
} else {
// login the user and redirect to HOME of intended page (if any).
session.setUser(dvUser);
session.configureSessionTimeout();

final OAuth2TokenData tokenData = oauthUser.getTokenData();
tokenData.setUser(dvUser);
tokenData.setOauthProviderId(idp.getId());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ public String init() {
if (confirmEmailData != null) {
user = confirmEmailData.getAuthenticatedUser();
session.setUser(user);
session.configureSessionTimeout(); // TODO: is this needed here? (it can't hurt, but still)
JsfHelper.addSuccessMessage(BundleUtil.getStringFromBundle("confirmEmail.details.success"));
return "/dataverse.xhtml?faces-redirect=true";
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ public String resetPassword() {
String builtinAuthProviderId = BuiltinAuthenticationProvider.PROVIDER_ID;
AuthenticatedUser au = authSvc.lookupUser(builtinAuthProviderId, user.getUserName());
session.setUser(au);
session.configureSessionTimeout();
return "/dataverse.xhtml?alias=" + dataverseService.findRootDataverse().getAlias() + "faces-redirect=true";
} else {
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_ERROR, response.getMessageSummary(), response.getMessageDetail()));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ public String init() {
String draftDatasetPageToBeRedirectedTo = privateUrlRedirectData.getDraftDatasetPageToBeRedirectedTo() + "&faces-redirect=true";
PrivateUrlUser privateUrlUser = privateUrlRedirectData.getPrivateUrlUser();
session.setUser(privateUrlUser);
session.configureSessionTimeout();
logger.info("Redirecting PrivateUrlUser '" + privateUrlUser.getIdentifier() + "' to " + draftDatasetPageToBeRedirectedTo);
return draftDatasetPageToBeRedirectedTo;
} catch (Exception ex) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,16 @@
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.ejb.EJB;
import javax.enterprise.context.RequestScoped;
import javax.faces.context.FacesContext;
import javax.faces.view.ViewScoped;
import javax.inject.Inject;
import javax.inject.Named;
import javax.servlet.http.HttpServletRequest;
import org.apache.commons.lang.StringUtils;

@ViewScoped
//@ViewScoped
@RequestScoped
@Named("SearchIncludeFragment")
public class SearchIncludeFragment implements java.io.Serializable {

Expand Down Expand Up @@ -195,6 +197,7 @@ public String searchRedirect(String dataverseRedirectPage) {
* 5. Someday the default sort order for browse mode will be by "release
* date" (newest first) but that functionality is not yet available in
* the system ( see https://redmine.hmdc.harvard.edu/issues/3628 and
*
* https://redmine.hmdc.harvard.edu/issues/3629 ) so for now the default
* sort order for browse mode will by alphabetical (sort by name,
* ascending). The default sort order for search mode will be by
Expand Down Expand Up @@ -439,6 +442,8 @@ public void search(boolean onlyDataRelatedToMe) {
}
}
}

setDisplayCardValues();

} else {
// if SOLR is down:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -409,7 +409,13 @@ Whether Harvesting (OAI) service is enabled
* Allow CORS flag (true or false). It is true by default
*
*/
AllowCors;
AllowCors,

/**
* Lifespan, in minutes, of a login user session 
* (both DataverseSession and the underlying HttpSession)
*/
LoginSessionTimeout;

@Override
public String toString() {
Expand Down
Loading

0 comments on commit 729101f

Please sign in to comment.