diff --git a/doc/sphinx-guides/source/_static/installation/files/etc/init.d/glassfish b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/glassfish new file mode 100755 index 00000000000..8d74f89ec18 --- /dev/null +++ b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/glassfish @@ -0,0 +1,44 @@ +#! /bin/sh +# chkconfig: 2345 99 01 +# description: GlassFish App Server + +set -e + +ASADMIN=/usr/local/glassfish4/bin/asadmin + +case "$1" in + start) + echo -n "Starting GlassFish server: glassfish" + # Increase file descriptor limit: + ulimit -n 32768 + # Allow "memory overcommit": + # (basically, this allows to run exec() calls from inside the + # app, without the Unix fork() call physically hogging 2X + # the amount of memory glassfish is already using) + echo 1 > /proc/sys/vm/overcommit_memory + + #echo + #echo "GLASSFISH IS UNDER MAINTENANCE;" + #echo "PLEASE DO NOT USE service init script." + #echo + LANG=en_US.UTF-8; export LANG + $ASADMIN start-domain domain1 + echo "." + ;; + stop) + echo -n "Stopping GlassFish server: glassfish" + #echo + #echo "GLASSFISH IS UNDER MAINTENANCE;" + #echo "PLEASE DO NOT USE service init script." + #echo + + $ASADMIN stop-domain domain1 + echo "." + ;; + + *) + echo "Usage: /etc/init.d/glassfish {start|stop}" + exit 1 +esac + +exit 0 diff --git a/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr new file mode 100755 index 00000000000..5044a1b1e62 --- /dev/null +++ b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr @@ -0,0 +1,35 @@ +**** +#!/bin/sh + +# Starts, stops, and restarts Apache Solr. +# +# chkconfig: 35 92 08 +# description: Starts and stops Apache Solr + +SOLR_DIR="/usr/local/solr-4.6.0/example" +JAVA_OPTIONS="-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar" +LOG_FILE="/var/log/solr.log" +JAVA="/usr/bin/java" + +case $1 in + start) + echo "Starting Solr" + cd $SOLR_DIR + $JAVA $JAVA_OPTIONS 2> $LOG_FILE & + ;; + stop) + echo "Stopping Solr" + cd $SOLR_DIR + $JAVA $JAVA_OPTIONS --stop + ;; + restart) + $0 stop + sleep 1 + $0 start + ;; + *) + echo "Usage: $0 {start|stop|restart}" >&2 + exit 1 + ;; +esac + diff --git a/doc/sphinx-guides/source/installation/administration.rst b/doc/sphinx-guides/source/installation/administration.rst index eb7f0428432..6c665cbb650 100644 --- a/doc/sphinx-guides/source/installation/administration.rst +++ b/doc/sphinx-guides/source/installation/administration.rst @@ -1,15 +1,81 @@ Administration ============== +This section focuses on system and database administration tasks. Please see the :doc:`/user/index` for tasks having to do with having the "Admin" role on a dataverse or dataset. + .. contents:: :local: +Solr Search Index +----------------- + +Dataverse requires Solr to be operational at all times. If you stop Solr, you should see a error about this on the home page, which is powered by the search index Solr provides. You set up Solr by following the steps in the :doc:`prerequisites` section and the :doc:`config` section explained how to configure it. This section is about the care and feeding of the search index. PostgreSQL is the "source of truth" and the Dataverse application will copy data from PostgreSQL into Solr. For this reason, the search index can be rebuilt at any time but depending on the amount of data you have, this can be a slow process. You are encouraged to experiment with production data to get a sense of how long a full reindexing will take. + +Full Reindex +++++++++++++ + +There are two ways to perform a full reindex of the Dataverse search index. Starting with a "clear" ensures a completely clean index but involves downtime. Reindexing in place doesn't involve downtime but does not ensure a completely clean index. + +Clear and Reindex +~~~~~~~~~~~~~~~~~ + +Clearing Data from Solr +....................... + +Please note that the moment you issue this command, it will appear to end users looking at the home page that all data is gone! This is because the home page is powered by the search index. + +``curl http://localhost:8080/api/admin/index/clear`` + +Start Async Reindex +................... + +Please note that this operation may take hours depending on the amount of data in your system. This known issue is being tracked at https://github.com/IQSS/dataverse/issues/50 + +``curl http://localhost:8080/api/admin/index`` + +Reindex in Place +~~~~~~~~~~~~~~~~ + +An alternative to completely clearing the search index is to reindex in place. + +Clear Index Timestamps +...................... + +``curl -X DELETE http://localhost:8080/api/admin/index/timestamps`` + +Start or Continue Async Reindex +................................ + +If indexing stops, this command should pick up where it left off based on which index timestamps have been set, which is why we start by clearing these timestamps above. These timestamps are stored in the ``dvobject`` database table. + +``curl http://localhost:8080/api/admin/index/continue`` + +Glassfish +--------- + +``server.log`` is the main place to look when you encounter problems. Hopefully an error message has been logged. If there's a stack trace, it may be of interest to developers, especially they can trace line numbers back to a tagged version. + +For debugging purposes, you may find it helpful to increase logging levels as mentioned in the :doc:`/developers/debugging` section of the Developer Guide. + +This guide has focused on using the command line to manage Glassfish but you might be interested in an admin GUI at http://localhost:4848 + +Monitoring +---------- + +In production you'll want to monitor the usual suspects such as CPU, memory, free disk space, etc. + +https://github.com/IQSS/dataverse/issues/2595 contains some information on enabling monitoring of Glassfish, which is disabled by default. + +There is a database table called ``actionlogrecord`` that captures events that may be of interest. See https://github.com/IQSS/dataverse/issues/2729 for more discussion around this table. + User Administration ------------------- -Deleting an API token -~~~~~~~~~~~~~~~~~~~~~ +There isn't much in the way of user administration tools built in to Dataverse. + +Deleting an API Token ++++++++++++++++++++++ -If an API token is compromised it should be deleted. Users can generate a new one for themselves, but someone with access to the database can delete tokens as well. +If an API token is compromised it should be deleted. Users can generate a new one for themselves as explained in the :doc:`/user/account` section of the User Guide, but you may want to preemptively delete tokens from the database. Using the API token 7ae33670-be21-491d-a244-008149856437 as an example: @@ -17,4 +83,3 @@ Using the API token 7ae33670-be21-491d-a244-008149856437 as an example: You should expect the output ``DELETE 1`` after issuing the command above. -After the API token has been deleted, users can generate a new one per :doc:`/user/account`. diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst new file mode 100644 index 00000000000..98d7e89fd9e --- /dev/null +++ b/doc/sphinx-guides/source/installation/config.rst @@ -0,0 +1,400 @@ +============= +Configuration +============= + +Now that you've successfully logged into Dataverse with a superuser account after going through a basic :doc:`installation-main`, you'll need to secure and configure your installation. + +Settings within Dataverse itself are managed via JVM options or by manipulating values in the ``setting`` table directly or through API calls. Configuring Solr requires manipulating XML files. + +Once you have finished securing and configuring your Dataverse installation, proceed to the :doc:`administration` section. Advanced configuration topics are covered in the :doc:`r-rapache-tworavens` and :doc:`shibboleth` sections. + +.. contents:: :local: + +Securing Your Installation +-------------------------- + +Blocking API Endpoints +++++++++++++++++++++++ + +The :doc:`/api/native-api` contains a useful but potentially dangerous API endpoint called "admin" that allows you to change system settings, make ordinary users into superusers, and more. There is a "test" API endpoint used for development and troubleshooting that has some potentially dangerous methods. The ``builtin-users`` endpoint lets people create a local/builtin user account if they know the ``BuiltinUsers.KEY`` value described below. + +By default, all APIs can be operated on remotely and without the need for any authentication. https://github.com/IQSS/dataverse/issues/1886 was opened to explore changing these defaults, but until then it is very important to block both the "admin" and "test" endpoint (and at least consider blocking ``builtin-users``). For details please see also the section on ``:BlockedApiPolicy`` below. + +Forcing HTTPS ++++++++++++++ + +To avoid having your users send credentials in the clear, it's strongly recommended to force all web traffic to go through HTTPS (port 443) rather than HTTP (port 80). The ease with which one can install a valid SSL cert into Apache compared with the same operation in Glassfish might be a compelling enough reason to front Glassfish with Apache. In addition, Apache can be configured to rewrite HTTP to HTTPS with rules such as those found at https://wiki.apache.org/httpd/RewriteHTTPToHTTPS or in the section on :doc:`shibboleth`. + +Solr +---- + +schema.xml +++++++++++ + +The :doc:`prerequisites` section explained that Dataverse requires a specific Solr schema file called ``schema.xml`` that can be found in the Dataverse distribution. You should have already replaced the default ``example/solr/collection1/conf/schema.xml`` file that ships with Solr. + +jetty.xml ++++++++++ + +Stop Solr and edit ``solr-4.6.0/example/etc/jetty.xml`` to have the following value: ``102400``. Without this higher value in place, it will appear that no data has been added to your dataverse installation and ``WARN org.eclipse.jetty.http.HttpParser – HttpParser Full for /127.0.0.1:8983`` will appear in the Solr log. See also https://support.lucidworks.com/hc/en-us/articles/201424796-Error-when-submitting-large-query-strings- + +Network Ports +------------- + +The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has already been mentioned above and the fact that Glassfish puts these services on 8080 and 8181, respectively, was touched on in the :doc:`installation-main` section. You have a few options that basically boil down to if you want to introduce Apache into the mix or not. If you need :doc:`shibboleth` support you need Apache and you should proceed directly to that doc for guidance on fronting Glassfish with Apache. + +If you don't want to front Glassfish with a proxy such as Apache or nginx, you will need to configure Glassfish to run HTTPS on 443 like this: + +``asadmin set server-config.network-config.network-listeners.network-listener.http-listener-2.port=443`` + +Most likely you'll want to put a valid cert into Glassfish, which is certainly possible but out of scope for this guide. + +What about port 80? Even if you don't front Dataverse with Apache, you may want to let Apache run on port 80 just to rewrite HTTP to HTTPS as described above. You can use a similar command as above to change the HTTP port that Glassfish uses from 8080 to 80 (substitute ``http-listener-1.port=80``). Glassfish can be used to enforce HTTPS on its own without Apache, but configuring this is an exercise for the reader. Answers here may be helpful: http://stackoverflow.com/questions/25122025/glassfish-v4-java-7-port-unification-error-not-able-to-redirect-http-to + +Root Dataverse Configuration +---------------------------- + +The user who creates a dataverse is given the "Admin" role on that dataverse. The root dataverse is created automatically for you by the installer and the "Admin" is the superuser account ("dataverseAdmin") we used in the :doc:`installation-main` section to confirm that we can log in. These next steps of configuring the root dataverse require the "Admin" role on the root dataverse, but not the much more powerful superuser attribute. In short, users with the "Admin" role are subject to the permission system. A superuser, on the other hand, completely bypasses the permission system. You can give non-superusers the "Admin" role on the root dataverse if you'd like them to configure the root dataverse. + +Root Dataverse Permissions +++++++++++++++++++++++++++ + +In order for non-superusers to start creating dataverses or datasets, you need click "Edit" then "Permissions" and make choices about which users can add dataverses or datasets within the root dataverse. (There is an API endpoint for this operation as well.) Again, the user who creates a dataverse will be granted the "Admin" role on that dataverse. + +Publishing the Root Dataverse ++++++++++++++++++++++++++++++ + +Non-superusers who are not "Admin" on the root dataverse will not be able to to do anything useful until the root dataverse has been published. + +Customizing the Root Dataverse +++++++++++++++++++++++++++++++ + +As the person installing Dataverse you may or may not be local metadata expert. You may want to have others sign up for accounts and grant them the "Admin" role at the root dataverse to configure metadata fields, browse/search facets, templates, guestbooks, etc. For more on these topics, consult the :doc:`/user/dataverse-management` section of the User Guide. + +Once this configuration is complete, your Dataverse installation should be ready for users to start playing with it. That said, there are many more configuration options available, which will be explained below. + +JVM Options +----------- + +JVM stands Java Virtual Machine and as a Java application, Glassfish can read JVM options when it is started. A number of JVM options are configured by the installer below is a complete list of the Dataverse-specific JVM options. You can inspect the configured options by running ``asadmin list-jvm-options | egrep 'dataverse|doi' +``. + +When changing values these values with ``asadmin``, you'll need to delete the old value before adding a new one, like this: + +``asadmin delete-jvm-options "-Ddataverse.fqdn=old.example.com"`` + +``asadmin create-jvm-options "-Ddataverse.fqdn=dataverse.example.com"`` + +It's also possible to change these values by stopping Glassfish, editing ``glassfish4/glassfish/domains/domain1/config/domain.xml``, and restarting Glassfish. + +dataverse.fqdn +++++++++++++++ + +If the Dataverse server has multiple DNS names, this option specifies the one to be used as the "official" host name. For example, you may want to have dataverse.foobar.edu, and not the less appealling server-123.socsci.foobar.edu to appear exclusively in all the registered global identifiers, Data Deposit API records, etc. + +The password reset feature requires ``dataverse.fqdn`` to be configured. + +| Do note that whenever the system needs to form a service URL, by default, it will be formed with ``https://`` and port 443. I.e., +| ``https://{dataverse.fqdn}/`` +| If that does not suit your setup, you can define an additional option, ``dataverse.siteUrl``, explained below. + +dataverse.siteUrl ++++++++++++++++++ + +| and specify the alternative protocol and port number. +| For example, configured in domain.xml: +| ``-Ddataverse.fqdn=dataverse.foobar.edu`` +| ``-Ddataverse.siteUrl=http://${dataverse.fqdn}:8080`` + +dataverse.files.directory ++++++++++++++++++++++++++ + +This is how you configure the path to which files uploaded by users are stored. The installer prompted you for this value. + +dataverse.auth.password-reset-timeout-in-minutes +++++++++++++++++++++++++++++++++++++++++++++++++ + +Users have 60 minutes to change their passwords by default. You can adjust this value here. + +dataverse.rserve.host ++++++++++++++++++++++ + +Configuration for :doc:`r-rapache-tworavens`. + +dataverse.rserve.port ++++++++++++++++++++++ + +Configuration for :doc:`r-rapache-tworavens`. + +dataverse.rserve.user ++++++++++++++++++++++ + +Configuration for :doc:`r-rapache-tworavens`. + +dataverse.rserve.tempdir +++++++++++++++++++++++++ +Configuration for :doc:`r-rapache-tworavens`. + +dataverse.rserve.password ++++++++++++++++++++++++++ + +Configuration for :doc:`r-rapache-tworavens`. + +dataverse.dropbox.key ++++++++++++++++++++++ + +Dropbox integration is optional. Enter your key here. + +dataverse.path.imagemagick.convert +++++++++++++++++++++++++++++++++++ + +For overriding the default path to the ``convert`` binary from ImageMagick (``/usr/bin/convert``). + +dataverse.dataAccess.thumbnail.image.limit +++++++++++++++++++++++++++++++++++++++++++ + +For limiting the size of thumbnail images generated from files. + +dataverse.dataAccess.thumbnail.pdf.limit +++++++++++++++++++++++++++++++++++++++++ + +For limiting the size of thumbnail images generated from files. + +doi.baseurlstring ++++++++++++++++++ + +As of this writing "https://ezid.cdlib.org" is the only valid value. See also these related database settings below: + +- :DoiProvider +- :Protocol +- :Authority +- :DoiSeparator + +doi.username +++++++++++++ + +Used in conjuction with ``doi.baseurlstring``. + +doi.password +++++++++++++ + +Used in conjuction with ``doi.baseurlstring``. + +dataverse.handlenet.admcredfile ++++++++++++++++++++++++++++++++ + +For Handle support (not fully developed). + +dataverse.handlenet.admprivphrase ++++++++++++++++++++++++++++++++++ +For Handle support (not fully developed). + +Database Settings +----------------- + +These settings are stored in the ``setting`` table but can be read and modified via the "admin" endpoint of the :doc:`/api/native-api` for easy scripting. + +The most commonly used configuration options are listed first. + +:BlockedApiPolicy ++++++++++++++++++ + +Out of the box, all API endpoints are completely open as mentioned in the section on security above. It is highly recommend that you choose one of the policies below and also configure ``:BlockedApiEndpoints``. + +- localhost-only: Allow from localhost. +- unblock-key: Require a key defined in ``:BlockedApiKey``. +- drop: Disallow the blocked endpoints completely. + +``curl -X PUT -d localhost-only http://localhost:8080/api/admin/settings/:BlockedApiEndpoints`` + +:BlockedApiEndpoints +++++++++++++++++++++ + +A comma separated list of API endpoints to be blocked. For a production installation, "admin" and "test" should be blocked (and perhaps "builtin-users" as well), as mentioned in the section on security above: + +``curl -X PUT -d "admin,test,builtin-users" http://localhost:8080/api/admin/settings/:BlockedApiEndpoints`` + +See the :doc:`/api/index` for a list of API endpoints. + +:BlockedApiKey +++++++++++++++ + +Used in conjunction with the ``:BlockedApiPolicy`` being set to ``unblock-key``. When calling blocked APIs, add a query parameter of ``unblock-key=theKeyYouChose`` to use the key. + +``curl -X PUT -d s3kretKey http://localhost:8080/api/admin/settings/:BlockedApiKey`` + +BuiltinUsers.KEY +++++++++++++++++ + +The key required to create users via API as documented at :doc:`/api/native-api`. Unlike other database settings, this one doesn't start with a colon. + +``curl -X PUT -d builtInS3kretKey http://localhost:8080/api/admin/settings/:BuiltinUsers.KEY`` + +:SystemEmail +++++++++++++ + +This is the email address that "system" emails are sent from such as password reset links. + +``curl -X PUT -d "Support " http://localhost:8080/api/admin/settings/:SystemEmail`` + +:DoiProvider +++++++++++++ + +As of this writing "EZID" is the only valid options. DataCite support is planned: https://github.com/IQSS/dataverse/issues/24 + +``curl -X PUT -d EZID http://localhost:8080/api/admin/settings/:DoiProvider`` + +:Protocol ++++++++++ + +As of this writing "doi" is the only valid option for the protocol for a persistent ID. + +``curl -X PUT -d doi http://localhost:8080/api/admin/settings/:Protocol`` + +:Authority +++++++++++ + +Use the DOI authority assigned to you by EZID. + +``curl -X PUT -d 10.xxxx http://localhost:8080/api/admin/settings/:Authority`` + +:DoiSeparator ++++++++++++++ + +It is recommended that you keep this as a slash ("/"). + +``curl -X PUT -d "/" http://localhost:8080/api/admin/settings/:DoiSeparator`` + +:ApplicationTermsOfUse +++++++++++++++++++++++ + +Upload an HTML file containing the Terms of Use to be displayed at sign up. Supported HTML tags are listed under the :doc:`/user/dataset-management` section of the User Guide. + +``curl -X PUT -d@/tmp/apptou.html http://localhost:8080/api/admin/settings/:ApplicationTermsOfUse`` + +Unfortunately, in most cases, the text file will probably be too big to upload (>1024 characters) due to a bug. A workaround has been posted to https://github.com/IQSS/dataverse/issues/2669 + +:ApplicationPrivacyPolicyUrl +++++++++++++++++++++++++++++ + +Specify a URL where users can read your Privacy Policy, linked from the bottom of the page. + +``curl -X PUT -d http://best-practices.dataverse.org/harvard-policies/harvard-privacy-policy.html http://localhost:8080/api/admin/settings/:ApplicationPrivacyPolicyUrl`` + +:ApiTermsOfUse +++++++++++++++ + +Specify a URL where users can read your API Terms of Use. + +``curl -X PUT -d http://best-practices.dataverse.org/harvard-policies/harvard-api-tou.html http://localhost:8080/api/admin/settings/:ApiTermsOfUse`` + +:GuidesBaseUrl +++++++++++++++ + +Set ``GuidesBaseUrl`` to override the default value "http://guides.dataverse.org". If you are interested in writing your own version of the guides, you may find the :doc:`/developers/documentation` section of the Developer Guide helpful. + +``curl -X PUT -d http://dataverse.example.edu http://localhost:8080/api/admin/settings/:GuidesBaseUrl`` + +:StatusMessageHeader +++++++++++++++++++++ + +For dynamically adding information to the top of every page. For example, "For testing only..." at the top of https://demo.dataverse.org is set with this: + +``curl -X PUT -d "For testing only..." http://localhost:8080/api/admin/settings/:StatusMessageHeader`` + +:MaxFileUploadSizeInBytes ++++++++++++++++++++++++++ + +Set `MaxFileUploadSizeInBytes` to "2147483648", for example, to limit the size of files uploaded to 2 GB. +Notes: +- For SWORD, this size is limited by the Java Integer.MAX_VALUE of 2,147,483,647. (see: https://github.com/IQSS/dataverse/issues/2169) +- If the MaxFileUploadSizeInBytes is NOT set, uploads, including SWORD may be of unlimited size. + +``curl -X PUT -d 2147483648 http://localhost:8080/api/admin/settings/:MaxFileUploadSizeInBytes`` + +:TabularIngestSizeLimit ++++++++++++++++++++++++ + +Threshold in bytes for limiting whether or not "ingest" it attempted for tabular files (which can be resource intensive). For example, with the below in place, files greater than 2 GB in size will not go through the ingest process: + +``curl -X PUT -d 2000000000 http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit`` + +(You can set this value to 0 to prevent files from being ingested at all.) + +You can overide this global setting on a per-format basis for the following formats: + +- dta +- por +- sav +- Rdata +- CSV +- xlsx + +For example, if you want your installation of Dataverse to not attempt to ingest Rdata files larger that 1 MB, use this setting: + +``curl -X PUT -d 1000000 http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit:Rdata`` + +:ZipUploadFilesLimit +++++++++++++++++++++ + +Limit the number of files in a zip that Dataverse will accept. + +:GoogleAnalyticsCode +++++++++++++++++++++ + +For setting up Google Analytics for your Dataverse installation. + +:SolrHostColonPort +++++++++++++++++++ + +By default Dataverse will attempt to connect to Solr on port 8983 on localhost. Use this setting to change the hostname or port. + +``curl -X PUT -d localhost:8983 http://localhost:8080/api/admin/settings/:SolrHostColonPort`` + +:SignUpUrl +++++++++++ + +The relative path URL to which users will be sent after signup. The default setting is below. + +``curl -X PUT -d true /dataverseuser.xhtml?editMode=CREATE http://localhost:8080/api/admin/settings/:SignUpUrl`` + +:TwoRavensUrl ++++++++++++++ + +The location of your TwoRavens installation. + +:GeoconnectCreateEditMaps ++++++++++++++++++++++++++ + +Set ``GeoconnectCreateEditMaps`` to true to allow the user to create GeoConnect Maps. This boolean effects whether the user sees the map button on the dataset page and if the ingest will create a shape file. + +``curl -X PUT -d true http://localhost:8080/api/admin/settings/:GeoconnectCreateEditMaps`` + +:GeoconnectViewMaps ++++++++++++++++++++ + +Set ``GeoconnectViewMaps`` to true to allow a user to view existing maps. This boolean effects whether a user will see the "Explore" button. + +``curl -X PUT -d true http://localhost:8080/api/admin/settings/:GeoconnectViewMaps`` + +:SearchHighlightFragmentSize +++++++++++++++++++++++++++++ + +Set ``SearchHighlightFragmentSize`` to override the default value of 100 from https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize . In practice, a value of "320" seemed to fix the issue at https://github.com/IQSS/dataverse/issues/2191 + +``curl -X PUT -d 320 http://localhost:8080/api/admin/settings/:SearchHighlightFragmentSize`` + +:ScrubMigrationData ++++++++++++++++++++ + +Allow for migration of non-conformant data (especially dates) from DVN 3.x to Dataverse 4. + +:ShibEnabled +++++++++++++ + +This setting is experimental per :doc:`/installation/shibboleth`. + +:AllowSignUp +++++++++++++ + +Set to false to disallow local accounts to be created if you are using :doc:`shibboleth` but not for production use until https://github.com/IQSS/dataverse/issues/2838 has been fixed. diff --git a/doc/sphinx-guides/source/installation/img/3webservers.png b/doc/sphinx-guides/source/installation/img/3webservers.png new file mode 100644 index 00000000000..b8bd222a56f Binary files /dev/null and b/doc/sphinx-guides/source/installation/img/3webservers.png differ diff --git a/doc/sphinx-guides/source/installation/img/dataflow.png b/doc/sphinx-guides/source/installation/img/dataflow.png new file mode 100644 index 00000000000..49c49c36c18 Binary files /dev/null and b/doc/sphinx-guides/source/installation/img/dataflow.png differ diff --git a/doc/sphinx-guides/source/installation/index.rst b/doc/sphinx-guides/source/installation/index.rst index 621b3d448ca..ba2992c5ec4 100755 --- a/doc/sphinx-guides/source/installation/index.rst +++ b/doc/sphinx-guides/source/installation/index.rst @@ -4,7 +4,7 @@ contain the root `toctree` directive. Installation Guide -======================================================= +================== Contents: @@ -12,9 +12,12 @@ Contents: :titlesonly: :maxdepth: 2 + intro + prep prerequisites - installer-script installation-main + config + administration + upgrading r-rapache-tworavens shibboleth - administration diff --git a/doc/sphinx-guides/source/installation/installation-main.rst b/doc/sphinx-guides/source/installation/installation-main.rst index c8871c2fea8..9835584b82c 100755 --- a/doc/sphinx-guides/source/installation/installation-main.rst +++ b/doc/sphinx-guides/source/installation/installation-main.rst @@ -1,171 +1,137 @@ -==================================== -Application Configuration -==================================== +============ +Installation +============ -**Much of the Dataverse Application configuration is done by the automated installer (described above). This section documents the additional configuration tasks that need to be done after you run the installer.** +Now that the :doc:`prerequisites` are in place, we are ready to execute the Dataverse installation script (the "installer") and verify that the installation was successful by logging in with a "superuser" account. -.. _introduction: +.. contents:: :local: -Dataverse Admin Account -+++++++++++++++++++++++ - -Now that you've run the application installer and have your own Dataverse instance, you need to configure the Dataverse Administrator user. -By default installer pre-sets the Admin credentials as follows: - -.. code-block:: none - - First Name: Dataverse - Last Name: Admin - Affiliation: Dataverse.org - Position: Admin - Email: dataverse@mailinator.com - -Log in as the user dataverseAdmin with the password "admin" and change these values to suit your installation. - -(Alteratively, you can modify the file ``dvinstall/data/user-admin.json`` in the installer bundle **before** you run the installer. The password is in ``dvinstall/setup-all.sh``, which references this JSON file.) - -Solr Configuration -++++++++++++++++++ - -Dataverse requires a specific Solr schema file called `schema.xml` that can be found in the Dataverse distribution. It should replace the default `example/solr/collection1/conf/schema.xml` file that ships with Solr. - -If ``WARN org.eclipse.jetty.http.HttpParser – HttpParser Full for /127.0.0.1:8983`` appears in the Solr log, adding ``8192`` (or a higher number of bytes) to Solr's jetty.xml in the section matching the XPath expression ``//Call[@name='addConnector']/Arg/New[@class='org.eclipse.jetty.server.bio.SocketConnector']`` may resolve the issue. See also https://support.lucidworks.com/hc/en-us/articles/201424796-Error-when-submitting-large-query-strings- - -Solr Security -------------- - -Solr must be firewalled off from all hosts except the server(s) running Dataverse. Otherwise, any host that can reach the Solr port (8983 by default) can add or delete data, search unpublished data, and even reconfigure Solr. For more information, please see https://wiki.apache.org/solr/SolrSecurity +Running the Dataverse Installer +------------------------------- -Settings -++++++++ +A scripted, interactive installer is provided. This script will configure your Glassfish environment, create the database, set some required options and start the application. Some configuration tasks will still be required after you run the installer! So make sure to consult the next section. +At this point the installer only runs on RHEL 6 and similar. -ApplicationPrivacyPolicyUrl ---------------------------- +You should have already downloaded the installer from https://github.com/IQSS/dataverse/releases when setting up and starting Solr under the :doc:`prerequisites` section. Again, it's a zip file with "dvinstall" in the name. -Specify a URL where users can read your Privacy Policy, linked from the bottom of the page. +Unpack the zip file - this will create the directory ``dvinstall``. -``curl -X PUT -d http://best-practices.dataverse.org/harvard-policies/harvard-privacy-policy.html http://localhost:8080/api/admin/settings/:ApplicationPrivacyPolicyUrl`` +Execute the installer script like this:: -ApplicationTermsOfUse ---------------------- + # cd dvinstall + # ./install -Upload a text file containing the Terms of Use to be displayed at sign up. +The script will prompt you for some configuration values. If this is a test/evaluation installation, it should be safe to accept the defaults for most of the settings: -``curl -X PUT -d@/tmp/apptou.html http://localhost:8080/api/admin/settings/:ApplicationTermsOfUse`` +- Internet Address of your host: localhost +- Glassfish Directory: /usr/local/glassfish4 +- SMTP (mail) server to relay notification messages: localhost +- Postgres Server: localhost +- Postgres Server Port: 5432 +- Name of the Postgres Database: dvndb +- Name of the Postgres User: dvnapp +- Postgres user password: secret +- Rserve Server: localhost +- Rserve Server Port: 6311 +- Rserve User Name: rserve +- Rserve User Password: rserve -ApiTermsOfUse -------------- +The script is to a large degree a derivative of the old installer from DVN 3.x. It is written in Perl. If someone in the community is eager to rewrite it, perhaps in a different language, please get in touch. :) -Upload a text file containing the API Terms of Use. +All the Glassfish configuration tasks performed by the installer are isolated in the shell script ``dvinstall/glassfish-setup.sh`` (as ``asadmin`` commands). -``curl -X PUT -d@/tmp/api-tos.txt http://localhost:8080/api/admin/settings/:ApiTermsOfUse`` +As the installer finishes, it mentions a script called ``post-install-api-block.sh`` which is **very important** to execute for any production installation of Dataverse. Security will be covered in :doc:`config` section but for now, let's make sure your installation is working. -SolrHostColonPort ------------------ +Logging In +---------- -Set ``SolrHostColonPort`` to override ``localhost:8983``. +Out of the box, Glassfish runs on port 8080 and 8181 rather than 80 and 443, respectively, so visiting http://localhost:8080 (substituting your hostname) should bring up a login page. See the :doc:`shibboleth` page for more on ports, but for now, let's confirm we can log in by using port 8080. Poke a temporary hole in your firewall. -``curl -X PUT -d localhost:8983 http://localhost:8080/api/admin/settings/:SolrHostColonPort`` +Superuser Account ++++++++++++++++++ -SearchHighlightFragmentSize ---------------------------- +We'll use the superuser account created by the installer to make sure you can log into Dataverse. For more on the difference between being a superuser and having the "Admin" role, read about configuring the root dataverse in the :doc:`config` section. -Set ``SearchHighlightFragmentSize`` to override the default value of 100 from https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize +(The ``dvinstall/setup-all.sh`` script, which is called by the installer sets the password for the superuser account account and the username and email address come from a file it references at ``dvinstall/data/user-admin.json``.) -``curl -X PUT -d 320 http://localhost:8080/api/admin/settings/:SearchHighlightFragmentSize`` +Use the following credentials to log in: -ShibEnabled ------------ +- URL: http://localhost:8080 +- username: dataverseAdmin +- password: admin -This setting is experimental per :doc:`/installation/shibboleth`. +Congratulations! You have a working Dataverse installation. Soon you'll be tweeting at `@dataverseorg `_ asking to be added to the map at http://dataverse.org :) -MaxFileUploadSizeInBytes ------------------------------- +(While you're logged in, you should go ahead and change the email address of the dataverseAdmin account to a real one rather than "dataverse@mailinator.com" so that you receive notifications.) -Set `MaxFileUploadSizeInBytes` to "2147483648", for example, to limit the size of files uploaded to 2 GB. -Notes: -- For SWORD, this size is limited by the Java Integer.MAX_VALUE of 2,147,483,647. (see: https://github.com/IQSS/dataverse/issues/2169) -- If the MaxFileUploadSizeInBytes is NOT set, uploads, including SWORD may be of unlimited size. +Trouble? See if you find an answer in the troubleshooting section below. -``curl -X PUT -d 2147483648 http://localhost:8080/api/admin/settings/:MaxFileUploadSizeInBytes`` +Next you'll want to check out the :doc:`config` section. -GuidesBaseUrl -------------- +Troubleshooting +--------------- -Set ``GuidesBaseUrl`` to override the default value "http://guides.dataverse.org". +If the following doesn't apply, please get in touch as explained in the :doc:`intro`. You may be asked to provide ``glassfish4/glassfish/domains/domain1/logs/server.log`` for debugging. -``curl -X PUT -d http://dataverse.example.edu http://localhost:8080/api/admin/settings/:GuidesBaseUrl`` +Dataset Cannot Be Published ++++++++++++++++++++++++++++ -GeoconnectCreateEditMaps ------------------------- +Check to make sure you used a fully qualified domain name when installing Dataverse. You can change the ``dataverse.fqdn`` JVM option after the fact per the :doc:`config` section. -Set ``GeoconnectCreateEditMaps`` to true to allow the user to create GeoConnect Maps. This boolean effects whether the user sees the map button on the dataset page and if the ingest will create a shape file. - -``curl -X PUT -d true http://localhost:8080/api/admin/settings/:GeoconnectCreateEditMaps`` - -GeoconnectViewMaps ------------------- - -Set ``GeoconnectViewMaps`` to true to allow a user to view existing maps. This boolean effects whether a user will see the "Explore" button. - -``curl -X PUT -d true http://localhost:8080/api/admin/settings/:GeoconnectViewMaps`` - - -JVM Options -+++++++++++ - -dataverse.fqdn --------------- +Problems Sending Email +++++++++++++++++++++++ -If the Dataverse server has multiple DNS names, this option specifies the one to be used as the "official" host name. For example, you may want to have dataverse.foobar.edu, and not the less appealling server-123.socsci.foobar.edu to appear exclusively in all the registered global identifiers, Data Deposit API records, etc. +You can confirm the SMTP server being used with this command: -To change the option on the command line: +``asadmin get server.resources.mail-resource.mail/notifyMailSession.host`` -``asadmin delete-jvm-options "-Ddataverse.fqdn=old.example.com"`` +UnknownHostException While Deploying +++++++++++++++++++++++++++++++++++++ -``asadmin create-jvm-options "-Ddataverse.fqdn=dataverse.example.com"`` +If you are seeing "Caused by: java.net.UnknownHostException: myhost: Name or service not known" in server.log and your hostname is "myhost" the problem is likely that "myhost" doesn't appear in ``/etc/hosts``. See also http://stackoverflow.com/questions/21817809/glassfish-exception-during-deployment-project-with-stateful-ejb/21850873#21850873 -The ``dataverse.fqdn`` JVM option also affects the password reset feature. +Fresh Reinstall +--------------- -| Do note that whenever the system needs to form a service URL, by default, it will be formed with ``https://`` and port 443. I.e., -| ``https://{dataverse.fqdn}/`` -| If that does not suit your setup, you can define an additional option - +Early on when you're installing Dataverse, you may think, "I just want to blow away what I've installed and start over." That's fine. You don't have to uninstall the various components like Glassfish, PostgreSQL and Solr, but you should be conscious of how to clear out their data. -dataverse.siteUrl ------------------ +Drop database ++++++++++++++ -| and specify the alternative protocol and port number. -| For example, configured in domain.xml: -| ``-Ddataverse.fqdn=dataverse.foobar.edu`` -| ``-Ddataverse.siteUrl=http://${dataverse.fqdn}:8080`` +In order to drop the database, you have to stop Glassfish, which will have open connections. Before you stop Glassfish, you may as well undeploy the war file. First, find the name like this: +``asadmin list-applications`` -dataverse.auth.password-reset-timeout-in-minutes ------------------------------------------------- +Then undeploy it like this: -Set the ``dataverse.auth.password-reset-timeout-in-minutes`` option if you'd like to override the default value put into place by the installer. +``asadmin undeploy dataverse-VERSION`` -Dropbox Configuration -++++++++++++++++++++++ - -- Add JVM option in the domain.xml: -``asadmin create-jvm-options "-Ddataverse.dropbox.key="`` +Stop Glassfish with the init script provided in the :doc:`prerequisites` section or just use: +``asadmin stop-domain`` +With Glassfish down, you should now be able to drop your database and recreate it: +``psql -U dvnapp -c 'DROP DATABASE "dvndb"' template1`` +Clear Solr +++++++++++ +The database is fresh and new but Solr has stale data it in. Clear it out with this command: +``curl http://localhost:8983/solr/update/json?commit=true -H "Content-type: application/json" -X POST -d "{\"delete\": { \"query\":\"*:*\"}}"`` +Deleting uploaded files ++++++++++++++++++++++++ +The path below will depend on the value for ``dataverse.files.directory`` as described in the :doc:`config` section: +``rm -rf /usr/local/glassfish4/glassfish/domains/domain1/files`` -The guide is intended for anyone who needs to install the Dataverse app. +Rerun Installer ++++++++++++++++ -If you encounter any problems during installation, please contact the -development team -at `support@thedata.org `__ -or our `Dataverse Users -Community `__. +With all the data cleared out, you should be ready to rerun the installer per above. +Related to all this is a series of scripts at https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy that Dataverse developers use have the test server http://phoenix.dataverse.org rise from the ashes before integration tests are run against it. Your mileage may vary. :) diff --git a/doc/sphinx-guides/source/installation/installer-script.rst b/doc/sphinx-guides/source/installation/installer-script.rst index 5cd7eb5e52d..72881587917 100644 --- a/doc/sphinx-guides/source/installation/installer-script.rst +++ b/doc/sphinx-guides/source/installation/installer-script.rst @@ -1,23 +1 @@ -==================================== -Dataverse Application Installer -==================================== - -**A scripted, interactive installer is provided. This script will configure your glassfish environment, create the database, set some required options and start the application. Some configuration tasks will still be required after you run the installer! So make sure to consult the next section. -At this point the installer only runs on RedHat 6.* derivatives.** - -Download and run the installer -------------------------------- - -Download the installer package (``dvnstall.zip``). Unpack the zip file - this will create the directory ``dvinstall``. -Execuite the installer script (``install``): - -``cd dvinstall`` - -``./install`` - -The script will prompt you for some configuration values. If this is a test/evaluation installation, it should be safe to accept nthe defaults for most of the settiongs. For for a developer's installation we recommend that you choose ``localhost`` for the host name. - -The script is to a large degree a derivative of the old installer from DVN 3.x. It is written in Perl. - -All the Glassfish configuration tasks performed by the installer are isolated in the shell script ``scripts/install/glassfish-setup.sh`` (as ``asadmin`` commands). - +This content has been moved to :doc:`/installation/installation-main`. diff --git a/doc/sphinx-guides/source/installation/intro.rst b/doc/sphinx-guides/source/installation/intro.rst new file mode 100644 index 00000000000..f3a2d226fb0 --- /dev/null +++ b/doc/sphinx-guides/source/installation/intro.rst @@ -0,0 +1,42 @@ +============ +Introduction +============ + +Welcome! Thanks for installing `Dataverse `_! + +Quick Links +----------- + +If you are installing Dataverse for the first time, please proceed to the :doc:`prep` section. + +Jump ahead to :doc:`config` or :doc:`upgrading` for an existing Dataverse installation. + +Intended Audience +----------------- + +This guide is intended primarily for sysadmins who are installing, configuring, and upgrading Dataverse. + +Sysadmins are expected to be comfortable using standard Linux commands, issuing ``curl`` commands, and running SQL scripts. + +Related Guides +-------------- + +Many "admin" functions can be performed by Dataverse users themselves (non-superusers) as documented in the :doc:`/user/index` and that guide is a good introduction to the features of Dataverse from an end user perspective. + +If you are a sysadmin who likes to code, you may find the :doc:`/api/index` useful, and you may want to consider improving the installation script or hacking on the community-lead configuration management options mentioned in the :doc:`prep` section. If you **really** like to code and want to help with the Dataverse code, please check out the :doc:`/developers/index`! + +.. _support: + +Getting Help +------------ + +To get help installing or configuring Dataverse, please try one or more of: + +- posting to the `dataverse-community `_ Google Group. +- asking at http://chat.dataverse.org (#dataverse on the freenode IRC network) +- emailing support@dataverse.org to open a private ticket at https://help.hmdc.harvard.edu + +Improving this Guide +-------------------- + +If you spot a typo in this guide or would like to suggest an improvement, please find the appropriate file in https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/installation and send a pull request. You are also welcome to simply open an issue at https://github.com/IQSS/dataverse/issues to describe the problem with this guide. diff --git a/doc/sphinx-guides/source/installation/prep.rst b/doc/sphinx-guides/source/installation/prep.rst new file mode 100644 index 00000000000..76c224b40bb --- /dev/null +++ b/doc/sphinx-guides/source/installation/prep.rst @@ -0,0 +1,106 @@ +=========== +Preparation +=========== + +:: + +> "What are you preparing? You're always preparing! Just go!" -- Spaceballs + +We'll try to get you up and running as quicky as possible, but we thought you might like to hear about your options. :) + +.. contents:: :local: + +Choose Your Own Installation Adventure +-------------------------------------- + +Vagrant (for Testing Only) +++++++++++++++++++++++++++ + +If you are looking to simply kick the tires on Dataverse and are familiar with Vagrant, running ``vagrant up`` after cloning the Dataverse repo **should** give you a working installation at http://localhost:8888 . This is one of the :doc:`/developers/tools` developers use to test the installation process but you're welcome to give it a shot. + +Pilot Installation +++++++++++++++++++ + +Vagrant is not a bad way for a sysadmin to get a quick sense of how an application like Dataverse is put together in a sandbox (a virtual machine running on a laptop for example), but to allow end users to start playing with Dataverse, you'll need to install Dataverse on a server. + +Installing Dataverse involves some system configuration followed by executing an installation script that will guide you through the installation process as described in :doc:`installation-main`, but reading about the :ref:`architecture` of Dataverse is recommended first. + +.. _advanced: + +Advanced Installation ++++++++++++++++++++++ + +There are some community-lead projects to use configuration management tools such as Puppet and Ansible to automate Dataverse installation and configuration, but support for these solutions is limited to what the Dataverse community can offer as described in each project's webpage: + +- https://github.com/IQSS/dataverse-puppet +- https://github.com/IQSS/dataverse-ansible + +The Dataverse development team is happy to "bless" additional community efforts along these lines (i.e. Docker, Chef, Salt, etc.) by creating a repo under https://github.com/IQSS and managing team access. + +Dataverse permits a fair amount of flexibility in where you choose to install the various components. The diagram below shows a load balancer, multiple proxies and web servers, redundant database servers, and offloading of potentially resource intensive work to a separate server. A setup such as this is advanced enough to be considered out of scope for this guide but you are welcome to ask questions about similar configurations via the support channels listed in the :doc:`intro`. + +|3webservers| + + +.. _architecture: + +Architecture and Components +--------------------------- + +Dataverse is a Java Enterprise Edition (EE) web application that is shipped as a war (web archive) file. + +When planning your installation you should be aware of the following components of the Dataverse architecture: + +- Linux: RHEL/CentOS is highly recommended since all development and QA happens on this distribution. +- Glassfish: a Java EE application server to which the Dataverse application (war file) is to be deployed. +- PostgreSQL: a relational database. +- Solr: a search engine. A Dataverse-specific schema is provided. +- SMTP server: for sending mail for password resets and other notifications. +- Persistent indentifer service: DOI support is provided. An EZID subscription is required for production use. + +There are a number of optional components you may choose to install or configure, including: + +- R, rApache, Zelig, and TwoRavens: :doc:`/user/data-exploration/tworavens` describes the feature and :doc:`r-rapache-tworavens` describes how to install these components. +- Dropbox integration: for uploading files from the Dropbox API. +- Apache: a web server that can "reverse proxy" Glassfish applications and rewrite HTTP traffic. +- Shibboleth: an authentication system described in :doc:`shibboleth`. Its use with Dataverse requires Apache. +- Geoconnect: :doc:`/user/data-exploration/worldmap` describes the feature and the code can be downloaded from https://github.com/IQSS/geoconnect + +System Requirements +------------------- + +Hardware Requirements ++++++++++++++++++++++ + +A basic installation of Dataverse runs fine on modest hardware. For example, as of this writing the test installation at http://phoenix.dataverse.org is backed by a single virtual machine with two 2.8 GHz processors, 8 GB of RAM and 50 GB of disk. + +In contrast, the production installation at https://dataverse.harvard.edu is currently backed by six servers with two Intel Xeon 2.53 Ghz CPUs and either 48 or 64 GB of RAM. The three servers with 48 GB of RAM run are web frontends running Glassfish and Apache and are load balanced by a hardware device. The remaining three servers with 64 GB of RAM are the primary and backup database servers and a server dedicated to running Rserve. Multiple TB of storage are mounted from a SAN via NFS. The :ref:`advanced` section shows a diagram (a seventh server to host Geoconnect will probably be added). + +The Dataverse installation script will attempt to give Glassfish the right amount of RAM based on your system. + +Experimentation and testing with various hardware configurations is encouraged, or course, but do reach out as explained in the :doc:`intro` as needed for assistance. + +Software Requirements ++++++++++++++++++++++ + +See :ref:`architecture` for an overview of required and optional components. The :doc:`prerequisites` section is oriented toward installing the software necessary to successfully run the Dataverse installation script. Pages on optional components contain more detail of software requirements for each component. + +Clients are expected to be running a relatively modern browser. + +Decisions to Make +----------------- + +Here are some questions to keep in the back of your mind as you test and move into production: + +- How much storage do I need? +- Which features do I want based on :ref:`architecture`? +- Do I want to to run Glassfish on the standard web ports (80 and 443) or do I prefer to have a proxy such as Apache in front? +- How many points of failure am I willing to tolerate? How much complexity do I want? +- How much does it cost to subscribe to a service to create persistent identifiers such as DOIs? + +Next Steps +---------- + +Proceed to the :doc:`prerequisites` section will help you get ready to run the Dataverse installation script. + +.. |3webservers| image:: ./img/3webservers.png diff --git a/doc/sphinx-guides/source/installation/prerequisites.rst b/doc/sphinx-guides/source/installation/prerequisites.rst index 3c77683aa9e..a224c0397df 100644 --- a/doc/sphinx-guides/source/installation/prerequisites.rst +++ b/doc/sphinx-guides/source/installation/prerequisites.rst @@ -1,86 +1,100 @@ -==================================== +============= Prerequisites -==================================== +============= -.. _introduction: +Before running the Dataverse installation script, you must install and configure the following software, preferably on a distribution of Linux such as RHEL or its derivatives such as CentOS. After following all the steps below (which have been written based on CentOS 6), you can proceed to the :doc:`installation-main` section. + +You **may** find it helpful to look at how the configuration is done automatically by various tools such as Vagrant, Puppet, or Ansible. See the :doc:`prep` section for pointers on diving into these scripts. + +.. contents:: :local: Java ----------------------------- +---- + Dataverse requires Java 8 (also known as 1.8). +Installing Java +=============== + Dataverse should run fine with only the Java Runtime Environment (JRE) installed, but installing the Java Development Kit (JDK) is recommended so that useful tools for troubleshooting production environments are available. We recommend using Oracle JDK or OpenJDK. The Oracle JDK can be downloaded from http://www.oracle.com/technetwork/java/javase/downloads/index.html On a Red Hat and similar Linux distributions, install OpenJDK with something like:: - $ yum install java-1.8.0-openjdk-devel + # yum install java-1.8.0-openjdk-devel If you have multiple versions of Java installed, Java 8 should be the default when ``java`` is invoked from the command line. You can test this by running ``java -version``. On Red Hat/CentOS you can make Java 8 the default with the ``alternatives`` command, having it prompt you to select the version of Java from a list:: - $ alternatives --config java + # alternatives --config java If you don't want to be prompted, here is an example of the non-interactive invocation:: - $ alternatives --set java /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java + # alternatives --set java /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java Glassfish ----------------------------- +--------- -Glassfish Version 4.1 is required. +Glassfish Version 4.1 is required. There are known issues with Glassfish 4.1.1 as chronicled in https://github.com/IQSS/dataverse/issues/2628 so it should be avoided until that issue is resolved. -**Important**: once Glassfish is installed, a new version of the WELD library (v2.2.10.SP1) must be downloaded and installed. This fixes a serious issue in the library supplied with Glassfish 4.1. +Installing Glassfish +==================== +**Important**: once Glassfish is installed, a new version of the Weld library (v2.2.10.SP1) must be downloaded and installed. This fixes a serious issue in the library supplied with Glassfish 4.1 ( see https://github.com/IQSS/dataverse/issues/647 for details). Please note that if you plan to front Glassfish with Apache you must also patch Grizzly as explained in the :doc:`shibboleth` section. - Download and install Glassfish (installed in ``/usr/local/glassfish4`` in the example commands below):: - $ wget http://dlc-cdn.sun.com/glassfish/4.1/release/glassfish-4.1.zip - $ unzip glassfish-4.1.zip - $ mv glassfish4 /usr/local + # wget http://dlc-cdn.sun.com/glassfish/4.1/release/glassfish-4.1.zip + # unzip glassfish-4.1.zip + # mv glassfish4 /usr/local -- Remove the stock WELD jar; download WELD v2.2.10.SP1 and install it in the modules folder:: +- Remove the stock Weld jar; download Weld v2.2.10.SP1 and install it in the modules folder:: - $ cd /usr/local/glassfish4/glassfish/modules - $ /bin/rm weld-osgi-bundle.jar - $ wget http://central.maven.org/maven2/org/jboss/weld/weld-osgi-bundle/2.2.10.SP1/weld-osgi-bundle-2.2.10.SP1-glassfish4.jar - $ /usr/local/glassfish4/bin/asadmin start-domain domain1 + # cd /usr/local/glassfish4/glassfish/modules + # rm weld-osgi-bundle.jar + # wget http://central.maven.org/maven2/org/jboss/weld/weld-osgi-bundle/2.2.10.SP1/weld-osgi-bundle-2.2.10.SP1-glassfish4.jar + # /usr/local/glassfish4/bin/asadmin start-domain -- Verify Weld version:: +- Verify the Weld version:: - $ /usr/local/glassfish4/bin/asadmin osgi lb | grep 'Weld OSGi Bundle' + # /usr/local/glassfish4/bin/asadmin osgi lb | grep 'Weld OSGi Bundle' -PostgreSQL ----------------------------- +- Stop Glassfish and change from ``-client`` to ``-server`` under ``-client``:: -1. Installation -================ + # /usr/local/glassfish4/bin/asadmin stop-domain + # vim /usr/local/glassfish4/glassfish/domains/domain1/config/domain.xml + +This recommendation comes from http://blog.c2b2.co.uk/2013/07/glassfish-4-performance-tuning.html among other places. + +Glassfish Init Script +===================== -Version 9.3 is recommended. +The Dataverse installation script will start Glassfish if necessary, but while you're configuring Glassfish, you might find the following init script helpful to have Glassfish start on boot. -1A. RedHat and similar systems: -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Adjust `this Glassfish init script <../_static/installation/files/etc/init.d/glassfish>`_ for your needs or write your own. -We recommend installing Postgres from the EPEL repository:: +It is not necessary to have Glassfish running before you execute the Dataverse installation script because it will start Glassfish for you. - $ wget http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-centos93-9.3-1.noarch.rpm - rpm -ivh pgdg-centos93-9.3-1.noarch.rpm +PostgreSQL +---------- + +Installing PostgreSQL +======================= - $ yum install postgresql93-server.x86_64 - $ chkconfig postgresql-9.3 on - $ service postgresql-9.3 initdb - $ service postgresql-9.3 start +Version 9.x is required. Previous versions have not been tested. -1B. MacOS X: -~~~~~~~~~~~~~ +The version that ships with RHEL 6 and above is fine:: -A distribution from `http://www.enterprisedb.com `__ is recommended. Fink and MacPorts distributions are also readily available. See `http://www.postgresql.org/download/macosx/ `__ for more information. + # yum install postgresql-server + # service postgresql initdb + # service postgresql start -2. Configure access to PostgresQL for the installer script -========================================================== +Configure Access to PostgreSQL for the Installer Script +======================================================= -- The installer script needs to have direct access to the local PostgresQL server via Unix domain sockets. This is configured by the line that starts with "local all all" in the pg_hba.conf file. The location of this file may vary depending on the distribution. But if you followed the suggested installtion instructions above, it will be ``/var/lib/pgsql/9.3/data/pg_hba.conf`` on RedHat (and similar) and ``/Library/PostgreSQL/9.3/data/pg_hba.conf`` on MacOS. Make sure the line looks like this (it will likely be pre-configured like this already):: +- When using localhost for the database server, the installer script needs to have direct access to the local PostgreSQL server via Unix domain sockets. This is configured by the line that starts with ``local all all`` in the pg_hba.conf file. The location of this file may vary depending on the distribution. But if you followed the suggested installation instructions above, it will be ``/var/lib/pgsql/data/pg_hba.conf`` on RHEL and similar. Make sure the line looks like this (it will likely be pre-configured like this already):: local all all peer @@ -88,12 +102,12 @@ A distribution from `http://www.enterprisedb.com /proc/sys/vm/overcommit_memory - - # Set UTF8 as the default encoding: - LANG=en_US.UTF-8; export LANG - $ASADMIN start-domain domain1 - echo "." - ;; - stop) - echo -n "Stopping GlassFish server: glassfish" - - $ASADMIN stop-domain domain1 - echo "." - ;; - - *) - echo "Usage: /etc/init.d/glassfish {start|stop}" - exit 1 - esac - exit 0 - +---- + +The Dataverse search index is powered by Solr. + +Installing Solr +=============== + +Download and install Solr with these commands:: + + # wget https://archive.apache.org/dist/lucene/solr/4.6.0/solr-4.6.0.tgz + # tar xvzf solr-4.6.0.tgz + # rsync -auv solr-4.6.0 /usr/local/ + # cd /usr/local/solr-4.6.0/example/solr/collection1/conf/ + # cp -a schema.xml schema.xml.orig + +The reason for backing up the ``schema.xml`` file is that Dataverse requires a custom Solr schema to operate. This ``schema.xml`` file is contained in the "dvinstall" zip supplied in each Dataverse release at https://github.com/IQSS/dataverse/releases . Download this zip file, extract ``schema.xml`` from it, and put it into place (in the same directory as above):: + + # cp /tmp/schema.xml schema.xml + +With the Dataverse-specific schema in place, you can now start Solr:: + + # java -jar start.jar + +Solr Init Script +================ + +The command above will start Solr in the foreground which is good for a quick sanity check that Solr accepted the schema file, but starting Solr with an init script is recommended. You can attempt to adjust `this Solr init script <../_static/installation/files/etc/init.d/solr>`_ for your needs or write your own. + +Solr should be running before the installation script is executed. + +Securing Solr +============= + +Solr must be firewalled off from all hosts except the server(s) running Dataverse. Otherwise, any host that can reach the Solr port (8983 by default) can add or delete data, search unpublished data, and even reconfigure Solr. For more information, please see https://wiki.apache.org/solr/SolrSecurity + +You may want to poke a temporary hole in your firewall to play with the Solr GUI. More information on this can be found in the :doc:`/developers/dev-environment` section of the Developer Guide. + +jq +-- + +Installing jq +============= + +``jq`` is a command line tool for parsing JSON output that is used by the Dataverse installation script. https://stedolan.github.io/jq explains various ways of installing it, but a relatively straightforward method is described below. Please note that you must download the 64- or 32-bit version based on your architecture. In the example below, the 64-bit version is installed. We confirm it's executable and in our ``$PATH`` by checking the version (1.4 or higher should be fine):: + + # cd /usr/bin + # wget http://stedolan.github.io/jq/download/linux64/jq + # chmod +x jq + # jq --version + +Now that you have all the prerequisites in place, you can proceed to the :doc:`installation-main` section. diff --git a/doc/sphinx-guides/source/installation/upgrading.rst b/doc/sphinx-guides/source/installation/upgrading.rst new file mode 100644 index 00000000000..4ad14bb67d6 --- /dev/null +++ b/doc/sphinx-guides/source/installation/upgrading.rst @@ -0,0 +1,11 @@ +========= +Upgrading +========= + +When upgrading within Dataverse 4.x, you will need to follow the upgrade instructions for each intermediate version. + +Upgrades always involve deploying the latest war file but may also include running SQL scripts and updating the schema used by Solr. + +Please consult the release notes associated with each release at https://github.com/IQSS/dataverse/releases for more information. + +Upgrading from DVN 3.x is actually a migration due to the many changes. Migration scripts have been checked into the source tree but as of this writing it is expected that people will require assistance running them. Please reach out per the :doc:`intro` section.