- About the project
- Installation
- Initialization
- Launch & develop GarganText
- Uses cases
- GraphQL
- PostgreSQL
GarganText is a collaborative web-decentralized-based macro-service platform for the exploration of unstructured texts. It combines tools from natural language processing, text-data-mining bricks, complex networks analysis algorithms and interactive data visualization tools to pave the way toward new kinds of interactions with your textual and digital corpora.
This software is free (as "Libre" in French) software, developed by the CNRS Complex Systems Institute of Paris Île-de-France (ISC-PIF) and its partners.
GarganText Project: this repo builds the backend for the frontend server built by backend.
Disclaimer: since this project is still in development, this document remains in progress. Please report and improve this documentation if you encounter any issues.
- Install:
- Clone the project.
git clone https://gitlab.iscpif.fr/gargantext/haskell-gargantext.git cd haskell-gargantext
This project can be built with either Stack or Cabal. We keep up-to-date the cabal.project
(which allows us
to build with cabal
by default) but we support stack
thanks to thanks to
cabal2stack, which allows us to generate a valid stack.yaml
from
a cabal.project
. Due to the fact gargantext requires a particular set of system dependencies (C++ libraries,
toolchains, etc) we use nix to setup an environment with all the required system
dependencies, in a sandboxed and isolated fashion.
As said, Gargantext requires Nix to provide system dependencies (for example, C libraries), but its use is limited to that. In order to install Nix:
sh <(curl -L https://nixos.org/nix/install) --daemon
Verify the installation is complete with
nix-env --version
nix-env (Nix) 2.19.2
Important: Before building the project with either stack
or cabal
you need to be in the correct Nix shell, which will fetch all the required system dependencies. To do so, just type inside your haskell-gargantext folder:
nix-shell
This will take a bit of time as it has to download/build the dependencies, but this will be needed only the first time.
Create a cabal.project.local
file (don't commit it to git!):
package gargantext
ghc-options: -fwrite-ide-info -hiedir=".stack-work/hiedb" -O0
package gargantext-admin
ghc-options: -O0
package gargantext-cli
ghc-options: -O0
package gargantext-db-obfuscation
ghc-options: -O0
package gargantext-import
ghc-options: -O0
package gargantext-init
ghc-options: -O0
package gargantext-invitations
ghc-options: -O0
package gargantext-phylo
ghc-options: -O0
package gargantext-server
ghc-options: -O0
package gargantext-upgrade
ghc-options: -O0
package gargantext-graph
ghc-options: -O0
package hmatrix
ghc-options: -O0
package sparse-linear
ghc-options: -O0
First, into nix-shell
:
cabal update
cabal install
Alternatively, if you want to run the command "from the outside", in your current shell:
nix-shell --run "cabal update"
nix-shell --run "cabal install"
Install Stack (or Haskell Tool Stack):
curl -sSL https://get.haskellstack.org/ | sh
Verify the installation is complete with
stack --version
Version 2.9.1
NOTE: Default build (with optimizations) requires large amounts of RAM (16GB at least). To avoid heavy compilation times and swapping out your machine, it is recommended to stack build
with the --fast
flag, i.e.:
stack build --fast
(Section for Developers using stack only)
Once you have a valid version of stack
, building requires generating a valid stack.yaml
.
This can be obtained by installing cabal2stack
:
git clone https://github.com/iconnect/cabal2stack.git
cd cabal2stack
Then, depending on what build system you are using, either build with cabal install --overwrite-policy=always
or stack install
.
And finally:
cabal2stack --system-ghc --allow-newer --resolver lts-21.17 --resolver-file devops/stack/lts-21.17.yaml -o stack.yaml
stack build
The good news is that you don't have to do all of this manually; during development, after modifying the
cabal.project
, it's enough to do:
./bin/update-project-dependencies
# If docker is not installed:
# curl -sSL https://gitlab.iscpif.fr/gargantext/haskell-gargantext/raw/dev/devops/docker/install_docker | sh
cd devops/docker
docker compose up
Initialization schema should be loaded automatically (from devops/postgres/schema.sql
).
stack install
cp gargantext.ini_toModify gargantext.ini
Do not worry,
.gitignore
avoids adding this file to the repository by mistake, then you can change the passwords in gargantext.ini safely.
~/.local/bin/gargantext-init "gargantext.ini"
Now, user1
is created with password 1resu
From the Backend root folder (haskell-gargantext):
git clone ssh://git@gitlab.iscpif.fr:20022/gargantext/purescript-gargantext.git
Note: here, the method with Cabal is used as default
From the Backend root folder (haskell-gargantext):
./start
# The start script runs following commands:
# - `./bin/install` to update and build the project
# - `docker compose up` to run the Docker for postgresql from devops/docker folder
# - `cabal run gargantext-server -- --ini gargantext.ini --run Prod` to run other services through `nix-shell`
For frontend development and compilation, see the Frontend Readme.md
From nix shell:
cabal v2-test --test-show-details=streaming
Or, from "outside":
nix-shell --run "cabal v2-test --test-show-details=streaming"
When a devlopment is needed on libraries (for instance, the HAL crawler in https://gitlab.iscpif.fr/gargantext/crawlers):
- Ongoing devlopment (on local repo):
- In
cabal.project
:- add
../hal
topackages:
- turn off (temporarily) the
hal
insource-repository-package
- add
- When changes work and tests are OK, commit in repo
hal
- In
- When changes are commited / merged:
- Get the hash id, and edit
cabal.project
with the new commit id - run
./bin/update-project-dependencies
- get an error that sha256 don't match, so update the
./bin/update-project-dependencies
with new sha256 hash - run again
./bin/update-project-dependencies
(to make sure it's a fixed point now)
- get an error that sha256 don't match, so update the
- Get the hash id, and edit
Note: without
stack.yaml
we would have to only fixcabal.project
->source-repository-package
commit id. Sha256 is there to make sure CI reruns the tests.
~/.local/bin/stack --docker exec gargantext-server -- --ini "gargantext.ini" --run Prod
Then you can log in with user1
/ 1resu
stack --docker exec gargantext-cli -- CorpusFromGarg.csv ListFromGarg.csv Ouput.json
We store the repository in directory repos
in the CBOR file format. To decode it to JSON and analyze, say, using jq, use the following command:
cat repos/repo.cbor.v5 | stack exec gargantext-cbor2json | jq .
To build documentation, run:
stack build --haddock --no-haddock-deps --fast
(in .stack-work/dist/x86_64-linux-nix/Cabal-3.2.1.0/doc/html/gargantext
).
Some introspection information.
Playground is located at http://localhost:8008/gql
{
__schema {
types {
name
}
}
}
{
__type(name:"User") {
fields {
name
description
type {
name
}
}
}
}
https://www.cloudytuts.com/tutorials/docker/how-to-upgrade-postgresql-in-docker-and-kubernetes/
To upgrade PostgreSQL in Docker containers, for example from 11.x to 14.x, simply run:
docker exec -it <container-id> pg_dumpall -U gargantua > 11-db.dump
Then, shut down the container, replace image
section in devops/docker/docker-compose.yaml
with postgres:14
. Also, it is a good practice to create a new volume, say garg-pgdata14
and bind the new container to it. If you want to keep the same volume, remember about removing it like so:
docker-compose rm postgres
docker volume rm docker_garg-pgdata
Now, start the container and execute:
# need to drop the empty DB first, since schema will be created when restoring the dump
docker exec -i <new-container-id> dropdb -U gargantua gargandbV5
# recreate the db, but empty with no schema
docker exec -i <new-container-id> createdb -U gargantua gargandbV5
# now we can restore the dump
docker exec -i <new-container-id> psql -U gargantua -d gargandbV5 < 11-db.dump
There is a solution using pgupgrade_cluster but you need to manage the clusters version 14 and 13. Hence here is a simple solution to upgrade.
First save your data:
sudo su postgres
pg_dumpall > gargandb.dump
Upgrade postgresql:
sudo apt install postgresql-server-14 postgresql-client-14
sudo apt remove --purge postgresql-13
Restore your data:
sudo su postgres
psql < gargandb.dump
Maybe you need to restore the gargantua password
ALTER ROLE gargantua PASSWORD 'yourPasswordIn_gargantext.ini'
Maybe you need to change the port to 5433 for database connection in your gargantext.ini file.