ScienceDynamics

The package supports parsing and extracting data from bibliometric datasets namely:

This package is an improved version of the code used in Over-optimization of academic publishing metrics: observing Goodhart’s Law in action. Due to the size of the datasets, the building of the full datasets may take several hours and requires at least 450GB of free space on the hard drive. We suggest using a memory-intensive server for the computations (we used a server with 12 cores and 1TB RAM).

Folder Structure:

examples - Code examples.
examples/Coronavirus - The code used in "Scientometric Trends for Coronaviruses and Other Emerging Viral Infections"
examples/Over-optimization - The code used in "Over-optimization of academic publishing metrics: observing Goodhart’s Law in action"
ScienceDynamics - Library source code.

System Requirements

Python 3.6, 3.7
As much RAM as possible, smaller tables such as Affiliations should work perfectly with 16GB RAM and less. However, larger tables such as ExtendedPapers requires a lot of RAM. Since Turi Create is not memory bound it should be able to load the data but it will drastically reduce performance and some functions will crash the kernel (out of memory).

Supported Platforms

macOS 10.12+
Linux (with glibc 2.10+)
Windows 10 (via WSL)

Installation

Installation from zero:

git clone https://github.com/data4goodlab/ScienceDynamics
pip install -r requirements.txt
pip install pycld2 
conda install --yes pycurl #Install before wptools
pip install wptools

If pycld2 installation fails to install: GCC and g++. For Debian distribution run:

apt-get install -y  gcc
apt-get install -y g++

Note: If you have libblas or liblapack errors please read Turi Create LINUX_INSTALL.md.

Docker:

Build or download the docker image sciencedynamics.tar. Then run docker load --input sciencedynamics.tar Since docker is not designed to save data persistently, we recommend mapping the data directories to directories on the hosting machine. For Example:

docker run -p 127.0.0.1:9000:8888 -v $(pwd)/scidyn2:/root/.scidyn2 -v $(pwd)/ScienceDynamics/examples/Coronavirus/Data/:/ScienceDynamics/examples/Coronavirus/Data/ --name corona sciencedynamics:1.2

The Jupyter notebook will be accessible on localhost:9000.

Example of how to load the data used in "Scientometric Trends for Coronaviruses and Other Emerging Viral Infections" available here.

Example:

from ScienceDynamics.datasets import MicrosoftAcademicGraph
mag = MicrosoftAcademicGraph()
mag.extended_papers

The data will be downloaded when accessing a specific table. By default, the data will be saved in ~.scidyn2. You can select the directory to where to download/load from the data using by dataset_dir= parameter.

Also, it is possible to download all the data using one command:

from ScienceDynamics.datasets import MicrosoftAcademicGraph
mag = MicrosoftAcademicGraph(download=True)
mag.extended_papers

To Do

Refactoring
Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
ScienceDynamics		ScienceDynamics
examples		examples
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dataset.md		Dataset.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-requirements.txt		docker-requirements.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScienceDynamics

Folder Structure:

System Requirements

Supported Platforms

Installation

Docker:

Example:

To Do

About

Releases

Packages

Contributors 2

Languages

License

data4goodlab/ScienceDynamics

Folders and files

Latest commit

History

Repository files navigation

ScienceDynamics

Folder Structure:

System Requirements

Supported Platforms

Installation

Docker:

Example:

To Do

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages