This repository contains notebooks for exploring a remote-hosted Gaia EDR3 x PanSTARRS dataset (1 billion rows, 1 terabyte) using vaex. This system allows you to compute multi-dimensional histograms of the full dataset in seconds without having to download all the data. The GDE as a project funded by the Heising-Simons Foundation through Scialog, with PIs Joshua Peek (STScI/JHU) and Sergey Koposov (Edinburgh) and development led by Maarten Breddels.
10-introduction-vaex-gaia.ipynb
will learn you how to use vaex to connect to the remote dataset/dataframe with an API similar to Pandas. We will go through a basic workflow to show you how you can use vaex for data exploration.
20-voila-vaex-hr-diagram.ipynb
shows how to create a simple interactive dashboard showing a Hertzsprung-Russell (HR) diagram
30-voila-vaex-sky-hr-diagram.ipynb
shows how to create a fully interactive dashboard with a Sky plot and a Hertzsprung-Russell (HR) diagram
Assuming conda (otherwise see http://vaex.io/docs/installing.html), run the following commands:
$ git clone https://github.com/maartenbreddels/gde-examples
$ cd gde-examples
# assuming conda
$ conda -c conda-forge install vaex-core vaex-hdf5 vaex-viz vaex-server notebook voila
# otherwise using pip:
# $ pip install -r requirements.txt
# to run the notebook
$ jupyter notebook
# to run the dashboard
$ voila 30-voila-vaex-sky-hr-diagram.ipynb