π Awesome lists about all kinds of topics and tools interesting to D-Labbers
What is an awesome list?
Only put stuff on the list that you or another D-Labber can personally recommend. You should rather leave stuff out than include too much. Read the Awesome Manifesto to find out more what this list is about.
Or if you'd like to check out stuff that is awesome to people outside of D-Lab, then start here:
- Datasets
- Natural Language Processing (NLP)
- Rosetta Stones
- R
- Python
- Databases
- Systems Administration
- Cloud computing
- Reproducibility
- Case.Law - all official, book-published United States case law β every volume designated as an official report of decisions by a court within the United States.
- DEA Pain Pills Database - The Washington Post published a significant portion of a database that tracks the path of every opioid pain pill, from manufacturer to pharmacy, in the United States between 2006 and 2012.
- Awesome Public Data - list of a topic-centric public data sources collected and tidied from blogs, answers, and user responses.
- tidytweetjson - R package for Turning Tweet JSON Files into a Tidyverse-ready Dataframe. The package takes 18 minutes to turn 1 million tweets into a dataframe.
- tidyethnicnews - R package for turning one of the largest databases on ethnic newspapers and magazines (Ethnic NewsWatch) into a tidyverse-ready dataframe. The package takes 0.0005 seconds to turn 100 newspaper articles into a tidy dataframe.
- California COVID Assessment Tool - This repository contains an application written in Shiny and for use with any US state to assist in assessing the many different models available for understanding COVID-19 transmission and spread. It brings together several data sources that are publicly available, and can be supplemented with your own data to improve the assessment.
- Tracking Progress in Natural Language Processing - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
- Rosetta: Python, R, Stata Rosetta Stone. Projects implemented in each language side-by-side.
- Stata to Pandas Cross-Walk
- Data Science Rosetta Stone - A Tutorial of and Translation between Data Science Programming Languages
-
Awesome R - more awesomeness related to this topic.
-
rio: A Swiss-Army Knife for Data I/O - Import, Export, and Convert Data Files including web-based import, reading compressed files directly without explicit decompression, and 'convert()' function for converting between file types.
-
makereproducible: R package for making a project computationally reproducible before sharing it
- Working with PDFs in Python - Describes a range of Python libraries and and examples to work with PDFs: Reading and Splitting Pages; Adding Images and Watermarks; Inserting, Deleting, and Reordering Pages
- Awesome Python - more awesomeness related to this topic.
- SQLite - A completely embedded, full-featured relational database in a few 100k that you can include right into your project.
- sqlitebiter - a CLI tool to convert CSV / Excel / HTML / JSON / and many other formats to a SQLite database file.
- Awesome SQL - more awesomeness related to this topic.
- fuzzy string matching with Postgresql - examples of different ways to match strings using PostgreSQL and extensions.
- binder-postgres - Demo of launching a binderhub notebook server with a free running Postgres server.
- SQL Join Types Explained in Visuals - Simple, useful visual expalanation of joins in SQL.
- Understanding Joins in Relational data | R for Data Science - Visual expalanation of joins in SQL with the addition of R code and variables.
- miller - With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, and positionally-indexed.
- q - Run SQL directly on CSV or TSV files.
- jq - jq is a lightweight and flexible command-line JSON processor.
- jid - JSON Incremental Digger to drill down interactively by using filtering queries like jq.
- Ops School - Comprehensive program that will help you learn to be an operations engineer.
- Awesome Sysadmin - more awesomeness related to this topic.
- Binder - To turn a Git repo into a collection of interactive notebooks. A great tool for teaching workshops.
- The Turing Way handbook - a handbook to reproducible, ethical and collaborative data science.
- MRAN Timemachine - For the purpose of reproducibility, MRAN hosts daily snapshots of the CRAN R packages and R releases as far back as Sept. 17, 2014.