Skip to content

😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers

Notifications You must be signed in to change notification settings

dlab-berkeley/awesome-dlab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 

Repository files navigation

awesome-dlab

😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers

What is an awesome list?

Only put stuff on the list that you or another D-Labber can personally recommend. You should rather leave stuff out than include too much. Read the Awesome Manifesto to find out more what this list is about.

Or if you'd like to check out stuff that is awesome to people outside of D-Lab, then start here: Awesome

Contents

Datasets

  • Case.Law - all official, book-published United States case law β€” every volume designated as an official report of decisions by a court within the United States.
  • DEA Pain Pills Database - The Washington Post published a significant portion of a database that tracks the path of every opioid pain pill, from manufacturer to pharmacy, in the United States between 2006 and 2012.
  • Awesome Public Data - list of a topic-centric public data sources collected and tidied from blogs, answers, and user responses.
  • tidytweetjson - R package for Turning Tweet JSON Files into a Tidyverse-ready Dataframe. The package takes 18 minutes to turn 1 million tweets into a dataframe.
  • tidyethnicnews - R package for turning one of the largest databases on ethnic newspapers and magazines (Ethnic NewsWatch) into a tidyverse-ready dataframe. The package takes 0.0005 seconds to turn 100 newspaper articles into a tidy dataframe.
  • California COVID Assessment Tool - This repository contains an application written in Shiny and for use with any US state to assist in assessing the many different models available for understanding COVID-19 transmission and spread. It brings together several data sources that are publicly available, and can be supplemented with your own data to improve the assessment.

Natural Language Processing (NLP)

Rosetta Stones

R

  • Awesome R - more awesomeness related to this topic.

  • rio: A Swiss-Army Knife for Data I/O - Import, Export, and Convert Data Files including web-based import, reading compressed files directly without explicit decompression, and 'convert()' function for converting between file types.

  • makereproducible: R package for making a project computationally reproducible before sharing it

PDF

  • Working with PDFs in Python - Describes a range of Python libraries and and examples to work with PDFs: Reading and Splitting Pages; Adding Images and Watermarks; Inserting, Deleting, and Reordering Pages

Python

Databases

Bash

  • miller - With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, and positionally-indexed.
  • q - Run SQL directly on CSV or TSV files.
  • jq - jq is a lightweight and flexible command-line JSON processor.
  • jid - JSON Incremental Digger to drill down interactively by using filtering queries like jq.

Systems Administration

  • Ops School - Comprehensive program that will help you learn to be an operations engineer.
  • Awesome Sysadmin - more awesomeness related to this topic.

Cloud Computing

  • Binder - To turn a Git repo into a collection of interactive notebooks. A great tool for teaching workshops.

Reproducibility

  • The Turing Way handbook - a handbook to reproducible, ethical and collaborative data science.
  • MRAN Timemachine - For the purpose of reproducibility, MRAN hosts daily snapshots of the CRAN R packages and R releases as far back as Sept. 17, 2014.

About

😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published