Skip to content

This is a JAVA based spreadsheet system through the modification of OpenRefine for automated data cleaning

License

Notifications You must be signed in to change notification settings

Joannechiao18/DataRefine

DataRefine

Abstract

Introduction

DataRefine is a Java-based spreadsheet system that allows you to load data, understand it, clean it up, reconcile it, and augment it with data coming from the web all from a web browser.

Features

  1. Outlier detection facet using nearest-neighbor (NN)-based interquantile range (IQR) for numeric data, e.g. time series, image metadata
  2. Semantic facet via the inference API of the pre-trained BERT model, e.g. people's name, stock, book title, streets
  3. Type recommendation results sorting for non-numeric data
  4. UI Renovation

Visual Results

🔨 Getting Started

  1. Clone this github repo.
git clone https://github.com/Joannechiao18/DataRefine.git
  1. Install JDK 8, Apache Maven, and Eclipse.
  2. Import the cloned project into Eclipse. (Remember to uncheck extensions and packaging on the import window).
  3. Run configuration and set the base directory to ${workspace_loc:/openrefine}; Goals to exec:java.
  4. Click Run, and DataRefine will run at local host http://127.0.0.1:3333/.

Acknowledgement

Expand https://github.com/OpenRefine/OpenRefine

Credits

This software was created by Metaweb Technologies, Inc. and originally written and conceived by David Huynh dfhuynh@google.com. Metaweb Technologies, Inc. was acquired by Google, Inc. in July 2010 and the product was renamed Google Refine. In October 2012, it was renamed OpenRefine as it transitioned to a community-supported product.

About

This is a JAVA based spreadsheet system through the modification of OpenRefine for automated data cleaning

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published