Skip to content

Latest commit

 

History

History
79 lines (61 loc) · 5.58 KB

README.md

File metadata and controls

79 lines (61 loc) · 5.58 KB

Open-source natural language enrichments at your fingertips.

Apache 2.0 License GitHub Discussions Discord Twitter LinkedIn YouTube Roadmap Playground Website

Browse bricks to find gold nuggets for your projects; enrich your texts e.g. with sentence complexity estimations, sentiment analysis, and more.

Table of contents

Why bricks?

We're aiming to build a library of off-the-shelf natural language enrichments that can be used in any project as well as directly in our main project refinery. We're building bricks to make it easier for developers to build better products. That's where the name comes from. bricks is a library not in the sense that you pip install it in your repository, but that you can copy-paste the code from the online platform.

Demo

Demo Click on the image or here to watch the demo.

What are classifiers, extractors and generators?

We generally summarize them as modules in this repository.

  • classifiers are modules that summarize a given text into a specific category. For example, a module that classifies a text into the category news or blog would go into this folder. It can also be about enrichments, e.g. to detect languages and such.
  • extractors are modules that retrieve specific information from a given text. For example, a module that extracts the author of a text would go into this folder.
  • generators create new content based on a given text, or filtersets for refinery with pre-defined content. For example, a module that translates one language into another language would be a generator.

Structure of modules

Each module has a folder with the following structure:

  • __init__.py: if the module can be executed as a script, this file contains the entry point.
  • README.md: a description of the module, which is displayed on the platform on the detail page of the module.
  • code_snippet_refinery.md: the displayed code snippet based on a SpaCy input. This is showed on the detail page of the module.
  • code_snippet_common.md: the displayed code snippet for any Python env on the detail page of the module. This is showed on the detail page of the module.
  • config.py: a config script to synchronize this repository with the online platform.

If you want to add a new module, please look into our contributing guidelines.

Getting started

You can access the modules of this repository in bricks. If you want to host the modules yourself, you can do so by following the steps below.

  1. Clone this repository
  2. (optional) Create a virtual environment
  3. Install the dependencies (pip install -r requirements.txt)
  4. Run the FastAPI server (uvicorn api:api)
  5. Go to http://localhost:8000/docs to see the documentation

Contributing

Modules added in this repository are added to the online platform by us continuously. If you want to add your own module, please follow the contribution guidelines. If you have any questions, please reach out to us anytime on Discord.

If the content of this repository is helpful, please leave a star ⭐️. Also, make sure to check out refinery.

refinery

Check out our main product refinery, which is another open-source project helping you to scale, assess and maintain your training data. You can use the modules from bricks right away in refinery.

Regular updates and newsletter

We regularly update bricks with new modules (we aim to add two modules per week, if not more). If you want to stay up to date, you can subscribe to our newsletter.

License

This repository is licensed under the Apache License, Version 2.0. View a copy of the License file.