Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code for sentiment analysis #417

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

MSanKeys963
Copy link

Removed class NLPHandler() and added sentiment analysis functionality in class MLHandler().

To setup a Gramex service for performing sentiment analysis, use the following configuration:

url:
  sentiment-analysis:
    pattern: /$YAMLURL/
    handler: MLHandler
    kwargs:
      backend: transformers
      task: sentiment-analysis
      xsrf_cookies: false

Getting predictions

GET sentiments of short pieces of text as follows:

curl -X GET --data-urlencode "text=This movie is so bad, it's good." http://localhost:9988/

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  }
]

Files containing text to be classified can also be POSTed to the endpoint, with _action=predict. Any file supported by gramex.cache.open will work. (Download a sample here.)

curl -X POST -F "file=@sentiment.csv" http://localhost:9988/?_action=predict

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  },
  {
    "label": "NEGATIVE",
    "score": 0.9974692463874817
  },
  // etc.
]

Measuring model performance

Files containing the text and label fields can be POSTED to the endpoint
with _action=score to get the ROC AUC score of the model against the dataset. (Download a sample dataset here).

curl -X POST -F "file=@sentiment.csv" http://localhost:9988/?_action=score

The output will be something like:

{
  "roc_auc": 0.9929
}

Training the model

The model can be trained on a dataset by setting _action=train, and POSTing the file.

curl -X POST -F "file=@sentiment_score.json" http://localhost:9988/?_action=train

The output will show the score of the trained model on the dataset:

{
  "roc_auc": 0.8
}

Multiple training options for the transformer are supported, including the number of epochs, batch size and weight decay. These can all be specified in the POST request as follows:

# Train for three epochs instead of the default 1
curl -X POST -F "file=@sentiment.csv" http://localhost:9988/?_action=train&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs:

{
  "roc_auc": 0.98
}
# Change the batch size to 32 instead of the default 16
curl -X POST -F "file=@sentiment.csv" \
	http://localhost:9988/?_action=train&per_device_train_batch_size=32&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs and a batch size of 32:

{
  "roc_auc": 0.99
}

@sanand0
Copy link
Contributor

sanand0 commented Jun 21, 2021

Cool! @jaidevd could you please review? Do let me know when to merge

@sanand0 sanand0 requested a review from jaidevd June 21, 2021 15:04
@MSanKeys963 MSanKeys963 reopened this Jun 22, 2021
@jaidevd
Copy link
Member

jaidevd commented Jun 26, 2021

@MSanKeys963 The target branch has to be gramener/gramex's master branch, not the jd-transformers branch.

gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved
gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved
@jaidevd
Copy link
Member

jaidevd commented Jun 26, 2021

@MSanKeys963 other than these two changes, LGTM

@jaidevd jaidevd changed the base branch from jd-transformers to master June 30, 2021 09:54
@jaidevd
Copy link
Member

jaidevd commented Jul 7, 2021

@MSanKeys963 this still showing merge conflicts. Please take a look.

gramex/handlers/formhandler.py Outdated Show resolved Hide resolved
gramex/install.py Outdated Show resolved Hide resolved
@MSanKeys963
Copy link
Author

@jaidevd I've fixed all the issues mentioned above. Please let me know if there's anything else.

@jaidevd
Copy link
Member

jaidevd commented Jul 19, 2021

Thanks, @MSanKeys963

@sanand0 This is ready for merge.

gramex/dl_utils.py Outdated Show resolved Hide resolved
gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved
@MSanKeys963
Copy link
Author

@sanand0 I've fixed all the issues. Please check.

@sanand0
Copy link
Contributor

sanand0 commented Jul 30, 2021

@MSanKeys963

  • Can you get this to work, please? sentiment.zip
  • Gramex should still run if PyTorch & Huggingface are not installed

For example, this is how we optionally import ElasticSearch:

def gramexlog(conf):
    try:
        from elasticsearch import Elasticsearch, helpers
    except ImportError:
        app_log.error('gramexlog: elasticsearch missing. pip install elasticsearch')
        return

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants