Add code for sentiment analysis #417

MSanKeys963 · 2021-06-21T14:13:18Z

Removed class NLPHandler() and added sentiment analysis functionality in class MLHandler().

To setup a Gramex service for performing sentiment analysis, use the following configuration:

url:
  sentiment-analysis:
    pattern: /$YAMLURL/
    handler: MLHandler
    kwargs:
      backend: transformers
      task: sentiment-analysis
      xsrf_cookies: false

Getting predictions

GET sentiments of short pieces of text as follows:

curl -X GET --data-urlencode "text=This movie is so bad, it's good." http://localhost:9988/

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  }
]

Files containing text to be classified can also be POSTed to the endpoint, with _action=predict. Any file supported by gramex.cache.open will work. (Download a sample here.)

curl -X POST -F "file=@sentiment.csv" http://localhost:9988/?_action=predict

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  },
  {
    "label": "NEGATIVE",
    "score": 0.9974692463874817
  },
  // etc.
]

Measuring model performance

Files containing the text and label fields can be POSTED to the endpoint
with _action=score to get the ROC AUC score of the model against the dataset. (Download a sample dataset here).

curl -X POST -F "file=@sentiment.csv" http://localhost:9988/?_action=score

The output will be something like:

{
  "roc_auc": 0.9929
}

Training the model

The model can be trained on a dataset by setting _action=train, and POSTing the file.

curl -X POST -F "file=@sentiment_score.json" http://localhost:9988/?_action=train

The output will show the score of the trained model on the dataset:

{
  "roc_auc": 0.8
}

Multiple training options for the transformer are supported, including the number of epochs, batch size and weight decay. These can all be specified in the POST request as follows:

# Train for three epochs instead of the default 1
curl -X POST -F "file=@sentiment.csv" http://localhost:9988/?_action=train&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs:

{
  "roc_auc": 0.98
}

# Change the batch size to 32 instead of the default 16
curl -X POST -F "file=@sentiment.csv" \
	http://localhost:9988/?_action=train&per_device_train_batch_size=32&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs and a batch size of 32:

{
  "roc_auc": 0.99
}

Required for fixing gramener#377 Since v0.24, sklearn's column transformers need the same order of feature names between .fit and .predict. We can still send URL parameters in any order, but they need to be ordered correctly by the MLHandler. See sklearn's release notes for more: https://scikit-learn.org/stable/whats_new/v0.24.html#sklearn-compose

…amex into jd-transformers

…XDATA

sanand0 · 2021-06-21T15:04:28Z

Cool! @jaidevd could you please review? Do let me know when to merge

jaidevd · 2021-06-26T08:22:08Z

@MSanKeys963 The target branch has to be gramener/gramex's master branch, not the jd-transformers branch.

gramex/handlers/mlhandler.py

jaidevd · 2021-06-26T09:52:04Z

@MSanKeys963 other than these two changes, LGTM

jaidevd · 2021-07-07T08:34:38Z

@MSanKeys963 this still showing merge conflicts. Please take a look.

gramex/handlers/formhandler.py

gramex/install.py

gramex/handlers/mlhandler.py

…r.py

MSanKeys963 · 2021-07-16T20:27:12Z

@jaidevd I've fixed all the issues mentioned above. Please let me know if there's anything else.

jaidevd · 2021-07-19T15:12:02Z

Thanks, @MSanKeys963

@sanand0 This is ready for merge.

gramex/dl_utils.py

gramex/handlers/mlhandler.py

MSanKeys963 · 2021-07-23T10:32:16Z

@sanand0 I've fixed all the issues. Please check.

sanand0 · 2021-07-30T04:51:10Z

@MSanKeys963

Can you get this to work, please? sentiment.zip
Gramex should still run if PyTorch & Huggingface are not installed

For example, this is how we optionally import ElasticSearch:

def gramexlog(conf):
    try:
        from elasticsearch import Elasticsearch, helpers
    except ImportError:
        app_log.error('gramexlog: elasticsearch missing. pip install elasticsearch')
        return

…ix flake8 errors

jaidevd and others added 17 commits February 23, 2021 14:21

WIP

e1ec479

WIP: MLHandler support for Huggingface transformers

4f5b50d

WIP

52e72cb

WIP

4b01634

Ensure label IDs are long ints

1f85677

WIP

a3fa51f

Merge branch 'jd-mlhandler-featnames-order' of github.com:gramener/gr…

2322e8e

…amex into jd-transformers

WIP: MLHandler Refactoring

e6be0d5

Merge branch 'master' of github.com:gramener/gramex into jd-transformers

2a30092

ENH: Transfomers - model persistence

8317fe1

WIP

2d5b213

Merge branch 'master' of github.com:gramener/gramex into jd-transformers

168c7e1

WIP

3775939

ENH: Modify gramex.install.safe_rmtree to remove files outside $GRAME…

d446803

…XDATA

Add code for sentiment analysis and remove print statements

80b6c27

Remove print statement

13116a0

sanand0 requested a review from jaidevd June 21, 2021 15:04

MSanKeys963 closed this Jun 22, 2021

MSanKeys963 reopened this Jun 22, 2021

jaidevd requested changes Jun 26, 2021

View reviewed changes

gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved

gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved

jaidevd changed the base branch from jd-transformers to master June 30, 2021 09:54

MSanKeys963 added 2 commits July 7, 2021 02:41

Solve merge conflicts

7f40258

Add space

dac5359

jaidevd requested changes Jul 7, 2021

View reviewed changes

gramex/handlers/formhandler.py Outdated Show resolved Hide resolved

gramex/install.py Outdated Show resolved Hide resolved

Add new install.py

e9c6bb2

MSanKeys963 added 4 commits July 10, 2021 00:50

Remove space

07807c7

Remove merge conflicts test_mlhandler.py

4ee107d

Merge branch 'master' into jd-transformers

20ce370

Remove unnecessary space & lines

e4f59e0

jaidevd reviewed Jul 13, 2021

View reviewed changes

gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved

MSanKeys963 added 2 commits July 13, 2021 19:19

Restore formhandler.py to original version & Minor Change to mlhandle…

2f00a8e

…r.py

Remove unused function

e84a064

sanand0 reviewed Jul 20, 2021

View reviewed changes

gramex/dl_utils.py Outdated Show resolved Hide resolved

gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved

MSanKeys963 added 3 commits July 20, 2021 20:32

Rename dl_utils.py to dl.py

dd3ec9c

Remove dl_utils.py

d6a7132

Fix imports

2d2cfb1

MSanKeys963 added 3 commits August 6, 2021 04:17

Add changes to support new gramex.yaml config.

326b93d

Remove spaces

db0b20c

Add changes to run PyTorch and Hugging Face only when requested and f…

7c59db3

…ix flake8 errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code for sentiment analysis #417

Add code for sentiment analysis #417

MSanKeys963 commented Jun 21, 2021

sanand0 commented Jun 21, 2021

jaidevd commented Jun 26, 2021

jaidevd commented Jun 26, 2021

jaidevd commented Jul 7, 2021

MSanKeys963 commented Jul 16, 2021

jaidevd commented Jul 19, 2021

MSanKeys963 commented Jul 23, 2021

sanand0 commented Jul 30, 2021

Add code for sentiment analysis #417

Are you sure you want to change the base?

Add code for sentiment analysis #417

Conversation

MSanKeys963 commented Jun 21, 2021

Getting predictions

Measuring model performance

Training the model

sanand0 commented Jun 21, 2021

jaidevd commented Jun 26, 2021

jaidevd commented Jun 26, 2021

jaidevd commented Jul 7, 2021

MSanKeys963 commented Jul 16, 2021

jaidevd commented Jul 19, 2021

MSanKeys963 commented Jul 23, 2021

sanand0 commented Jul 30, 2021