The GraphGrid Python SDK is a python-based software development kit that can be used to programmatically interact with GraphGrid services.
Currently, its primary purpose is to provide a flexible way to train NLP models on a variety of tasks.
This README covers setting up the SDK object and method overview. For further documentation and tutorials please visit https://docs.graphgrid.com/ to learn about the GraphGrid SDK and the GraphGrid CDP platform.
The first step in using the SDK is setting up the GraphGridSdk
python object.
# Setup bootstrap config
bootstrap_conf = SdkBootstrapConfig(
access_key='a3847750f486bd931de26c6e683b1dc4',
secret_key='81a62cea53883f4a163a96355d47656e',
url_base='localhost',
is_docker_context=False)
# Initialize the SDK
sdk = GraphGridSdk(bootstrap_conf)
You create a SdkBootstrapConfig
object that provides the basic configuration the SDK needs.
This example uses the default access_key
and secret_key
associated with GraphGrid CDP.
You can initialize your GraphGridSdk
object with that configuration and begin using the SDK.
For details on usage please see the docs on GraphGrid SDK Usage.
There are currently seven SDK methods available for use:
Method | Description |
---|---|
nmt_train | Kick off training job |
nmt_status | Status and results of a training job |
job_run | Kick off a custom job |
job_status | Status of a custom job |
save_dataset | Save a dataset for training |
promote_model | Promote an NLP model, swapping it in for use |
nmt_train_pipeline | Kick off NLP model training pipeline |
The nmt_train
and nmt_status
methods are provided to trigger, monitor, and retrieve results from a nlp-model-training
job run.
In contrast, the methods job_run
and job_status
are provided to trigger and monitor custom jobs.
The nmt_train_pipeline
method is specifically for kicking off NLP model training pipeline, it runs training jobs, monitors them, and can promote the newly trained models.
For details on specific methods please see the docs on GraphGrid SDK Method Reference.
The TrainRequestBody
is necessary for kicking of NLP model training via the nmt_train
and nmt_train_pipeline
SDK methods.
TrainRequestBody
has the following attributes:
Attribute | Type | Description | Required |
---|---|---|---|
model | NlpModel | the type of model to train | True |
dataset_id | str | id of dataset for training | True |
no_cache | bool | flag for whether caching should be disabled | False (defaults to False) |
gpu | bool | flag for whether gpu should be used for training | False (defaults to False) |
Below is an example for how defining a TrainRequestBody
might look:
request_body = TrainRequestBody(model=NlpModel.NAMED_ENTITY_RECOGNITION,
dataset_id="9tb98wJhuQCoPSJEDKys3WRfrUfpp3tkFpAYexGVMzGc",
no_cache=False,
gpu=True)
The value of dataset_id
can be retrieved from the response of a successful call of the save_dataset
SDK method.
If the dataset has already been saved, the dataset_id
can also be found as an attribute of the node representing that dataset within the graph.