-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataset, FHIR stores and annotations stores and related resources #71
Conversation
|
"Text" Implies a written source , https://www.dictionary.com/browse/text . Wouldn't it be more generic to call it "data" https://www.dictionary.com/browse/data? if we really need to change this? Also gets us into the arena of what is the character encoding supported for this data ? ASCII , UTF-8 , UTF-16 , etc. You mention this naming conversion is "The second motivation is for the date annotator to be used with texts that are not clinical notes" , but yet then go on to add LOINC codes ( a medical Ontology ) as the type of note ? Will these types support other non-medical data ? As for the created_by and created_at fields . One may not see use for them as a tool developer , but would be thankful to have them when your looking downstream at a final result that does not make sense and then need to track the data back to the source system. We get many notes from various systems and pipelines into a Data Warehouse that are constantly being created and updated . They all have various note id's that overlap and conflict. Only want to keep track of this is a "created_by" name for the system / it came from and created_date for when it was last updated. |
@gkowalski Here is feedback on the first part of your questions.
I don't see the need to go down to "data" as so far we only consider "text". Data could be useful the day we support DICOM image, for example if one day we decide to add the deid of DICOM image metadata. Stopping at text for now seems reasonable.
Adding an encoding property to the Text schema is something that I can definitively agree with.
I included this example in the spec: "loinc:LP29684-5". The idea behind this formatting is that it would allow us to support non-medical notes by replacing the prefix "loinc:". |
Using Google Healthcare API as reference: |
Issue regarding annotation and noteIdBackgroundThe current Annotation schema has a property IssueWe are moving away from giving an array as input to the annotator. Therefore, the is no longer the need for the annotator to link the output annotation to the note. Instead, it is the responsibility of the user to link the note sent to the annotator and the annotations received from it. Most importantly, we could not trust the value of Annotation.noteId set by the annotator. While I don't see how an incorrect |
Get all annotations in a storeCurrently I defined the spec to returned an array of object that can be a mix of Annotation objects and DateAnnotation object. The way I do it seems correct according to this comment, however the HTML doc does not seem to be able to render this yet. If this is really an issue, we could define specific endpoints to get only annotation of a given type. Also it will be interesting to see how the openapi-generator handles this spec. EDIT: The above is actually incorrect. The HTML documentation generated by redocly-cli does actually show that the output can be of mixed types. |
…dbox-schemas into add-text-schema
Health endpoint was deleted from commons in PR nlpsandbox#71
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Contributions of this PR
Summary
New Date Node API
The new API is inspired from Google Healthcare API that organizes data into three hierarchical levels: Project, Dataset, and Store. The Date Node API skips the Project level for simplicity and only keeps Dataset and Stores.
The paragraphs below describe how to push and get data, namely notes and annotations, using the Date Node API. In order to better understand the steps, we recommend that you clone the content of this PR and start the server used to preview the OpenAPI specification in HTML format. Once you have cloned this repo:
Create a Dataset
First, you need to create a dataset that has a unique name in the entire data node.
http://localhost:8080/#operation/createDataset
Add a NoteStore to the Dataset
The next step is to add a data store to the dataset. Two types of stores are available: NoteStore and AnnotationStore.
Let's start by adding a NoteStore to our Dataset using the endpoint below. This endpoint takes one path parameter named
datasetId
. The value of this parameter must be set with the name of the dataset to use. The payload contains one property namedname
, which is the name of the store. The name of a store must be unique for a given dataset.http://localhost:8080/#operation/createNoteStore
Add Notes to the NoteStore
Notes are added to a NoteStore using the endpoint below. This endpoints takes two path parameters:
datasetId
andstoreId
that take the name of the dataset and note store to use. The payload is a Note object with two properties:noteId
(unique in a given note store) andtext
.http://localhost:8080/#operation/createNote
Get notes from the NoteStore
You can retrieve the notes stored in the NoteStore with this endpoint. The results returned are paginated.
http://localhost:8080/#operation/listNotes
Push and pull annotations
The Data Node API currently has a second type of store: DataStore. So far two types of annotations can be pushed to an instance of this store: Annotation and DateAnnotation (extends Annotation).
Add an AnnotationStore to your dataset.
http://localhost:8080/#operation/createAnnotationStore
Add Annotation or DateAnnotation to this store.
http://localhost:8080/#operation/createAnnotation
Get all annotations (filters will be added soon). The results returned are paginated.
http://localhost:8080/#operation/listAnnotations
Date Annotator API Update
TODO: Decide what to do with Annotation.
noteId
. RenametextId
? :/Initial PR description
This is an attempt to simplify the note schema and add a level of abstraction to target one day text that are not clinical notes.
First, this PR proposes to add a Text schema that has one string property called
text
. The idea is to pass to the date annotator a text object rather than a Note object. The motivation is that the Note object does not provide additional useful information that can help the date annotator in its task as of now. The second motivation is for the date annotator to be used with texts that are not clinical notes. Hence, the input to the date annotator becomes very similar to the input of Amazon Comprehend deid service.Source
Another proposal is to temporarily at least get ride of the fields like "createdBy", "createdAt", "updatedBy", etc. These fields are meaningful for storing the record in a DB, but there is currently no need to expose this information to the user of most API. The data node should ideally continue to store "createdBy" value in the DB but API does not need to document this at this point. This will be revisited later.
Once again, the motivation is to simplify the API spec of the NLP tool. In the API documentation, it is more friendly to ask the input:
Rather than the current large Note object that comes with "createdBy" etc. that are not useful to the tool developer.
Regarding the type of clinical note, I proposed to use the LOINC concept used by FHIR mentioned in this page:
FHIR Clinical Notes Guidance
Finally, one proposed change is give as input to the annotators ONE text or note instead of an array. Here are the motivations:
requestBody
that is an array in the OpenAPI doc, and 2) theopenapi-generator
uses the singular form for the name of the input parameter. For example, the parameter is namednote
in this function of the data annotator in both the Python and java implementation, while we currently treat it as an array.Python: https://github.com/Sage-Bionetworks/nlp-sandbox-date-annotator-example/blob/develop/server/openapi_server/controllers/date_controller.py#L9
Java: https://github.com/nlpsandbox/date-annotator-example-java/blob/develop/server/src/main/java/org/openapitools/api/DatesApi.java#L60
If this PR is accepted