Updated ES search functions and authentication #22

vitaliyok · 2025-07-14T11:58:26Z

Hi,

Here are proposed changes to ES search and authentications options:

Renamed and refactored the original search function
Added a function which would allow users to reuse scroll id if the the search fails
Added a function which sorting options, as recommended by ES
Added support for API key object (with encoded API key) as generated by ES
Added some more options for users to view index mapping and available indices in the search template

mart-r

There's a few minor issues I see.
Mostly to do with the removed methods having been used in other parts of the project.

Though the gist of it seems to be fine. I haven't tested it out, but I'm sure it'll work if you've been using it at GSTT.

mart-r · 2025-07-14T15:03:30Z

cogstack.py


-    def get_docs_generator(self, index: List, query: Dict, es_gen_size: int=800, request_timeout: Optional[int] = 300):


There's a bit that's using this:

https://github.com/CogStack/working_with_cogstack/blob/main/medcat/3_run_model/run_model.py#L73

The method has been removed as but can be brought back.
All of the new methods return a Pandas DataFrame but currently this function returns raw JSON which is converted to a list of tuples. It also uses ES _source object. In the new functions, I have excluded _source object from search results and only returning "fields", as recommended by Elastic. The problem is that all fields are arrays and values need to be joined in for the resulting DataFrame.
I think, it would be possible to change the implementation to use new methods but create tuples from DataFrame instead of brining the old function back.

Yes, I think that's the right approach here.

I'm not saying that the methods I've tagged need to be reimplemented. All I'm trying to do is make sure that the code that uses them (i.e in the other notebooks and/or scripts) gets updated alongside the changes to cogstack.py. I.e if someone uses the scripts we provide (after this change), they don't error out because the they are out of sync from the loaded module(s).

mart-r · 2025-07-14T15:04:46Z

cogstack.py

-            df = pd.DataFrame(temp_results)
-        return df
-
-    def DataFrame(self, index: str, columns: Optional[List[str]] = None):


There's a few bits that are using this:
https://github.com/CogStack/working_with_cogstack/blob/main/medcat/3_run_model/run_model.py#L32
https://github.com/CogStack/working_with_cogstack/blob/main/medcat/2_train_model/1_unsupervised_training/unsupervised_medcattraining.py#L28

This is using eland DataFrame which is the same as Pandas DataFrame and can be re-implemented without eland.

mart-r · 2025-07-14T15:07:06Z

cogstack.py

+            The username to use when connecting to Elasticsearch. If not provided, the user will be prompted to enter a username.
+        password : str, optional
+            The password to use when connecting to Elasticsearch. If not provided, the user will be prompted to enter a password.
+        apiKey : Dict, optional


Generaly, we want snake_case names for variables. So api_key would make more sense.

Ok. I have changed this.

mart-r · 2025-07-14T15:07:17Z

cogstack.py

-                                                       api_key=api_key,
-                                                       verify_certs=False,
-                                                       timeout=timeout)
+                 apiKey: Dict = None):


Generaly, we want snake_case names for variables. So api_key would make more sense.

mart-r · 2025-07-14T15:08:26Z

cogstack.py

-
-        if api_key and api:
-            self.elastic = elasticsearch.Elasticsearch(hosts=hosts,
-                                                       api_key=api_key,


There were a few bits that used this:
https://github.com/CogStack/working_with_cogstack/blob/main/medcat/2_train_model/1_unsupervised_training/unsupervised_medcattraining.py#L27
https://github.com/CogStack/working_with_cogstack/blob/main/medcat/3_run_model/run_model.py#L27

This can be changed to match the new parameters. I can implement this too.

vitaliyok · 2025-07-15T14:48:11Z

It looks like the current implementation is still using the old CogStack text field: "body_analysed". It should probably be renamed to "document_Content" or not use any specific field names in the code here.

mart-r · 2025-07-15T14:53:11Z

It looks like the current implementation is still using the old CogStack text field: "body_analysed". It should probably be renamed to "document_Content" or not use any specific field names in the code here.

The exepctation is generally that the user provides the correct fields they're interested in. I'm pretty sure body_analysed serves as just an example.

With that said, if there's a more relevant, up to date example, we'd be better off using that indeed.

Updated ES search functions and authentication

7e07dfc

mart-r requested changes Jul 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated ES search functions and authentication #22

Updated ES search functions and authentication #22

Uh oh!

vitaliyok commented Jul 14, 2025

Uh oh!

mart-r left a comment

Uh oh!

mart-r Jul 14, 2025

Uh oh!

vitaliyok Jul 15, 2025

Uh oh!

mart-r Jul 15, 2025

Uh oh!

mart-r Jul 14, 2025

Uh oh!

vitaliyok Jul 15, 2025

Uh oh!

mart-r Jul 14, 2025

Uh oh!

vitaliyok Jul 15, 2025

Uh oh!

mart-r Jul 14, 2025

Uh oh!

mart-r Jul 14, 2025

Uh oh!

vitaliyok Jul 15, 2025

Uh oh!

vitaliyok commented Jul 15, 2025

Uh oh!

mart-r commented Jul 15, 2025

Uh oh!

Uh oh!


		def get_docs_generator(self, index: List, query: Dict, es_gen_size: int=800, request_timeout: Optional[int] = 300):

Updated ES search functions and authentication #22

Are you sure you want to change the base?

Updated ES search functions and authentication #22

Uh oh!

Conversation

vitaliyok commented Jul 14, 2025

Uh oh!

mart-r left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vitaliyok commented Jul 15, 2025

Uh oh!

mart-r commented Jul 15, 2025

Uh oh!

Uh oh!