PR/image analysis #323

YetAnotherModder · 2024-07-08T04:20:25Z

This PR makes multiple changes to Mantella to allow image analysis.
The aim was to allow two things :

Direct analyses of image through a LLM ( for now I only had success with ChatGPT4-o) where the image is sent alongside the whole prompt.
Two step iterative queries where an image analysis LLM is used then another LLM (or the same one if desired) is used to process the classic Mantella prompt. This allows to avoid relying on ChatGPt4-o or it allows to only rely on it for image analysis and not the actual NPC responses.

Key points:

Adds a requirement for a new IMAGE_GPT_SECRET_KEY.txt that should hold a key to the image analysis service (may be left empty for local analysis but the file has to be there).
Adds new variables to configloader and definitions to allow to set file path for screenshots (need to be game root directory for SK and FO4 desktop and steam directory for SK and FO4 VR), it also allows to choose image LLM models, service (openAI, openrouter, local) and specs (temperature, etc).
New prompts definitions are added for image prompts both direct and iterative (two steps)
Conversation.py now creates a image manager at the start of the conversation and ask it to process_image_analysis if the conversation isn't over. If the conversation is over it will attempt to clean all the files in it's deletion array.
New abstract methods for the gameable Skyrim and FO4 to get the the filepath for the images.
Mantellaroute.py now loads a secret_key_file for the image manager and generates an image client will all the config specs at the start of Mantella (image client is only used in the iterative two steps method of image analysis)
OpenAI client was refactored to accomodate a new class : image_client that class stores all the info for the image LLM use in the iterative two step method and has it's own specific method for steaming calls : streaming_one_message_call()
Outputmanager.py has a new property 'generated_simple_result' that store the feedback of the image_LLM as well as three new methods to make one message calls to the image LLM : generate_simple_response(), process_simple_response(), run_async(). Also a new method get_image_filepath is used to pull the appropriate filepath from the gameable.
Message_thread.py has a couple new method to check, delete or replace messages of a specific type. Image placement in the message_thread is very finicky from my experience. It's better to have it as close to the end of the message_thread as possible and only have one image or image description present to make sure the LLM is taking it into account.
Messages.py has two new subclasses : image_message that contains image only messages and image_description_message that contains the text received from the image LLM in the two step method.
New image manager script, see below

The new image manager.py handles multiple tasks:

Runs process_image_analysis() : Main workhorse of the image_manager, this checks the context values to see if a screenshot is ready to analyze and will attempt to process it according to the option selected in iterative query in the config. If iterative query is disabled it will attempt to send the image directly to the LLM along with the rest of the prompt. If iterative query is enabled it will instead send a request to the image_client and once it gets the response back it will create a new image description message in the message_thread. If no image is ready it removes all encoded images and image descriptions from the message thread to avoid overloading the LLM. It also insures that there's only one image or only one image description present at any time in the message_thread to avoid overloading the LLM with info (from my testing event ChatGPT4-o gets confused when multiple images are in the thread).
The creation of image messages (contain the base64 encoded images)
The creation of image description messages (contain the feedback from the image LLM to send to the default LLM in the iterative two steps method).
Handles the tracking of the Steam screenshots (VR only)
Handles he encoding of the image to base64
Finds the most recent appropriate one within the last two minutes of gameplay (VR only)
It resizes of the image according to the game's feedback
Once the conversation is done it will remove the analyzed screenshots from the Steam folder (VR only), The steam folder management for screenshots is for VR only, for desktop the screenshots are stored in the game folder in a much more straightforward manner and are not deleted instead it's the same one that keeps being overwritten as gameplay progresses.

Potential future upgrades :

Cell scan for NPC lists, distance checks and cell name to give hints to the LLM to the location and actors where the screenshot is taken
Make the file deletion optional for screenshots in Steam
For Skyrim VR : Take the screenshot directly without having the user have to activate the screenshot using the steam overlay hotkey. This would require SKSE.
For Fallout 4 VR, not having to rely on the SUP_F4SE Steam overlay functions for screenshots. This would require making our own in F4SE.
Modify the prompts and message again so that it takes into account the player name according to the config.
Prompts could be reworked a bit too to help the LLM analyse more clearly
I found that asking some LLMs too much image analysis makes them fall into "tour guide mode" where's that's all they care about. Maybe there's a way to autocorrect that issue.

Add additional IMAGE_GPT_SECRET_KEY.txt for separate LLM access Add new Image_Manager.py to handle image generation, management and removal after use. Add new image LLM config values to config loader Add new image LLM definitions file Add new image LLM prompt definitions Add new config values definitions for image LLM functionality Modify conversation.py to generate and image_manager upon startup and use it to handle image analysis calls. Modify gameable, Skyrim and Fallout4.py to add a get_image_filepath() Add image_client_instance creation to mantella_route.py Add new functions to message_thread.py to delete, replace and add specific message types. Add new message type handling for image_message & image_description_message. Add two new classes to messages.py : image_message & image_description_message Modified openai_client.py to add a new class image_client. Add new method : streaming_one_message_call. Refactored the openai_client class to avoid redundancies when creating the subclass. Modified output_manager.py with three new methods generate_simple_response(), process_simple_response(), run_async() & get_image_filepath() to allow iterative image generation

-Fix __config initialization in Skyrim.py -Fix operator use in image_manager.py

Misc fixes

- Improve Image LLM definitions - Refactor messages.py and image_manager.py to make image_manager.py in charge of encoding. -Add functionality for image resizing according to in game variables to image_manager.py -Made the error messages a bit less obnoxious -Refactor method process_image_analysis() so it doesn't crash in case of absent images and add error messages. -Change find_most_recent_jpg() so that it excludes files ending in vr and also that it can't return the same filepath more than once in the same conversation by comparing to the deletion array. -Change delete_images_from_file() so that it checks for images ending with _vr as well for a more complete cleanup.

Updated image prompt definitions and manager to take into account the {game} variable

Add additional instructional comments

Add pillow as a requirement (for image resizing)

New config option to allow the user to select if the Steam screenshots are deleted after Mantella use or not Update definition for Skyrim to indicate the two possible filepaths (SUP_SKSE or Steam) New methods in imagemanager to allow to process hints and manage variable delay for Steam screenshot analysis. Minor bugfix to attempt_to_add_most_recent_image_to_deletion_array () to avoid errors when self.KEY_CONTEXT_CUSTOMVALUES_VISION_READY is empty. Refactored create_message_from_image() to account for non-VR Steam screenshots. Refactored process_image_analysis() to add hints support

gitignore : Updated to account for other secret keys communication_constants.py : Updated to allow compatibility with recent versions of Mantella image_llm_definitions.py : Changed definition and default value for FO4 VR since screenshots are now taken in-engine image_manager.py : attempt_to_add_most_recent_image_to_deletion_array() modified to account for Steam screenshot being enable instead of only VR being enabled due to FO4VR having in-engine screenshot capabilities resize_image() modified to take into account images with alpha channels (e.g. png) image_to_base64() modified to convert RGBA to RBG create_message_from_image() modified to take into account the customcontext variable KEY_CONTEXT_CUSTOMVALUES_VISION_ISUSINGSTEAMSCREENSHOT instead of relying on VR to select if the Steam screenshot folder will be parsed with : find_most_recent_jpg(). The function is also modified to look for png files instead of just jpgs (due to limitation with the in-engine screenshots of Skyrim)

YetAnotherModder and others added 24 commits July 1, 2024 00:49

Misc fixes to skyrim.py and image_manager.py

0100adb

-Fix __config initialization in Skyrim.py -Fix operator use in image_manager.py

Updated Image LLM definition desdription

cbc163c

Misc fixes

Allow dynamic game usage for image prompts

f845a70

Updated image prompt definitions and manager to take into account the {game} variable

Update image_manager.py

76ee4de

Add additional instructional comments

Update requirements.txt

83bc59b

Add pillow as a requirement (for image resizing)

Minor fix : remove useless print call

076eb2a

Merge branch 'main' into pr/YetAnotherModder/323

8bced2f

Merge branch 'main' into pr/YetAnotherModder/323

aa8627f

Swapped secret key order

1c0bd5a

Fixed reset button and added vision model list

b254686

Refactored OpenAI client code into abstract classes

8ffe527

Cleaned up gameable logic to check if VR

465d612

Fixed error message placement in UI

7341a9d

Moved vision prompt to Prompts tab

0021983

Added in game screenshot option to Vision

472b9c9

Added vision hints as in game events

7564ada

Added logic to use in game screenshots

8cc7162

Switched to config game path

42c40c0

Removed unused code

a96a770

Fixed class description

d40e2c1

art-from-the-machine approved these changes Jan 7, 2025

View reviewed changes

art-from-the-machine merged commit 4975400 into art-from-the-machine:main Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR/image analysis #323

PR/image analysis #323

YetAnotherModder commented Jul 8, 2024

PR/image analysis #323

PR/image analysis #323

Conversation

YetAnotherModder commented Jul 8, 2024

Key points:

Potential future upgrades :