Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR/image analysis #323

Conversation

YetAnotherModder
Copy link
Contributor

This PR makes multiple changes to Mantella to allow image analysis.
The aim was to allow two things :

  1. Direct analyses of image through a LLM ( for now I only had success with ChatGPT4-o) where the image is sent alongside the whole prompt.
  2. Two step iterative queries where an image analysis LLM is used then another LLM (or the same one if desired) is used to process the classic Mantella prompt. This allows to avoid relying on ChatGPt4-o or it allows to only rely on it for image analysis and not the actual NPC responses.

Key points:

  1. Adds a requirement for a new IMAGE_GPT_SECRET_KEY.txt that should hold a key to the image analysis service (may be left empty for local analysis but the file has to be there).
  2. Adds new variables to configloader and definitions to allow to set file path for screenshots (need to be game root directory for SK and FO4 desktop and steam directory for SK and FO4 VR), it also allows to choose image LLM models, service (openAI, openrouter, local) and specs (temperature, etc).
  3. New prompts definitions are added for image prompts both direct and iterative (two steps)
  4. Conversation.py now creates a image manager at the start of the conversation and ask it to process_image_analysis if the conversation isn't over. If the conversation is over it will attempt to clean all the files in it's deletion array.
  5. New abstract methods for the gameable Skyrim and FO4 to get the the filepath for the images.
  6. Mantellaroute.py now loads a secret_key_file for the image manager and generates an image client will all the config specs at the start of Mantella (image client is only used in the iterative two steps method of image analysis)
  7. OpenAI client was refactored to accomodate a new class : image_client that class stores all the info for the image LLM use in the iterative two step method and has it's own specific method for steaming calls : streaming_one_message_call()
  8. Outputmanager.py has a new property 'generated_simple_result' that store the feedback of the image_LLM as well as three new methods to make one message calls to the image LLM : generate_simple_response(), process_simple_response(), run_async(). Also a new method get_image_filepath is used to pull the appropriate filepath from the gameable.
  9. Message_thread.py has a couple new method to check, delete or replace messages of a specific type. Image placement in the message_thread is very finicky from my experience. It's better to have it as close to the end of the message_thread as possible and only have one image or image description present to make sure the LLM is taking it into account.
  10. Messages.py has two new subclasses : image_message that contains image only messages and image_description_message that contains the text received from the image LLM in the two step method.
  11. New image manager script, see below

The new image manager.py handles multiple tasks:

  1. Runs process_image_analysis() : Main workhorse of the image_manager, this checks the context values to see if a screenshot is ready to analyze and will attempt to process it according to the option selected in iterative query in the config. If iterative query is disabled it will attempt to send the image directly to the LLM along with the rest of the prompt. If iterative query is enabled it will instead send a request to the image_client and once it gets the response back it will create a new image description message in the message_thread. If no image is ready it removes all encoded images and image descriptions from the message thread to avoid overloading the LLM. It also insures that there's only one image or only one image description present at any time in the message_thread to avoid overloading the LLM with info (from my testing event ChatGPT4-o gets confused when multiple images are in the thread).
  2. The creation of image messages (contain the base64 encoded images)
  3. The creation of image description messages (contain the feedback from the image LLM to send to the default LLM in the iterative two steps method).
  4. Handles the tracking of the Steam screenshots (VR only)
  5. Handles he encoding of the image to base64
  6. Finds the most recent appropriate one within the last two minutes of gameplay (VR only)
  7. It resizes of the image according to the game's feedback
  8. Once the conversation is done it will remove the analyzed screenshots from the Steam folder (VR only), The steam folder management for screenshots is for VR only, for desktop the screenshots are stored in the game folder in a much more straightforward manner and are not deleted instead it's the same one that keeps being overwritten as gameplay progresses.

Potential future upgrades :

  1. Cell scan for NPC lists, distance checks and cell name to give hints to the LLM to the location and actors where the screenshot is taken
  2. Make the file deletion optional for screenshots in Steam
  3. For Skyrim VR : Take the screenshot directly without having the user have to activate the screenshot using the steam overlay hotkey. This would require SKSE.
  4. For Fallout 4 VR, not having to rely on the SUP_F4SE Steam overlay functions for screenshots. This would require making our own in F4SE.
  5. Modify the prompts and message again so that it takes into account the player name according to the config.
  6. Prompts could be reworked a bit too to help the LLM analyse more clearly
  7. I found that asking some LLMs too much image analysis makes them fall into "tour guide mode" where's that's all they care about. Maybe there's a way to autocorrect that issue.

YetAnotherModder and others added 24 commits July 1, 2024 00:49
Add additional IMAGE_GPT_SECRET_KEY.txt for separate LLM access
Add new Image_Manager.py to handle image generation, management and removal after use.
Add new image LLM config values to config loader
Add new image LLM definitions file
Add new image LLM prompt definitions
Add new config values definitions for image LLM functionality
Modify conversation.py to generate and image_manager upon startup and use it to handle image analysis calls.
Modify gameable, Skyrim and Fallout4.py to add a get_image_filepath()
Add image_client_instance creation to mantella_route.py
Add new functions to message_thread.py to delete, replace and add specific message types. Add new message type handling for image_message & image_description_message.
Add two new classes to messages.py : image_message & image_description_message
Modified openai_client.py to add a new class image_client. Add new method : streaming_one_message_call. Refactored the openai_client class to avoid redundancies when creating the subclass.
Modified output_manager.py with three new methods generate_simple_response(), process_simple_response(), run_async() & get_image_filepath() to allow iterative image generation
-Fix __config initialization in Skyrim.py
-Fix operator use in image_manager.py
- Improve Image LLM definitions
- Refactor messages.py and image_manager.py to make image_manager.py in charge of encoding.
-Add functionality for image resizing according to in game variables to image_manager.py
-Made the error messages a bit less obnoxious
-Refactor method process_image_analysis() so it doesn't crash in case of absent images and add error messages.
-Change find_most_recent_jpg() so that it excludes files ending in vr and also that it can't return the same filepath more than once in the same conversation by comparing to the deletion array.
-Change delete_images_from_file() so that it checks for images ending with _vr as well for a more complete cleanup.
Updated image prompt definitions and manager to take into account the {game} variable
Add additional instructional comments
Add pillow as a requirement (for image resizing)
New config option to allow the user to select if the Steam screenshots are deleted after Mantella use or not
Update definition for Skyrim to indicate the two possible filepaths (SUP_SKSE or Steam)
New methods in imagemanager to allow to process hints and manage variable delay for Steam screenshot analysis.
Minor bugfix to attempt_to_add_most_recent_image_to_deletion_array () to avoid errors when self.KEY_CONTEXT_CUSTOMVALUES_VISION_READY is empty.
Refactored create_message_from_image() to account for non-VR Steam screenshots.
Refactored process_image_analysis() to add hints support
gitignore : Updated to account for other secret keys
communication_constants.py : Updated to allow compatibility with recent versions of Mantella
image_llm_definitions.py : Changed definition and default value for FO4 VR since screenshots are now taken in-engine
image_manager.py :
	attempt_to_add_most_recent_image_to_deletion_array() modified to account for Steam screenshot being enable instead of only VR being enabled due to FO4VR having in-engine screenshot capabilities
	resize_image() modified to take into account images with alpha channels (e.g. png)
	image_to_base64() modified to convert RGBA to RBG
	create_message_from_image() modified to take into account the customcontext variable KEY_CONTEXT_CUSTOMVALUES_VISION_ISUSINGSTEAMSCREENSHOT instead of relying on VR to select if the Steam screenshot folder will be parsed with : find_most_recent_jpg(). The function is also modified to look for png files instead of just jpgs (due to limitation with the in-engine screenshots of Skyrim)
@art-from-the-machine art-from-the-machine merged commit 4975400 into art-from-the-machine:main Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants