refactor: Support image_list, document_list while using swagger api #2300

shaohuzhang1 · 2025-02-17T08:06:15Z

refactor: Support image_list, document_list while using swagger api --story=1017867 --user=刘瑞斌对话API支持上传文件、语音、图像和视频，用于实现maxkb的文件解析及多模态对话#2228 https://www.tapd.cn/57709429/s/1654842

shaohuzhang1 · 2025-02-17T08:06:37Z

apps/application/serializers/chat_message_serializers.py

+                'document_list': instance.get('document_list', []),
+                'audio_list': instance.get('audio_list', []),
+            }
+        ).chat(base_to_response=OpenaiToResponse())


 class ChatMessageSerializer(serializers.Serializer):


The provided code has several potential issues and areas for improvement:

Data Structure Changes:
The instance.get('form_data', {}) is being used multiple times, but changes may have occurred to this property over time (e.g., addition of new keys like 'image_list'). Consider using self.data instead since it reflects the final serialized data that includes all fields.

Code Duplication:
The last three calls to instance.get are repeated. Use a list comprehension to avoid duplication if there are more similar operations planned.

Variable Naming:
Using short variable names (with_valid, chat_id, etc.) can make the code harder to read. Consider renaming them to meaningful descriptions, such as include_validation, generated_chat_id.

Optional Parameters:
Instead of explicitly setting the values to an empty dictionary when they aren't present ({} for k in ['image_list', 'document_list', 'audio_list']), consider checking for existence before adding each value to the serializer's data dictionary.

Here's an optimized version of the function with these improvements:

@@ -222,19 +228,26 @@ def chat(self, instance: Dict, include_validation=True): client_type = self.data.get('client_type') generated_chat_id = self.generate_chat(chat_id, application_id, message, client_id) - return ChatMessageSerializer(data={'chat_id': generated_chat_id, 'message': message, + formatted_data = { + 'chat_id': generated_chat_id, + 'message': message, 're_chat': re_chat, 'stream': stream, 'application_id': application_id, - 'client_id': client_id, - 'client_type': client_type, - 'form_data': instance.get('form_data', {})} - -).chat(base_to_response=OpenaiToResponse()) + + # Include optional lists from instance + *optional_lists, form_data = [ + instance[k] for k in ['image_list', 'document_list', 'audio_list'] +] + [None] + + return ChatMessageSerializer( + data={ + **formatted_data, + 'image_list': optional_lists[0], + 'document_list': optional_lists[1], + 'audio_list': optional_lists[2], + 'form_data': form_data or {} + } + ).chat(base_to_response=OpenaiToResponse()) class ChatMessageSerializer(serializers.Serializer):

This refactoring makes the code cleaner and easier to maintain, ensuring that only necessary data attributes are included and avoiding redundant checks for list existentials.

shaohuzhang1 · 2025-02-17T08:06:43Z

apps/dataset/serializers/file_serializers.py

@@ -62,7 +62,7 @@ class FileSerializer(serializers.Serializer):
    def upload(self, with_valid=True):
        if with_valid:
            self.is_valid(raise_exception=True)
-        meta = self.data.get('meta')
+        meta = self.data.get('meta', {})
        file_id = meta.get('file_id', uuid.uuid1())
        file = File(id=file_id, file_name=self.data.get('file').name, meta=meta)
        file.save(self.data.get('file').read())


The code looks generally correct but has a few minor points to consider:

Handling of meta Dictionary: The current implementation assumes that the 'meta' key exists in the data dictionary, which is safe if it's always present according to your use case. If there's any chance that this might not be true, you should add checks to handle such scenarios gracefully (e.g., using try-except blocks).

meta = self.data.get('meta', {})

Use of uuid3 Instead of uuid1: While both UUID versions are valid, uuid3 uses a namespace, which can make them easier to read compared to uuid4. However, the choice between uuid5, uuid, or uuid4 would depend on what specific properties you want when generating unique IDs.

file_id = meta.get('file_id', uuid.uuid3(uuid.NAMESPACE_DNS, str(file))) # Or simply: file_id = meta.get('file_id', uuid.uuid4()) depending on needs

These changes ensure robustness against missing keys and clarity in how ID generation works.

--story=1017867 --user=刘瑞斌对话API支持上传文件、语音、图像和视频，用于实现maxkb的文件解析及多模态对话#2228 https://www.tapd.cn/57709429/s/1654842

shaohuzhang1 · 2025-02-17T08:12:03Z

apps/application/serializers/chat_message_serializers.py

+                'document_list': instance.get('document_list', []),
+                'audio_list': instance.get('audio_list', []),
+            }
+        ).chat(base_to_response=OpenaiToResponse())


 class ChatMessageSerializer(serializers.Serializer):


The provided code makes a few adjustments to improve readability and maintainability:

Changes Made

Added Image List Handling: Added keys 'image_list', 'document_list', and 'audio_list' to the data dictionary passed to the serializer.

Removed Unnecessary Comma After Dictionary: Removed the trailing comma after each key-value pair in the data dictionary except the last one to align with PEP 8 style guidelines for single-line dictionaries.

Suggested Optimization

There is no significant performance issue mentioned in the current context; however, ensure that all list comprehensions and loops within this function are efficient and optimized if needed for larger datasets.

Potential Issues

Duplicated Keys: There might be redundant keys like 'form_data', 'image_list', etc., unless they have different meanings across contexts. Double-check their usage to avoid confusion or unnecessary complexity.

Error Handling: Consider adding additional error handling to manage cases where the data retrieval from instance is not always successful or of expected formats. For example, you could handle empty values gracefully by providing default placeholders.

Documentation Improvements: Ensure that the comments and docstrings throughout this section provide clear explanations of what the method does. This helps other developers understand its purpose and functionality.

By addressing these points, the code can become more robust, concise, and easier to maintain.

shaohuzhang1 · 2025-02-17T08:12:09Z

apps/dataset/serializers/file_serializers.py

@@ -62,7 +62,7 @@ class FileSerializer(serializers.Serializer):
    def upload(self, with_valid=True):
        if with_valid:
            self.is_valid(raise_exception=True)
-        meta = self.data.get('meta')
+        meta = self.data.get('meta', {'debug': True})
        file_id = meta.get('file_id', uuid.uuid1())
        file = File(id=file_id, file_name=self.data.get('file').name, meta=meta)
        file.save(self.data.get('file').read())


The code appears to be mostly correct, but there are a few improvements or notes that can be made:

Handling None Values: Currently, self.data.get('meta') will raise a KeyError if 'meta' is not present in the request data. You might want to add a check to ensure it exists before accessing its properties.

meta = self.data.get('meta', {})

Defaulting Debug Mode: The default value for 'debug' is set to True. If you want this behavior only under certain conditions (e.g., testing/debugging), consider adding another condition to check the debug mode setting.

File Handling: Ensure that any exceptions related to reading from the file are properly handled. Using context managers (with open(...) as f:) would also improve error handling.

Logging: For debugging purposes, consider logging some key information during the process of uploading the file, such as the file ID and whether the validation was successful.

Documentation: Although not shown here, make sure to document all functions and methods appropriately, especially since they modify sensitive user files.

Validation: Depending on your specific requirements, ensure that after retrieving meta, further validation logic is applied if necessary.

By making these minor enhancements, you can make the code more robust, maintainable, and future-proof.

shaohuzhang1 commented Feb 17, 2025

View reviewed changes

refactor: Support image_list, document_list while using swagger api

fc82273

--story=1017867 --user=刘瑞斌对话API支持上传文件、语音、图像和视频，用于实现maxkb的文件解析及多模态对话#2228 https://www.tapd.cn/57709429/s/1654842

liuruibin force-pushed the pr@main@refactor_image_list branch from def48ac to fc82273 Compare February 17, 2025 08:11

shaohuzhang1 commented Feb 17, 2025

View reviewed changes

liuruibin merged commit 9249c17 into main Feb 17, 2025
4 of 5 checks passed

liuruibin deleted the pr@main@refactor_image_list branch February 17, 2025 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Support image_list, document_list while using swagger api #2300

refactor: Support image_list, document_list while using swagger api #2300

shaohuzhang1 commented Feb 17, 2025

shaohuzhang1 Feb 17, 2025

shaohuzhang1 Feb 17, 2025

shaohuzhang1 Feb 17, 2025

shaohuzhang1 Feb 17, 2025

refactor: Support image_list, document_list while using swagger api #2300

refactor: Support image_list, document_list while using swagger api #2300

Conversation

shaohuzhang1 commented Feb 17, 2025

shaohuzhang1 Feb 17, 2025

Choose a reason for hiding this comment

shaohuzhang1 Feb 17, 2025

Choose a reason for hiding this comment

shaohuzhang1 Feb 17, 2025

Choose a reason for hiding this comment

Changes Made

Suggested Optimization

Potential Issues

shaohuzhang1 Feb 17, 2025

Choose a reason for hiding this comment