-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backend support for upload modal #89
Conversation
WalkthroughThe changes introduce a new video upload feature to the backend API. A new route for uploading videos to a collection is added in the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API
participant VideoDBHandler
participant VideoDBTool
Client->>API: POST /collection/<collection_id>/upload
API->>VideoDBHandler: upload_video(collection_id)
VideoDBHandler->>API: Check for file or source
alt File provided
VideoDBHandler->>VideoDBTool: upload(file, source_type="file")
VideoDBTool->>Server: GET /collection/{collection_id}/upload_url
Server-->>VideoDBTool: return upload_url
VideoDBTool->>Server: POST upload_url with file
Server-->>VideoDBTool: response
else Source provided
VideoDBHandler->>VideoDBTool: upload(source, source_type="url")
else No valid source
VideoDBHandler-->>API: 400 error
end
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (3)
backend/director/entrypoint/api/routes.py (2)
105-106
: Enhance error message with expected input formats.The current error message could be more helpful by specifying the accepted upload methods.
- return {"message": "No valid source provided"}, 400 + return { + "message": "No valid source provided", + "accepted_formats": { + "file_upload": "multipart/form-data with 'file' field", + "json_upload": { + "source": "string", + "source_type": "string (url|file)" + } + } + }, 400
88-107
: Consider adding rate limiting and async processing.For a production-ready upload endpoint, consider implementing:
- Rate limiting to prevent abuse
- Async processing for large files using a task queue (e.g., Celery)
- Progress tracking for large uploads
- Chunked upload support for better reliability
Would you like help implementing any of these architectural improvements?
backend/director/handler.py (1)
137-141
: Add error handling wrapperThe method should handle and properly format errors from the underlying tool.
Consider adding error handling:
def upload( self, source, source_type="url", media_type="video", name=None ): - return self.videodb_tool.upload(source, source_type, media_type, name) + try: + return self.videodb_tool.upload(source, source_type, media_type, name) + except Exception as e: + logger.error(f"Failed to upload media: {str(e)}") + raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
backend/director/entrypoint/api/routes.py
(1 hunks)backend/director/handler.py
(1 hunks)backend/director/tools/videodb_tool.py
(2 hunks)
🔇 Additional comments (4)
backend/director/entrypoint/api/routes.py (2)
88-91
: LGTM! Route definition follows REST conventions.
The endpoint is well-structured and consistent with other routes in the file.
98-100
: Verify VideoDBHandler.upload() method signature.
Ensure that the VideoDBHandler.upload() method accepts all the parameters being passed:
- For files: source, source_type, media_type, name
- For JSON: source, source_type
Also applies to: 104-104
✅ Verification successful
The verification is complete. I'll generate the final response based on the results:
VideoDBHandler.upload() method signature matches the usage
The method signature in backend/director/handler.py
perfectly matches the parameters being passed:
- Default parameters:
source_type="url"
,media_type="video"
,name=None
- All required parameters are supported:
source
,source_type
,media_type
,name
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for VideoDBHandler.upload method definition
ast-grep --pattern 'class VideoDBHandler:
$$$
def upload($$$):
$$$'
Length of output: 1627
backend/director/tools/videodb_tool.py (2)
2-2
: LGTM: Import addition is appropriate
The requests library import is correctly added to support the new file upload functionality.
84-93
:
Several improvements needed for robust file upload handling
While the basic implementation works, there are several areas that need attention for production-ready file upload handling:
- Add validation and security measures:
- File size limits
- File type validation
- Memory usage optimization for large files
- Enhance error handling
- Improve request handling
- Add proper resource management
Here's a suggested improvement:
elif source_type == "file":
upload_url_data = self.conn.get(
path=f"/collection/{self.collection.id}/upload_url",
params={"name": name},
)
upload_url = upload_url_data.get("upload_url")
- files = {"file": (name, source)}
- response = requests.post(upload_url, files=files)
- response.raise_for_status()
- upload_args["url"] = upload_url
+ try:
+ # Validate file size
+ if hasattr(source, 'seek') and hasattr(source, 'tell'):
+ source.seek(0, os.SEEK_END)
+ size = source.tell()
+ source.seek(0)
+ if size > MAX_UPLOAD_SIZE: # Define this constant
+ raise ValueError(f"File size exceeds maximum limit of {MAX_UPLOAD_SIZE} bytes")
+
+ # Stream the file in chunks to avoid memory issues
+ with source:
+ files = {"file": (name, source)}
+ response = requests.post(
+ upload_url,
+ files=files,
+ timeout=300, # 5 minutes timeout
+ headers={"Accept": "application/json"}
+ )
+ response.raise_for_status()
+ upload_args["url"] = upload_url
+ except requests.exceptions.RequestException as e:
+ raise ValueError(f"Failed to upload file: {str(e)}")
Let's verify the maximum file size handling in the codebase:
Consider implementing the following architectural improvements:
- Move file validation logic to a separate utility function
- Add retry logic for failed uploads
- Consider using multipart uploads for large files
- Add progress tracking for large uploads
- Implement proper cleanup of temporary upload URLs on failure
Would you like me to provide implementations for any of these improvements?
elif "source" in request.json: | ||
source = request.json["source"] | ||
source_type = request.json["source_type"] | ||
return videodb.upload(source=source, source_type=source_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation for JSON source uploads.
The current implementation lacks validation for JSON payload:
- Missing validation for source_type values
- No validation of source format/content
- Potential KeyError if source_type is missing
Apply these improvements:
elif "source" in request.json:
source = request.json["source"]
+ if not source:
+ return {"message": "Empty source provided"}, 400
+
+ if "source_type" not in request.json:
+ return {"message": "source_type is required"}, 400
source_type = request.json["source_type"]
+
+ # Validate source_type
+ valid_source_types = ["url", "file"] # Add other valid types
+ if source_type not in valid_source_types:
+ return {"message": f"Invalid source_type. Must be one of: {', '.join(valid_source_types)}"}, 400
+
+ # Validate source format based on source_type
+ if source_type == "url" and not source.startswith(("http://", "https://")):
+ return {"message": "Invalid URL format"}, 400
+
return videodb.upload(source=source, source_type=source_type)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
elif "source" in request.json: | |
source = request.json["source"] | |
source_type = request.json["source_type"] | |
return videodb.upload(source=source, source_type=source_type) | |
elif "source" in request.json: | |
source = request.json["source"] | |
if not source: | |
return {"message": "Empty source provided"}, 400 | |
if "source_type" not in request.json: | |
return {"message": "source_type is required"}, 400 | |
source_type = request.json["source_type"] | |
# Validate source_type | |
valid_source_types = ["url", "file"] # Add other valid types | |
if source_type not in valid_source_types: | |
return {"message": f"Invalid source_type. Must be one of: {', '.join(valid_source_types)}"}, 400 | |
# Validate source format based on source_type | |
if source_type == "url" and not source.startswith(("http://", "https://")): | |
return {"message": "Invalid URL format"}, 400 | |
return videodb.upload(source=source, source_type=source_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
backend/director/entrypoint/api/routes.py
(3 hunks)
🔇 Additional comments (4)
backend/director/entrypoint/api/routes.py (4)
97-111
:
Add file upload validations and improve memory handling.
Previous review comments about file validation remain relevant. Additionally, reading the entire file into memory could cause issues with large files.
Consider using chunked upload for large files:
if "file" in request.files:
file = request.files["file"]
- file_bytes = file.read()
+ # Stream file in chunks instead of loading entirely into memory
+ chunk_size = 8192 # 8KB chunks
+ file_chunks = []
+ while True:
+ chunk = file.read(chunk_size)
+ if not chunk:
+ break
+ file_chunks.append(chunk)
safe_filename = secure_filename(file.filename)
112-115
:
Add validation for JSON source uploads.
The JSON source handling lacks validation for required fields and source_type values.
Previous review suggestions for JSON validation should be implemented to prevent potential KeyError exceptions and invalid source types.
106-111
: Verify VideoDBHandler.upload implementation.
Need to verify the implementation of VideoDBHandler.upload()
method to ensure proper handling of different source types and error cases.
Also applies to: 114-115
✅ Verification successful
VideoDBHandler.upload implementation is properly structured and handles different source types
The implementation shows a well-structured flow for handling uploads:
- The handler delegates to VideoDBTool which properly handles both URL and file-based uploads
- File uploads use a secure upload URL obtained from the API
- The code includes proper error handling with try-catch in the route
- Secure filename validation is implemented
- Different media types and source types are supported with proper parameter passing
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for VideoDBHandler.upload implementation
ast-grep --pattern 'class VideoDBHandler {
$$$
upload($$$) {
$$$
}
$$$
}'
# Search for any error handling related to uploads
rg -A 5 "def upload" backend/director/handler.py
Length of output: 342
Script:
#!/bin/bash
# Let's check the videodb_tool implementation and initialization
rg -A 10 "class VideoDBTool" backend/director/
rg "videodb_tool =" backend/director/
# Also search for any error handling or validation related to upload
rg "def upload" backend/director/ -A 10
Length of output: 5373
94-95
:
Validate collection existence before processing upload.
The code should verify that the collection exists before attempting to upload.
Add collection validation:
try:
videodb = VideoDBHandler(collection_id)
+ if not videodb.get_collection():
+ return {"message": f"Collection {collection_id} not found"}, 404
Likely invalid or redundant comment.
SUPPORTED_MEDIA_TYPES = ["audio/mpeg", "video/mp4", "image/jpeg", "image/png"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Implement media type validation using the SUPPORTED_MEDIA_TYPES constant.
The SUPPORTED_MEDIA_TYPES
constant is defined but not utilized. This should be used to implement the TODO comment on line 104 for media type validation.
Here's how to implement the validation:
media_type = file.content_type.split("/")[0]
+ if file.content_type not in SUPPORTED_MEDIA_TYPES:
+ return {"message": f"Unsupported media type. Supported types are: {', '.join(SUPPORTED_MEDIA_TYPES)}"}, 400
- # TODO: Check if the media type is supported, if not, return 400
return videodb.upload(
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let the internal function of conn.upload handle the validation, if it not a valid type, the function should throw error, we are going to catch the error and simply pass that to client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@0xrohitgarg, understood. If the conn.upload
function handles the media type validation and raises an error for unsupported types, and you're catching the exception and passing it to the client, that approach works well.
✏️ Learnings added
Learnt from: 0xrohitgarg
PR: video-db/Director#89
File: backend/director/entrypoint/api/routes.py:0-0
Timestamp: 2024-11-29T05:47:14.334Z
Learning: In `backend/director/entrypoint/api/routes.py`, the `upload_video` function relies on `conn.upload` for media type validation. The `conn.upload` function in `backend/director/tools/videodb_tool.py` raises an error if the media type is unsupported, and exceptions are caught and passed to the client in `upload_video`.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
backend/director/entrypoint/api/routes.py
(2 hunks)
🔇 Additional comments (4)
backend/director/entrypoint/api/routes.py (4)
4-4
: LGTM: Good security practice using secure_filename
The addition of secure_filename from werkzeug.utils is appropriate for sanitizing uploaded filenames.
92-93
:
Add collection validation before processing upload
The upload process should verify that the collection exists before attempting any operations.
Add collection validation at the start:
try:
videodb = VideoDBHandler(collection_id)
+ if not videodb.get_collection():
+ return {"message": "Collection not found"}, 404
95-108
:
Enhance file upload security and validation
The file upload handling needs additional security measures:
- File size limits
- Media type validation
- Memory-efficient file handling
Add these safety checks:
if "file" in request.files:
file = request.files["file"]
+ if not file or not file.filename:
+ return {"message": "No file provided"}, 400
+
+ # Check file size before reading
+ file.seek(0, os.SEEK_END)
+ size = file.tell()
+ file.seek(0)
+ if size > app.config.get("MAX_CONTENT_LENGTH", 16 * 1024 * 1024):
+ return {"message": "File too large"}, 413
+
+ # Validate media type
+ if file.content_type not in app.config.get("SUPPORTED_MEDIA_TYPES", ["video/mp4"]):
+ return {"message": "Unsupported media type"}, 400
+
file_bytes = file.read()
109-112
:
Add validation for source-based uploads
The source upload handling lacks necessary validation and error handling.
Add validation for source-based uploads:
elif "source" in request.json:
+ if not request.json.get("source"):
+ return {"message": "Empty source provided"}, 400
+
+ if "source_type" not in request.json:
+ return {"message": "source_type is required"}, 400
source = request.json["source"]
source_type = request.json["source_type"]
+
+ valid_source_types = ["url", "file"]
+ if source_type not in valid_source_types:
+ return {"message": f"Invalid source_type. Must be one of: {', '.join(valid_source_types)}"}, 400
Summary by CodeRabbit
New Features
Bug Fixes