Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
296 changes: 296 additions & 0 deletions ANALYSIS_123OPEN_TIMEOUT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
# Analysis: 123open Large File Upload Timeout Issue

## Problem Summary

When copying large files from Baidu Netdisk to 123open storage, the upload consistently times out with "upload complete timeout" error, while:
- ✅ Baidu Netdisk → Aliyun works fine
- ✅ Direct upload → 123open works fine
- ✅ Baidu Netdisk → 123open (quick upload/秒传 when file exists) works fine
- ❌ Baidu Netdisk → 123open (large file, requires actual upload) always times out

**Important**: Issue persists even with:
- Concurrency reduced to 1
- Timeout extended to 2 hours

## Code Flow Analysis

### 1. Cross-Storage Transfer Flow (Baidu → 123open)

```
FileTransferTask.RunWithNextTaskCallback()
op.Get() - Get source file object from Baidu
op.Link() - Get download link from Baidu
stream.NewSeekableStream() - Create stream from Baidu link
op.Put() → Open123.Put() - Upload to 123open
[Key Point 1] stream.CacheFullAndHash() - Compute MD5 if not available
[Key Point 2] Open123.create() - Create upload session with MD5
[Key Point 3] Open123.Upload() - Upload chunks in parallel
[Key Point 4] Open123.complete() - Poll for upload completion (60 times, 1 sec interval)
```

### 2. Direct Upload Flow (Local/Direct → 123open)

```
Local file/Direct stream
op.Put() → Open123.Put()
[Same as above, but stream is complete and reliable]
```

### 3. Quick Upload Flow (秒传)

```
Open123.Put()
Open123.create() - Server detects existing file by MD5
createResp.Data.Reuse = true - Quick upload, no actual data transfer needed
Return immediately with FileID
```

## Key Differences

| Aspect | Direct Upload | Cross-Storage (Baidu→123) |
|--------|--------------|---------------------------|
| Data Source | Local/Complete | Streaming from Baidu (may have rate limits) |
| MD5 Calculation | From complete file | From streaming data (CacheFullAndHash) |
| Stream Reliability | High | Depends on Baidu download stability |
| Data Integrity | Guaranteed | Potential mismatch during transfer |

## Potential Root Causes

### Hypothesis 1: Stream Data Incompleteness
**Theory**: Baidu download stream may not complete or may be rate-limited, causing incomplete data to be uploaded to 123open.

**Evidence**:
- Direct upload works (complete data source)
- Quick upload works (no data transfer needed)
- Only fails when actually transferring data from Baidu

**What to check in logs**:
```
[Transfer] Stream created for <filename> - Size matches expected?
[123open] Upload - all chunks uploaded successfully - All chunks complete?
[123open] Complete poll #N: Completed=false, FileID=0 - What's the actual response?
```

### Hypothesis 2: MD5 Mismatch
**Theory**: MD5 calculated during streaming (CacheFullAndHash) may differ from the actual uploaded data due to stream issues.

**Evidence**:
- Quick upload works because MD5 match is verified by server
- CacheFullAndHash reads entire stream to compute MD5, but stream might be incomplete

**What to check in logs**:
```
[123open] File: <filename> - Computing MD5 via CacheFullAndHash
[123open] File: <filename> - CacheFullAndHash completed, MD5: <hash>
[123open] Upload - chunk <N>: MD5: <hash> - Do chunk MD5s match expectations?
```

### Hypothesis 3: Server-Side Processing Delay
**Theory**: 123open server needs more time to merge/process chunks for large files, but complete() API doesn't properly indicate "processing" state.

**Evidence**:
- Timeout extended to 2 hours still fails
- API documentation mentions unknown error codes (e.g., 20103)

**What to check in logs**:
```
[123open] Complete poll #N: Code=<code>, Message=<msg>, Completed=<bool>, FileID=<id>
```

Look for:
- `Completed=false` with `FileID=0` consistently
- Unknown error codes in Message field
- Any error patterns in API responses

### Hypothesis 4: Chunk Upload Timing Race Condition
**Theory**: 123open server starts processing before all chunks are fully received, leading to validation failure.

**Evidence**:
- Parallel upload with retry mechanism
- Server may start merging chunks before all are confirmed

**What to check in logs**:
```
[123open] Upload - file: <filename>, all <N> chunks uploaded successfully
[123open] Complete poll #1: <response> - How quickly after upload completion?
```

### Hypothesis 5: Baidu Stream Rate Limiting
**Theory**: Baidu imposes download rate limits that cause chunks to be read slowly, leading to timeout or partial reads.

**Evidence**:
- Only happens with Baidu as source
- Works with Aliyun (different rate limit policy)

**What to check in logs**:
```
[123open] Upload - chunk <N>: MD5 calculation time
[123open] Upload - chunk <N>: HTTP request time
[Transfer] Link obtained for <filename> - Link URL and expiration
```

## Logging Added

### 1. In `drivers/123_open/driver.go` - Put() method
- File upload start with size
- MD5 hash availability and computation timing
- Create response details (Reuse flag, FileID, PreuploadID)
- Quick upload success detection
- Upload phase timing
- **Detailed complete() polling logs** showing:
- Poll attempt number
- API response code and message
- Completed flag status
- FileID value
- Any errors
- Success/timeout final status

### 2. In `drivers/123_open/upload.go` - Upload() method
- Upload parameters (domain, total size, chunk size, thread count)
- Per-chunk progress:
- Chunk number and MD5 hash
- Upload timing
- HTTP status and API response
- Success/failure status
- Total upload completion time

### 3. In `internal/fs/copy_move.go` - Transfer task
- Source and destination storage information
- File metadata (name, size)
- Link acquisition timing and details
- Stream creation timing and characteristics
- Stream hash availability
- Upload phase timing
- Final success/failure status

## How to Use These Logs

### Step 1: Reproduce the Issue
Attempt to copy a large file (>100MB) from Baidu Netdisk to 123open storage.

### Step 2: Collect Logs
Enable debug logging and capture the full log output. Look for patterns:

```bash
# Filter for relevant logs
grep "\[123open\]\|\[Transfer\]" openlist.log > analysis.log
```

### Step 3: Analyze the Complete() Responses
This is the most critical part. Look at the complete() polling logs:

```
[123open] File: example.zip - Complete poll #1: Code=?, Message=?, Completed=?, FileID=?
[123open] File: example.zip - Complete poll #2: Code=?, Message=?, Completed=?, FileID=?
...
[123open] File: example.zip - Complete poll #60: Code=?, Message=?, Completed=?, FileID=?
```

**Key questions**:
1. What are the actual Code and Message values?
2. Is Completed always false or does it change?
3. Is FileID always 0 or does it get set?
4. Are there any error messages in the Message field?

### Step 4: Compare with Direct Upload
Perform a direct upload of the same file size to 123open and compare logs:

1. MD5 computation time
2. Chunk upload timing
3. Complete() response patterns
4. Time from "upload completed" to "complete poll #1"

### Step 5: Check Data Integrity
1. Verify that all chunks report successful upload
2. Check if chunk count matches expected value
3. Compare MD5 computed by CacheFullAndHash with actual file MD5 (if available)

### Step 6: Examine Stream Characteristics
1. Is the download link from Baidu valid?
2. How long does stream creation take?
3. Is the stream size reported correctly?
4. Is MD5 hash available from the stream?

## Expected Findings

Based on the analysis, we expect to find one of these scenarios:

### Scenario A: Server Returns Unknown Status
```
[123open] Complete poll #N: Code=0, Message="", Completed=false, FileID=0
```
**Meaning**: Upload chunks received but server is stuck in processing state without providing clear feedback.

### Scenario B: Server Returns Error Code
```
[123open] Complete poll #N: Code=20103, Message="<unknown>", Completed=false, FileID=0
```
**Meaning**: Server encountered an error (possibly validation failure) but error code is undocumented.

### Scenario C: MD5 Mismatch
```
[123open] CacheFullAndHash computed MD5: <hash1>
[123open] Upload chunk MD5s: <various>
[123open] Complete poll: Validation failed
```
**Meaning**: MD5 computed during streaming doesn't match actual uploaded data.

### Scenario D: Incomplete Upload
```
[123open] Upload - Expected <N> chunks, uploaded <M> chunks (M < N)
[123open] Complete poll: Missing chunks
```
**Meaning**: Not all chunks were successfully uploaded due to stream issues.

## Next Steps After Analysis

Once the logs reveal the actual cause:

1. **If Scenario A**: Need to increase polling timeout or implement exponential backoff
2. **If Scenario B**: Need to handle specific error codes or retry with different strategy
3. **If Scenario C**: Need to verify MD5 before upload or implement checksum validation
4. **If Scenario D**: Need to improve stream reliability or implement chunk verification

## Testing Recommendations

1. **Test with different file sizes**:
- Small (< 10MB) - Does it work?
- Medium (10-100MB) - When does it start failing?
- Large (> 100MB) - Consistent failure?

2. **Test with different sources**:
- Baidu → 123open (fails)
- Aliyun → 123open (works?)
- Local → 123open (works)

3. **Test network conditions**:
- Fast network vs slow network
- With/without rate limiting

4. **Monitor server-side behavior**:
- Check 123open web interface during upload
- See if file appears in "processing" state

## Conclusion

The comprehensive logging added will help identify:
- Exactly what the complete() API returns (most critical)
- Whether all chunks upload successfully
- How long each phase takes
- Any data integrity issues
- Stream characteristics from Baidu

The user should run the code with these logs enabled and share the output for further analysis.
Loading