-
Notifications
You must be signed in to change notification settings - Fork 0
Add support for text paste URLs (xpaste.pro) with /txt endpoint #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #15
Implements comprehensive support for xpaste.pro URLs: **New Features:** - Added `/txt` endpoint to download text files from paste services - Auto-detects xpaste.pro URLs and converts them to raw text endpoints - Smart markdown conversion for text pastes: - Embeds content in code blocks for files <1500 lines - Creates zip archive with index.md and .txt file for >=1500 lines **Implementation:** - Added `normalizeUrlForTextContent()` utility to convert xpaste.pro URLs to /raw format - Added `isTextPasteUrl()` utility to detect text paste service URLs - Modified markdown handler to handle text pastes differently from HTML pages - Added archiver dependency for zip file creation **Tests:** - Added comprehensive unit tests for xpaste.pro URL utilities - Added integration tests for /txt endpoint - Added tests for markdown conversion with both small and large text files - All 63 tests pass (excluding docker e2e which requires docker-compose) Fixes #15 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit 4e17fcd.
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Please download entire page screenshot using our webcapture (playwright or puppeteer engine) for reference. To make sure we have actual data in test cases. |
|
🤖 AI Work Session Started Starting automated work session at 2025-11-14T19:30:22.072Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
…t4q0Lsp0 - Replaced mock test data with actual content from xpaste.pro/p/t4q0Lsp0 - Added screenshot and text content reference files in examples/xpaste/ - Tests now use real-world SQL query log data (63 lines) - Ensures tests accurately reflect production behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Please make sure we have a script to capture entire page screenshot not just the part of the page, and double that everything works correctly based on that new screenshot. Old partial screenshot should be replaced. |
|
🤖 AI Work Session Started Starting automated work session at 2025-11-15T09:27:56.525Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
…te screenshot - Add fullPage query parameter to /image endpoint for full-page screenshots - Fix Playwright adapter setUserAgent compatibility (no-op since UA is set at context level) - Replace partial xpaste screenshot with full-page version (113KB -> 236KB) - Add experiment script to capture full-page screenshots for testing The new screenshot shows all 64 lines of SQL queries from xpaste.pro/p/t4q0Lsp0, compared to the previous partial screenshot that only showed ~30 lines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Full-Page Screenshot Implementation CompleteI've successfully implemented the requested changes: Changes Made
Verification✅ All 68 tests passing (including existing xpaste tests) Screenshot ComparisonThe new full-page screenshot now shows:
This ensures our tests accurately reflect the real data from the xpaste URL. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart 1/3Detected uncommitted changes from previous run. Starting new session to review and commit them. Uncommitted files: Auto-restart will stop after changes are committed or after 2 more iterations. Please wait until working session will end and give your feedback. |
Updated yarn.lock to use consistent registry URLs (registry.npmjs.org) and consolidated package versions. No functional changes. All 68 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
✅ Pull Request Ready for ReviewAll uncommitted changes have been resolved and the implementation is complete. What was done:
Requirements verification (Issue #15):
Test coverage:
The PR is ready for final review and merge. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Also add extraction as markdown for the the same page screenshot, and make sure markdown will end up as close to content on the page as possible. So all text on screenshot should also be present in markdown extraction of that page. And cover that with extensive 100% coverage testing. |
|
🤖 AI Work Session Started Starting automated work session at 2025-11-15T13:17:17.577Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
…ents Changes: - Updated /markdown endpoint for xpaste.pro URLs to fetch HTML and convert to markdown instead of fetching raw text content - This ensures ALL text visible in screenshots is captured in markdown extraction (header, metadata, footer, language links, etc.) - For large files (>= 1500 lines), creates zip with markdown file instead of txt file Tests: - Added comprehensive screenshot-to-markdown content matching tests - All 6 new tests verify that markdown includes every UI element visible in screenshots: * Header/branding (Xpaste) * Metadata (format, creation date, deletion date) * All SQL query content with line numbers * Footer elements (Southbridge, help links, privacy policy) * Language selector (Ru/En) * RAW button link - Updated existing tests to work with HTML-based markdown extraction - All 74 tests passing Examples: - Added t4q0Lsp0-page.html (actual HTML fetched from xpaste.pro) - Added t4q0Lsp0-page.md (markdown conversion showing full page content) - Added test-xpaste-html-markdown.js experiment script This addresses the requirement to make markdown extraction match screenshot content with 100% coverage of visible text elements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
✅ Enhancement Complete: Screenshot-to-Markdown Content MatchingI've successfully implemented the requested feature to ensure markdown extraction from xpaste.pro pages includes ALL text visible in screenshots. What ChangedBefore:
After:
Test CoverageAdded comprehensive test suite with 6 new tests in
Result: All 74 tests passing (9 test suites) Examples Added
VerificationYou can verify the markdown extraction includes all screenshot text by running: node experiments/test-xpaste-html-markdown.jsThis will show: All requirements from the issue comment have been met with extensive test coverage! 🎉 |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
In markdown version of the page |
|
🤖 AI Work Session Started Starting automated work session at 2025-11-15T13:35:44.564Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
Fixes issue where xpaste.pro page heading and language selector appeared at the end of markdown instead of at the top, as shown in screenshots. Changes: - Modified convertHtmlToMarkdown() to reorder header/main/footer elements before conversion - When header comes after main in DOM (CSS-positioned), it's now moved before main - This ensures markdown reflects the visual layout, not the DOM order - Updated t4q0Lsp0-page.md with correct element ordering - Added test to verify heading and languages appear before metadata - Added regenerate script for reproducibility Test results: - All 75 tests passing (7 xpaste screenshot tests) - New test verifies correct ordering: heading → languages → metadata → content - Element positions verified: heading (line 5), languages (line 7), metadata (line 10) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
✅ Fixed: Markdown Element OrderingI've successfully fixed the issue where the heading "Упакуем пароль или код в cсылку для передачи" and language selector (Ru/En) were appearing at the end of the markdown document instead of at the top, as shown in the screenshot. Problem IdentifiedThe xpaste.pro HTML structure has an unusual DOM order:
The site uses CSS to position the header at the top visually, but in the HTML/DOM it comes after the main content. This caused the markdown conversion to place header elements at the end. SolutionModified
Changes Made
Verification✅ All 75 tests passing (added 1 new test) New test verifies:
Element positions in generated markdown:
This now matches the screenshot exactly! 🎉 |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
I think useful data (examples) should go in tests/xpaste/data/ folder, and useful scripts should go to scripts/xpaste. So we can reuse them in the future to recapture/update data. And all tests related to it just in |
|
🤖 AI Work Session Started Starting automated work session at 2025-11-15T15:22:23.309Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
Reorganize all xpaste-related files to improve project structure and reusability:
Changes:
- Move test data: examples/xpaste/ → tests/xpaste/data/
- t4q0Lsp0-page.html, t4q0Lsp0-page.md, t4q0Lsp0-screenshot.png, t4q0Lsp0-actual-content.txt
- Move scripts: experiments/ → scripts/xpaste/
- capture-fullpage.js (was capture-xpaste-fullpage.js)
- regenerate-markdown.js (was regenerate-xpaste-markdown.js)
- test-html-markdown.js (was test-xpaste-html-markdown.js)
- Updated all import paths to point to new locations
- Reorganize tests: tests/{unit,mock}/ → tests/xpaste/
- xpaste.unit.test.js (from tests/unit/xpaste.test.js)
- xpaste.mock.test.js (from tests/mock/xpaste.test.js)
- xpaste-screenshot-markdown.mock.test.js (from tests/mock/xpaste-screenshot-markdown.test.js)
- Tests now use proper naming convention (.unit.test.js, .mock.test.js)
- Update Jest configuration (jest.config.mjs):
- Added tests/xpaste/**/*.test.js to testMatch patterns
All tests passing (75 tests in 9 test suites).
This structure makes it easier to:
- Reuse scripts for data capture/regeneration
- Locate test data and reference files
- Understand test types at a glance
- Extend with additional test types (e2e, integration)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
Summary
This PR adds comprehensive support for text paste services like xpaste.pro, implementing automatic text file detection, download, and smart markdown conversion that captures all visible page content including UI elements, metadata, and text - ensuring markdown extraction matches screenshot content 100%.
Fixes #15
Changes
New Endpoint
GET /txt?url=<URL>- Downloads text content from paste services as.txtfiles/rawendpointtext/plaincontent typeContent-Dispositionheader for file downloadsEnhanced Markdown Endpoint
The
/markdownendpoint now intelligently handles text paste URLs by fetching the full HTML page and converting to markdown, ensuring all visual elements are captured:For files < 1500 lines:
For files >= 1500 lines:
index.md- Markdown file with metadata and link to the full markdown file[filename].md- The complete page as markdownKey Improvement: The markdown extraction now matches screenshot content 100% - all text visible in a screenshot of the page is also present in the markdown output.
Image Endpoint Enhancement
GET /image?url=<URL>&fullPage=true- AddedfullPagequery parameter supportfullPage=true, captures entire scrollable page instead of just viewportNew Utilities (src/lib.js)
normalizeUrlForTextContent(url)- Converts xpaste.pro URLs to raw text formatisTextPasteUrl(url)- Detects if a URL is from a text paste serviceFiles Modified
/txtendpointfullPagequery parameter supportconvertHtmlToMarkdownto reorder header/main/footer elements to match visual layoutarchiverdependency for zip creationtests/xpaste/to test match patternsTests Added
All xpaste-related tests are now organized in
tests/xpaste/:/txtendpoint functionalityTest Data
Test data is organized in
tests/xpaste/data/:Scripts
Reusable scripts are organized in
scripts/xpaste/:Test Results
✅ 75 tests passing (9 test suites, 0 tests skipped)
Test Coverage Highlights
/txtendpointUsage Examples
Download text file directly
curl "http://localhost:3000/txt?url=https://xpaste.pro/p/t4q0Lsp0" -o output.txtGet markdown with full page content (small file)
curl "http://localhost:3000/markdown?url=https://xpaste.pro/p/t4q0Lsp0"Output includes:
Get full-page screenshot
curl "http://localhost:3000/image?url=https://xpaste.pro/p/t4q0Lsp0&fullPage=true" -o screenshot.pngGet markdown (large file - zip archive)
curl "http://localhost:3000/markdown?url=https://xpaste.pro/p/largefile" -o archive.zipTechnical Notes
convertHtmlToMarkdownfunction now intelligently reorders header/main/footer elements to match the visual page layout, not just the DOM order. This fixes cases where CSS positioning (like xpaste.pro) places header visually at top but after main in HTMLFile Organization
This PR follows a structured organization pattern:
tests/xpaste/with proper naming (.unit.test.js,.mock.test.js)tests/xpaste/data/scripts/xpaste/🤖 Generated with Claude Code