Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge #158: Sanitize torrent description in markdown
77cbdad refactor: extract functions for sanitizing torrent description (Jose Celano) 7d65d3b feat: [#105] allow only some html tags in sanitized markdown torrent description (Jose Celano) 9143736 feat: sanitize torrent description in markdown (Jose Celano) 34b9fb6 feat: add dependency: dompurify (Jose Celano) Pull request description: - Purify HTML to avoid potential XSS attacks. - Remove external URLS to protect users' privacy. ### Details The sanitize function: - Removes harmful code like the one in the example. - Remove internal links. It removes the "href" attribute. - Replace image sources with the direct base64 image data from the backend image proxy, only for valid image formats. Invalid format images are removed from the DOM. - Valid images are images whose source URL ends with a valid extension: `["png", "PNG", "jpg", "JPG", "jpeg", "JPEG"]`. The backend is supposed to support only PNG and JPG images. - Other HTML tags like `<embed>` are removed. The sanitize function is being tested with this sample torrent description: ``` Harmful script (the button show not show the alert): <p>Click the button to display an alert box.</p> <button onclick="alert('Hello! I am an alert box!')">Try it</button> Valid PNG image in markdown format (it should show the image using the proxy) ![Torrust Logo](https://raw.githubusercontent.com/torrust/torrust-index-backend/develop/docs/media/torrust_logo.png) Another valid (JPG) image in markdown format (it should show the image using the proxy) ![Mandelbrot Set](https://upload.wikimedia.org/wikipedia/commons/2/21/Mandel_zoom_00_mandelbrot_set.jpg) Valid image (PNG) in html format (it should show the image using the proxy): <img src="https://raw.githubusercontent.com/torrust/torrust-index-backend/develop/docs/media/torrust_logo.png"> Invalid image (TIF) in html format (it should remove the image): <img src="https://commons.wikimedia.org/wiki/Category:TIFF_files#/media/File:Arkansas_Constitution_of_1836,_page_1.tif"> Invalid html link (it should remove the href attribute value): <a href="https://commons.wikimedia.org/wiki/Category:TIFF_files#/media/File:Arkansas_Constitution_of_1836,_page_1.tif">Arkansas Constitution of 1836. Page 1</a> <embed type="video/webm" src="/media/cc0-videos/flower.mp4" width="250" height="200"> ``` ### Todo - [x] Basic sanitize function - [x] Remove other HTML tags that can contain external sources like `<video>` - [ ] Add tests: unit and E2E. This could be hard to test as base64 encoded images are big strings. Maybe with a very small image. - [x] Refactor: extract sanitize function from component to a service I think these are the HTML tags that can contain sources linking to external resources: - `<a>`: The href attribute can contain an external URL. - `<img>`: The src attribute can point to an external image URL. - `<script>`: The src attribute can point to an external JavaScript file. - `<link>`: Used for linking CSS files, favicons, etc., the href attribute can contain an external URL. - `<iframe>`: The src attribute can contain an external URL to embed content from another site. - `<object>` and `<embed>`: These tags are used to embed multimedia content like Flash or PDFs, and their data and src attributes respectively can contain external URLs. - `<audio>` and `<video>`: The src attribute can contain an external URL to a media file. - `<source>`: This tag, used inside <audio>, <video>, or <picture> elements, has a src or srcset attribute that can contain an external URL. - `<form>`: The action attribute can contain an external URL where the form data is sent when submitted. - `<meta http-equiv="refresh" content="0; url=http://example.com/" />`: This meta tag can be used to redirect to an external URL. The dompurifier only removes unsafe code, but we also want to [remove external links to avoid tracking the users](#67). Top commit has no ACKs. Tree-SHA512: 97bfdcf4fcf07d5940654872b8759308a12f701ac2023248878fa0bce96278f0560b6b3a38c1474da44aa885b9bb99dac4afa87c515e0d14eacd469715782ff3
- Loading branch information