Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

WIP: Implement failing media API tests #1775

Closed

Conversation

DelusionalOptimist
Copy link

Fixes #1303

× Can fetch images in room
× Test URL preview
× Can read configuration endpoint

Previews just work as of now, pretty tentative implementation
Currently storing the response body for the url to be previewed is heavily inspired by the /upload endpoint. Not sure but probably gotta make a url_cache in media_store (similar to synapse).

Would be grateful to receive suggestions and comments. 😁

Pull Request Checklist

  • I have added any new tests that need to pass to sytest-whitelist as specified in docs/sytest.md
  • Pull request includes a sign off

Signed-off-by: Rudraksh Pareek <54525605+DelusionalOptimist@users.noreply.github.com>

Signed-off-by: Rudraksh Pareek <54525605+DelusionalOptimist@users.noreply.github.com>
Signed-off-by: Rudraksh Pareek <54525605+DelusionalOptimist@users.noreply.github.com>
@kegsay kegsay self-assigned this Jul 12, 2021
Copy link
Member

@kegsay kegsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL previews aren't an easy thing to tackle due to the amount of paranoia you need to apply when downloading arbitrary URLs from clients. I didn't really think about this when tagging this issue as good-first-issue so that's my bad. At present, there is still a lot of work which needs to be done before this is suitable to be merged.

request := &PreviewUrlRequest{
url: *parsedUrl,
// Convert timestamp to ms
ts: types.UnixMs(ts / 1000000),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsStr will be in milliseconds already looking at https://github.com/matrix-org/synapse/blob/4b965c862dc66c0da5d3240add70e9b5f0aa720b/synapse/rest/media/v1/preview_url_resource.py#L198 so it should just be fine to do ts: types.UnixMs(ts).

urlString := previewReq.url.String()

// Get the URL
response, err := http.Get(urlString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use the default client as it has no timeout, so malicious preview URLs could just not return anything and consume a file descriptor on the server. Use a client with a sensible timeout please e.g &http.Client{Timeout:30 * time.Second}

}

// parse the url
parsedUrl, err := url.Parse(urlToPreview)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// Save the response body to the temporary dir
// TODO: should be a new directory like url_cache maybe and
hash, bytesWritten, tmpDir, err := fileutils.WriteTempFile(ctx, response.Body, cfg.AbsBasePath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely want to set a cap on the number of bytes we're willing to write to file, otherwise malicious urls can return a 10GB file and consume lots of disk space. 1MB should be plenty.

Origin: cfg.Matrix.ServerName,
FileSizeBytes: bytesWritten,
Base64Hash: hash,
ContentType: types.ContentType(response.Header["Content-Type"][0]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use response.Header.Get("Content-Type") instead please. This otherwise will panic if the response has no Content-Type header.

return urlMetadata, nil
}

func (m *mediaInfo) generateMediaID(ctx context.Context, db storage.Database) (types.MediaID, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will race with multiple uploads. If 2 people upload a file at the same time it's possible that they will be allocated the same media ID which will then be checked in the database, see it's free and then overwrite each other later on. This is however unlikely given the 32 bytes of entropy used here so maybe that's okay.

if err != nil {
return nil, &util.JSONResponse{
Code: http.StatusInternalServerError,
JSON: jsonerror.Unknown("Couldn't get the path to the stored file " + err.Error()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect error message.


// Iterating the *html.Node, looking for og data
var f func(*html.Node)
f = func(n *html.Node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do f := func(n *html.Node) to avoid the declaration on :371?

}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to limit how much we're willing to explore on the HTML to avoid pathological cases which consume lots of CPU.

@@ -0,0 +1,400 @@
// Copyright 2017 Vector Creations Ltd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyrights are wrong.

@kegsay kegsay removed their assignment Jul 12, 2021
@kegsay kegsay added the stale This issue or PR is at risk of being closed without further feedback label Jul 12, 2021
@DelusionalOptimist
Copy link
Author

Hey @kegsay! Thanks for taking a look at this.
I agree this is some terrible work lol.
I learnt a lot while working on this so thanks for that too but unfortunately I've also got other commitments now and I'm unable to carry this on.
Closing this : )

brianathere added a commit to HereNotThere/dendrite that referenced this pull request Mar 29, 2023
The biggest thing standing between us and efficient docker cached builds
is the `RELEASE_VERSION` arg. Since this argument is always changing,
and it's one of the first things used in the Dockerfile, our docker
build cache is always invalidated. This PR addresses that problem via:

1) Remove the -ldflags argument to the go build
2) Move the RELEASE_VERSION arg to the bottom
3) Make dendrite use the environment variable, instead of the build-time
arg
4) Make dockerfile pass the build-time version ARG as an environment
variable into the image

---------

Co-authored-by: Brian Meek <brian@hntlabs.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stale This issue or PR is at risk of being closed without further feedback
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement failing media API tests
2 participants