Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get latest page always from wayback machine #8311

Merged
merged 1 commit into from
Mar 23, 2021
Merged

Get latest page always from wayback machine #8311

merged 1 commit into from
Mar 23, 2021

Conversation

simonhong
Copy link
Member

@simonhong simonhong commented Mar 20, 2021

By deleting timestamp value from wayback query, we can get latest saved page.
fix brave/brave-browser#14843

Submitter Checklist:

  • I confirm that no security/privacy review is needed, or that I have requested one
  • There is a ticket for my issue
  • Used Github auto-closing keywords in the PR description above
  • Wrote a good PR/commit description
  • Added appropriate labels (QA/Yes or QA/No; release-notes/include or release-notes/exclude; OS/...) to the associated issue
  • Checked the PR locally: npm run test -- brave_browser_tests, npm run test -- brave_unit_tests, npm run lint, npm run gn_check, npm run tslint
  • Ran git rebase master (if needed)

Reviewer Checklist:

  • A security review is not needed, or a link to one is included in the PR description
  • New files have MPL-2.0 license header
  • Adequate test coverage exists to prevent regressions
  • Major classes, functions and non-trivial code blocks are well-commented
  • Changes in component dependencies are properly reflected in gn
  • Code follows the style guide
  • Test plan is specified in PR before merging

After-merge Checklist:

Test Plan:

npm run test brave_unit_tests -- --filter=BraveWaybackMachineUtilsTest*

  1. Load https://en.wikipedia.org/wiki/1960%E2%80%9361_UE_Lleida_season?&timestamp=20160101
  2. Press Check for saved version
  3. Check https://web.archive.org/web/20210315191803/https://en.wikipedia.org/wiki/1960%E2%80%9361_UE_Lleida_season is loaded
    W/o this PR, https://web.archive.org/web/20161220105407/https://en.wikipedia.org/wiki/1960%E2%80%9361_UE_Lleida_season is loaded due to timestamp value

@simonhong simonhong self-assigned this Mar 20, 2021
@simonhong
Copy link
Member Author

simonhong commented Mar 22, 2021

@diracdeltas I checked that encoding the whole url or query only both doesn't work.
wayback machine can't find saved version with encoded url.
I think invalidating timestamp is the only way to get latest saved page always.

For example, when query string in https://en.wikipedia.org/wiki/1960–61_UE_Lleida_season?&timestamp=20160101 is encoded,
query url is https://brave-api.archive.org/wayback/available?url=https://en.wikipedia.org/wiki/1960%E2%80%9361_UE_Lleida_season?%26timestamp%3D20160101.
and response for this query is below.
{"url": "https://en.wikipedia.org/wiki/1960\u201361_UE_Lleida_season?&timestamp=20160101", "archived_snapshots": {}}
If timestamp query is interpreted properly, url should be https://en.wikipedia.org/wiki/1960\u201361_UE_Lleida_season?.

Copy link
Member

@diracdeltas diracdeltas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approach lgtm

By deleting timestamp value from wayback query, we can get latest saved page.
fix brave/brave-browser#14843
@simonhong
Copy link
Member Author

Merged because failure from dist step on Window is installer signing failure in CI.
and audit failure fro post-int. All both are not related with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Security] [hackerone] wayback machine URL encoding
2 participants