Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classify search engine results page content #13522

Merged
merged 1 commit into from
May 31, 2022
Merged

Classify search engine results page content #13522

merged 1 commit into from
May 31, 2022

Conversation

tmancey
Copy link
Collaborator

@tmancey tmancey commented May 29, 2022

Resolves brave/brave-browser#6000

Submitter Checklist:

  • I confirm that no security/privacy review is needed, or that I have requested one
  • There is a ticket for my issue
  • Used Github auto-closing keywords in the PR description above
  • Wrote a good PR/commit description
  • Squashed any review feedback or "fixup" commits before merge, so that history is a record of what happened in the repo, not your PR
  • Added appropriate labels (QA/Yes or QA/No; release-notes/include or release-notes/exclude; OS/...) to the associated issue
  • Checked the PR locally: npm run test -- brave_browser_tests, npm run test -- brave_unit_tests, npm run lint, npm run gn_check, npm run tslint
  • Ran git rebase master (if needed)

Reviewer Checklist:

  • A security review is not needed, or a link to one is included in the PR description
  • New files have MPL-2.0 license header
  • Adequate test coverage exists to prevent regressions
  • Major classes, functions and non-trivial code blocks are well-commented
  • Changes in component dependencies are properly reflected in gn
  • Code follows the style guide
  • Test plan is specified in PR before merging

After-merge Checklist:

Test Plan:

Confirm visiting a selection of the following sites and then performing a search classifies the page content for supported classifier languages:

Before the change Search Engine Result Pages (SERP) were not classified. After the change SERP should be classified and the following text should appear in the console logs Classified text with the top segment as...

https://developer.mozilla.org/
https://duckduckgo.com/
https://en.wikipedia.org/
https://fireball.de/
https://github.com/
https://infogalactic.com/
https://ja.wikipedia.org/
https://search.brave.com/
https://search.yahoo.com/
https://stackoverflow.com/
https://swisscows.com/
https://twitter.com/explore/
https://uk.search.yahoo.com/
https://www.amazon.co.uk/
https://www.amazon.com/
https://www.baidu.com/
https://www.bing.com/
https://www.dogpile.com/
https://www.ecosia.org/
https://www.excite.com/
https://www.findx.com/
https://www.gigablast.com/
https://www.google.co.uk/
https://www.google.com/
https://www.lycos.com/
https://www.metacrawler.com/
https://www.mojeek.co.uk/
https://www.mojeek.com/
https://www.petalsearch.com/
https://www.qwant.com/
https://www.semanticscholar.org/
https://www.sogou.com/
https://www.startpage.com/
https://www.webcrawler.com/
https://www.wolframalpha.com/
https://www.youtube.com/
https://yandex.com/

There are unit tests to cover all sites including those that are multiilingual.

@tmancey tmancey requested a review from a team as a code owner May 29, 2022 12:26
@tmancey tmancey self-assigned this May 29, 2022
@tmancey tmancey force-pushed the issues/6000 branch 8 times, most recently from 725ad7f to d3e6b3f Compare May 30, 2022 21:25
@tmancey tmancey requested a review from moritzhaller May 30, 2022 21:28
@tmancey tmancey force-pushed the issues/6000 branch 8 times, most recently from de000d9 to d7ce7fb Compare May 31, 2022 09:06
@tmancey tmancey merged commit 3727b5a into master May 31, 2022
@tmancey tmancey deleted the issues/6000 branch May 31, 2022 11:08
@tmancey tmancey added this to the 1.41.x - Nightly milestone May 31, 2022
@andywillis
Copy link

Why are you still testing for the racist search-engine infogalactic? I thought we were done with that back in 2019?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Classify search engine results page content
3 participants