Skip to content

Commit

Permalink
🐛 Add digits in searched keywords
Browse files Browse the repository at this point in the history
Also conserve combining characters, that are characters that are intended to modify other characters.

Close Bug avec les titres contenant des guillemets anglais (" ")  #25
  • Loading branch information
lnoss committed Jan 26, 2024
1 parent a2980c5 commit 5ac0b23
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion ophirofox/content_scripts/europresse_search.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,17 @@ async function onLoad() {
const search_terms = await consumeSearchTerms();
if (!search_terms) return;
const stopwords = new Set(['d', 'l', 'et', 'sans']);

/*
L = { Lu , Ll , Lt , Lm , Lo }
M = { Mn , Mc , Me }
Nd: a decimal digit
Unicode specification: https://www.unicode.org/reports/tr44/#General_Category_Values
Categories browser: https://www.compart.com/fr/unicode/category
*/
const keywords = search_terms
.replace(/œ/g, 'oe')
.split(/[^\p{L}]+/u)
.split(/[^\p{L}\p{M}\p{Nd}]+/u)
.filter(w => !stopwords.has(w))
.join(' ');
const keyword_field = document.getElementById("Keywords");
Expand Down

0 comments on commit 5ac0b23

Please sign in to comment.