Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcard search didn't work as expected #30

Open
tleyden opened this issue May 14, 2017 · 6 comments
Open

Wildcard search didn't work as expected #30

tleyden opened this issue May 14, 2017 · 6 comments

Comments

@tleyden
Copy link

tleyden commented May 14, 2017

Disclaimer: I didn't read the documentation :-)

I searched for:

find
    {name: ~= "geo*"}
return
    .

and got results:

{
  "_id": "14153",
  "cast": [],
  "episodes": [
    {
      "airdate": "2016-03-05",
      "airtime": "07:00",
      "name": "Funky Feathers",
      "number": 1,
      "runtime": 120,
      "season": 1,
      "summary": "<p>Brainteasers, wacky animal facts, hip-hopping birds, animated adventures, and a cup of Joe with your favorite vet.</p>",
      "url": "http://www.tvmaze.com/episodes/650336/nat-geo-wild-kids-1x01-funky-feathers"
    },
    {
      "airdate": "2016-03-12",
      "airtime": "07:00",
      "name": "Jungle Jamboree",
      "number": 2,
      "runtime": 120,
      "season": 1,
      "summary": "<p>Explore bizarre creatures in Wonderfully Weird; get inspired by the wildlife rescue team on Bandit Patrol; special guests in Dr. Pol Coffee Breaks; cute and cuddly animal buddies in Unlikely Animal Friends.</p>",
      "url": "http://www.tvmaze.com/episodes/650337/nat-geo-wild-kids-1x02-jungle-jamboree"
    },

Was expecting only results with "Geo*" in the name, like "George".

@vmx
Copy link
Member

vmx commented May 14, 2017

First of all, wildcard search is not supported yet. Now to the details:

What your show where are the episodes. It matched on the title of the show. If we return that you'll get:

find
    {name: ~= "geo*"}
return
    .name

Which returns

"Nat Geo Wild Kids"
"Geo Bee"

What happened with geo* is that it got stemmed to geo and hence matches those seen above.

@OSHistory
Copy link

First of. Great project. I am thinking about using it as a backend
for my mainly text-based research. So I am also interested in the
issue.

Are wildcards or regex on the roadmap? Perhaps you could also
shortly elaborate on the following:
Which stemmer is used? (and for which language)
Best way to proceed when trying to glob or regex?

thx

@vmx
Copy link
Member

vmx commented Apr 19, 2018

@OSHistory Wildcards are on the roadmap, but sadly there's a huge lack of time, hence I don't know when this will happen.

The stemmer currently used is just a Rust wrapper around Snowball. We don't do any language specific things yet, so you get whatever Snowball does.

Adding wildcard/regex is non-trivial. Perhaps @Damienkatz could give a brief overview on what he had in mind in regards to that.

@OSHistory
Copy link

Thanks for the reply. I can imagine that regex implementation is a huge task
to implement. I think i would be happy if the snowball-stemmer would support something
else than english. And indeed in stem.rs one can simply change the language.

It compiles fine, however, due to no
rust experience I am a little bit lost on how to include it in my local npm
installation to test on my sample data which is in german. Would be gratefull
on hints as how to do it or where to start.

Perhaps an option to specify a language for the stemmer on index
creation might substantially increase flexibility for non-english use cases?
Something along the lines of:

let index = noise.open("myindex", true, { "lang": "german" });

Most use cases should operate on a single language.

@vmx
Copy link
Member

vmx commented Apr 19, 2018

@OSHistory Could you please open another issue for supporting other languages as an option? This way it won't get lost that easily.

@OSHistory
Copy link

@vmx sure i was thinking the same thing while writing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants