Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fuzzy string searching/matching for areas #101

Open
justincy opened this issue Dec 19, 2013 · 6 comments
Open

Enable fuzzy string searching/matching for areas #101

justincy opened this issue Dec 19, 2013 · 6 comments
Labels

Comments

@justincy
Copy link

Searching by string prefixes is great but not nearly as useful as fuzzy string searching would be. Fuzzy string searching makes it possible to geocode with data that includes misspellings and variations.

Read this question on GIS.SE for more background on this request.

@chris48s
Copy link
Contributor

chris48s commented May 4, 2015

I also think this would be a really useful feature and is not that difficult to implement, so I decided to add it.
I've coded a solution to this which is currently sitting on my fork of MapIt:
chris48s@e2ca3f9

I've not submitted it as a pull request (yet) because I've used the djorm-ext-pgtrgm package ( https://pypi.python.org/pypi/djorm-ext-pgtrgm/0.3 ), which adds support for queries using the pg_trgm extension to Django's ORM. Unfortunately this package doesn't yet support Django 1.8 and I don't want to submit a pull request which would break MapIt's support for Django 1.8. That said, I've submitted a pull request to the djorm-ext-pgtrgm project ( jleivaizq/djorm-ext-pgtrgm#17 ) proposing a fix to resolve the Django 1.8 compatibility issue - so I will keep an eye on that. Hopefully the author will merge it and update the package. If/when that happens, I'll submit a pull request for this feature.

A few more notes on what I've done:
My solution to this would add a couple of extra steps/dependencies to the mapit setup process:

  1. CREATE EXTENSION pg_trgm;
    This enables the PostgreSQL trigram matching extension on your DB, allowing you to run queries using the 'similarity' function.
  2. pip install djorm-ext-pgtrgm
    This installs the djorm-ext-pgtrgm package which adds support for queries using the pg_trgm extension to Django's ORM.

I've mapped this to the endpoint /fuzzyareas as a suggestion, but you could change the endpoint if you wanted.

This would allow MapIt to tolerate spelling mistakes/partial input e.g:
/fuzzyareas/romley.html
/fuzzyareas/notingham.html
/fuzzyareas/edinbrogh.html
and receive sensible suggestions.

Results are returned sorted by relevance.

Let me know if you're interested in taking this forward.

@chris48s
Copy link
Contributor

djorm-ext-pgtrgm has been updated for Django 1.8 compatibility but pypi version is some way behind version on github.
Proposed solution: pull into repository using git submobule instead of pip install djorm-ext-pgtrgm
Will submit pull request on this basis.

@dracos
Copy link
Member

dracos commented May 22, 2015

Thanks for looking at this :) Some random notes as I take a look:
Your PR appears to have a couple of unrelated commits included :)
This looks to need the postgres-contrib package on Debian/Ubuntu.
Does an index need to be manually added to the column for performance?
MapIt also works on Python 3, which going by jleivaizq/djorm-ext-pgtrgm#6 isn't yet true for djorm-ext-pgtrgm, though if that's all that's needed, that's not much.
In SayIt, rather than a submodule we uploaded our own fork (prefixed) of a third party module to PyPI, which is also then useful for other people wishing to use a more up-to-date version, so that might be an alternative way to go there.
URL - do we want a separate endpoint, or do we want a parameter (e.g. '?fuzzy=true') to the /areas/ endpoint? Not sure if one is preferable.

This would certainly be nice to have, I think :)

@chris48s
Copy link
Contributor

Cheers – once I've spent some time working through this I'll get back to you on the points you've raised here.

@chris48s
Copy link
Contributor

This looks to need the postgres-contrib package on Debian/Ubuntu.

Apologies for overlooking this. I think I missed it because postgresql-contrib is a recommended package for PostGIS ( http://packages.ubuntu.com/trusty/postgresql-9.3-postgis-2.1 ). Ubuntu installs recommended packages by default so it will usually be installed in the setup process for MapIt anyway - so I don’t think this point requires any additional action, unless I have missed your point.

Does an index need to be manually added to the column for performance?

Full text queries can be performed in Postgres without an index (unlike MySQL for example where you must create a full text index before a full text query can be performed). Obviously using an index will be faster. You can create a suitable index using a command like:

CREATE INDEX trgm_idx ON mapit_area USING gin(name gin_trgm_ops);

GIN being the more sensible choice than GiST for this situation. I don’t think you can create an index like that through Django so I guess this would be another manual step, unfortunately.

MapIt also works on Python 3, which going by jleivaizq/djorm-ext-pgtrgm#6 isn't yet true for djorm-ext-pgtrgm, though if that's all that's needed, that's not much.

I have done a quick test with a python 3 virtualenv to see if applying the change in jleivaizq/djorm-ext-pgtrgm#6 fixes this but it appears there are other problems. This may be why that PR is not merged. I need to spend a bit more time on this than I’ve been able to so far to work out the best approach from here. Perhaps I should take back that 'not that difficult to implement' comment :)

do we want a separate endpoint, or do we want a parameter (e.g. '?fuzzy=true')

A parameter is probably more sensible. I’ll modify urls.py accordingly when I next update the pull request.

@chris48s
Copy link
Contributor

Sorry it has taken a bit of time to get back to this.
Having looked a little deeper and tested things in isolation, I realise that the errors I was seeing in my Python 3.4 virtualenv were not anything to do with the djorm-ext-pgtrgm module. They were actually down to the issue that you corrected in 72716db and pulling this commit on to my branch fixed those errors.
Additionally, jleivaizq/djorm-ext-pgtrgm#6 is no longer relevant as the only call to reduce was removed in jleivaizq/djorm-ext-pgtrgm#17
As such, the djorm-ext-pgtrgm module is compatible with Django 1.8 and Python 3 as it stands in the jleivaizq/djorm-ext-pgtrgm repository and no additional work is needed on that - which is useful.

I don't know if that influences your view on whether it is best to pull the library in as a submodule or if you would still prefer to fetch the module from pypi using pip?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants