Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Natural Language Processing Suggestions #227

Open
bencooper222 opened this issue Jul 13, 2016 · 1 comment
Open

Natural Language Processing Suggestions #227

bencooper222 opened this issue Jul 13, 2016 · 1 comment

Comments

@bencooper222
Copy link

While it is certainly possible for Chicago to use OpenNLP to build out it's own full natural language processing systems, I'm not sure if that's wise. With the advent of chatbots, NLP is about to make some huge strides and it looks like progress will mostly be concentrated among Google, Microsoft, Amazon, IBM Watson & Apple. At the moment, Microsoft Cognitive Services and IBM Watson are the most mature and it seems like it would be most wise to use those so you can utilize the progress they will undoubtedly make. Not saying Chicago couldn't make it's own NLP - but it almost certainly couldn't improve it at the same rate as the big cloud providers could.

@tomschenkjr
Copy link
Contributor

Adding some further thoughts on this issue and how it can be tackled:

The advanced query consists of some basic parameters:

  • Range of dates
  • Selecting data sources (and criteria for each source)
  • Select location parameters

Natural Language Processing (e.g., OpenNLP) can identify these principal components of the query.

Example syntax and the resulting queries.

  • 911 calls in Rogers Park

Resulting query: Dataset == 911p AND Community Area == Rogers Park

  • Burglaries in Rogers Park

Though similar to previous example, this can be more complex. Burglaries could correspond to burglaries filed in the Crimes dataset or could be related to 911 calls received about burglaries. We should over-identify

In the absence of specific dates, the application could rely upon our current protocol to displaying a fixed number (e.g., 6,000) of the most recent data points.

Resulting query: Dataset == Crimes AND Dataset == 911p WHERE Primary Description == Burglaries AND Community Area == Rogers Park

  • Tweets around me

Resulting query: Dataset == Twitter AND geoWithin: {center, ([current location])}

  • Tweets about Chicago Bulls on May 20, 2015

Resulting query: Dataset == Twitter WHERE Twitter.text == "Chicago Bulls" AND Date == "2015-05-20"

  • Buses around City Hall

Resulting query: Dataset == CTA AND geoWithin: {center, (41,8657, -87.7611)}

Developing and testing

Testing the NLP feature can be done against the developer API and by referencing the corresponding API docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants