Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions for natural language processing #472

Merged
merged 6 commits into from
Sep 5, 2016
Merged

Conversation

poke1024
Copy link
Contributor

@poke1024 poke1024 commented Aug 4, 2016

Opening a PR for this since I believe this is pretty complete, apart from some final QA passes.

Depends heavily on third party libraries, but thanks to require will handle this gracefully if packages are not installed.

This whole thing is obviously somewhat experimental, but so is most of this functionality in MMA.

Everything tries to resemble MMA functions. Noteworthy functions:

ExperimentalWordSimilaritydoes not exist in MMA. Yet, it's extremely powerful for semantic analysis so I couldn't help including it, though putting it in thisExperimental` context.

TextStructure gives slightly different results than MMA (it's not the same parser after all), and it currently only supports the ConstituentString mode, since everything else relies of Assocations which we do not support (yet).

I'm not sure what a good place for the installation notes (regarding pattern and omw) in the top of the file would look like. Maybe somewhere in the docs?

def _status_message(text, evaluation):
# currently this uses "print" as everything else interferes with the test cases.
# FIXME find a better solution that is clean and works with web based notebooks.
print('# ' + text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluation.print_out?

@sn6uv
Copy link
Member

sn6uv commented Aug 23, 2016

One thing to fix, if a specific nltk dataset is missing this crashes. For example:

In[1]:= WordFrequencyData["the"]
# Loading "English" language data. This might take a moment.
Out[1]= 0.02934108190860026

In[2]:= WordDefinition["dog"]
# Loading "English" word data. Please wait.
Traceback (most recent call last):
  File "/home/angus/venv/lib/python3.5/site-packages/nltk/corpus/util.py", line 63, in __load
    try: root = nltk.data.find('corpora/%s' % zip_name)
  File "/home/angus/venv/lib/python3.5/site-packages/nltk/data.py", line 641, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'corpora/omw.zip/omw/' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/home/angus/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mathics/main.py", line 289, in <module>
    main()
  File "mathics/main.py", line 277, in main
    result = evaluation.evaluate(query, timeout=settings.TIMEOUT)
  File "/home/angus/Mathics/mathics/core/evaluation.py", line 230, in evaluate
    result = run_with_timeout(evaluate, timeout)
  File "/home/angus/Mathics/mathics/core/evaluation.py", line 63, in run_with_timeout
    return request()
  File "/home/angus/Mathics/mathics/core/evaluation.py", line 213, in evaluate
    result = query.evaluate(self)
  File "/home/angus/Mathics/mathics/core/expression.py", line 849, in evaluate
    result = rule.apply(new, evaluation, fully=False)
  File "/home/angus/Mathics/mathics/core/rules.py", line 74, in apply
    yield_match, expression, {}, evaluation, fully=fully)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 206, in match
    yield_head, expression.get_head(), vars, evaluation)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 135, in match
    yield_func(vars, None)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 198, in yield_head
    yield_choice, expression, attributes, head_vars)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 321, in get_pre_choices
    yield_func(vars)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 187, in yield_choice
    wrap_oneid=expression.get_head_name() != 'System`MakeBoxes')
  File "/home/angus/Mathics/mathics/core/pattern.py", line 478, in match_leaf
    include_flattened=include_flattened)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 342, in get_wrappings
    yield_func(items[0])
  File "/home/angus/Mathics/mathics/core/pattern.py", line 474, in yield_wrapping
    leaf_count=leaf_count, wrap_oneid=wrap_oneid)
  File "/home/angus/Mathics/mathics/builtin/patterns.py", line 660, in match
    self.pattern.match(yield_func, expression, new_vars, evaluation)
  File "/home/angus/Mathics/mathics/builtin/patterns.py", line 843, in match
    yield_func(vars, None)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 466, in match_yield
    leaf_count=leaf_count, wrap_oneid=wrap_oneid)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 478, in match_leaf
    include_flattened=include_flattened)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 353, in get_wrappings
    yield_func(sequence)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 474, in yield_wrapping
    leaf_count=leaf_count, wrap_oneid=wrap_oneid)
  File "/home/angus/Mathics/mathics/builtin/patterns.py", line 1201, in match
    yield_func(new_vars, None)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 469, in match_yield
    yield_func(new_vars, items_rest)
  File "/home/angus/Mathics/mathics/core/pattern.py", line 458, in leaf_yield
    (rest_expression[0] + items_rest[0], next_rest[1]))
  File "/home/angus/Mathics/mathics/core/rules.py", line 40, in yield_match
    new_expression = self.do_replace(vars, options, evaluation)
  File "/home/angus/Mathics/mathics/core/rules.py", line 130, in do_replace
    evaluation=evaluation, options=options, **vars_noctx)
  File "/home/angus/Mathics/mathics/builtin/natlang.py", line 1023, in apply
    wordnet, language_code = self._load_wordnet(evaluation, self._language_name(evaluation, options))
  File "/home/angus/Mathics/mathics/builtin/natlang.py", line 927, in _load_wordnet
    if language_code not in wordnet.langs():
  File "/home/angus/venv/lib/python3.5/site-packages/nltk/corpus/reader/wordnet.py", line 1098, in langs
    fileids = self._omw_reader.fileids()
  File "/home/angus/venv/lib/python3.5/site-packages/nltk/corpus/util.py", line 99, in __getattr__
    self.__load()
  File "/home/angus/venv/lib/python3.5/site-packages/nltk/corpus/util.py", line 64, in __load
    except LookupError: raise e
  File "/home/angus/venv/lib/python3.5/site-packages/nltk/corpus/util.py", line 61, in __load
    root = nltk.data.find('corpora/%s' % self.__name)
  File "/home/angus/venv/lib/python3.5/site-packages/nltk/data.py", line 641, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'corpora/omw' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/home/angus/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

This can be fixed by installing all missing dataset but I think the best solution would be to wrap everything and catch/handle LookupError.

@poke1024
Copy link
Contributor Author

Spacy is currently down and you cannot install the language packages. Here's the deal:

https://www.spacy.io/blog/announcement

They have have an issue ticket on GitHub. Hope they can solve this soon.

@sn6uv
Copy link
Member

sn6uv commented Sep 2, 2016

Looks good. Are you done working on this?

@poke1024
Copy link
Contributor Author

poke1024 commented Sep 4, 2016

Yes, I'm done.

@sn6uv sn6uv merged commit ffa2e7a into mathics:master Sep 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants