-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactor documentation API Reference for gensim.summarization #1709
Merged
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
1c6009c
Added docstrings in textcleaner.py
yurkai 851b02c
Merge branch 'develop' into fix-1668
menshikh-iv 5cbb184
Added docstrings to bm25.py
yurkai 31be095
syntactic_unit.py docstrings and typo
yurkai c6c608b
added doctrings for graph modules
yurkai d5247c1
keywords draft
yurkai 3031cd0
keywords draft updated
yurkai 4d7b0a9
keywords draft updated again
yurkai 2c8ef28
keywords edited
yurkai 254dce7
pagerank started
yurkai a2c2102
pagerank summarizer docstring added
yurkai 1a87934
fixed types in docstrings in commons, bm25, graph and keywords
yurkai 0ca8332
fixed types, examples and types in docstrings
yurkai ed188ae
Merge branch 'develop' into fix-1668
menshikh-iv 20b19d6
fix pep8
menshikh-iv 6ec29bf
fix doc build
menshikh-iv e2a2e60
fix bm25
menshikh-iv d7056e4
fix graph
menshikh-iv 400966c
fix graph[2]
menshikh-iv 44f617c
fix commons
menshikh-iv d2fed6c
fix keywords
menshikh-iv 84b0f3a
fix keywords[2]
menshikh-iv ba8b1b6
fix mz_entropy
menshikh-iv 2a283d7
fix pagerank_weighted
menshikh-iv 6bd1584
fix graph rst
menshikh-iv 7ec89fa
fix summarizer
menshikh-iv fa5efce
fix syntactic_unit
menshikh-iv 0014d88
fix textcleaner
menshikh-iv 1a0166a
fix
menshikh-iv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -199,6 +199,36 @@ def _format_results(_keywords, combined_keywords, split, scores): | |
|
||
def keywords(text, ratio=0.2, words=None, split=False, scores=False, pos_filter=('NN', 'JJ'), | ||
lemmatize=False, deacc=True): | ||
""". | ||
|
||
Parameters | ||
---------- | ||
text : str | ||
Sequence of values. | ||
ratio : float | ||
If no "words" option is selected, the number of sentences is | ||
reduced by the provided ratio, else, the ratio is ignored. | ||
words : list | ||
. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to add descriptions to parameters |
||
split : bool | ||
. | ||
scores : bool | ||
. | ||
pos_filter : tuple | ||
Part of speech filters. | ||
lemmatize : bool | ||
Lemmatize words, optional. | ||
deacc : bool | ||
Remove accentuation, optional. | ||
|
||
Returns | ||
------- | ||
Graph | ||
Created graph. | ||
|
||
""" | ||
|
||
|
||
# Gets a dict of word -> lemma | ||
text = to_unicode(text) | ||
tokens = _clean_text_by_word(text, deacc=deacc) | ||
|
@@ -233,6 +263,20 @@ def keywords(text, ratio=0.2, words=None, split=False, scores=False, pos_filter= | |
|
||
|
||
def get_graph(text): | ||
"""Creates and returns graph with given text. Cleans, tokenizes text | ||
before creating a graph. | ||
|
||
Parameters | ||
---------- | ||
text : str | ||
Sequence of values. | ||
|
||
Returns | ||
------- | ||
Graph | ||
Created graph. | ||
|
||
""" | ||
tokens = _clean_text_by_word(text) | ||
split_text = list(_tokenize_by_word(text)) | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is
optional
too