Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactor documentation API Reference for gensim.summarization #1709

Merged
merged 29 commits into from
Dec 12, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1c6009c
Added docstrings in textcleaner.py
yurkai Nov 12, 2017
851b02c
Merge branch 'develop' into fix-1668
menshikh-iv Nov 12, 2017
5cbb184
Added docstrings to bm25.py
yurkai Nov 13, 2017
31be095
syntactic_unit.py docstrings and typo
yurkai Nov 14, 2017
c6c608b
added doctrings for graph modules
yurkai Nov 16, 2017
d5247c1
keywords draft
yurkai Nov 17, 2017
3031cd0
keywords draft updated
yurkai Nov 20, 2017
4d7b0a9
keywords draft updated again
yurkai Nov 21, 2017
2c8ef28
keywords edited
yurkai Nov 22, 2017
254dce7
pagerank started
yurkai Nov 23, 2017
a2c2102
pagerank summarizer docstring added
yurkai Nov 25, 2017
1a87934
fixed types in docstrings in commons, bm25, graph and keywords
yurkai Nov 27, 2017
0ca8332
fixed types, examples and types in docstrings
yurkai Nov 28, 2017
ed188ae
Merge branch 'develop' into fix-1668
menshikh-iv Dec 11, 2017
20b19d6
fix pep8
menshikh-iv Dec 11, 2017
6ec29bf
fix doc build
menshikh-iv Dec 11, 2017
e2a2e60
fix bm25
menshikh-iv Dec 11, 2017
d7056e4
fix graph
menshikh-iv Dec 11, 2017
400966c
fix graph[2]
menshikh-iv Dec 11, 2017
44f617c
fix commons
menshikh-iv Dec 11, 2017
d2fed6c
fix keywords
menshikh-iv Dec 11, 2017
84b0f3a
fix keywords[2]
menshikh-iv Dec 11, 2017
ba8b1b6
fix mz_entropy
menshikh-iv Dec 11, 2017
2a283d7
fix pagerank_weighted
menshikh-iv Dec 12, 2017
6bd1584
fix graph rst
menshikh-iv Dec 12, 2017
7ec89fa
fix summarizer
menshikh-iv Dec 12, 2017
fa5efce
fix syntactic_unit
menshikh-iv Dec 12, 2017
0014d88
fix textcleaner
menshikh-iv Dec 12, 2017
1a0166a
fix
menshikh-iv Dec 12, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions gensim/summarization/commons.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
#
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html

"""This module provides functions of creatinf graph from sequence of values and
"""This module provides functions of creating graph from sequence of values and
removing of unreachable nodes.


Examples
--------

Create simple graph and add edges. Let's kake a look at nodes.
Create simple graph and add edges. Let's take a look at nodes.

>>> gg = build_graph(['Felidae', 'Lion', 'Tiger', 'Wolf'])
>>> gg.add_edge(("Felidae", "Lion"))
Expand Down
44 changes: 44 additions & 0 deletions gensim/summarization/keywords.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,36 @@ def _format_results(_keywords, combined_keywords, split, scores):

def keywords(text, ratio=0.2, words=None, split=False, scores=False, pos_filter=('NN', 'JJ'),
lemmatize=False, deacc=True):
""".

Parameters
----------
text : str
Sequence of values.
ratio : float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is optional too

If no "words" option is selected, the number of sentences is
reduced by the provided ratio, else, the ratio is ignored.
words : list
.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add descriptions to parameters

split : bool
.
scores : bool
.
pos_filter : tuple
Part of speech filters.
lemmatize : bool
Lemmatize words, optional.
deacc : bool
Remove accentuation, optional.

Returns
-------
Graph
Created graph.

"""


# Gets a dict of word -> lemma
text = to_unicode(text)
tokens = _clean_text_by_word(text, deacc=deacc)
Expand Down Expand Up @@ -233,6 +263,20 @@ def keywords(text, ratio=0.2, words=None, split=False, scores=False, pos_filter=


def get_graph(text):
"""Creates and returns graph with given text. Cleans, tokenizes text
before creating a graph.

Parameters
----------
text : str
Sequence of values.

Returns
-------
Graph
Created graph.

"""
tokens = _clean_text_by_word(text)
split_text = list(_tokenize_by_word(text))

Expand Down