Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python docstring counted as code #185

Closed
adah1972 opened this issue Feb 4, 2018 · 4 comments
Closed

Python docstring counted as code #185

adah1972 opened this issue Feb 4, 2018 · 4 comments
Labels

Comments

@adah1972
Copy link

adah1972 commented Feb 4, 2018

For this piece of simple Python code:

def test():
    """
    Test docstring.
    """
    print('Hello World!')

sloccount generates the correct result:

Total Physical Source Lines of Code (SLOC)                = 2

tokei gives:

-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Python                  1            5            5            0            0
-------------------------------------------------------------------------------
 Total                   1            5            5            0            0
-------------------------------------------------------------------------------

For more information about Python docstring, check out:

https://www.python.org/dev/peps/pep-0257/

@XAMPPRocky
Copy link
Owner

Thank you for this issue! Python docstrings are considered code as syntactically they are strings and it requires parsing the code into a Abstract Syntax Tree in order to correctly to determine whether the given """hello world""" is actually code or comments. This might hopefully be resolved when there is a good solution to #67 and there can then be an option for you as the user to decide whether to treat them as code or comments, until then I'm marking this as wontfix.

@olivren
Copy link

olivren commented Feb 22, 2019

If docstrings are not counted, then tokei is very misleading for python files: docstrings are ubiquitous, and generally make up for the major volume of the comments in a file.

Triple quoted strings are used both as docstrings, and as multiline literal strings in code, this is very true. But docstrings are very common, while multiline literal strings are rare. I think the default should be to count the triple quoted strings as comments.

If you want to be more precise, parsing the AST would be overkill. A good heuristic is to consider a docstring to be a triple-quoted string that appears at the start of a line (ignoring the blanks). That would exclude most usages of multiline literal strings, like this one:

text = """hello
world
"""

I you really don't want to fix this, I would advocate for at least printing a warning in the output when a python file is found.

@XAMPPRocky
Copy link
Owner

@olivren You can now set an option in your .tokeirc configuration file, to have docstrings counted as comments. I plan to keep the current default, tokei will always be somewhat inaccurate for certain languages. I don't plan to add those kinds of heuristics as I'd prefer to keep the way tokei counts as deterministic as possible.

@olivren
Copy link

olivren commented Feb 22, 2019

Thanks for your answer. I missed this configuration flag, and once activated it indeed gives a correct count. I am still not convinced by the current default, but at least there is a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants