Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] extractFilename("usr/lib/") should return "lib" (not "") and be called baseName (since works with dirs) #8341

Closed
timotheecour opened this issue Jul 17, 2018 · 11 comments · Fixed by #9116
Labels
OS (stdlib) Related to OS modules in standard library RFC Standard Library

Comments

@timotheecour
Copy link
Member

timotheecour commented Jul 17, 2018

/cc @dom96 this is relevant for #8315 (comment)

  • Most languages have a baseName function instead of extractFilename ; and I see extractFilename comes from Delphi, FreePascal, which are more "fringe" for Nim userbase (IMO) compared to the other ones.
    More importantly, extractFilename is not a good name because it works with directories, eg in "/home/bob", "bob" is not really a filename (it's a directory); that's why every other language uses basename.

  • path semantics can be ambiguous due to things like trailing / etc; it's annoying but we should find the most sensible semantics.

Currently extractFilename("usr/lib/") returns "" ; let's see how other languages handle this corner case:

these return "lib":

these return ".":

  • C++ boost::filesystem::basename or c++17 std::experimental::filesystem

these return "":

proposal: deprecate extractFilename (keeping its behavior so code won't break), and introduce baseName which would work as unix, C, D, node.js.

rationale:

  • directories can be specified either with trailing / or not (eg, in environment variables, and shouldn't affect their basename. eg: DESTDIR=/tmp/foo run_prog or DESTDIR=/tmp/foo/ run_prog should have same semantics
  • an actual directory name or file name can never be empty, but python's convention of returning empty "" breaks that assumption.
  • on posix, using C's basename convention, isHidden("./git/") EDIT isHidden(".git/") would be true which makes more sense; using python's convention, it would be false (with fix #8225 os.isHidden was buggy on posix #8315 implementation)

would a PR be accepted for that?

@Araq
Copy link
Member

Araq commented Jul 17, 2018

Meh, "basename" is a terrible name, might as well call it pathTail. And the fact that extractFilename "works on directories too", well it really doesn't, you're misusing it for pathTail. That's not a good reason for deprecating it.

@kaushalmodi
Copy link
Contributor

kaushalmodi commented Jul 17, 2018

baseName would be a familiar name as it's used by many shells too (bash, tcsh).

Another data point: https://www.gnu.org/software/emacs/manual/html_node/elisp/File-Name-Components.html

UPDATE: This means that it would be nice if extractFilename were renamed to basename, but really that's not a big deal. I'd prefer the current behavior of extractFilename to remain.

@timotheecour
Copy link
Member Author

not married to basename, but that what it's called in unix, C, D, node.js (and python, except with Nim's semantic) so ppl may be familiar with that already.
As for semantics, would you guys be ok w a function (whatever the name, pathTail or basename) that had semantics of C's basename?

@kaushalmodi
Copy link
Contributor

Continuing my previous comment, the Emacs basename would return "" too for "/foo/lib/".

@dom96
Copy link
Contributor

dom96 commented Jul 17, 2018

on posix, using C's basename convention, isHidden("./git/") would be true which makes more sense; using python's convention, it would be false (with #8315 implementation)

The Python result makes more sense to me. . means current directory, so to me that's not a hidden path.

More importantly, extractFilename is not a good name because it works with directories, eg in "/home/bob", "bob" is not really a filename (it's a directory); that's why every other language uses basename.

I disagree with this too. "bob" in this context could very well be a file thanks to how binaries are named on Unix. So IMO extractFilename is perfectly fine, and unless there is a real use case for basename's semantics I am rejecting this RFC.

@kaushalmodi
Copy link
Contributor

@timotheecour Emacs has a strong requirement (seen in many functions, API) that directory names always end in "/". In Unix systems, a directory is a "file" too! Semantically everything is a "file". So sticking to a "/"-ending is the best way to distinguish directories.

So in response to your rationale section:

directories can be specified either with trailing / or not (eg, in environment variables, and shouldn't affect their basename. eg: DESTDIR=/tmp/foo run_prog or DESTDIR=/tmp/foo/ run_prog should have same semantics

That's a bad idea. Then we cannot distinguish whether an input arg is a file or a directory. How about having something as file-name-as-directory? That will convert the env var parsed directory names to ones ending in "/".

an actual directory name or file name can never be empty,

If the extractFilename is returning "", it means that you are looking at a directory name, not a "Filename". So it makes perfect sense to me. It can be used to check of the input argument is a directory (if that code is using a strong convention of always ending directory names in "/").

@timotheecour
Copy link
Member Author

timotheecour commented Jul 17, 2018

So sticking to a "/"-ending is the best way to distinguish directories.
It can be used to check of the input argument is a directory (if that code is using a strong convention of always ending directory names in "/").

@kaushalmodi unfortunately there are tons of code that doesn't follow this convention, so we can't use that logic

@dom96 I just fixed a typo in top post, I meant isHidden(".git/") ; which IMO should be true (as in java.io.File.isHidden). Would you still argue this should be false?

Anyway looks like these 2 statements are contradicting:

And the fact that extractFilename "works on directories too", well it really doesn't, you're misusing it for pathTail

extractFilename is perfectly fine, and unless there is a real use case for basename's semantics [...]

If we're not introducing basename/pathTail, other possibilities are:

  • introducing stripTrailingSep (which just strips tailiing separator (if any) without further transformations on the input as normalizedPath does):
    doAssert stripTrailingSep("a/b/.///") == "a/b/."
  • add an optional arg to extractFilename: extractFilename(path: string, bool ignoreTrailingSep = false)

@Araq
Copy link
Member

Araq commented Jul 17, 2018

As for semantics, would you guys be ok w a function (whatever the name, pathTail or basename) that had semantics of C's basename?

import os

let (head, tail) = splitPath("/foo/bar")
echo head, " ", tail

@timotheecour
Copy link
Member Author

timotheecour commented Jul 17, 2018

the main issue is trailing /:

import os
let (head, tail) = splitPath("/foo/bar/")
echo head, ":", tail,":"
/foo/bar::

basename in unix, C, D, node.js would return bar, not "" as tail

@kaushalmodi
Copy link
Contributor

At least to me it seems to be a special case that can be dealt with:

import os, strformat

proc myBaseName(path: string): string =
  var (head, tail) = splitPath(path)
  if tail == "":
    (head, tail) = splitPath(head)
  return tail

let paths = ["/foo/bar/",
             "/foo/bar"]
for p in paths:
  echo fmt"path = `{p}', basename = `{p.myBaseName}'"

Outputs:

path = `/foo/bar/', basename = `bar'
path = `/foo/bar', basename = `bar'

@Araq
Copy link
Member

Araq commented Jul 18, 2018

Well we can fix/change splitPath.

timotheecour added a commit to timotheecour/Nim that referenced this issue Sep 29, 2018
timotheecour added a commit to timotheecour/Nim that referenced this issue Sep 29, 2018
@Araq Araq closed this as completed in #9116 Oct 9, 2018
Araq pushed a commit that referenced this issue Oct 9, 2018
krux02 pushed a commit to krux02/Nim that referenced this issue Oct 15, 2018
@timotheecour timotheecour added the OS (stdlib) Related to OS modules in standard library label Oct 30, 2018
narimiran pushed a commit to narimiran/Nim that referenced this issue Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS (stdlib) Related to OS modules in standard library RFC Standard Library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants