Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reported directory size always 0 #4580

Closed
dokterbob opened this issue Jan 14, 2018 · 19 comments
Closed

Reported directory size always 0 #4580

dokterbob opened this issue Jan 14, 2018 · 19 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@dokterbob
Copy link
Contributor

go-ipfs version: 0.4.13-
Repo version: 6
System version: amd64/linux
Golang version: go1.9.2

Type: Bug

Severity: Medium

Description:

The size of directories, returned from the API server, is always 0 - instead of the sum of the directory's contents.

Example:

$ ipfs --enc=json file ls /ipfs/QmYcutogfHrrxigzVw2Vi8P1cdiuXNn9VecL9cy95wB9MP

Result:

{
  "Arguments": {
    "/ipfs/QmYcutogfHrrxigzVw2Vi8P1cdiuXNn9VecL9cy95wB9MP": "QmYcutogfHrrxigzVw2Vi8P1cdiuXNn9VecL9cy95wB9MP"
  },
  "Objects": {
    "QmYcutogfHrrxigzVw2Vi8P1cdiuXNn9VecL9cy95wB9MP": {
      "Hash": "QmYcutogfHrrxigzVw2Vi8P1cdiuXNn9VecL9cy95wB9MP",
      "Size": 0,
      "Type": "Directory",
      "Links": [
        {
          "Name": "reindex",
          "Hash": "Qmac74sj9SgTVVVHFgWrFL6wyUQ8k8sgK4LYcEPTNoz3jW",
          "Size": 18132,
          "Type": "Directory"
        }
      ]
    }
  }
}

Expected size: 18132, plus actual directory data (i.e. filenames).

@Stebalien Stebalien added the kind/bug A bug in existing code (including security flaws) label Jan 28, 2018
@schomatis
Copy link
Contributor

@Stebalien Is this a relevant issue to work on?

The help message from ipfs file ls says:

This functionality is deprecated, and will be removed in future versions. If
possible, please use 'ipfs ls' instead.

But from what I'm seeing debugging the code the reconstructed unixFSNode has its Filesize set to nil and hence it's returning 0, so this may be a problem worth investigating further.

@dokterbob
Copy link
Contributor Author

dokterbob commented Feb 27, 2018

@schomatis I'd love to, but as of now they are not functionally equivalent.

For one, ipfs ls --enc=json gives back a number as the type (which I suppose, can be overcome). For another, and this is a big point for us at ipfs-search.com, it does not tell us the type of the requested file.

Example output ipfs ls:

{
  "Objects": [
    {
      "Hash": "/ipfs/QmYcutogfHrrxigzVw2Vi8P1cdiuXNn9VecL9cy95wB9MP",
      "Links": [
        {
          "Name": "reindex",
          "Hash": "Qmac74sj9SgTVVVHFgWrFL6wyUQ8k8sgK4LYcEPTNoz3jW",
          "Size": 18132,
          "Type": 1
        }
      ]
    }
  ]
}

Related: #3081

@schomatis
Copy link
Contributor

The problem seems to be the way in which the root object size is obtained. Normally the size of the directory is obtained from its parent's link size, but the root directory doesn't have a parent with that information.

mkdir dir
echo "File contents" > dir/file.txt

# Add `dir` inside another wrapper (unnamed) directory (`-w` option)
ipfs add -r -w dir

# added QmZJe7iPsYf54aaDQroXRVuUDAJAB2nG3Vktatjnip4xyi dir/file.txt
# added QmajKHPHsR5eYTqRztZt9rqN2syktiDVLQWaLxPgitThTw dir
# added QmXHGhHmqr27yJrTm98Qi26Aozag9GBa5NuvJhCx7WF6Q8 

# List directory `dir`:
ipfs --enc=json file ls QmajKHPHsR5eYTqRztZt9rqN2syktiDVLQWaLxPgitThTw | jq

# [...]
# "Hash": "QmajKHPHsR5eYTqRztZt9rqN2syktiDVLQWaLxPgitThTw",
# "Size": 0,
# "Type": "Directory",
# [...]

# List wrapper directory that contains `dir`:
ipfs --enc=json file ls QmXHGhHmqr27yJrTm98Qi26Aozag9GBa5NuvJhCx7WF6Q8 | jq

# [...]
# "Hash": "QmXHGhHmqr27yJrTm98Qi26Aozag9GBa5NuvJhCx7WF6Q8",
# "Size": 0,
# "Type": "Directory",
# [...]
#     "Name": "dir",
#     "Hash": "QmajKHPHsR5eYTqRztZt9rqN2syktiDVLQWaLxPgitThTw",
#     "Size": 76,
# [...]

In the second listing the size of the dir is correctly obtained from its parent node (unnamed directory) link size. The root object (LsObject) size is obtained from GetFilesize, i.e., the node's pointer to Filesize, which seems to be set for files but not for directories (that's why listing a single file as the root object does display the correct size).

Later in the code the child nodes (LsLink) sizes are correctly handled analyzing whether the child is a file (looking for the size with GetFilesize) or a directory (looking for the size in the parent's link size).

@Stebalien
Copy link
Member

Stebalien commented Mar 13, 2018

Sorry for missing this. Really, it would be nice to get file types into ipfs ls but that would have severe performance implications. We're planning on adding this information to directory entries later but it'll be a while till we do that.

So, @schomatis, if you'd like to fix this, go ahead. Your diagnosis sounds correct.

edit: apparently, I need to pay more attention. We do include the type, we just return it as a number instead of as as string (for some unknown reason...).

@dokterbob
Copy link
Contributor Author

dokterbob commented Mar 13, 2018

@Stebalien Note that by removing the existing types in listings, as with ipfs file ls, there remains not a single way to figure out - through the API or command line - whether a hash is a directory, a file or some other type - other than a hardcoded test for the first 2 data bytes of the object.

I really think there should be some abstraction in the API/command line of IPFS to figure out what kind of information we're dealing with. It doesn't need to be in the ls command but could be in something like object stat or have it's own endpoint/subcommand altogether.

Lastly; as, currently, the 'file type' of an object is in the first 2 bytes of the object, where do you see an unacceptable performance impact on the ls command?

Context:

$ ipfs object get /ipfs/QmYcutogfHrrxigzVw2Vi8P1cdiuXNn9VecL9cy95wB9M
{
  "Links": [
    {
      "Name": "reindex",
      "Hash": "Qmac74sj9SgTVVVHFgWrFL6wyUQ8k8sgK4LYcEPTNoz3jW",
      "Size": 18132
    }
  ],
  "Data": "\b\u0001"
}

@Stebalien
Copy link
Member

Lastly; as, currently, the 'file type' of an object is in the first 2 bytes of the object, where do you see an unacceptable performance impact on the ls command?

The problem is that you have to actually fetch the object. Listing the names in a directory requires only fetching the directory object. Listing the types as well requires fetching both the directory and the root block of all of the files in the directory.

However, we should probably have some kind of ls -l option.

Note: we do actually have a stat you can use: ipfs files stat /ipfs/.... That will tell you if something is a directory or a file.

Also note: I agree that not having a decent ls sucks.

@schomatis
Copy link
Contributor

So, @schomatis, if you'd like to fix this, go ahead. Your diagnosis sounds correct.

I'm not sure what the fix should be. I can copy the Size() functionality from the ipfs files stat command and list the cumulative size for the directory (that right now is listing zero), but (if I understand correctly) that includes the size of the metadata encoded in the node, which is not the same as Filesize (actually ipfs files stat has the same problem as this command, it uses GetFilesize that returns zero for directories). Also I fear I'll just be duplicating code.

My main confusion here stems from the fact that I have trouble discerning when are we talking about the size of a unix file or the size of the underlying node structure that represents it.

@dokterbob Is the ipfs files stat suggestion adequate for ipfs-search.com to get the file type? If not, please create a separate issue to discuss this feature, I'm fine with adding the ls -l (with the performance implications already explained).

@dokterbob
Copy link
Contributor Author

@schomatis I would really love the ls -l as this is closest to the currently existing functionality - but ipfs files stat would be sufficient as well (the advantage being that ls -l saves an additional request in case it's a directory.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 16, 2018

As a note, ipfs ls mirrors the ls unix command. The directory size in ls is size of directory inode and not cumulative size of directory tree.

@Kubuxu Kubuxu closed this as completed Mar 16, 2018
@Kubuxu Kubuxu reopened this Mar 16, 2018
@schomatis
Copy link
Contributor

@Kubuxu Thanks for the clarification, so what would be the closest to a directory inode size here? As mentioned before Filesize doesn't seem to be set for directories, one possibility is to just print the size of the underlying ProtoNode that represents the directory, WDYT?

@Kubuxu
Copy link
Member

Kubuxu commented Mar 16, 2018

There are three options: 1. nothing 2. underlying node 3. resolve file nodes and sum their CumulativeSizes. 1 and 2 seem optimal for me in case of ipfs ls command.

@schomatis
Copy link
Contributor

@Kubuxu Well, number 1 is what started this issue in the first place, so I'll go with number 2, thanks.

@dokterbob
Copy link
Contributor Author

dokterbob commented Mar 16, 2018 via email

@schomatis
Copy link
Contributor

@dokterbob

currently it displays the cumulative size for subdirectories AFAIK.

Yes. Actually, in the Node interface what Stat() calls CumulativeSize is obtained from the Size() API (at least in ProtoNode) which is rather confusing.

Wouldn't it then make sense to do the same for the directory itself? (As in, recursive consistency.)

Yes, to honor the unix API I'll have to list in both cases the underlying node's size.

In any case, the cumulative size seems more practically useful to me; I find it hard to imagine a use case for knowing the directory node size.

Agreed, please open another issue to discuss the possibility of adding another flag to this command to list cumulative sizes (or maybe the addition of a du command).

@dokterbob
Copy link
Contributor Author

dokterbob commented Mar 22, 2018 via email

@Stebalien
Copy link
Member

So my edit doesn't get lost in the noise, I read your comments about including the type as "the type is missing" not "the type is an int". Apparently, we do return the type (and do take the performance hit of fetching the root nodes). Really, I don't see why this command returns a number instead of a string.

@dokterbob
Copy link
Contributor Author

I would really really love to have the cumulative size for directories available - just like the web interface. This is what users will be expecting and it gives a lot of actual information whereas the size of the actual node is very rarely ever relevant here.

@schomatis
Copy link
Contributor

In any case, the cumulative size seems more practically useful to me; I find it hard to imagine a use case for knowing the directory node size.

Agreed, please open another issue to discuss the possibility of adding another flag to this command to list cumulative sizes (or maybe the addition of a du command).

:)

@dokterbob
Copy link
Contributor Author

Superseded by #5804, related: #3081.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

4 participants