Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The list children api doesn't return all children #740

Closed
gilbertchen opened this issue Nov 20, 2017 · 7 comments
Closed

The list children api doesn't return all children #740

gilbertchen opened this issue Nov 20, 2017 · 7 comments

Comments

@gilbertchen
Copy link

gilbertchen commented Nov 20, 2017

A Duplicacy user reported a bug that seemed to be caused by the list children api failing to return all files under a directory. Specifically, Duplicacy tried to list all the chunk files under chunks and the sequence of API calls seemed normal:

GET https://api.onedrive.com/v1.0/drive/root:/backup.pictures/chunks:/children?top=1000&select=name,size,folder
GET https://api.onedrive.com/v1.0/drives('me')/items('root%252Fbackup.pictures%252Fchunks')/children?$top=1000&$select=name,size,folder&$skiptoken=MTAwMQ
GET https://api.onedrive.com/v1.0/drives('me')/items('root%252Fbackup.pictures%252Fchunks')/children?$top=1000&$select=name,size,folder&$skiptoken=MjAwMQ
GET https://api.onedrive.com/v1.0/drives('me')/items('root%252Fbackup.pictures%252Fchunks')/children?$top=1000&$select=name,size,folder&$skiptoken=MzAwMQ
...
GET https://api.onedrive.com/v1.0/drives('me')/items('root%252Fbackup.pictures%252Fchunks')/children?$top=1000&$select=name,size,folder&$skiptoken=MzQwMDE
GET https://api.onedrive.com/v1.0/drives('me')/items('root%252Fbackup.pictures%252Fchunks')/children?$top=1000&$select=name,size,folder&$skiptoken=MzUwMDE

However, some chunks were included in the responses more than once, while some others were never returned. Moreover, the chunk names changed significantly between pages:

hostname:~ $ cat dup-foo-7 | grep Chunk: | head -1005
[lines removed]
Chunk: baf2cbe825454f31be275edc5c42911cf162a1a4534e287d1d3da5d815330861
Chunk: baf3e384e3d5273b2cfc55a88e49caa98cafc80ab2c4767b1760609657d252de
Chunk: baf74aaf4ddbdd79902061abc02827e15a59cbc15f0f4b1be16957e1203f5454
Chunk: baf90adae45ef6ece7b474dc3a8bcbbff5d56e2aa217f380292e12c30860d753
Chunk: baf487ba6d2433679eb587ca69abf67ea308aa1083e8ff6db796f184ad48ef2f
Chunk: 7fa52d6f9a59b1227444cdf5d896f11792c62fe24d0eba9ed087d04813f6a66c
Chunk: 7fa77f452f9556777e599de78449e38c73a433be631174e6186e1f8a163ce68a
Chunk: 7fb0bbde62b8eda6488b7cb65a5f543fa86423eef8730f83167983cba2cc0f2e
Chunk: 7fb2aeb0985052b4a0840ad492d9ac757fd18fcb30f29af53aeddea4004b224e
Chunk: 7fb5e1ad1fb36d545ae1f428efb3cbf08b4f180af37bf2eec08500b33fe6e764
hostname:~ $

All chunk names are hashes of the files so one would expect they should have been in order and there should have been a lot with names between ba... and 7f....

More details can be found at https://duplicacy.com/issue?id=5715683958587392.

Is this a known issue?

@ificator
Copy link
Contributor

Hi @gilbertchen, is it possible that files being added into the chunks folder while the children were being enumerated?

@gilbertchen
Copy link
Author

@ificator it is impossible. Chunk files were created during the backup command. The check command (which listed the chunks directory) ran only after the backup command had been completed.

This doesn't seem to affect everyone. At least for me the listing is still complete and in order.

@ificator
Copy link
Contributor

That's definitely strange then. I wonder if the user is doing something outside of the app when this repros. Unfortunately paging of /children is not transacted, and so changes to the underlying data can manifest as missing / duplicated items across pages.

One thing you could try is to use delta on the folder instead of enumerating the children. It'll give you everything you're after (maybe more - you'll need to ignore any items returned that are not immediate children), and it'll do so in a way that can handle changes during the paging.

@grhall
Copy link

grhall commented Nov 21, 2017

Hi I am the user of gilbertchen's software. I can confirm nothing else accessed the files or directory during the issue. I've asked what size of backup and number of chunk files gilbertchen tested where the listing is complete and in order. There are historical posts related to oneDrive that suggest oneDrive and/or Sharepoint had a 20,000 file limit. Other information suggests these limits were removed but could there be some legacy constraints in the use of @odata.nextLink and skiptoken, my folders have 35,000+ chunk files ? It works correctly for me at 2,005 chunk files.

@gilbertchen
Copy link
Author

Aha! The number of chunk files does seem to matter. In my previous test there were about 13K files under the chunks directory and everything was good. But once I doubled that to 26K the exact same behavior appeared. Missing chunks, duplicates, and out-of-order.

@ificator let me know if you need a small test program to reproduce it.

@grhall there is a workaround in Duplicacy. Recently I changed the chunk directory structure to allow nested levels for cloud storages, so there won't be that many chunk files under one directory. If you can build from the latest source on the master branch and start a new backup you will be able to get around this issue.

@ificator
Copy link
Contributor

FYI we've tracked down a bug that was introduced in November that may explain this behavior. I'll update this thread when we believe the issue is fixed and hopefully you'll confirm!

@ificator ificator self-assigned this Jan 27, 2018
@ificator
Copy link
Contributor

I believe this issue should be resolved now - is anyone still seeing unexpected paging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants