-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype: Folder items scheduler - evaluation attribute sorted tree #5349
Conversation
@mrow4a, thanks for your PR! By analyzing the history of the files in this pull request, we identified @ogoffart, @ckamm and @jturcotte to be potential reviewers. |
This algorithm live sorts the items withing all the folders according to its size or the total size of the contained items. It CREATES ON FLY a specific sorted tree, while ADDING items to the folder using specific data structure QMap with key being predicate and value being Job (not using sorting loop! this will be inefficient for a lot of files, it will loop only for folders, which number is not expected to be exceeding milion I guess). Quick reminder, QMap is characterised that it sorts its item by the key. If the key is the same, e.g. 0, items are sorted in the insertion order. |
This should also solve #1633 since this will By The Way balance upload/download by placing the files/folders of the same sizes close to each other. Our 3 separate threads should create nice pipe utilization in this case. Please close the issue after merge |
56afdbf
to
786478a
Compare
src/libsync/owncloudpropagator.cpp
Outdated
// TODO: The call for getJobPredicateValue should recursively traverse all kids in _containerJobs of _containerjobs, | ||
// in order to update their predicates. Root job should call updatePredicates in owncloudPropagator as a last step before run | ||
// , and save flag _folderPredicateUpdated = true | ||
_subJobs.insertMulti(job->getJobPredicateValue(), job); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should basicaly be moved to the moment before root get started, and the function _root->updateContainersPredicates() should perform left-deep traversal of all the folders with changed files ( _containerJobs), and be added to _subJobs list with getJobPredicateValue(), job>. This has to be done like that, because files are being added to containers and we have to update _subJobs list of all folders starting from deepest ones.
d319e86
to
d252113
Compare
src/libsync/owncloudpropagator.h
Outdated
|
||
// This uses recursion to perform Depth-First Traversal of the directories with changes trees | ||
// If the given (this) directory contains _containerJobs, it will call updateJob on that child dir job, otherwise does nothing | ||
void updateJobPredicateValues(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is initiated (run first) for _root, after all jobs are appended. It will execute Post-Order Depth-First Tree Traversal in order to visit directories kids (_containerJobs) of given _root, starting from bottom, and visiting their respecive directories kids (_containerJobs).
See #5352 to fix a few comment typos. |
src/libsync/owncloudpropagator.h
Outdated
} | ||
|
||
void append(PropagatorJob *subJob) { | ||
_subJobs.append(subJob); | ||
_subJobsPriority += subJob->getJobPredicateValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see getJobPredicateValue is called here.. but there is also updateJobPredicateValues... does this not cause too much load? Maybe I misunderstand..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are two different functions. getJobPredicateValue will just get the present value of the sync item (ignore file etc, sync file (file size)) and for directories, we insert them to _containerJobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateJobPredicateValue is recursively traversing only _containerJobs tree, thus only directories with changes. This one will be called once for the _root job, and recursively visit all "changed" directories. _subJobs are untouched, and I will not touch that, because this could contain thousands of items, while _containerJobs will never be deep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more commenting..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guruz Ok, on the weekend will write algorithm documentation for that in https://github.com/owncloud/documentation/tree/master/developer_manual
src/libsync/owncloudpropagator.h
Outdated
if(subJob->_priority == JobPriority::ContainerItemsPriority){ | ||
_containerJobs.append(subJob); | ||
} else { | ||
_subJobs.insertMulti(subJob->getJobPredicateValue(), subJob); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the intent clear, change QMap to QMultiMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, didnt know about this one. Nice
|
||
for (int i = _firstUnfinishedSubJob; i < subJobsCount; ++i) { | ||
if (_subJobs.at(i)->_state == Finished) { | ||
QMutableMapIterator<quint64, PropagatorJob *> subJobsIterator(_subJobs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that this iterates by predicate and also iterated over the multiple values of a multi map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt it just iterates over the items it has in the sorting order? I thought it is self explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it was not...
@@ -396,9 +396,12 @@ void OwncloudPropagator::start(const SyncFileItemVector& items) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I added you on several issues here in github, please mention them in the commit message if you think they are handled by this
src/libsync/owncloudpropagator.h
Outdated
// all the sub files or sub directories. | ||
QVector<PropagatorJob *> _subJobs; | ||
// all the sub files or sub directories. This map has to be updated with _containerJobs | ||
// QMap is ordered by the key value, or in case of equal keys (e.g 0) by insertion order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify what the key value actually is here in the comment (it's the predicate value aka file sizes right?)
src/libsync/owncloudpropagator.h
Outdated
// QMap is ordered by the key value, or in case of equal keys (e.g 0) by insertion order. | ||
QMap<quint64, PropagatorJob *> _subJobs; | ||
|
||
QVector<PropagatorJob *> _containerJobs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add detailed comment of the destinction between a container job and a sub job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, interesting..
Needs more comments.. :)
What does smashbox say?
@guruz Whom to contact for smashbox tests? How do you do that normally? @owncloud/qa do you automatize smashbox tests also for client? EDIT: submitted corrections |
Please after acceptance, close #4498 (comment) |
@mrow4a That's why I wrote "please mention them in the commit message if you think they are handled by this".. then they are linked. |
@guruz Ok, I misunderstood you, though you mean just comment here. Will correct it. |
Ohh, so I should just run it localy, I though you had some QA server on which you test your branches. Ok, will test it localy agains the tests from ownCloud CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My list of changes I'd like to see after the discussion today:
- Rename "Predicate" to something like "Function"
- Set the sort function of a folder vs. its sibblings to be the most recent modification time of all its children
|
||
// FIXME! we should probably cache this result | ||
//this will get the _container jobs within this parent directory and iterate over them | ||
QMutableVectorIterator<PropagatorJob *> containerJobsIterator(_containerJobs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I use List? @ogoffart because of the remove?
@@ -48,7 +48,7 @@ class PropagateRemoteDelete : public PropagateItemJob { | |||
QPointer<DeleteJob> _job; | |||
public: | |||
PropagateRemoteDelete (OwncloudPropagator* propagator,const SyncFileItemPtr& item) | |||
: PropagateItemJob(propagator, item) {} | |||
: PropagateItemJob(propagator, item, JobPriority::LastOutPriority) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok?
@@ -40,7 +40,8 @@ static const char checkSumAdlerC[] = "Adler32"; | |||
class PropagateLocalRemove : public PropagateItemJob { | |||
Q_OBJECT | |||
public: | |||
PropagateLocalRemove (OwncloudPropagator* propagator,const SyncFileItemPtr& item) : PropagateItemJob(propagator, item) {} | |||
PropagateLocalRemove (OwncloudPropagator* propagator,const SyncFileItemPtr& item) | |||
: PropagateItemJob(propagator, item, JobPriority::FirstOutPriority) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok?
@mrow4a Please add unit tests, you are rising complexity without testing. |
We have not established yet best way to schedule the file. |
Yes, I think this is yet another feature, independent from other things, but worth consideration. |
AFAIK this only makes sense if #5440 is done? |
I removed the milestone because this would still need more work, and I don't think it is worth it at all. But I leave it open for the sake of discussion. |
I still think that sorting by modification time and syncing that way might be a good idea. |
I think it might be. So let's see after the tests/benchmarks about this. |
Closing outdated pull request |
Case Scenario 1:
Carlos and his Zombies just went back from holidays, he opens the computer and tries to sync all his photos - 4GB. With his link, to upload all the photos will take around 1 hour. He starts working since next day is monday, and he puts in the various folders his new files.
Case Scenario 2:
Carlos didnt use sync client for longer time, he want to do the initial sync of his files. He starts sync of a lot of folders with a lot of files. Total sync time will be around 30 minutes. Within all these files, there are some folders on which he collaborates on the document. He want to share the document he is editing during the sync, and this folder is not big (it contains only documents).
EXPECTATIONS:
In the very nested structure, there is a folder with only small changes, and folder with a lot of new files, which sync time will be very long. It is expected, that the folders with small changes will be synced first. If I pause-start sync, my small file schould by synced first
REALITY:
OC client currently is "I just found folder on filesystem and I will try to sync its contents to the cloud" structured. This means, that if you place a small change in some nexted folder tree, and before it there is another nested tree with a lot of files, your document will wait till sync comes to your folder
Solution:
Effect 1:
I removed the caching of the Finished Jobs, and I just remove them from the stack. I am using also Iterator instead of for loop.
Effect 2:
Oc sync is structurising the sync in the following manner, not regarding on the size of the folders:
THE TREES BELOW ARE TREES OF CHANGES - These folders could contain a lot of unchanged files in each. Tree is being build only basing on size of changes within specific subtree.
Important note: all file operations(IGNORE, MOVE, ETC) except for PUT and GET, are assigned with HighPriority, it means they will be synced with insertion order (the same as previously). PUT and GET predicates are based on their file sizes and are assigned NormalPriority. Folders (subtrees) are assigned ContainerPriority, which means their predicate is size of the changes within this subtree. REMOVED folders are HighPriority and will be synced old way.
However, this algorithm (not using loop for all the items, only for folders within folder - complexity 0(n) where n is number of folders in the folder) builds it as follows, looking on size predicate for files and size of items within folders:
I have tested it on similar structure and it works really nice :> @ogoffart @DeepDiver1975 @guruz @felixboehm