-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move all child entries in the cache in a single query #13956
Conversation
8fb35ac
to
94ec169
Compare
@@ -25,6 +25,7 @@ public function fixupStatement($statement) { | |||
$statement = str_replace('`', '"', $statement); | |||
$statement = str_ireplace('NOW()', 'CURRENT_TIMESTAMP', $statement); | |||
$statement = str_ireplace('UNIX_TIMESTAMP()', self::UNIX_TIMESTAMP_REPLACEMENT, $statement); | |||
$statement = preg_replace('/MD5\(([^)]+)\)/i', 'LOWER(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw($1)))', $statement); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets all take a moment to appreciate oracle's sql dialect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
\OC_DB::executeAudited($query, array($targetPath, md5($targetPath), $child['fileid'])); | ||
} | ||
$query = \OC_DB::prepare('UPDATE `*PREFIX*filecache` SET | ||
`path_hash` = MD5(CONCAT(?, SUBSTR(`path`, ?))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we do all the magic in php?
Also this is using the old path, not the new one, intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, you are not looping anymore
Please also add a test scenario with two folders:
Then rename I can't see it clearly in your code, if that works or not. |
@icewind1991 awesome finding. |
94ec169
to
a37de95
Compare
@@ -149,7 +149,7 @@ public function get($file) { | |||
$where = 'WHERE `fileid` = ?'; | |||
$params = array($file); | |||
} | |||
$sql = 'SELECT `fileid`, `storage`, `path`, `parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, | |||
$sql = 'SELECT `fileid`, `storage`, `path`, `path_hash`, `parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used?
reschedule for 8.2 |
a37de95
to
a508292
Compare
rebased - just because .... |
|
Before diving into the filecache table I highly recommend we start educating ourselves on what the existing solutions are. In general there are 4 implementations for storing hierarchical date in a relational database:
Currently oc implements an adjecency list. Unfortunately, we loose all benefits of that model because we are also storing the full path in the db. As a result we have to propagate moves down the tree. An additional requirement is that we need to propagate mtime and etag up the tree. Furthermore, we need to keep in mind how expensive it is to resolve a fileid from a path. That is nothing I could find in any comparison of the above data models. It is also the reason why we store the path, which basically breaks our necks. Maybe it is enough to cache the path->id mapping in memory? The only way to find out IMO is to implement the four data models, find out which operations are expensive and especially how they perform in our usecases. It might make sense to actually change our implementation to allow faster uploads / propagation of etags. Closure tables for example would allow us to do that in one query, preserving referential integrity. Something we currently do not have when a request times out / is interrupted. |
that's indeed very interesting - THX for sharing - this actually cries for a 20% research time allocation @icewind1991 @PVince81 @blizzz interested? |
@felixboehm had the idea to not propagate the etag of shared folders up to the root and instead treat storages as separate trees. the etag for a mount point is then dynamicalld calculated by concatenating and hashing the individual etags for the separate trees. This would prevent having to propagate the etag up into multiple storages (when a file has been shared with multiple users). It would reduce our problem from graphs to trees again. |
I've looked into it in the past and came to the conclusion that the vast majority of operation we do is based on the file path so that's the main case we need to optimize for. While deleteing/renaming a folder isn't optimal in the current approach those operations dont happen nearly as often as Closure tables seemed like the best way to solve recursive operations like delete/rename to me but since we can't do triggers in all our db backends afaik maintaining the closure table adds a significant amount of complexity and potential for bugs |
There are only two options for keeping referential integrity when inserting / moving / updating trees: Adjacency List or Closure Table. While I agree that we mostly do queries to map paths back to fileids please keep in mind that SELECTS can be cached and scaled out to multiple db servers. UPDATEs can't. But again, the proof is in the pudding. We need to actually try this and see how it scales in our workloads. |
Also found this patent on how to model a hierarchical filesystem in a relational database:https://www.google.com/patents/US6427123 it recognizes the problem of path based lookups and adresses it. But it looses referential integrity in the process. |
👍 for @icewind1991 s approach of doing that with one statement. This kills pot. concurrency issues as well. One remark: The LIKE tends to eat kittens with big data sets as it turns slow. One trick to improve that is to shrink the actual data set on which the LIKE is performed. In this case this could be done by excluding all records where the path length is shorter than the source, ie:
We do that in the client, and I once checked that it was improving speed considerably. We, however, have a column with pathlen in the table which is indexed. If @icewind1991 has a test setup anyhow it might be worth to check. |
@DeepDiver1975 @butonic We need to agree as to whether this is going into 8.1. This seems required ASAP as per @butonic |
To e honest with you: the system is already in an unstable state - moving more changes in will help no body. Furthermore we did freeze 8.1 weeks back and this change was move out of scope of 8.1 for a reason. |
Please rebase. Would be cool to push this forward 😄 Then have something similar for etag propagation if possible. |
Where's the 'vote' button? I'd very much like to see this implemented 👍 |
a508292
to
47af8fb
Compare
A new inspection was created. |
@DeepDiver1975 @cmonteroluque And again we need to move this. Bringing in such a huge change in the current state is quite bad -> 9.0 |
@MorrisJobke ok. Yeah, this is definitely 9.0 |
@icewind1991 Please rebase this. It would be super nice to have this early in the release cycle. We have all of this covered by tests and rebase it now and merge it would be a good way to proof that it works. |
47af8fb
to
1c0903f
Compare
More conflicts ... what to do here? |
Missed the mark again. Move to 9.2 ? Or we can merge this (solve conflicts first) before the feature freeze and iron out potential issues during the hardening phase... |
It's seriously a joke to move this once more. I'm closing this now. |
So... "closed" means "won't fix / won't be implemented"? That would be a real shame... |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Use some sql magic to move the calculation of the updated fields to sql so we can do it all in a single query.
Moves the time it takes to rename a folder with ~5k files down to ~180ms, comparison.
Adds adds support for using
MD5()
in sql to oracle and mssql and usingCONCAT()
in sqlite.cc @DeepDiver1975 @PVince81 @MorrisJobke