-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor cache architecture #628
Conversation
2347077
to
6954d65
Compare
92c8f69
to
ac00995
Compare
@roshanshariff - FYI, I have a status section at the top of the PR, where I will briefly update where things stand as I make changes. I moved everything from
|
5ac5098
to
ca52191
Compare
ELISP> (benchmark-run-compiled 1 (citar--ref-completion-table))
(0.597032375 1 0.1635932209999993) |
787ec6d
to
1d6b332
Compare
1d6b332
to
477f2b3
Compare
d04e44a
to
1b6cbc1
Compare
120f866
to
7afafc4
Compare
If we're just talking about #628 (comment), I guess that would solve the problem I raised there. But creating the completion candidates is still the slowest operation, and turning off the completion cache is noticeable, even with my not very large library. The completion cache is also brain-dead simple, so the cost seems low. So I'll be curious what you find in your experiments. Granted, keeping the caching would require a way to invalidate the cache from outside. But as I think about that, maybe its trivial? |
Even if we can invalidate a cache from outside, it's still inefficient to regenerate the cache repeatedly for every buffer every time any metadata changes. A cache only goes so far, and if it's invalidated too often you'll lose the benefits of caching. Currently, generating the completion candidates is doing a lot of unnecessary work in the template expansion. I think it can be boiled down to just concatenating a few strings, which can in fact be done as-needed. The trick is to do most of the string formatting ahead of time per-bibliography, and only redo it whenever the bibliography changes. Then the only work that needs to be done to generate the completion candidates is to gather the list of bibliographies, get their associated formatted strings, concatenate them (possibly adjusting for the frame width), and add the has-note/has-file metadata. This last step (I believe) can be done very quickly. |
OIC. I guess this also goes back to the earlier discussion of as-needed display formatting, which completing-read per doesn't support yet, but its Edit: In the interim, I guess you're thinking, we can just stash the primary completion string in an entry field while parsing the bibliography? |
I don't think it's a matter of just The only thing you can do at display-time is to add non-searchable but visible metadata (like Marginalia does). This doesn't help Citar much because we pretty much want the whole completion string to be searchable, so we must necessarily generate all the completion strings every time we call
I was thinking of another hash table in the cache with the display strings, just to avoid messing with the entry too much. Otherwise, it gets hard to keep track of all the non-standard bibliography fields we're adding and they become a de facto part of our public API, which means we can't change implementation details without breaking external code. |
OIC; so basically moving something like the hash (minus the "has" stuff, and with the key-value structure inverted) that is now the completion cache into the bibliography cache? Makes sense in any case. I get the basic idea, and agree it likely makes sense. The end result is one cache, that is simple from an API standpoint, but with great functionality, and very snappy performance. The downside, I'd guess, is the internals of that cache may be complex. It also occurs to me it'd be best optimized, if one has a large number of entries, for having one large file, that doesn't change much, and maybe one or two that do. Say a main file, and a "new" file. Is that right? |
Yeah, exactly. The structure of the cache is actually not that complex, and even that complexity is localized in a few functions. And yes, it is optimized for that use case, but there's no disadvantage to having a large number of small bibliographies either. It's just that all those bibliographies will be stored in a central cache, but fetching things from that cache is extremely efficient so it doesn't really matter. |
I was realizing after I wrote that something like this would probably work to represent a file, and it's indeed simple. ("/home/test/bib/test.bib"
:data #<hash-table equal 1/65 0x1b5b1e9>
:completions #<hash-table equal 0/65 0x208978f>
:checksm "1234"
:buffers nil) |
Yeah, that's pretty much what I have now, but defined using a (cl-defstruct (citar--bibliography
(:constructor citar--make-bibliography (filename))
(:copier nil))
"Cached bibliography file."
(filename
nil
:read-only t
:documentation
"True filename of a bibliography, as returned by `file-truename`.")
(hash
nil
:documentation
"Hash of the file's contents, as returned by `buffer-hash`.")
(buffers
nil
:documentation
"List of buffers that require this bibliography.")
(entries
(make-hash-table :test 'equal)
:documentation
"Hash table mapping citation keys to bibliography entries,
as returned by `parsebib-parse`.")
(formatted
(make-hash-table :test 'equal)
:documentation
"Formatted strings used to display bibliography entries.")) |
Oh, good; hadn't thought of that, but I like it a lot! I had just run into |
It looks like |
Ah yeah; that makes sense. BTW, another minor thing we can control ... As I'm reading the |
I was looking at that, and it looks like that's mainly for compatibility with Common Lisp? It's ignored in Elisp, and in fact there isn't really a good vocabulary for naming types in Emacs. |
This has some good tips on https://nullprogram.com/blog/2018/02/14/ He recommends, and I think I agree, for a constructor name like I also think (cl-defstruct (citar--bibliography
(:constructor citar--bibliography-create (filename))
(:copier nil))
"Cached bibliography file."
(filename
nil
:read-only t
:documentation
"True filename of a bibliography, as returned by `file-truename`.")
(hash
nil
:documentation
"Hash of the file's contents, as returned by `buffer-hash`.")
(buffers
nil
:documentation
"List of buffers that require this bibliography.")
(entries
(make-hash-table :test 'equal)
:documentation
"Hash table mapping citation keys to bibliography entries,
as returned by `parsebib-parse`.")
(completions
(make-hash-table :test 'equal)
:documentation
"Formatted completion strings used to display bibliography entries.")) |
Sorry for this breaking change, but I wanted to get the foundations right before tagging 1.0. This completely restructures the core of citar to borrow some code and ideas from the org-mode oc-basic package. In particular, it changes to using two primary caches: - bibliography - completion Both of these now use hash tables, rather than lists. Caching functionality is also changed, and the API now focuses on citekeys as arguments for key functions. Finally, citar--parse-bibliography should re-parse bibliography files upon change. Fix #623 Close #627
This functions returns all local and global bibliography files for 'citar--parse-bibliography' to parse.
Allows to independently turn off whether to do this by default, and whether to toggle the behavior.
a1d0085
to
c2a4855
Compare
FWIW, I just pushed a change here (rebased from main), but it was only adding the new citar-capf.el file I just merged to main. We'll need to adjust that to this branch. |
From my perspective, this idea seems pretty good. If I understand this correctly, this makes bibliography into a |
Thanks for taking a look!
It makes the cache a list of such objects, so we have finer-grained control of UI updating, and also, as a consequence the "has-note" indicators in the UI are always up-to-date with the bibliography files (edit: AND possible external note systems like org-roam). The higher level functions (like Make sense? |
Yeah I get it. This definitely sounds like a step in the right direction. And by not changing the higher level functions it also doesn't break anybody's workflow which is a good thing (unless they use the low level functions at which point they can figure out what to change). |
@roshanshariff any ETA on when you can open up at least a draft PR? Even if not all functions are working ATM, I can still weigh on what they are, etc. |
Sure, I can do that in a couple of hours. Pretty much all the code is ready and tested, and the performance looks promising. Just need to do a bit of work to hook it up to the existing citar functions so you can use it. |
Closing in favor of #634 |
Convert the cache from a single list, to a list of structured objects, each of which contains two hash tables, each keyed by citekey:
:entries
the raw bib data:completions
the formatted completion stringsThe completion candidates themselves are now no longer cached, but quickly assembled on-the-fly.
The bibliography objects also include the following additional properties, to selectively update bibliographic data:
:hash
the checksum of the file:buffers
the associated buffersClose #627, fix #623 #610
Status
Roshan's #634 is an improvement on this (the text above reflects it actually), so closing this in favor of that.
Caching and completion all work, as do at least some of the interactive commands (I haven't tested them all; some may break currently because completion now always returns a list of keys).
I've added keyword comments in places.
Questions and notes
citar--parse-bibliography
(and alsoorg-cite-list-bibliography-files
).citar-refresh
etc, which I have done for now (though adding it back would basically be a wrapper for(clrhash citar--completion-cache)
.has-notes
#623 along with this? (I think I did.)ensure-entries
?citar--get-candidates
, or deprecate that function (as now)?citar-select-ref
now always returns a list. Is that a mistake? My key concern is avoiding bugs in Embark and such.citar--ref-completion-table
is conceptually the same ascitar--get-candidates
; should we rename itcitar--get-ref-completions
to be consistent?