-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error importing groupstree version 3 generated by Better BibTeX for Zotero #2477
Comments
Hi @retorquere Thanks for your report and thanks for supporting an integration with JabRef. Recently, there have been considerable changes in the way in which information about groups is serialized. This is probably the reason for why an import of your generated BibTeX no longer works. Unfortunately, the current serialization is not documented as well. I am far from being an expert on groups and hopefully our group expert @tobiasdiez can find the time to look at this. The most notable change is that the group information is serialized together with an entry in a field and the meta data only contain the structure of the groups tree. Here is an example of one entry with a group:
Maybe we should take this issue as an opportunity to document the group syntax of JabRef. |
Oh wow that's a very significant change. This would mean groups get some kind of unique ID then. It'd be good to know what kind of restrictions are in place on group IDs as I would have to generate them, and the Zotero collection names from which I'd have to do so will almost certainly contain illegal characters. And the names are not by necessity unique even on a single parent. Is this format already out in the wild? Or is there still opportunity to weigh in on the format? |
Unless #1495 is still at play? Would that mean that an entry belongs to any group which happens to have that exact name? |
As far as I am aware, special characters in groups should not be a problem. However, somebody else should confirm this. The format is indeed out in the wild. Nevertheless, if you have a suggestion for improving it, please go ahead and let us know. The format is not set in stone and even if we do not follow a suggestion, discussing the format here helps to clarify it. Regarding #1495: Duplicate group names are still causing us massive headaches and, unfortunately, there is no solution in sight for this issue. |
My go-to solution for such things would have been json, either inside the comment or by just having the other braces already present on the comment block be the outer braces on a json construct. I know this might run into unbalanced braces if any of the values or keys has braces in them, but there's several ways to tackle that:
The "magic" here would be to have the first line always formatted that way, and the closing brace be the only one in that formatting to be the first line with a non-space first character. Since bib(la) tex ignores anything that's not a valid reference it should be safe to put there but through special formatting it that way it would both be valid json (super easy to parse) and easily picked out while scanning line-by-line. |
One of the aims of the syntax change was to let the user edit the group membership by hand in the bibtex code. This would not be possible if the identifier is a magic hash since nobody can remember such a string (this is also one of the reasons, why we have problems with duplicate group names). In any case, JabRef should still understand the old groups tree format and convert it to the new one automatically. Thus, it should be possible to open the above bib file without any problem. For now I have no idea where the issue lies and sadly I have also not the time right now to debug it further. Sorry Special characters in the group name shouldn't be a problem (in principle, I'm never sure about anything with JabRef's code 😸 ). |
But the json would then serve the dual purpose of being easier to parse and to edit for the user - even just keeping the old groups tree idea with the keys in the groups. |
I've only had reports that JabRef can't open that sample file without discarding the groups (although it does warn it will). On my macbook, I can't open the file at all. JabRef says that it's importing the file but doesn't seem to be doing anything, even if I just let it sit for 10 minutes. |
JSON is theoretically easy to parse, but the problem is that we do not start from scratch. JabRef has a rather complex parser and supporting a new style of format would essentially result in a re-write of this parser. That doesn't mean that you do not have a point, and we should keep the suggestion in mind in case we go for that option one day. Regarding your bib file: When opening it, the error console shows an exception in the groups parser. This might be fixable:
|
I don't know the jabref parser of course, but I figured it'd needed to read the whole How do I get to the error console, and what does this error tell me? Does it give me a position where it finds the offending input? I think it expects a number at the failure position, but from my reverse engineering of the groups format (and my limited use of it) there should be only two places a number is expected:
but the sample file has those two on all of the lines. Is the line-wrapping required for jabref BTW or is it a convenience feature only? And should I move to the new groups format or wait until #1495 is resolved? |
The JabRef parser has been implemented 15 years back in time and evolved since then :) It is PushbackReader, essentially a queue that reads the file character-wise, possibly pushing characters that have been read back to the head of the queue, if needed. You get to the error console via the menu: Help -> Show error console. To me, the error locks more like a bug in JabRef. Even if the group code is faulty, the GroupsParser should not fail with a NumberFormatException. It does not tell you the exact position of that triggers the error. But it might help us in debugging and finding this position. This is the reason why I posted the stacktrace. Regarding line wrapping in the groups format, I would again need the advice of @tobiasdiez My hope would be that we just find the error in the current parsing that triggers this exception and after that your generated groups work again. |
OK so in the error console I now see this same error message, but the status remains set at "Importing in unknown format". I assume that this does not actually mean the import is still running. I'll just wait for now before doing anything in BBT. Except the groups format perhaps. Should I be generating the new format? At the very least I should be parsing it. |
You can go ahead with implementing the new format. We will keep support for reading the old format for quite some time (and will investigate this error, though I cannot guarantee the time frame). But we will definitely not roll back to the old format. Newer versions of JabRef exclusively write out the new format, so support for parsing it makes sense. You can also go ahead with generating it, if you do not mind its drawbacks. Despite those, the current format has been stable for quite a while and we will keep it stable for quite a while longer. My hope is that we can just resolve the duplicates bug inside the application code (without touching the format). |
So how are multiple groups separated in the current implementation? Comma, semicolon? |
Extending the example above:
Ergo: Semicolon. The safest way to get information about the group format is probably to install jabref and see what it serializes ;-) |
That's what I did before and now it fails to parse in Jabref 😉 . I meant to ask how to separate the groups in the reference so that's a comma. Parsing should already work (it was a relatively minor change), writing out is en route. |
I think I found the error why your example bib file does not import correctly. PR is coming hopefully this evening. |
Alright, next release of BBT will have format 4 (I suppose) parsing & writing, and will still parse format 3. I can see why you chose to go this way -- the changes were minimal. |
How should the |
Perhaps a Name list would be better, as its behavior is a little more well-defined. |
So the problem was that some group names in the example bib file contained non-escaped backlashes. For example, With #2488 a proper error message should be shown if the "damaged" database is imported. |
I think I may know - looks like the jabref groups are just lists of strings, which are encoded into a single string by escaping backslashes and semicolons, and then joined by a semicolon. The hierarchy is then treated the same, which leads to the double escaping. I'll make the change somewhere today - should be easy. |
OK, I think I have this fixed now. |
Yeah, I have tests confirming the fix. My implementation of the groups format was pretty whack, that should now be fixed. What is the meaning of the empty field at the end of an ExplicitGroup BTW? |
If I understand the code correctly, then the last field contained the list of referenced entries. This is now empty since newer versions store this information directly in the entry (as |
As far as I can tell the empty last cell was always there, also in groupstree format 3. |
I'm the author of Zotero Better BibTeX; my extension generates BibTeX including a JabRef groupstree that is intended to import cleanly into JabRef. Recenty I've had reports that this import fails; this means my implementation of the groupstree is faulty, but as I cannot find any documentation on the format, I don't know in what way it is faulty.
I have a sample at https://drive.google.com/open?id=0BxFpK0V-elKWSVE1ejdLVkdiNXc, if someone could have a look at what I did wrong it'd be hugely appreciated. The BBT-generated groupstree is very simple and always only has ExplicitGroup entries.
The text was updated successfully, but these errors were encountered: