Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use "extra: EdTechHub.ItemAlsoKnownAs" instead of "archiveLocation" field. #27

Closed
bjohas opened this issue Feb 1, 2020 · 22 comments
Closed

Comments

@bjohas
Copy link
Collaborator

bjohas commented Feb 1, 2020

archiveLocation causes issues with APA and reports - the archiveLocation is printed.

While it feels more fragile to use the extra field, this is already done for some items that do not have archiveLocation.

The proposal is to use a line in "extra" to store the identifiers. I further suggest to prefix this line with "zoteroIdentifiers" rather than "archiveLocation", because from then on, the new line is unrelated to archiveLocation.

@retorquere
Copy link
Collaborator

Not just that: the upcoming zotero will migrate stuff out of the extra field if it thinks it has a corresponding "real" field or if it has the name of an existing CSL variable. You really want a clearly unambiguous field that makes it non-zotero. I'd say "zoteroIdentifiers" is still too generic. For BBT I've started using tex.FieldName.

@bjohas
Copy link
Collaborator Author

bjohas commented Feb 18, 2020

OK, great - that's important. How about

AlsoKnownAsId

Is that sufficiently cryptic?
Or

EdTechHub.ItemAlsoKnownAs

(as it's to do with the edtech hub plugin)?

@retorquere
Copy link
Collaborator

Would that have a key, a title, something else?

@retorquere
Copy link
Collaborator

Should I move what is currently in the archiveLocation to the Extra field?

@bjohas
Copy link
Collaborator Author

bjohas commented Feb 23, 2020

Yes - what's currently there should be moved. The condition would be: If there's no extra field "EdTechHub.ItemAlsoKnownAs" then move what's in the archiveLocation to "Extra -> EdTechHub.ItemAlsoKnownAs".

Previously

archiveLocation: 123:ABC;456;DEF

now:

archiveLocation: 
extra: EdTechHub.ItemAlsoKnownAs: 123:ABC;456;DEF

Not sure what you mean by "Would that have a key, a title, something else?" ?

@bjohas
Copy link
Collaborator Author

bjohas commented Feb 23, 2020

As I think I have provided the needed information, I'm removing the label. Feel free to re-add!

@bjohas bjohas changed the title Use "extra zoteroIdentifiers" instead of archiveLocation Use "extra: EdTechHub.ItemAlsoKnownAs" instead of "archiveLocation" field. Feb 23, 2020
@retorquere
Copy link
Collaborator

WRT the migration: what do you want to be the trigger for this? Scanning all the items on start to determine whether any of them need to be migrated is going to be a PITA for the user when the library grows. There is also the problem that if people start using archiveLocation for its intended purpose (whatever that may be) the migration code I would write would always move it to extra.

@bjohas
Copy link
Collaborator Author

bjohas commented Feb 24, 2020

Let's just trigger it manually. I.e., it should run when the user runs the plugin. The plugin has very limited distribution (and even for us, only covers a few libraries). Importantly, I'll get everyone to update the code, so that people don't keep putting ids into archiveLocation. In practice, few people do that.

So I would even say we leave this mirgration in the code temporarily. Let's leave it there for say a month, and then remove it.

I think there should be some light pattern matching on archiveLocation before the content is moved. I would suggest that we look whether there's a least one item of the form

1234567:ABCDEF
\d{7}\:[\w\d]{7}

in it.

@retorquere
Copy link
Collaborator

The user doesn't really run the plugin. The plugin is installed by the user, but is then ran every time Zotero is started.

@bjohas bjohas closed this as completed Feb 24, 2020
@bjohas
Copy link
Collaborator Author

bjohas commented Feb 24, 2020

The "add id to archiveLocation" function only runs manually. Items aren't scanned on Zotero start.

@bjohas bjohas reopened this Feb 24, 2020
@bjohas
Copy link
Collaborator Author

bjohas commented Feb 24, 2020

Sorry for accidentally closing!

@retorquere
Copy link
Collaborator

So far we've just stored the DOIs in archiveLocation, right? I could test the Archive Location field and corresponding likes in extra to see if they're DOIs (which follow a detectable format with reasonable accuracy), and if so, migrate that to EdTechHub.ItemAlsoKnownAs.

Scanning for a DOI in extra or archivelocation can be done reasonably fast because I can construct a single SQL query that tests for likely presence (probably just testing for "10.") after which I can do a more thorough test on the found items. I use a similar approach for my own plugin. But that all hangs on there being a simple test for likely presence. If the existing use of archiveLocation has things other than DOIs, that may not be tenable.

@bjohas
Copy link
Collaborator Author

bjohas commented Apr 12, 2020

So far we've just stored the DOIs in archiveLocation, right? I could test the Archive Location field and corresponding likes in extra to see if they're DOIs (which follow a detectable format with reasonable accuracy), and if so, migrate that to EdTechHub.ItemAlsoKnownAs.

Some item types do not have archiveLocation, so for those we've already used an extra field. I am hoping that this means that we have code that can write to EdTechHub.ItemAlsoKnownAs already. I.e., rather than making that the exception, we make it the rule.

That leaves migration. As we have identifiers in archiveLocation and EXTRA.archiveLocation , those both need to be scanned. The entries should mainly look like:

(\d+)\:([\d\w]+)

bar DOIs, and they could be chained as

(\d+)\:([\d\w]+)\;(\d+)\:([\d\w]+)\;...(\d+)\:([\d\w]+)\;?

I think you could see whether the archiveLocation or extra.archiveLocation contains one combination of (\d+)\:([\d\w]+) (perhaps of the right length for group id and item id). If yes, then the whole thing are identifiers. These should then be moved from archiveLocation and EXTRA.archiveLocation to EdTechHub.ItemAlsoKnownAs.

(I've not really seen archiveLocation used much, which is why I used it in the first place,.... but clearly that was a mistake!)

Regarding DOIs: The DOI - if available - has also been added to the archiveLocation field. I had anticipated that this could be another identifier and as the DOI field wasn't consistent across types, wanted to copy it to the archiveLocation. Maybe that was also a mistake, esp. as the universal DOI field is coming. However, for now, the DOI is part of the item anyway, we don't need to particularly migrate DOIs here: The DOI should already part of what is picked up and what is written to archiveLocation and thus to EdTechHub.ItemAlsoKnownAs. Hope that makes sense.

@bjohas
Copy link
Collaborator Author

bjohas commented Apr 12, 2020

I had wondered whether I shouldn't just do the migration with the zotero-cli script? It's a one-off process, and could be done that way. If the migration is hard, then lets just do that?

@retorquere
Copy link
Collaborator

Some item types do not have archiveLocation, so for those we've already used an extra field. I am hoping that this means that we have code that can write to EdTechHub.ItemAlsoKnownAs already. I.e., rather than making that the exception, we make it the rule.

That's the easy part.

That leaves migration. As we have identifiers in archiveLocation and EXTRA.archiveLocation , those both need to be scanned. The entries should mainly look like:

(\d+)\:([\d\w]+)

That is not possible with the 2-stage approach I had in mind. The approach I have was:

  1. do an SQL query against the zotero database to find candidate items
  2. of the candidates, get their details, and examine which are precisely the ones we want

but I can't use regexen in the sql query. I can of course fetch all items and inspect them, but for very large libraries I don't know what the impact would be.

I had wondered whether I shouldn't just do the migration with the zotero-cli script? It's a one-off process, and could be done that way. If the migration is hard, then lets just do that?

If it's truly a one-off, I'd not bother with that. I'd just make a throwaway script that would run in the Zotero javascript runner.

@bjohas
Copy link
Collaborator Author

bjohas commented Apr 12, 2020

Yes, it's truly a one-off. Basically, I'd run it against all my libraries, and that should take care of it!

@retorquere
Copy link
Collaborator

Then performance also isn't the most important bit.

@bjohas
Copy link
Collaborator Author

bjohas commented Apr 12, 2020

No, not at all.

retorquere added a commit that referenced this issue Apr 13, 2020
retorquere added a commit that referenced this issue Apr 13, 2020
retorquere added a commit that referenced this issue Apr 13, 2020
@edtechbot
Copy link
Collaborator

🤖 this is your friendly neighborhood build bot announcing test build 0.0.21.187 ("fixes #27")

Install in Zotero by downloading test build 0.0.21.187, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

@retorquere
Copy link
Collaborator

The migration script is in https://github.com/edtechhub/zotero-edtechhub/blob/gh-27/eth-migrate-archiveLocation.js

Note that .187 is not yet merged; only after it is merged will it be released as an upgrade.

@bjohas bjohas closed this as completed in e0cef62 Apr 13, 2020
bjohas added a commit that referenced this issue Apr 13, 2020
Gh 27 - change to storage of item identifiers
@edtechbot
Copy link
Collaborator

🤖 this is your friendly neighborhood build bot announcing test build 0.0.21.190 ("Update eth-migrate-archiveLocation.js\n\nAs far as I can see, the script has the problem that all archive locations will be moved to extras. However, very occasionally archive locations are used for other things. This isn't the case in the main libraries we use, however, there are many many libraries I have and some of them may have data in the archive location. I've added some pseudo-code above.\r\n\r\nAlso, it may be the case that the script has to run more than once. I've added some pseudo-code to allow that.")

Install in Zotero by downloading test build 0.0.21.190, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

bjohas added a commit that referenced this issue Apr 13, 2020
Merge pull request #43 from edtechhub/gh-27
bjohas added a commit that referenced this issue Apr 13, 2020
@edtechbot
Copy link
Collaborator

🤖 this is your friendly neighborhood build bot announcing test build 0.0.21.192 ("Merge pull request #47 from edtechhub/master\n\nMerge pull request #43 from edtechhub/gh-27")

Install in Zotero by downloading test build 0.0.21.192, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

@edtechbot edtechbot mentioned this issue Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants