Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What issues are we running into in Backfillr? #389

Open
alexwlchan opened this issue Mar 13, 2024 · 5 comments
Open

What issues are we running into in Backfillr? #389

alexwlchan opened this issue Mar 13, 2024 · 5 comments
Labels
backfillr bot 🤖 bug Something isn't working

Comments

@alexwlchan
Copy link
Contributor

This is a tracking ticket to highlight files (with examples!) where the bot is getting "confused" and doesn't know how to update the SDC.

@alexwlchan alexwlchan added bug Something isn't working backfillr bot 🤖 labels Mar 13, 2024
@alexwlchan
Copy link
Contributor Author

Unable to find Flickr photographer

Example: https://commons.wikimedia.org/wiki/File:%22Air_Cav!%22.jpg

This file links to https://www.flickr.com/photos/35703177@N00/28006848539/, but the photo page is a 404 and the user page is a 410.

In this case the bot adds the P12120 (Flickr photo ID) and P7482 (source of file statements), but it can't add any of the other Flickr metadata.

It might be useful to add the Flickr user ID in P170 (creator), but it's not essential.

@alexwlchan
Copy link
Contributor Author

"Date taken" comes from the EXIF, not the Flickr metadata

Example: https://commons.wikimedia.org/wiki/File:%22Aircraft_revetments_constructed_from_empty_fuel_drums_at_Chu_Lai_-_September_1965.%22_-_49716360457.jpg

The date on WMC is 11 January 2018, 05:27:39, which is the created date in the EXIF of the JPEG file, but that's not when the photo was actually taken – more likely when it was digitised.

The Flickr photo has the actual date: September 1965.

I don't know how widespread this is, but this is the sort of thing we should fix.

@alexwlchan
Copy link
Contributor Author

alexwlchan commented Mar 13, 2024

Videos are weird

Example: https://commons.wikimedia.org/wiki/File:Strawberries_time-lapse.ogv
Example: https://commons.wikimedia.org/wiki/File:Lascar_VIDEO_-_Riding_the_Budavari_Siklo_to_the_Castle_Hill_top_(4543574073).jpg

This throws an exception when we try to retrieve the image info:

Traceback (most recent call last):
  File "/Users/alexwlchan/repos/flickypedia/.venv/bin/flickypedia", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/src/flickypedia/backfillr/cli.py", line 312, in update_single_file
    run_with(list_of_filenames=[filename])
  File "/Users/alexwlchan/repos/flickypedia/src/flickypedia/backfillr/cli.py", line 93, in run_with
    photo = flickr_api.get_single_photo(photo_id=photo_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/flickr_photos_api/api.py", line 371, in get_single_photo
    "width": int(s.attrib["width"]),
             ^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: ''

This is a bug, because we don't even use the width value that's being extracted.

@alexwlchan
Copy link
Contributor Author

"Date taken" has an incorrect level of precision

Example: https://commons.wikimedia.org/wiki/File:%22A_Welcome_Visitor_to_Camels%27_Paradise%22.jpg

The date on Flickr is circa 1922, but it's been mapped to WMC as 1 January 1922. This looks like a bug in the original migration tool – Flickr returns a 1 Jan timestamp in the "date taken" field and stores the granularity separately. This is something we should be able to fix automatically.

@alexwlchan
Copy link
Contributor Author

This seems like a fairly obvious thing to do which I've only added now; I'm now tracking which properties have an unknown action and the associated files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backfillr bot 🤖 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant