Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --fix-file-extension to export #382

Open
RhetTbull opened this issue Feb 20, 2021 · 10 comments
Open

Add --fix-file-extension to export #382

RhetTbull opened this issue Feb 20, 2021 · 10 comments
Labels
cli Pertains to the command line interface feature request New feature or request

Comments

@RhetTbull
Copy link
Owner

See #336 and #381

Photos and external editing apps can apply the incorrect extension. Photos doesn't complain. It's also possible to import a photo with incorrect extension and Photos displays it fine but gets the UTI wrong. It would be good to have an option to fix the extension on export. I think the most reliable way to do this is use exiftool to get the filetype (see comments on #381).

>>> from osxphotos.exiftool import ExifTool
>>> exif = ExifTool("/Users/rhet/Pictures/Test-10.16.0.1.photoslibrary/originals/D/D05A5FE3-15FB-49A1-A15D-AB3DA6F8B068.dng")
>>> exif.asdict()["File:FileTypeExtension"]
'DNG'
@RhetTbull RhetTbull added feature request New feature or request cli Pertains to the command line interface labels Feb 20, 2021
@RhetTbull
Copy link
Owner Author

Should there be an option to do this only for original images or only for edited images? The problem can happen on import (originals) or more often on edit but a lot easier to just apply the fix to all images.

The would be good with a cache for exiftool #325

@RhetTbull
Copy link
Owner Author

For this to work with with --download-missing and --use-photos-export, the check will need to occur after the file is exported to the export directory. This could result in the file having a different name which would then make the --update logic think the file was missing resulting in the file being re-downloaded. To avoid this, the following may work:

  • Create a new ExportDB table, filetype with columns: filepath, actual_mime_type, actual_file_ext
  • filepath would be the original (pre-fix) filepath
  • After file is exported, check the ExportDB to see if this file has previously been checked and if so, check that the the extension is correct. If so, do nothing. If not, rename the file.
  • If not in the filetype table, run exiftool to get the actual mime type and extension and populate the table then do the same check.
  • This could result in incorrect files being downloaded unnecessarily but there's not an easy way around this with the current --update code.
  • This could also result in name collisions though this should be rare.
  • --update code could be updated at some point to try to avoid a duplicate export but as these "wrong" files are expected to be rare, I'm not too worried about that.

@PetrochukM
Copy link
Contributor

I love this idea! I am running into this issue now, and I am trying to figure out a way to fix the file extensions before running an export, so that everything goes smoothly.

@RhetTbull
Copy link
Owner Author

@PetrochukM how many photos do you have with this problem? Are you intending to do a 1-time export or a recurring export with --update? If this is a one-time export or it's a small number of photos, you could use --post-function to call a custom function which would examine the extension and the file type and rename the file if necessary. This would be relatively simple to implement. The downside is that this couldn't be used with --cleanup and if you used --update, the files will get re-exported with each subsequent export as the name won't match what's in the export database. This feature is on my to-do list but it's pretty far down the list so will be a while before I get to it.

If you save the following as fix_export_extension.py and run export by adding the flags:

----post-function fix_export_extension.py::fix_extension

it should rename the files with incorrect extension. It doesn't check for name collisions and there may some extensions where there's more than one valid extension (for example, .jpg, .jpeg) in which case

""" Example function for use with osxphotos export --post-function option """

import pathlib
from typing import Callable

from osxphotos import ExportResults, PhotoInfo
from osxphotos.exiftool import ExifTool


def fix_extension(
    photo: PhotoInfo, results: ExportResults, verbose: Callable, **kwargs
):
    """Call this with osxphotos export /path/to/export --post-function fix_export_extension.py::fix_extension
        This will get called immediately after the photo has been exported

    See full example here: https://github.com/RhetTbull/osxphotos/blob/master/examples/post_function.py

    Args:
        photo: PhotoInfo instance for the photo that's just been exported
        results: ExportResults instance with information about the files associated with the exported photo
        verbose: A function to print verbose output if --verbose is set; if --verbose is not set, acts as a no-op (nothing gets printed)
        **kwargs: reserved for future use; recommend you include **kwargs so your function still works if additional arguments are added in future versions

    Notes:
        Use verbose(str) instead of print if you want your function to conditionally output text depending on --verbose flag
        Any string printed with verbose that contains "warning" or "error" (case-insensitive) will be printed with the appropriate warning or error color
        Will not be called if --dry-run flag is enabled
        Will be called immediately after export and before any --post-command commands are executed
    """

    for filepath in results.exported:
        filepath = pathlib.Path(filepath)
        ext = filepath.suffix.lower()
        if not ext:
            continue
        ext = ext[1:]  # remove leading dot
        exiftool = ExifTool(filepath)
        actual_ext = exiftool.asdict().get("File:FileTypeExtension").lower()
        if ext != actual_ext and (ext not in ("jpg", "jpeg") or actual_ext != "jpg"):
            # WARNING: Does not check for name collisions; left as an exercise for the reader
            verbose(f"Fixing extension for {filepath} from {ext} to {actual_ext}")
            new_filepath = filepath.with_suffix(f".{actual_ext}")
            verbose(f"Renaming {filepath} to {new_filepath}")
            filepath.rename(new_filepath)

@PetrochukM
Copy link
Contributor

I have 30k photos or so and had a couple of hundred issues. I had several issues with NEF, PNG, and JPEG files. I have been collecting my photos from across the internet, so I intend on doing reoccurring exports as I collect more photos! I have been manually fixing these issues, downloading the images, renaming them, and uploading them. These issues have been coming up regularly, especially with Facebook exports and some iPhone screenshots.

Unfortunately, Apple Photos sometimes bugs or crashes when dealing with these types of files. That has made it more challenging to resolve this issue. Today, I needed to find the original files because I could not export the original files from Apple Photos without crashing.

Thanks for providing the script, I'll try it out!

If I use it with --cleanup, it'll just re-do this every time. I am okay with that. It's only a couple hundred photos, and this should be a pretty quick operation to rename the files, yeah?

@oPromessa
Copy link
Contributor

oPromessa commented Dec 28, 2022

Hi @PetrochukM

What I do is to try and fix as many problems (extension, EXIF date times, quick time date times, conversions/rotate videos) prior to uploading them to Photos so that osxphotos can then apply it's magic.

  • I've setup a few functions on bash shell in order to facilitate my job. They are not production ready for any use but with some tweaks. It may help you. Use at your own risk ;). The examples follow below. I have them on my ~/.bash_login file.
  • I also bumped into this exiftool-scripts-for-takeout geared at adapting content from Google Takeout but it might be helpful for your photos collected "from across the internet" ;)

PLEASE NOTE MOST OF THESE COMMANDS USE THE -overwrite_original_in_place WHICH WRITES OVER THE FILE ITSELF. SO DO TEST IT OUT AND SAVE COPIES OF THE ORIGINALS, just in case!!!

  • Place this file in a location of your chosing and point to it in the exif() function definition.
###############################################################################
#
# EXIFTOOL related commands
#
###############################################################################

#==============================================================================
# exif() Displays some key EXIF tags from files/directories recursively.
#------------------------------------------------------------------------------
function exif() {
	# To overcome incompatibility of xattr (loaded by osxphotos).
	# Force PATH to force xattr to be sourced from /usr/bin
	# *** Adapt/change the location of the txt file
	PATH=/usr/bin:$PATH exiftool -r -d """%Y:%m:%d %H:%M:%S""" -p """/Users/YourUser/format.txt""" -f "$@"
}
export -f exif
#------------------------------------------------------------------------------

#==============================================================================
# exiforiginaldate() Copies the original date from EXIF to all other dates in the file.  To align all.
#------------------------------------------------------------------------------
function exiforiginaldate () {
	echo Performing... """-CreateDate\<DateTimeOriginal""" """-FileModifyDate\<DateOriginalDate"""
	# 2017.04.13 Included changing the ModifyDate tag (operating system level, which could also be set via touch command)
	exiftool -d """%Y:%m:%d %H:%M:%S""" -overwrite_original_in_place -fileOrder DateTimeOriginal """-CreateDate<DateTimeOriginal""" """-FileModifyDate<DateTimeOriginal""" """-ModifyDate<DateTimeOriginal""" -v "$@"
}
export -f exiforiginaldate
#------------------------------------------------------------------------------

#==============================================================================
# exifKeysdate() Copies Keys:Creation date into OriginalDate. Some videos (QT) have this as the correct date and don't have OriginalDate and/or FileModify sate is wrong.
#------------------------------------------------------------------------------
function exifKeysdate () {
	echo Performing... ""-DateTimeOriginal\<Keys:CreationDate"" 
	exiftool -overwrite_original_in_place """-DateTimeOriginal<Keys:CreationDate""" -P -v "$@"
}
export -f exifKeysdate
#------------------------------------------------------------------------------

#==============================================================================
# exifpng2jpg() # Command which convert .PNG files into .JPG and copies EXIF fields ...
#------------------------------------------------------------------------------
function exifpng2jpg() {
# Command which convert .PNG files into .JPG and copies EXIF fields ...

	for a in "$@"
	do
		fname=`basename "$a" .png`
		xtension=png
		echo 1st -  ${fname}.${xtension}

		if [ \( -f "${fname}".png -o -f "${fname}".PNG \) -a ! -f "${fname}".jpg  ]
		then
			echo ok -  ${fname}.${xtension}
			sips -s format jpeg -s formatOptions 90 """${fname}".${xtension}"" --out """${fname}".jpg""
			exiftool -overwrite_original_in_place -TagsFromFile "${fname}".${xtension} "-all:all>all:all" "${fname}".jpg
		else
			fname=`basename "$a" .PNG`
			xtension=PNG
			echo 2nd -  ${fname}.${xtension}

			if [ \( -f "${fname}".png -o -f "${fname}".PNG \) -a ! -f "${fname}".jpg  ]
			then
				echo ok -  ${fname}.${xtension}
				sips -s format jpeg -s formatOptions 90 """${fname}".${xtension}"" --out """${fname}".jpg""
				exiftool -overwrite_original_in_place -TagsFromFile "${fname}".${xtension} "-all:all>all:all" "${fname}".jpg
			fi
		fi	
	done
}  
export -f exifpng2jpg
#------------------------------------------------------------------------------

#==============================================================================
# exiftif2jpg() # Command which convert .TIF files into .JPG and copies EXIF fields ... Uses ModifyDate as the source for DateTimeOriginal, FileModifyDate, CreateDate fields
#------------------------------------------------------------------------------
function exiftif2jpg() {
# Command which convert .TIF files into .JPG and copies EXIF fields ... Uses ModifyDate as the source for DateTimeOriginal, FileModifyDate, CreateDate fields

	for a in "$@"
	do
		fname=`basename "$a" .tif`
		dname=`dirname "$a" `
		if [ -f "${dname}/${fname}".tif -a ! -f "${dname}/${fname}".jpg  ]
		then
			echo ok -  ${fname}
			sips -s format jpeg -s formatOptions 100 """${dname}/${fname}".tif"" --out """${dname}/${fname}".jpg""
			exiftool -overwrite_original_in_place -TagsFromFile "${dname}/${fname}".tif "-all:all>all:all" "${dname}/${fname}".jpg
			# WHy am I using ModifyDate and referencing FileModifyDate?
			echo Performing... ""-DateTimeOriginal\<FileModifyDate"" ""-CreateDate\<FileModifyDate"" on "${dname}/${fname}".jpg
			exiftool -overwrite_original_in_place """-DateTimeOriginal<ModifyDate""" """-CreateDate<ModifyDate""" """-FileModifyDate<ModifyDate""" -P -v "${dname}/${fname}".jpg
		fi	
	done
}  
export -f exiftif2jpg
#------------------------------------------------------------------------------

#==============================================================================
# exifsetdate()Takes a parameters in the format %Y:%m:%d %H:%M:%S and adjust the Original date of file.
#------------------------------------------------------------------------------
function exifsetdate() {
	
	datetoset=${1}; echo $datetoset
	shift
	exiftool -d """%Y:%m:%d %H:%M:%S""" -overwrite_original_in_place -fileOrder DateTimeOriginal -DateTimeOriginal="""${datetoset}"""  -v "$@"
}
export -f exifsetdate
#------------------------------------------------------------------------------

#==============================================================================
# exifsetdate()Takes a parameters in the format %Y:%m:%d %H:%M:%S and adjust the ALL dates of file.
#------------------------------------------------------------------------------
function exifsetalldates() {
	
	datetoset=${1}; echo $datetoset
	shift
	exiftool -d """%Y:%m:%d %H:%M:%S""" -overwrite_original_in_place -fileOrder DateTimeOriginal -DateTimeOriginal="""${datetoset}""" -CreateDate="""${datetoset}""" -FileModifyDate="""${datetoset}""" -ModifyDate="""${datetoset}""" -v "$@"
}
export -f exifsetalldates
#------------------------------------------------------------------------------

#==============================================================================
# exifrenamefile() Rename the file based on it's Original date. File name format will be: IMG_%Y%m%d_%H%M%S%%-c.%%e and keeping the same extension.
#------------------------------------------------------------------------------
function exifrenamefile() {
	#exiftool -v4 -r -d """%Y-%m-%d %H.%M.%S%%-c.%%le""" '-filename<DateTimeOriginal' -f "$@"
	# %%le lowers the case of extension. Hopefully %%e keeps the same extension
	exiftool -v4 -r -d """IMG_%Y%m%d_%H%M%S%%-c.%%e""" '-filename<DateTimeOriginal' -f "$@"
	echo "Note: Do you need to run exiforiginaldate?"
}
export -f exifrenamefile

#------------------------------------------------------------------------------

#==============================================================================
# exifcopytags()
#   - srcfile
#   - dstfile
# Copy all metadata from one file to another
# exiftool -TagsFromFile srcimage.jpg "-all:all>all:all" targetimage.jpg
# USE -overwrite_original_in_place in applicable
# USE WITHOUT all:all will copy everything including GPS data (works with MP4)
#------------------------------------------------------------------------------
function exifcopytags() {
E_BADARGS=85
E_BADFILES=90

	if [ ! -n "$1" -o ! -n "$2" ]
	then
		echo "Usage: exifcopytags srcfile dstfile"
		return $E_BADARGS
	elif [ -f "$1" ]
	then
		srcfile=${1}; shift

		echo Copying all tags from "${srcfile}" to "$@"
		exiftool -v -overwrite_original_in_place -TagsFromFile "${srcfile}" "${@}"
	else
		echo "Usage: file(s) not found!"
		echo "Usage: exifcopytags srcfile dstfile"
		return $E_BADFILES
	fi  
}
export -f exifcopytags
#------------------------------------------------------------------------------

#==============================================================================
# exifcleantags()
#   - files
# Claen all metadata from one file
# exiftool -overwrite_original -all= -gps:all= *.jpg
# USE -overwrite_original_in_place in applicable
#------------------------------------------------------------------------------
function exifcleantags() {
E_BADARGS=85
E_BADFILES=90

	if [ ! -n "$1" ]
	then
		echo "Usage: exifcleantags files"
		return $E_BADARGS
	elif [ -f "$1" ]
	then
		echo Cleaning all tags from "$@"
		exiftool -v -overwrite_original -all= -gps:all= "${@}"
	else
		echo "Usage: file(s) not found!"
		echo "Usage: exifcleantags files"
		return $E_BADFILES
	fi  
}
export -f exifcleantags
#------------------------------------------------------------------------------

@PetrochukM
Copy link
Contributor

PetrochukM commented Dec 29, 2022

Thanks for sending that. I am not sure I am ready to copy all my photos and run that over them. I think it's small enough that I don't want to take that risk (:

That said, I did learn from this that EXIF might be hiding metadata that could be useful to me! I have many mystery files that I am struggling to pin down the creation date for.

@RhetTbull
Copy link
Owner Author

@PetrochukM I've added a script find_bad_extensions.py to the examples directory. This will scan your Photos library to find all photos that have a bad extension. It caches the results so when you re-run it, it doesn't have to scan each photo again.

If you save the file at the link above to find_bad_extensions.py you can run with osxphotos via: osxphotos run find_bad_extensions.py. You can see the help (pasted below) using osxphotos run find_bad_extensions.py --help. Info on files with bad extensions is output as CSV format to STDOUT so you can do this to capture the results: osxphotos run find_bad_extensions.py > results.csv.

Usage: osxphotos run find_bad_extensions.py [OPTIONS]

  Scan Photos library to find photos with bad (incorrect) file extensions.

  This can be run with osxphotos via: `osxphotos run find_bad_extensions.py`

  Both STDOUT and STDERR are used to output results.

  STDOUT is used to output a CSV file with the following columns:

  uuid, original_filename, version, current_extension, correct_extension, path

  Thus, to save the results to a file, run:

  osxphotos run find_bad_extensions.py > results.csv

Options:
  --library PATH  Path to Photos library to use. Default is to use default
                  Photos library.
  --recheck       Recheck all files even if previously checked and cached.
  --edited        Check edited versions of photos in addition to originals.
  --help          Show this message and exit.

You can use this to find all the bad extensions then you can export those, fix the extension and re-import them if desired. I intend to eventually add this as an osxphotos command to automatically fix the extensions by automating the export, fix, re-import (and re-apply metadata). Ref. #336

@RhetTbull
Copy link
Owner Author

I have many mystery files that I am struggling to pin down the creation date for.

You might want to check out osxphotos help timewarp to learn about the timewarp tool which can bulk-adjust the creation date for photos. It can also pull the creation date from EXIF (--pull-exif) and looks at a much wider range of EXIF fields than Photos does.

@PetrochukM
Copy link
Contributor

Thank you so much for this! This should be really helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Pertains to the command line interface feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants