Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add phasher #3864

Merged
merged 9 commits into from
Jul 11, 2023
Merged

add phasher #3864

merged 9 commits into from
Jul 11, 2023

Conversation

aghoulcoder
Copy link
Contributor

@aghoulcoder aghoulcoder commented Jun 29, 2023

A simple phasher program that accepts a video files as a command line arguments and calculates and prints their PHASHes.

$ phasher /tmp/test.mp4
d39239890743d9f6 /tmp/test.mp4

The goal of this separate executable is to have a simple way to calculate phashes that doesn't depend on a full stash instance so that third-party systems and tools can independently generate PHASHes which can be used for interacting with stash and stash-box APIs and data.

Currently phasher is built in the default make target along with stash by simply running make.
Build with make phasher.
Cross-platform targets have not been considered.

Concurrency is intentionally not implemented because it is simpler to use GNU Parallel. For example:

parallel phasher {} ::: *.mp4

P.S. This is the most go I've ever written and my first attempt at reading stash code, so a review and feedback would be appreciated.

A simple `phasher` program that accepts a video file as a command line
argument and calculates and prints its PHASH.

The goal of this separate executable is to have a simple way to
calculate phashes that doesn't depend on a full stash instance so that
third-party systems and tools can independently generate PHASHes which
can be used for interacting with stash and stash-box APIs and data.

Currently `phasher` is built in the default make target along with
`stash` by simply running `make`.
Cross-platform targets have not been considered.

Concurrency is intentionally not implemented because it is simpler to
use [GNU Parallel](https://www.gnu.org/software/parallel/).
For example:
```
parallel phasher {} ::: *.mp4
```
@aghoulcoder
Copy link
Contributor Author

I've been made aware that this tool by @peolic also exists: https://github.com/peolic/videohashes

But I argue that this should be included in stash because stash's PHASH is a bespoke and authoritative implementation of the algorithm for video files. Therefore, anyone who has used stash and is looking for a simple to use implementation of PHASH will be looking here first, as did I.

Makefile Outdated
go build $(OUTPUT) $(BUILD_FLAGS) ./cmd/stash
go build $(BUILD_FLAGS) ./phasher.go
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it a separate make target?

Maybe make build-phasher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, it should have it's own make target. I just wanted to push early for review. I'll work on it...

@DingDongSoLong4
Copy link
Collaborator

This is definitely very useful, and I agree that having this be part of stash rather than separate is better for discovery and easier to keep up to date.

I just would suggest moving phasher.go to cmd/phasher/main.go, to keep in line with the recommended go project layout (https://github.com/golang-standards/project-layout).

If we're going to release phasher binaries then cross-compile targets would be needed, otherwise I don't think they're necessary.

For argument handling, I think spf13/cobra would be overkill. You could use spf13/pflag (which stash already uses) to parse a help flag for a simple usage info printout. Check out the Usage variable in spf13/pflag, which lets you override the usage message so you can display normal arguments in addition to flags.

I think it would also be great to add support for multiple command-line file arguments (concurrency not required), stdin input and JSON output, but that can definitely come in a later PR.

The make target still needs to be integrated into the rest of the
Makefile so it can be built as part of normal releases.
@aghoulcoder
Copy link
Contributor Author

aghoulcoder commented Jul 2, 2023

Thanks for the pointers @DingDongSoLong4

I have:

  • Separated phasher into its own make target. But it's disconnected from the reset of the Makefile which is melting my brain a bit.
  • Added usage output and a --quiet option that only prints the phash.

Here are some things that have been mentioned, which I recommend not doing:

  • Don't handle multiple file inputs. Handling multiple files is inefficient if they are just going to be processed sequentially, compared to using gnu parallel.
  • Don't impelment concurrency. If you run phasher as part of a pipeline that includes other tools with concurrency, you will need some way to balance cpu cores between the tools. Instead, you can run the entire pipeline concurrently with gnu parallel.
  • JSON output should not be implemented. The current output format is intentional and in-line with gnu coreutils hashing programs (i.e. md5sum, sha1sum, etc..): It prints the hash, followed by a space, and then the filename. With -q it prints the hash only. It does not get any simpler than this.

I'm still trying to untangle this Makefile in my mind to decide how best to restructure it so all the build options and flags are re-used in an efficient and easy to read way for both stash and phasher.

@DingDongSoLong4
Copy link
Collaborator

Here are some things that have been mentioned, which I recommend not doing:

* Don't handle multiple file inputs. Handling multiple files is inefficient if they are just going to be processed sequentially, compared to using gnu parallel.

* Don't impelment concurrency. If you run phasher as part of a pipeline that includes other tools with concurrency, you will need some way to balance cpu cores between the tools. Instead, you can run the entire pipeline concurrently with gnu parallel.

* JSON output should not be implemented. The current output format is intentional and in-line with gnu coreutils hashing programs (i.e. `md5sum`, `sha1sum`, etc..): It prints the hash, followed by a space, and then the filename. With `-q` it prints the hash only. It does not get any simpler than this.

Agreed, with the exception of multiple file inputs. I was looking at peolic's implementation and saw it had JSON output so I mentioned it, but that outputs more than just the phash so a JSON output makes sense. This implementation does not need it, the output is simple enough.

Regarding multiple inputs, I do understand your reasoning, but it shouldn't be at all complicated to get working so I don't really see a downside. I'm assuming you use Linux, and I do as well, so I'm totally on board with GNU parallel, but many (most) use Windows so would not have easy access to parallel (plus I myself haven't got around to learning it yet, as I think might be the case for many).

You mention md5sum and sha1sum, which do support hashing multiple file arguments sequentially, so if we're trying to emulate them then phasher should support this as well. Also to be a bit pedantic those utilities actually output two spaces (well most of the time, see the manpage) between the hash and filename, so if we're going for compatibility then that is a minor difference. It doesn't really affect "parseability" though, so your call.

Concurrency on the other hand would definitely add unnecessary complexity, so I agree that it shouldn't be added.

I'm still trying to untangle this Makefile in my mind to decide how best to restructure it so all the build options and flags are re-used in an efficient and easy to read way for both stash and phasher.

This has prompted me to make a few improvements to the Makefile in #3876, in order for the following to work as the phasher target:

.PHONY: build-phasher
build-phasher:
	go build -v $(GO_BUILD_FLAGS) -ldflags "$(LDFLAGS)" ./cmd/phasher

phasher doesn't actually need any of the build flags or variables set by the build-flags and pre-build targets, so it doesn't need them as prerequisites.

For consistency with the rest of the file I've changed the target name to build-phasher rather than phasher. We're basically just using the Makefile as a glorified sort-of-cross-plaform shell script so having the name match the binary isn't really necessary.

You can then just add build-phasher as another prerequisite of build-release and then build-release will build release versions of both phasher and stash itself. The cross-compilation targets should then also work, though I haven't tested that.

@aghoulcoder
Copy link
Contributor Author

Very quickly put together handling of multiple arguments. Doesn't do anything with stdin.

@aghoulcoder
Copy link
Contributor Author

aghoulcoder commented Jul 3, 2023

@DingDongSoLong4 Thanks for helping out with that Makefile. Hopefully it can be merged soon.

Let me know if there's something more you think could be improved in phasher itself at this point.

Regarding make target names, while I understand that we're quite far from the classic C programs that make was designed for, it still makes sense to name the build target exactly like the resulting executable.

  1. The commands make phasher and make stash make sense intuitively and are shorter to type than make build-phasher.

  2. With properly named targets you can use the $@ automatic variable to reference the name of the make target, reducing repetition.
    For example, this:

    build-phasher:
        go build -o phasher -trimpath -buildmode=pie -ldflags '-s -w' ./cmd/phasher

    can be written more concisely like this:

    phasher:
        go build -o $@ -trimpath -buildmode=pie -ldflags '-s -w' ./cmd/$@

    And all this:

    .PHONY: cross-compile-linux-arm32v6
    cross-compile-linux-arm32v6: export GOOS := linux
    cross-compile-linux-arm32v6: export GOARCH := arm
    cross-compile-linux-arm32v6: export GOARM := 6
    cross-compile-linux-arm32v6: export CC := arm-linux-gnueabi-gcc
    cross-compile-linux-arm32v6: OUTPUT := -o dist/stash-linux-arm32v6
    cross-compile-linux-arm32v6: build-release-static

    can be written like this:

    .PHONY: linux-arm32v6
    linux-arm32v6: export GOOS := linux
    linux-arm32v6: export GOARCH := arm
    linux-arm32v6: export GOARM := 6
    linux-arm32v6: export CC := arm-linux-gnueabi-gcc
    linux-arm32v6: OUTPUT := -o dist/stash-$@
    linux-arm32v6: build-release-static

@aghoulcoder
Copy link
Contributor Author

I just realized I did something dumb.
Because I very hastily put together the code for iterating over multiple command line arguments, I accidentally made it so that FFMPEG and FFPROBE are initialized on every loop, which is pointless and inefficient.
Working on it...

@aghoulcoder
Copy link
Contributor Author

aghoulcoder commented Jul 6, 2023

@WithoutPants just fyi, I don't intend on changing anything else on this PR for now. It's ready for review whenever you want.
Original description in the PR is updated and valid.

@WithoutPants WithoutPants added the feature Pull requests that add a new feature label Jul 11, 2023
@WithoutPants WithoutPants added this to the Version 0.22.0 milestone Jul 11, 2023
@WithoutPants WithoutPants merged commit 969af2a into stashapp:develop Jul 11, 2023
randemgame added a commit to randemgame/SexyStash that referenced this pull request Jul 12, 2023
commit 67d4f97
Author: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
Date:   Wed Jul 12 11:51:52 2023 +1000

    Multiple scene URLs (stashapp#3852)

    * Add URLs scene relationship
    * Update unit tests
    * Update scene edit and details pages
    * Update scrapers to use urls
    * Post-process scenes during query scrape
    * Update UI for URLs
    * Change urls label

commit 76a4bfa
Author: chickenwingavalanche <138962341+chickenwingavalanche@users.noreply.github.com>
Date:   Tue Jul 11 19:25:24 2023 -0600

    Add keyboard shortcut to toggle video looping in scene player (stashapp#3902)

    * Use shift+L to toggle video looping in scene player

commit 3e810cf
Author: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
Date:   Wed Jul 12 10:53:46 2023 +1000

    Add nil checks in identify (stashapp#3905)

commit c1352f9
Author: NodudeWasTaken <75137537+NodudeWasTaken@users.noreply.github.com>
Date:   Wed Jul 12 02:45:33 2023 +0200

    Safari video height css fix (stashapp#3882)

commit f665aa8
Author: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
Date:   Wed Jul 12 10:38:52 2023 +1000

    Update make target in Dockerfile-CUDA

commit b2b52bc
Author: chickenwingavalanche <138962341+chickenwingavalanche@users.noreply.github.com>
Date:   Tue Jul 11 18:37:46 2023 -0600

    Add missing scene player shortcuts to Help -> Keyboard Shortcuts (stashapp#3903)

    Co-authored-by: chickenwingavalanche <chickenwingavalanche@example.com>

commit 96f2229
Author: DingDongSoLong4 <99329275+DingDongSoLong4@users.noreply.github.com>
Date:   Wed Jul 12 02:05:35 2023 +0200

    Improve Makefile (stashapp#3901)

    * Improve Makefile
    * Make ui targets consistent
    ---------
    Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>

commit 278a064
Author: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
Date:   Tue Jul 11 19:16:22 2023 +1000

    Revert "Add AirPlay and Chromecast support (stashapp#2872)" (stashapp#3898)

    This reverts commit 8e235a2.

commit 0c0ba19
Author: Csaba Maulis <hello@senki.xyz>
Date:   Tue Jul 11 13:54:42 2023 +0800

    Add `-v/--version` flag to print version string (stashapp#3883)

    * Add `-v/--version` flag to print version string

    - Created a new flag `-v/--version` in the command-line interface to display the version number and exit.
    - Moved all version-related functions inside the config package to the new file `manager/config/version.go` to avoid circular dependencies.
    - Added a new `GetVersionString()` function to generate a formatted version string.
    - Updated references to the moved version functions.
    - Updated references in the `Makefile`.

    * Move version embeds to build package

    * Remove githash var

    ---------

    Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>

commit 969af2a
Author: A Ghoul Coder <aghouleditor@protonmail.com>
Date:   Tue Jul 11 07:53:53 2023 +0200

    add phasher (stashapp#3864)

    * add phasher

    A simple `phasher` program that accepts a video file as a command line
    argument and calculates and prints its PHASH.

    The goal of this separate executable is to have a simple way to
    calculate phashes that doesn't depend on a full stash instance so that
    third-party systems and tools can independently generate PHASHes which
    can be used for interacting with stash and stash-box APIs and data.

    Currently `phasher` is built in the default make target along with
    `stash` by simply running `make`.
    Cross-platform targets have not been considered.

    Concurrency is intentionally not implemented because it is simpler to
    use [GNU Parallel](https://www.gnu.org/software/parallel/).
    For example:
    ```
    parallel phasher {} ::: *.mp4
    ```

    * standard dir structure for phasher and separate make target

    The make target still needs to be integrated into the rest of the
    Makefile so it can be built as part of normal releases.

    * phasher: basic usage output and quiet option
    * phasher: allow and process multiple command line arguments
    * phasher: camelCase identifiers
    * phasher: initialize ffmpeg and ffprobe only once

commit cbdd4d3
Author: Flashy78 <90150289+Flashy78@users.noreply.github.com>
Date:   Mon Jul 10 21:37:00 2023 -0700

    Identify: Options to skip multiple results and single name performers (stashapp#3707)

    Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>

commit ff22577
Author: hontheinternet <121332499+hontheinternet@users.noreply.github.com>
Date:   Tue Jul 11 13:32:42 2023 +0900

    Add additional stats to the Stats page (stashapp#3812)

    * Add o_counter, play_duration, play_count, unique_play_count stats

commit 4f0e0e1
Author: hontheinternet <121332499+hontheinternet@users.noreply.github.com>
Date:   Tue Jul 11 13:02:09 2023 +0900

    Allow serving of interactive CSVs directly to Handy (stashapp#3756)

    * allow direct serve interactive CSVs to Handy
    ---------
    Co-authored-by: kermieisinthehouse <kermie@isinthe.house>

commit 8e235a2
Author: CJ <72030708+Teda1@users.noreply.github.com>
Date:   Mon Jul 10 22:47:11 2023 -0500

    Add AirPlay and Chromecast support (stashapp#2872)

    * dynamically load cast_sender.js
    * add https://www.gstatic.com to connectableOrigins
    * Add toggle for chromecast

commit c499c20
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Tue Jul 11 13:40:29 2023 +1000

    Bump semver from 5.7.1 to 5.7.2 in /ui/v2.5 (stashapp#3896)

    Bumps [semver](https://github.com/npm/node-semver) from 5.7.1 to 5.7.2.
    - [Release notes](https://github.com/npm/node-semver/releases)
    - [Changelog](https://github.com/npm/node-semver/blob/v5.7.2/CHANGELOG.md)
    - [Commits](npm/node-semver@v5.7.1...v5.7.2)

    ---
    updated-dependencies:
    - dependency-name: semver
      dependency-type: indirect
    ...

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit f0d901a
Author: plato178 <137155614+plato178@users.noreply.github.com>
Date:   Tue Jul 11 03:45:20 2023 +0100

    Add codec filters (stashapp#3843)

    * Add video_codec and audio_codec filter criteria
    * Add Audio Codec and Video Codec UI filters

commit 93b41fb
Author: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
Date:   Tue Jul 11 11:53:49 2023 +1000

    Add folder rename detection (stashapp#3817)

commit 5c38836
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Tue Jul 11 11:40:49 2023 +1000

    Bump stylelint from 15.1.0 to 15.10.1 in /ui/v2.5 (stashapp#3889)

    Bumps [stylelint](https://github.com/stylelint/stylelint) from 15.1.0 to 15.10.1.
    - [Release notes](https://github.com/stylelint/stylelint/releases)
    - [Changelog](https://github.com/stylelint/stylelint/blob/main/CHANGELOG.md)
    - [Commits](stylelint/stylelint@15.1.0...15.10.1)

    ---
    updated-dependencies:
    - dependency-name: stylelint
      dependency-type: direct:development
    ...

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit cec9195
Author: DingDongSoLong4 <99329275+DingDongSoLong4@users.noreply.github.com>
Date:   Tue Jul 11 03:40:20 2023 +0200

    Fix scene missing flicker on scene page (stashapp#3857)

    * use useLayoutEffect
    * Remove unnecessary nullability in ScenePlayer

commit 0268565
Author: DingDongSoLong4 <99329275+DingDongSoLong4@users.noreply.github.com>
Date:   Tue Jul 11 03:36:57 2023 +0200

    Makefile cleanup (stashapp#3876)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Pull requests that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants