Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reddit] ignore '/message/compose' URLs without www subdomain #4581

Closed
wants to merge 3 commits into from
Closed

[reddit] ignore '/message/compose' URLs without www subdomain #4581

wants to merge 3 commits into from

Conversation

Hrxn
Copy link
Contributor

@Hrxn Hrxn commented Sep 25, 2023

🤷‍♂️

Feel free to make any changes w.r.t formatting etc.

I stayed within 80 cols.
It's crazy low, IMHO.

@Hrxn
Copy link
Contributor Author

Hrxn commented Sep 25, 2023

Waaaaaiit...

I just saw this in my unsupported file:

[2023-09-25T03:12:26] https://www.reddit.com/message/compose?to=%2Fr%2FNSFWfashion
[2023-09-25T03:12:26] https://www.reddit.com/message/compose/?to=/r/NSFWfashion

Running latest sources, obv.
I don't get it. Is it actually working at all?!

@mikf
Copy link
Owner

mikf commented Sep 26, 2023

I've tested with your example from #4482 and it works there:

# without d7aac9fc
$ gallery-dl -g -o comments=100 https://www.reddit.com/r/NSFWfashion/comments/15mp3k7/danielle_fisser_aic/
https://cdn.imgchest.com/files/k739cpq85v7.jpg
https://www.reddit.com/message/compose?to=%2Fr%2FNSFWfashion
https://www.reddit.com/message/compose/?to=/r/NSFWfashion
https://imgchest.com/p/ej7m6nzj4dl

# with 
$ gallery-dl -g -o comments=100 https://www.reddit.com/r/NSFWfashion/comments/15mp3k7/danielle_fisser_aic/
https://cdn.imgchest.com/files/k739cpq85v7.jpg
https://imgchest.com/p/ej7m6nzj4dl

(If it doesn't appear with -g, it would also not be written to an unsupported file)

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
@Hrxn
Copy link
Contributor Author

Hrxn commented Sep 26, 2023

Yeah, I had kind of a hunch here.. I mean, it's just a small bit of code, and the logic looks absolutely sound.

Honestly, no idea what was going on there.

Could some kind of shenanigans with caching by the Python interpreter be a possible cause?

After running gallery-dl from the source directory there are those __pycache__ directories and <filename>.cpython-311.pyc files. I don't think this should be an issue, but who knows. I'll just test it again first.

@Hrxn
Copy link
Contributor Author

Hrxn commented Sep 26, 2023

What. The. Actual. Fuck.

PS D:\> python.exe 'C:\Sources\gallery-dl-master\gallery_dl' --version
1.25.8
PS D:\> python.exe 'C:\Sources\gallery-dl-master\gallery_dl\__main__.py' --version
1.26.0-dev
PS D:\>

@Hrxn
Copy link
Contributor Author

Hrxn commented Sep 26, 2023

Apparently not? #4554 (reply in thread)

@mikf
Copy link
Owner

mikf commented Sep 26, 2023

Seems that python.exe 'C:\Sources\gallery-dl-master\gallery_dl' does not run __main__.py, which would do some sys.path/PYTHONPATH adjustments to make sure import gallery_dl imports the correct gallery-dl module.

if __package__ is None and not hasattr(sys, "frozen"):
import os.path
path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, os.path.realpath(path))

(yt-dlp/youtube-dl do the same in their __main__.py)

@Hrxn
Copy link
Contributor Author

Hrxn commented Sep 26, 2023

Surprising behaviour, to say the least..

Quoting from https://docs.python.org/3/using/cmdline.html#using-on-cmdline

<script>

Execute the Python code contained in script, which must be a filesystem path (absolute or relative) referring to either a Python file, a directory containing a main.py file, or a zipfile containing a main.py file.

If this option is given, the first element of sys.argv will be the script name as given on the command line.

If the script name refers directly to a Python file, the directory containing that file is added to the start of sys.path, and the file is executed as the main module.

If the script name refers to a directory or zipfile, the script name is added to the start of sys.path and the main.py file in that location is executed as the main module.

-I option can be used to run the script in isolated mode where sys.path contains neither the script’s directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too.

Raises an auditing event cpython.run_file with argument filename.

See also
runpy.run_path()
Equivalent functionality directly available to Python code

If no interface option is given, -i is implied, sys.argv[0] is an empty string ("") and the current directory will be added to the start of sys.path. Also, tab-completion and history editing is automatically enabled, if available on your platform (see Readline configuration).

Not sure, based on this description, sounds like it should work?

The only other thing I've found so far with a search on this (seriously, screw the search engines here - 99% of all hits, links, stackoverflow entries etc. seem to deal with which version of python to run, not which version of a script)

In analogy to the small test / example I did in #4554
No difference in results between directory name with trailing slash, and without (unlike the test from my comment linked above)

PS D:\> $env:PYLAUNCHER_DEBUG = 1
PS D:\> py C:\Sources\gallery-dl-master\gallery_dl\ --version
argv0: C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.exe
version: 3.11.5
# Failed to open C:\Sources\gallery-dl-master\gallery_dl\ for shebang parsing (0x00000003)
# Reading from C:\Users\Hrxn\AppData\Local\py.ini for defaults/python
# Did not find file C:\Users\Hrxn\AppData\Local\py.ini
# Reading from C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini for defaults/python
# Did not find file C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini
SearchInfo.originalCmdLine: "C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.exe" C:\Sources\gallery-dl-master\gallery_dl\ --version
SearchInfo.restOfCmdLine:  C:\Sources\gallery-dl-master\gallery_dl\ --version
SearchInfo.executablePath: (null)
SearchInfo.scriptFile: C:\Sources\gallery-dl-master\gallery_dl\
SearchInfo.executable: python.exe
SearchInfo.executableArgs: (null)
SearchInfo.company: (null)
SearchInfo.tag: (null)
SearchInfo.oldStyleTag: False
SearchInfo.lowPriorityTag: False
SearchInfo.allowDefaults: True
SearchInfo.allowExecutableOverride: True
SearchInfo.windowed: False
SearchInfo.list: False
SearchInfo.listPaths: False
SearchInfo.help: False
SearchInfo.limitToCompany: (null)
 -V:3.11          C:\Users\Hrxn\AppData\Local\Programs\Python\Python311\python.exe
env.company: PythonCore
env.tag: 3.11
# about to run: C:\Users\Hrxn\AppData\Local\Programs\Python\Python311\python.exe C:\Sources\gallery-dl-master\gallery_dl\ --version
1.25.8
child process exit code: 0
PS D:\>

and

PS D:\> $env:PYLAUNCHER_DEBUG = 1
PS D:\> py C:\Sources\gallery-dl-master\gallery_dl --version
argv0: C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.exe
version: 3.11.5
# Failed to open C:\Sources\gallery-dl-master\gallery_dl for shebang parsing (0x00000005)
# Reading from C:\Users\Hrxn\AppData\Local\py.ini for defaults/python
# Did not find file C:\Users\Hrxn\AppData\Local\py.ini
# Reading from C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini for defaults/python
# Did not find file C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini
SearchInfo.originalCmdLine: "C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.exe" C:\Sources\gallery-dl-master\gallery_dl --version
SearchInfo.restOfCmdLine:  C:\Sources\gallery-dl-master\gallery_dl --version
SearchInfo.executablePath: (null)
SearchInfo.scriptFile: C:\Sources\gallery-dl-master\gallery_dl
SearchInfo.executable: python.exe
SearchInfo.executableArgs: (null)
SearchInfo.company: (null)
SearchInfo.tag: (null)
SearchInfo.oldStyleTag: False
SearchInfo.lowPriorityTag: False
SearchInfo.allowDefaults: True
SearchInfo.allowExecutableOverride: True
SearchInfo.windowed: False
SearchInfo.list: False
SearchInfo.listPaths: False
SearchInfo.help: False
SearchInfo.limitToCompany: (null)
 -V:3.11          C:\Users\Hrxn\AppData\Local\Programs\Python\Python311\python.exe
env.company: PythonCore
env.tag: 3.11
# about to run: C:\Users\Hrxn\AppData\Local\Programs\Python\Python311\python.exe C:\Sources\gallery-dl-master\gallery_dl --version
1.25.8
child process exit code: 0
PS D:\>

with full path to __main__.py

PS D:\> $env:PYLAUNCHER_DEBUG = 1
PS D:\> py C:\Sources\gallery-dl-master\gallery_dl\__main__.py --version
argv0: C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.exe
version: 3.11.5
# Read 563 bytes from C:\Sources\gallery-dl-master\gallery_dl\__main__.py to find shebang line
Shebang: /usr/bin/env python3
# Did not find python3.exe on PATH
# Reading from C:\Users\Hrxn\AppData\Local\py.ini for commands/python3
# Did not find file C:\Users\Hrxn\AppData\Local\py.ini
# Reading from C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini for commands/python3
# Did not find file C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini
# Treating shebang command '3' as 'py -3'
# Reading from C:\Users\Hrxn\AppData\Local\py.ini for defaults/python3
# Did not find file C:\Users\Hrxn\AppData\Local\py.ini
# Reading from C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini for defaults/python3
# Did not find file C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.ini
SearchInfo.originalCmdLine: "C:\Users\Hrxn\AppData\Local\Programs\Python\Launcher\py.exe" C:\Sources\gallery-dl-master\gallery_dl\__main__.py --version
SearchInfo.restOfCmdLine:  C:\Sources\gallery-dl-master\gallery_dl\__main__.py --version
SearchInfo.executablePath: (null)
SearchInfo.scriptFile: C:\Sources\gallery-dl-master\gallery_dl\__main__.py
SearchInfo.executable: python.exe
SearchInfo.executableArgs:
SearchInfo.company: (null)
SearchInfo.tag: 3
SearchInfo.oldStyleTag: True
SearchInfo.lowPriorityTag: False
SearchInfo.allowDefaults: True
SearchInfo.allowExecutableOverride: True
SearchInfo.windowed: False
SearchInfo.list: False
SearchInfo.listPaths: False
SearchInfo.help: False
SearchInfo.limitToCompany: (null)
 -V:3.11          C:\Users\Hrxn\AppData\Local\Programs\Python\Python311\python.exe
env.company: PythonCore
env.tag: 3.11
# about to run: C:\Users\Hrxn\AppData\Local\Programs\Python\Python311\python.exe C:\Sources\gallery-dl-master\gallery_dl\__main__.py --version
1.26.0-dev
child process exit code: 0
PS D:\>

The py launcher understands the shebang, okay, but what seems more interesting to me:

SearchInfo.scriptFile: C:\Sources\gallery-dl-master\gallery_dl\
vs.
SearchInfo.scriptFile: C:\Sources\gallery-dl-master\gallery_dl
vs.
SearchInfo.scriptFile: C:\Sources\gallery-dl-master\gallery_dl\__main__.py

@mikf
Copy link
Owner

mikf commented Sep 26, 2023

The problem seems to be the if __package__ is None check in __main__.py combined with the different behaviors depending on how __main__.py is run.

All of

  • py C:\Sources\gallery-dl-master\gallery_dl\__main__.py
  • py C:\Sources\gallery-dl-master\gallery_dl\
  • py C:\Sources\gallery-dl-master\gallery_dl
  • py -m gallery_dl

run __main__.py, but only launching it directly sets __package__ to None. Launching py …\gallery_dl with or without trailing \ sets __package__ to an empty string and sys.path doesn't get updated.

The solution is probably just

diff --git a/gallery_dl/__main__.py b/gallery_dl/__main__.py
index 441009fe..9832190d 100644
--- a/gallery_dl/__main__.py
+++ b/gallery_dl/__main__.py
@@ -9,7 +9,7 @@
 
 import sys
 
-if __package__ is None and not hasattr(sys, "frozen"):
+if not __package__ and not hasattr(sys, "frozen"):
     import os.path
     path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
     sys.path.insert(0, os.path.realpath(path))

@Hrxn
Copy link
Contributor Author

Hrxn commented Sep 29, 2023

@mikf I've just tested the changes you suggested to __main__.py

Before

C:\>python.exe C:\Sources\gallery-dl-master\gallery_dl --version
1.25.8

C:\>python.exe C:\Sources\gallery-dl-master\gallery_dl\ --version
1.25.8

C:\>python.exe C:\Sources\gallery-dl-master\gallery_dl\__main__.py --version
1.26.0-dev

C:\>

After

C:\>python.exe C:\Sources\gallery-dl-master\gallery_dl --version
1.26.0-dev

C:\>python.exe C:\Sources\gallery-dl-master\gallery_dl\ --version
1.26.0-dev

C:\>python.exe C:\Sources\gallery-dl-master\gallery_dl\__main__.py --version
1.26.0-dev

C:\>

As you said...

Of course, I can't say anything about what is considered "best practice", or idiomatic Python, or considered doing this the "right" way.

I definitely don't want any changes just to work around some Windows specific issue.

Anyway, this is a PR about filtering another URL from reddit.py, so I dunno, follow proper protocols and continue this in a new issue?

@mikf
Copy link
Owner

mikf commented Oct 1, 2023

I've technically merged your reddit related changes on Tuesday (37184eb), but GitHub apparently doesn't acknowledge that, probably because you added some more commits since.

I got sick around that time, felt like shit, and decided not to do anything on GitHub until it got better.

The __main__.py related fix we talked about is in 7150c4c and I did a symlink related change in 4477808.

@mikf mikf closed this Oct 1, 2023
@Hrxn
Copy link
Contributor Author

Hrxn commented Oct 2, 2023

Not sure, I'm seeing 37184eb now, from Tuesday, as you say, but it did not appear (at least for me) in the overall commit list (https://github.com/mikf/gallery-dl/commits/master/) until Sunday.

Maybe it's just GitHub being weird again, or a normal delay between the repo update itself and the repo web page update, which can take some time occasionally, it seems 😄

Maybe it's just that one shows the date from git commit and the other only registers changes on git push, no clue, haven't tested this.

I got sick around that time, felt like shit, and decided not to do anything on GitHub until it got better.

Yeah, that sucks. I can't stand being in front of a computer if I'm feeling sick, perfectly understandable.
Just take all the time you need for yourself.

The main.py related fix we talked about is in 7150c4c and I did a symlink related change in 4477808.

Perfect, if this is indeed the optimal way to do this, regardless of platform. Again, if it were a Windows-only workaround I'd be not really happy about it.

(yt-dlp/youtube-dl do the same in their __main__.py)

Does this mean that they would be still affected by this kinda "unintuitive" (to say the least) behavior?
If yes, they maybe should be notified?

@mikf
Copy link
Owner

mikf commented Oct 2, 2023

Not sure, I'm seeing 37184eb now, from Tuesday, as you say, but it did not appear (at least for me) in the overall commit list (https://github.com/mikf/gallery-dl/commits/master/) until Sunday.

I merged this PR locally (with scripts/pull-request) last Tuesday, but I only git pushed it and some other commits to GitHub on Sunday, so it was only on GitHub and visible there since yesterday.

Usually, GitHub considers a PR merged when I merge and push like that, but not here since you pushed another commit after my local merge, I think.

I can't stand being in front of a computer if I'm feeling sick

That's fine for me, it usually even helps, but interacting with others is something I'd like to avoid then.

if this is indeed the optimal way to do this, regardless of platform

I actually don't know myself if this is how it's supposed to be done. I didn't even know you could run python with just the directory path as argument. I've only ever used python -m gallery_dl myself.

Does this mean that they would be still affected by this kinda "unintuitive" (to say the least) behavior?
If yes, they maybe should be notified?

They should be affected by this, since their code only checks for __package__ is None as gallery-dl did, but, again, I do not know whether the changes from 7150c4c have any unintended consequences and break some user scripts down the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants