Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pornhub artists from shared site #32936

Open
draconis10 opened this issue Sep 30, 2024 · 9 comments
Open

Pornhub artists from shared site #32936

draconis10 opened this issue Sep 30, 2024 · 9 comments

Comments

@draconis10
Copy link

  • [ x] I'm reporting a broken site support
  • [ x] I've verified that I'm running youtube-dl version 2021.12.17
  • [ x] I've checked that all provided URLs are alive and playable in a browser
  • [ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [ x] I've searched the bugtracker for similar issues including closed ones

Verbose log

C:!Scripts>youtube-dl -o "D:\Live\Test%(uploader)s\Alina Bell - %(title)s.%(ext)s" --cookies C:!Scripts\cookies.txt --no-post-overwrites --fixup never -f bestvideo[ext=mp4]+bestaudio/best[ext=mp4]/best --merge-output-format mp4 --add-metadata -ciw https://www.pornhub.com/view_video.php?viewkey=ph61520d582bb8a -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-o', 'D:\Live\Test\%(uploader)s\Alina Bell - %(title)s.%(ext)s', '--cookies', 'C:\!Scripts\cookies.txt', '--no-post-overwrites', '--fixup', 'never', '-f', 'bestvideo[ext=mp4]+bestaudio/best[ext=mp4]/best', '--merge-output-format', 'mp4', '--add-metadata', '-ciw', 'https://www.pornhub.com/view_video.php?viewkey=ph61520d582bb8a', '-v']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.12.1 (CPython) - Windows-11-10.0.22631-SP0
[debug] exe versions: ffmpeg 7.0.1-full_build-www.gyan.dev, ffprobe 7.0.1-full_build-www.gyan.dev
[debug] Proxy map: {}
[PornHub] ph61520d582bb8a: Downloading pc webpage
[PornHub] ph61520d582bb8a: Downloading m3u8 information
[PornHub] ph61520d582bb8a: Downloading m3u8 information
[PornHub] ph61520d582bb8a: Downloading m3u8 information
[PornHub] ph61520d582bb8a: Downloading m3u8 information
[PornHub] ph61520d582bb8a: Downloading JSON metadata
WARNING: unable to extract view count; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[debug] Invoking downloader on 'https://ev-h.phncdn.com/hls/videos/202109/27/395427631/1080P_4000K_395427631.mp4/index-v1-a1.m3u8?validfrom=1727736273&validto=1727743473&ipa=82.5.46.215&hdl=-1&hash=9utxoyVgeq47sNxxU5fYkcSGMMc%3D'
[download] D:\Live\PTest\Property Sex\Alina Bell - PropertySex Attractive Curvy Latina Real Estate Agent with Amazing Ass Bangs Handyman in Kitchen.mp4 has already been downloaded
[download] 100% of 554.77MiB
[ffmpeg] Adding metadata to 'D:\Live\Test\Property Sex\Alina Bell - PropertySex Attractive Curvy Latina Real Estate Agent with Amazing Ass Bangs Handyman in Kitchen.mp4'
[debug] ffmpeg command line: ffmpeg -y -loglevel "repeat+info" -i "file:D:\Live\Test\Property Sex\Alina Bell - PropertySex Attractive Curvy Latina Real Estate Agent with Amazing Ass Bangs Handyman in Kitchen.mp4" -c copy -metadata "title=PropertySex Attractive Curvy Latina Real Estate Agent with Amazing Ass Bangs Handyman in Kitchen" -metadata "date=20210927" -metadata "purl=https://www.pornhub.com/view_video.php?viewkey=ph61520d582bb8a" -metadata "artist=Property Sex" "file:D:\Live\Test\Property Sex\Alina Bell - PropertySex Attractive Curvy Latina Real Estate Agent with Amazing Ass Bangs Handyman in Kitchen.temp.mp4"

Description

The video is not adding the correct metadata information for Contributing artists. Insterad of adding the performers name, it adds the companies name

@dirkf
Copy link
Contributor

dirkf commented Oct 1, 2024

uploader is set (that's probably what you're seeing as the company) but there is no extraction of performers. In yt-dlp's extractor there is:

             'categories': extract_list('categories'),
+            'cast': extract_list('pornstars'),
             'subtitles': subtitles,

@draconis10
Copy link
Author

Are we able to change it so it extracts the performers? Or is there a way to manually add performers?

@dirkf
Copy link
Contributor

dirkf commented Oct 1, 2024

The change shown above should do it. You can add the extra line if running from source.

There a few outstanding PRs and also probably some further improvements that could be pulled from the yt-dlp extractor:

    [ie/pornhub] Fix login by email address (#9914)
    feederbox826 committed May 13, 2024

    [ie/pornhub] Fix login support (#9227)
    feederbox826 committed Feb 17, 2024

    [ie/pornhub] Update access cookies for UK (#7591)
    yy-zhong committed Jul 15, 2023

    [extractor/pornhub] Set access cookies to fix extraction (#6685)
schmoaaaaah and arobase-che committed Apr 25, 2023

@draconis10
Copy link
Author

draconis10 commented Oct 1, 2024

I've tried adding in the live into the extractor but the same problem occurs, I feel like something else needs updating.

Coming to think about it, I don't think extracting tags/categories works either!

This is what it now looks like, which is failing

    def extract_list(meta_key):
        div = self._search_regex(
            r'(?s)<div[^>]+\bclass=["\'].*?\b%sWrapper[^>]*>(.+?)</div>'
            % meta_key, webpage, meta_key, default=None)
        if div:
            return re.findall(r'<a[^>]+\bhref=[^>]+>([^<]+)', div)

    info = self._search_json_ld(webpage, video_id, default={})
    # description provided in JSON-LD is irrelevant
    info['description'] = None

    return merge_dicts({
        'id': video_id,
        'uploader': video_uploader,
        'upload_date': upload_date,
        'title': title,
        'thumbnail': thumbnail,
        'duration': duration,
        'view_count': view_count,
        'like_count': like_count,
        'dislike_count': dislike_count,
        'comment_count': comment_count,
        'formats': formats,
        'age_limit': 18,
        'tags': extract_list('tags'),
        'categories': extract_list('categories'),
        'cast': extract_list('pornstars'),
        'subtitles': subtitles,
    }, info)

Update: I've managed to bypass this for the time beying by adding in --add-metadata --postprocessor-args "-metadata artist=Alina\ Belle"

@dirkf
Copy link
Contributor

dirkf commented Oct 2, 2024

The extracted Info-JSON, at least from my WIP version, contains these items:

  ...,
  "categories": [
    "Babe",
    "Big Ass",
    "Big Dick",
    "Big Tits",
    "Blowjob",
    "HD Porn",
    "Hardcore",
    "POV",
    "Pornstar"
  ],
  ...,
  "cast": [
    "Tony Rubino",
    "Alina Belle"
  ],
  ...,

If the version you're running has those, maybe the problem is with identifying the metadata items in the JSON and passing them to the post-processor.

@draconis10
Copy link
Author

draconis10 commented Oct 7, 2024

So I've tried to look into this again. I've made sure my version is up-to-date, and I've edited pornhub.py to show the following

return merge_dicts({
'id': video_id,
'uploader': video_uploader,
'upload_date': upload_date,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'formats': formats,
'age_limit': 18,
'tags': extract_list('tags'),
'categories': extract_list('categories'),
'cast': extract_list('pornstars'),
'subtitles': subtitles,
}, info)

However, the outcome shows as '"cast": ["\n\t\t\t\t\t\t\t\t\t", "\n\t\t\t\t\t\t\t\t\t"]. Any ideas?

You able to maybe share the whole code for the .py file, so I can import and try it myself?

@dirkf
Copy link
Contributor

dirkf commented Oct 7, 2024

Maybe this?

         def extract_list(meta_key):
             div = self._search_regex(
                 r'(?s)<div[^>]+\bclass=["\'].*?\b%sWrapper[^>]*>(.+?)</div>'
                 % meta_key, webpage, meta_key, default=None)
             if div:
-                return re.findall(r'<a[^>]+\bhref=[^>]+>([^<]+)', div)
+                return [clean_html(x) for x in re.findall(r'(?s)<a[^>]+\bhref=[^>]+>.+?</a>', div)]

@draconis10
Copy link
Author

WARNING: unable to extract view count; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
ERROR: name 'clean_html' is not defined

That's the error that comes up. Seeing as it works for you, are you able to attach your .py file?

@dirkf
Copy link
Contributor

dirkf commented Oct 14, 2024

Just before the previously posted function:

         def extract_vote_count(kind, name):
             return self._extract_count(
-                (r'<span[^>]+\bclass="votes%s"[^>]*>([\d,\.]+)</span>' % kind,
+                (r'<span[^>]+\bclass="votes%s"[^>]*>(\d[\d,\.]*[kKmM]?)</span>' % kind,
                  r'<span[^>]+\bclass=["\']votes%s["\'][^>]*\bdata-rating=["\'](\d+)' % kind),
                 webpage, name)
 
         view_count = self._extract_count(
-            r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view')
+            r'<span class="count">(\d[\d,\.]*[kKmM]?)</span> [Vv]iews', webpage, 'view')
         like_count = extract_vote_count('Up', 'like')
         dislike_count = extract_vote_count('Down', 'dislike')
         comment_count = self._extract_count(
-            r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
+            r'All Comments\s*<span>\((\d[\d,\.]*[kKmM]?)\)', webpage, 'comment')
 
         def extract_list(meta_key):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants