Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soundcloud 'ascii' codec can't decode byte ... #29417

Open
6 tasks done
firmitatis opened this issue Jun 27, 2021 · 3 comments
Open
6 tasks done

Soundcloud 'ascii' codec can't decode byte ... #29417

firmitatis opened this issue Jun 27, 2021 · 3 comments

Comments

@firmitatis
Copy link

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2021.06.06
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-o', u'%(uploader)s-%(upload_date)s-%(title)s-%(id)s.%(ext)s', u'--ignore-errors', u'--write-description', u'--write-thumbnail', u'--playlist-start', u'91', u'--playlist-end', u'91', u'--verbose', u'https://soundcloud.com/tapa-da-mao-invisivel']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 2.7.18 (CPython) - Linux-5.4.0-74-generic-x86_64-with-LinuxMint-20.1-ulyssa
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Proxy map: {}
[soundcloud:user] tapa-da-mao-invisivel: Downloading user info
[soundcloud:user] 524020590: Downloading track page 1
[soundcloud:user] 524020590: Downloading track page 2
[download] Downloading playlist: Tapa da Mão Invisível (All)
[soundcloud:user] playlist Tapa da Mão Invisível (All): Collected 141 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[soundcloud] tapa-da-mao-invisivel/episodio-051-winston-ling-sobre-china-guedes-e-hong-kong: Downloading info JSON
[soundcloud] 691375567: Downloading JSON metadata
[soundcloud] 691375567: Downloading webpage
ERROR: 'ascii' codec can't decode byte 0xf3 in position 4: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/soundcloud.py", line 492, in _real_extract
    return self._extract_info_dict(info, full_title, token)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/soundcloud.py", line 340, in _extract_info_dict
    'ext': urlhandle_detect_ext(urlh) or 'mp3',
  File "/usr/local/bin/youtube-dl/youtube_dl/utils.py", line 4293, in urlhandle_detect_ext
    e = determine_ext(m.group('filename'), default_ext=None)
  File "/usr/local/bin/youtube-dl/youtube_dl/utils.py", line 3042, in determine_ext
    if url is None or '.' not in url:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 4: ordinal not in range(128)

[download] Finished downloading playlist: Tapa da Mão Invisível (All)

Description

Hello. It's my first bug report. I hope it's ok.

I was trying to download all podcasts from this soundcloud account:

https://soundcloud.com/tapa-da-mao-invisivel

youtube-dl downloaded almost all episodes, but a group of 44 episodes shows an error:

youtube-dl -o '%(uploader)s-%(upload_date)s-%(title)s-%(id)s.%(ext)s' --ignore-errors --write-description -x --write-thumbnail --playlist-start 91 --playlist-end 134 https://soundcloud.com/tapa-da-mao-invisivel

ERROR: 'ascii' codec can't decode byte 0xf3 in position 4: ordinal not in range(128)

Is there a way to avoid this error?

Thanks,

Caio

@ghost
Copy link

ghost commented Jun 28, 2021

If you can use Python 3, you will succeed.

I tried with Python 3.9 and succeeded. I don't know if there is some workaround for Python 2, I couldn't find any.

@dirkf
Copy link
Contributor

dirkf commented Jun 28, 2021

The error is at line 3042 of utils.py from 2021-06-06 (and as above, it's only an issue in Python 2):

    if url is None or '.' not in url:

So url is some byte sequence that is being treated as a string, but has an unexpected byte value.

url comes from this function at line 4287:

def urlhandle_detect_ext(url_handle):
    getheader = url_handle.headers.get

    cd = getheader('Content-Disposition')
    if cd:
        m = re.match(r'attachment;\s*filename="(?P<filename>[^"]+)"', cd)
        if m:
            e = determine_ext(m.group('filename'), default_ext=None)
            if e:
                return e

    return mimetype2ext(getheader('Content-Type'))

The value assigned to cd (in this case 'attachment;filename="Episódio 051 - Winston Ling sobre China, Guedes e Hong Kong.mp3"', where the 5th character is '\xf3', small o acute) needs to be made into a compat_str but hasn't been. Replace line 4289 like this to fix the issue:

    cd = encode_compat_str(getheader('Content-Disposition'), encoding='iso-8859-1', errors='replace')

[edit]According to RFC7230: header fields should be ASCII but may contain non-ASCII octet(s) that should be treated as opaque, and may also be encoded as ISO 8859-1. The sender has used a non-ASCII character in the filename that's valid in 8859-1. To avoid data loss, specify 8859-1 when encoding as compat_str (in Python 2, unicode) and replace to avoid an exception in case any byte should be truly unencodable.

@firmitatis
Copy link
Author

If you can use Python 3, you will succeed.

I tried with Python 3.9 and succeeded. I don't know if there is some workaround for Python 2, I couldn't find any.

Thanks, kikuyan, thanks, dirkf

I used to install youtube-dl by wget. I removed it and reinstalled by pip3. Now it's working fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants