Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip quotes in detect-charset #618

Open
wants to merge 1 commit into
base: 3.x
Choose a base branch
from
Open

Strip quotes in detect-charset #618

wants to merge 1 commit into from

Conversation

nathell
Copy link

@nathell nathell commented Aug 10, 2022

What this PR does

RFC 7231 Section 3.1.1.1 allows the charset specified in the Content-Type header to be a quoted string. We want detect-charset to return unquoted charset names, so that they can be passed to, say, java.nio.charset.Charset/forName without postprocessing.

This PR fixes that, adds a test for a quoted charset, and rewrites t-detect-charset-by-content-type to use clojure.test/are to reduce repetition and read more nicely.

Implementation notes

  • As noted above, I've taken the liberty of rewriting the salient test in terms of are, as I find it to boost readability. Some people disagree, however, and find are a bit too magical. I'm not sure what position of this project is (I haven't seen are anywhere else), but I can rewrite it if needed. Let me know!
  • I wanted to use {starts,ends}-with? from clojure.string, but it would have broken compatibility with Clojure pre 1.8.0. So I resorted to Java interop.

Tests

I get one failure when running lein all test :all (stacktrace elided):

ERROR in (self-signed-ssl-get) (core_test.clj:298)
expected: (thrown? SunCertPathBuilderException (client/request {:scheme :https, :server-name "localhost", :server-port 18082, :request-method :get, :uri "/get"}))
2022-08-10 19:51:40,260 | INFO  | [main] | org.apache.http.impl.conn.InMemoryDnsResolver | Resolving foo.bar.com to [/127.0.0.1]
  actual: java.lang.IllegalAccessError: class clj_http.test.core_test$fn__5138$fn__5139 (in unnamed module @0x1b0db8b1) cannot access class sun.security.provider.certpath.SunCertPathBuilderException (in module java.base) because module java.base does not export sun.security.provider.certpath to unnamed module @0x1b0db8b1
ERROR in (self-signed-ssl-get) (core_test.clj:298)
expected: (thrown? SunCertPathBuilderException (client/request {:scheme :https, :server-name "localhost", :server-port 18082, :request-method :get, :uri "/get"}))
2022-08-10 19:51:40,260 | INFO  | [main] | org.apache.http.impl.conn.InMemoryDnsResolver | Resolving foo.bar.com to [/127.0.0.1]
  actual: java.lang.IllegalAccessError: class clj_http.test.core_test$fn__5138$fn__5139 (in unnamed module @0x1b0db8b1) cannot access class sun.security.provider.certpath.SunCertPathBuilderException (in module java.base) because module java.base does not export sun.security.provider.certpath to unnamed module @0x1b0db8b1

Doesn't look like related to my changes. I think it's because I'm on OpenJDK 11 and the tests presuppose JDK 8. Plain lein test comes out all green.

RFC 7231 Section 3.1.1.1 allows the charset specified in the
Content-Type header to be a quoted string. We want detect-charset to
return unquoted charset names, so that they can be passed to, say,
java.nio.charset.Charset/forName without postprocessing.

This commit fixes that, adds a test for a quoted charset, and
rewrites t-detect-charset-by-content-type to use clojure.test/are
to reduce repetition and read more nicely.
nathell added a commit to nathell/skyscraper that referenced this pull request Sep 4, 2022
This commit works around a bug in clj-http's detect-charset (see
dakrone/clj-http#618), making sure that
even when the charset in Content-Type is wrapped in quotes, it can be
passed to Charset/forName.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant