-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EUC-KR charset is not parsable #87
Comments
@vanniktech Can you please mention which variant you are using? |
|
@vanniktech i tested with the following code it worked fine:
and this also worked fine:
Can you please share your code how you are reading bytes from web |
This also works for me on the JVM (Desktop / Mac): suspend fun main() {
val url = "http://www.bodnara.co.kr/rss/rss_bodnara.xml"
val request = HttpRequestBuilder().apply {
url(url)
}
val response = HttpClient().get(request)
val document = Ksoup.parse(
sourceReader = response.bodyAsChannel().toByteArray().openSourceReader(),
baseUri = url,
charsetName = response.charset(),
parser = Parser.xmlParser(),
)
println(document)
}
fun HttpResponse.charset() = headers[HttpHeaders.ContentType]?.asContentTypeOrNull()?.parameter("charset")
?: "UTF-8"
// https://youtrack.jetbrains.com/issue/KTOR-6241/Lenient-Content-Type-Parsing
internal fun String.asContentTypeOrNull() =
runCatching { ContentType.parse(replace(", charset=", "; charset=")) }.getOrNull() The same code crashes on Android though with the exception from my original issue. Did you try it on an Android emulator? |
@vanniktech Yes, there is an issue with the EUC-KR charset in Android with Ktor 2, but it’s working fine with Ktor 3. I’m looking into whether I can fix it on my end. |
@vanniktech I’m trying to fix the issue; in the meantime, you can try this, it is working fine:
Reading text from ChannelBody and parsing it works fine. |
That's neat. I've changed it. Maybe ktor3 has a better/improved charset implementation due do the switch to kotlinxio? |
I'm using version 0.1.9 with the ktor module to parse the response from this website: http://www.bodnara.co.kr/rss/rss_bodnara.xml
I get my source reader via
response.bodyAsChannel().toByteArray().openSourceReader()
and then I useKsoup.parse
with an XML Parser and the charset isEUC-KR
. However this does not work on Android:I saw this 0b76b21 but I'm not on windows, so it should work?
The text was updated successfully, but these errors were encountered: