Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for CP1252 Encoding #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

codepwner
Copy link

I was getting an error on a particular file and when I made this change, it allowed it to proceed.

@AndrewRybka
Copy link

AndrewRybka commented Feb 8, 2020

@codepwner, can you please attach that file? I would like to find out why the original code wasn't working correctly with it.

I that's not possible, can you please provide the strEncoding value which was failing? Current code works correctly for me with both "CP1252" and "cp1252". And even with "C̈P̈1252".

@codepwner
Copy link
Author

image

Maybe it has to do with the specific versions of .NET I am using? Currently, my downstream application is targeting .NET Core 3.1.

Working on creating a sanitized version of the input file I used to be able to share.

@codepwner
Copy link
Author

I was able to recreate the issue using this file.

Test Case.zip

The SPSS project has no other modifications than the one I checked in. I was not aiming to do any development in the project, but when the nuget version was throwing the error, I decided to pull the source and was able to make this hack of a fix to get past my issue.

@codepwner
Copy link
Author

Here are the bytes of the string in the calling method too.

image

@codepwner
Copy link
Author

I converted my project to .NET Standard 4.7 (for reasons outside this topic) and decided to run a quick additional test...

image

@AndrewRybka
Copy link

@codepwner, thank you for the file. But the library works correctly with that file on my side.

Text and byte representation of the encoding are the same as yours, but the encoding is correctly resolved by the following lines in the GetEncoding() method:

// Try to get encoding parsing codepage
int cp;
if (int.TryParse(Regex.Match(strEncoding, @"\d+").Value, out cp))
{
    info = encInfo.SingleOrDefault(ei => ei.CodePage == cp);
    if (info != null)
        return info.GetEncoding();
}

It seems that the source of the problem is in the Encoding.GetEncodings() method.
It returns 140 encodings in a project which targets .Net Framework 4.6.2.

But it seems than in projects which target .Net Core that method returns only a few default encodings (PowerShell/PowerShell#6580).

So it's better to make this fix more universal in order to support other encodings. Something like this:

// Try to get encoding parsing codepage
int cp;
if (int.TryParse(Regex.Match(strEncoding, @"\d+").Value, out cp))
{
    info = encInfo.SingleOrDefault(ei => ei.CodePage == cp);
    if (info != null)
        return info.GetEncoding();

    try
    {
        var enc = Encoding.GetEncoding(cp);
        return enc;
     }
    catch (ArgumentException)
    { }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants