Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cdac] Implement ISOSDacInterface::GetPEFileName in cDAC #106358

Merged
merged 4 commits into from
Aug 14, 2024

Conversation

elinor-fung
Copy link
Member

  • Include Module path in data descriptor
  • Implement GetPath in Loader contract
  • Make cDAC implement ISOSDacInterface::GetPEFileName

Contributes to #99302

Copy link
Contributor

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Comment on lines 44 to 53
while (true)
{
// Read characters until we find the null terminator
char nameChar = _target.Read<char>(addr);
if (nameChar == 0)
break;

name.Add(nameChar);
addr += sizeof(char);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider doing this slightly different. I would instead determine the entire length, find the null terminator and then reread the entire block of bytes in a single operation. I think it would be slightly easier to debug. I also don't know if endianness is important here. Metadata is always encoded as UTF-16LE, but I'm not sure about non-metadata allocated strings.

Copy link
Member

@davidwrighton davidwrighton Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-metadata allocated strings would be big endian on a big endian machine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation is ... fine... but it would be better if we added an api to Target to read null terminated strings of both 16bit and 8 bit char types in target endianness. This is a thing which keeps coming up, and we shouldn't be re-implementing the wheel again and again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-metadata allocated strings would be big endian on a big endian machine

Would they even be utf-16 at all?

(To be clear, I think this is kind of academic until someone sits down and does a coreclr port to POWER64 and actually makes some decision about what an LPCWSTR even is in that platform version of the CoreCLR PAL. I don't know what a reasonable answer might be. /cc @uweigand @directhex )

Do we want to stash an encoding somewhere in the runtime descriptor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it would be better if we added an api to Target to read null terminated strings of both 16bit and 8 bit char types in target endianness.

It is more complicated than that. We need to know if the target pointer is into metadata, which is always LE. I was going to suggest the same thing, but the metadata angle makes it less obvious the correct answer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the metadata angle, I think having APIs on Target reading strings in target endianness still makes sense.

We need to know if the target pointer is into metadata, which is always LE.

And then it is up to the caller / consumer of the target pointer to know that it shouldn't be read in target endianness and to read it their own way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having APIs on Target reading strings in target endianness still makes sense.

Sure, I agree with this. My point is, it isn't as simple as "read the string". In fact it can become very complicated if the API doesn't know the encoding. The API needs to know the encoding apriori or else it won't be able to detect null properly. Is it the "high order" byte of a UTF-16 or an actual null in UTF-8? The string reading API needs to know. The endianness for UTF-16 is easier, but still can be confusing to the caller. The API said it read UTF-16, but now the caller must consider the source and determine if it is in machine endian form or metadta endian form. Creating three APIs would be needed.

It would need to be something like:

string ReadUTF8String();
string ReadUTF16TargetEndianness();
string ReadUTF16MetadataEndianness();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @AaronRobinsonMSFT's apis here are good enough for now. As @lambdageek points out, dealing with actual big endian machines is quite unlikely to happen in the near term, and we can get away with having a "reasonable" model now, and tweak it to match reality if/when such a thing happens. I also don't think this refactor to using a Target level api should happen in this PR, but should be deferred to a follow-up PR which is focused on making the string handling sensible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was expecting multiple APIs - like what @AaronRobinsonMSFT wrote. I put in a TODO for adding/using Target APIs and will a follow-up change for that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants