-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Remove ElfReader #if ALPINE dependence #34314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7be5615
to
0dd0d2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will post the rest of the review later after some more thought on the range checking.
And macros shoot at us again 😄 You include uint64_t minAddr = std::min<uint64_t>(std::min<uint64_t>((uint64_t)m_gnuHashTableAddr,
(uint64_t)m_stringTableAddr),
(uint64_t)m_symbolTableAddr); or uint64_t minAddr = (std::min)((std::min)((uint64_t)m_gnuHashTableAddr,
(uint64_t)m_stringTableAddr),
(uint64_t)m_symbolTableAddr); |
Thanks @hoyosjs |
The cross DAC wants to be build for only Alpine or Linux not both. Remove the ElfReader's compile time dependence on TARGET_ALPINE_LINUX Use a simple heuristic to determine whether we need to add the loadbias. If all else fails look for a well known symbol g_dacTable in the symbol hash to determine whether to add the loadbias. (Should be rare.)
ea37308
to
783c6bd
Compare
Fail if symbol lookup reads memory outside of ELF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting way to complicated for a simple elf reader. Isn't there any other way to do this? Some way to determine at runtime the distro?
OK. That wasn't obvious to me. It is relatively easy to refactor it your way.
I have struggled with similar thoughts. The potential to use this to read Linux and Alpine dumps in a single build feels appealing especially for SOS and dotnet-dump. The simpler model would be to treat this as two cases
It is not a perfect model, but it is highly likely to work in all the cases we care about. I have been tempted to strip it down to this for a while. I'm going to refactor it to meet requirements... |
} | ||
else | ||
{ | ||
// We cannot conclusively determine whether we need to add the load bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still leaning to this approach.
- This case is theoretically possible, but difficult to test because it has not occurred.
- I believe the hardening of the symbol reader,
ReadElfMemory
, makes this code reasonably safe. ReadElfMemory
is also not terribly complicated.ReadElfMemory
also may provide other benefits in cases of corrupt dumps or elf files.
There are alternatives.
- We could simply
return false;
in this case. Revert unnecessary code. - Ignore the possibility of this case. Drop lines 176-178 & 188-208. Drop other unnecessary code.
- Fallback to an
#ifdef LINUX_ALPINE
for this case. - Abandon the possibility of dynamically supporting Linux & Alpine.
- Define a mechanism for the hosting layer to inform the elf-reader the type of dump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In preference order, these are the leading options IMO
- This approach.
- Simply
return false;
- Abandon the possibility of dynamically supporting Linux & Alpine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I believe the hardening of the symbol reader,
ReadElfMemory
, makes this code reasonably safe.
If the ranges are correct, yeah. The only way that if could fail would be a symbol falls in an address in a PT_LOAD gap. This seems unlikely.
ReadElfMemory
is also not terribly complicated.ReadElfMemory
also may provide other benefits in cases of corrupt dumps or elf files.
There's not that many cases I can think of where we'd get a dump corrupted in such a way that we'd end reading in one of those regions (we'd need a corrupt .symtab, or a missing segment). That being said ReadElfMemory
doesn't feel like the complicated piece. However, it still relies on the stored state that Mike said (I still don't know why. A comment might be warranted...)?
The other part I like of this change is not calling populate and then trying to get the symbol. Felt a little of - remember to do this. Tends to be a step that's easy to forget.
- We could simply
return false;
in this case. Revert unnecessary code.- Ignore the possibility of this case. Drop lines 176-178 & 188-208. Drop other unnecessary code.
I am still not sure 176-178 are correct, so I am not sure the condition on 188 (the comment in that branch should be (greater than loadsize + loadbias)
given that you are using the vaddr in the calculations). I expect 182-187 to be the heavily predominant branch on musl based distros, while the second one should be the base for glibc based distros (not sure what about bionic's libc)
- Fallback to an
#ifdef LINUX_ALPINE
for this case.- Abandon the possibility of dynamically supporting Linux & Alpine.
I (really) like the possibility of dropping a couple of dll's, but I was wondering what it'd buy us on the long run. I don't know if I am lacking experience or prior art here, but I don't see much of an issue creating a separate one. Or if it warrants this complexity
- Define a mechanism for the hosting layer to inform the elf-reader the type of dump.
This one is nice (no guessing) but rarely a good option in my opinion. The DAC will need to now gain more information of the environment it runs in, and so will createdump and such.
Overall, I would say I don't mind a heuristic that much. I just don't know if:
- 176 is correct.
- An idea of how likely it is to SEGV trying of reading memory (definitely a LOT less now that there's a segment constrained read).
- What does this buy us in the end in exchange for the risk.
// Rather than making a compile time decision based on OS, we try to find a required symbol. | ||
// Try to find "g_dacTable" in the hash table w/o adding the loadbias | ||
uint64_t symbolOffset; | ||
if (!InitializeGnuHashTable() || !TryLookupSymbol(symbolName, &symbolOffset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this potentially SEGV?
} | ||
else | ||
{ | ||
// We cannot conclusively determine whether we need to add the load bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I believe the hardening of the symbol reader,
ReadElfMemory
, makes this code reasonably safe.
If the ranges are correct, yeah. The only way that if could fail would be a symbol falls in an address in a PT_LOAD gap. This seems unlikely.
ReadElfMemory
is also not terribly complicated.ReadElfMemory
also may provide other benefits in cases of corrupt dumps or elf files.
There's not that many cases I can think of where we'd get a dump corrupted in such a way that we'd end reading in one of those regions (we'd need a corrupt .symtab, or a missing segment). That being said ReadElfMemory
doesn't feel like the complicated piece. However, it still relies on the stored state that Mike said (I still don't know why. A comment might be warranted...)?
The other part I like of this change is not calling populate and then trying to get the symbol. Felt a little of - remember to do this. Tends to be a step that's easy to forget.
- We could simply
return false;
in this case. Revert unnecessary code.- Ignore the possibility of this case. Drop lines 176-178 & 188-208. Drop other unnecessary code.
I am still not sure 176-178 are correct, so I am not sure the condition on 188 (the comment in that branch should be (greater than loadsize + loadbias)
given that you are using the vaddr in the calculations). I expect 182-187 to be the heavily predominant branch on musl based distros, while the second one should be the base for glibc based distros (not sure what about bionic's libc)
- Fallback to an
#ifdef LINUX_ALPINE
for this case.- Abandon the possibility of dynamically supporting Linux & Alpine.
I (really) like the possibility of dropping a couple of dll's, but I was wondering what it'd buy us on the long run. I don't know if I am lacking experience or prior art here, but I don't see much of an issue creating a separate one. Or if it warrants this complexity
- Define a mechanism for the hosting layer to inform the elf-reader the type of dump.
This one is nice (no guessing) but rarely a good option in my opinion. The DAC will need to now gain more information of the environment it runs in, and so will createdump and such.
Overall, I would say I don't mind a heuristic that much. I just don't know if:
- 176 is correct.
- An idea of how likely it is to SEGV trying of reading memory (definitely a LOT less now that there's a segment constrained read).
- What does this buy us in the end in exchange for the risk.
uint64_t minAddr = std::min<uint64_t>(std::min<uint64_t>((uint64_t)m_gnuHashTableAddr, | ||
(uint64_t)m_stringTableAddr), | ||
(uint64_t)m_symbolTableAddr); | ||
uint64_t maxAddr = std::max<uint64_t>(std::max<uint64_t>((uint64_t)m_gnuHashTableAddr + sizeof(GnuHashTable) + sizeof(int32_t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you also need to account for the size of the symbol table as well? Also, what's the additional int32 at the end of the GNU hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have all the size info. So this is approximate. It is underestimating the maxAddr
. The hash is a hash table. A series of buckets and a series of chains. The sizeof(int)
was the absolute minimum number of buckets. It probably grossly underestimates the maxAddr
.
@mikem8361 @hoyosjs I'd like to get consensus on this.
If we still think it is valuable to pursue Alpine cross DAC == Linux Cross DAC... The patch is complicated by a special case which hasn't occurred in practice
Should we close the PR or simplify the patch |
Yes, I agree that it adds a lot of complexity to save building a couple more DACs. I think we should close the PR and save the changes for a rainy day :) |
The cross DAC wants to be build for only Alpine or Linux not both.
Remove the ElfReader's compile time dependence on TARGET_ALPINE_LINUX
Use a simple heuristic to determine whether we need to add the loadbias.
If all else fails look for a well known symbol g_dacTable in the symbol hash
to determine whether to add the loadbias. (Should be rare.)
Fixes #32756