yul literal value as struct of u256 + optional formatting hint #15112

clonker · 2024-05-16T10:34:02Z

Changed the definition of the solidity::yul::Literal to carry its value not as YulString but as LiteralValue struct, consisting of a u256 and an optional string representation hint. Upon converting from its data back to a string representation, it is first checked if the hint is not empty and in that case, whether value == parseValue(hint).

libyul/AsmJsonImporter.cpp

libyul/Utilities.cpp

Wanted to comment and not approve.

libyul/AsmJsonImporter.cpp

cameel

Not a complete review yet. Halfway through reading it I started realizing that I thought I knew how the hint and the unlimited flag worked but I actually didn't. I'm more and more convinced that this flag is not exactly the right solution here and we need to change it a bit. It's not really a property of the literal itself. But see my comments for details.

And some style nitpicking :)

libyul/AST.h

libyul/Utilities.cpp

test/tools/yulInterpreter/Interpreter.cpp

libyul/AsmJsonConverter.cpp

libyul/AsmParser.h

libyul/AsmJsonImporter.cpp

cameel · 2024-05-27T19:38:19Z

libyul/AST.h

+	LiteralValue(Data const& _data, std::optional<std::string> const& _hint = std::nullopt, bool _unlimited = false):
+		m_data(_data), m_hint(_hint ? std::make_shared<std::string>(*_hint) : nullptr), m_unlimited(_unlimited) {}


Can we assert here that the hint matches the data if present?

formatLiteral() ignores m_data when m_hint is present so we must ensure that these two are always consistent.

And I'd recommend this style:

Suggested change

LiteralValue(Data const& _data, std::optional<std::string> const& _hint = std::nullopt, bool _unlimited = false):

m_data(_data), m_hint(_hint ? std::make_shared<std::string>(*_hint) : nullptr), m_unlimited(_unlimited) {}

LiteralValue(Data const& _data, std::optional<std::string> const& _hint = std::nullopt, bool _unlimited = false):

m_data(_data),

m_hint(_hint ? std::make_shared<std::string>(*_hint) : nullptr),

m_unlimited(_unlimited)

{}

Hmm asserting it in this place is going to be difficult without knowing the LiteralKind. That is what determines the validity in the end. I have added a check to formatLiteral in the utilities, though, so that the hint isn't blindly taken but it is compared to the stored u256 value.

Alternatively, we could think about storing the LiteralKind in the LiteralValue as well. Then an assert in the constructor is possible and failure would occur earlier.

Yeah, it really feels like LiteralKind should be inside LiteralValue, because the value is ambiguous without it. If would let us get rid of a lot of redundant asserts that we now have to do at every step to ensure the values are valid.

But I think it would require changing a lot more code so perhaps better to leave at is for now and just finish the PR in the current state.

libyul/AsmAnalysis.cpp

libyul/AST.h

cameel · 2024-05-28T11:57:40Z

libyul/Utilities.cpp

+		if (_literal.kind == LiteralKind::Boolean)
+		{
+			yulAssert(_literal.value() == 0 || _literal.value() == 1, "Could not format boolean literal");
+			result = _literal.value() == 1 ? "true" : "false";
+		}
+		else
+			result = _literal.value().str();


This looks like you're formatting strings and integers the same way. How can that work? I.e. how does it preserve the distinction between 1234 and "1234"?

Actually, does the hint include the quotes? Because if it does not, then I can't see how we can distinguish values like '1234' and "1234" or '1234' and hex'1234'.

Also, if there's no distinction, the value cannot be unambiguously recovered from the hint (which matters if we want to enforce that they match).

By the way, it would be good to have those examples as test cases if we don't have them somewhere already.

I'd also add something like this:

object "A" { code { pop(datasize(hex'616263')) } data 'abc' "1234" }

I actually only now realized that we're allowing anything other than plain strings for literal arguments, not sure we even have it covered (only for arguments though, not in data).

The quoting and escaping of string literals is done in AsmPrinter, so the LiteralKind is also always required to unambiguously print and/or recover a literal. In your example you'd get

object "A" { code { pop(datasize(hex'616263')) } data 'abc' "1234" } // ---- // step: disambiguator // // object "A" { // code { pop(datasize("abc")) } // data "abc" hex"31323334" // }

as for almost ambiguous literals

{ let a := 1234 let a_hex := 0x4d2 let b := "1234" let b_hex := "0x4d2" let c := '1234' let c_hex := '0x4d2' let d := hex"1234" let d_hex := hex"04d2" } // ---- // step: disambiguator // // { // let a := 1234 // let a_hex := 0x4d2 // let b := "1234" // let b_hex := "0x4d2" // let c := "1234" // let c_hex := "0x4d2" // let d := "\x124" // let d_hex := "\x04\xd2" // }

Oh, so we're actually not preserving the representation perfectly? That's surprising. I had the impression that the AST allowed us to mostly reproduce the original source aside from comments and whitespace and that the representation hint was meant to help with that, but apparently not. I see from your output that it doesn't even keep hex numbers as hex so it seems kinda useless.

I wonder what we'd really lose by removing it. I wouldn't do it in this PR, but maybe we should consider that later?

Yeah, agree, this can probably be removed and hopefully make for a much cleaner data structure without losing too much

libyul/AST.cpp

libyul/Utilities.cpp

libyul/optimiser/SyntacticalEquality.cpp

libyul/AsmAnalysis.cpp

cameel

Here's the final batch of comments. There's quite a few of them but I haven't found anything big, so they should all be quick to deal with. I finally went through it all, including all the tedious changes resulting from the introduction of the new class so hopefully that's the last of it.

The main things are a few missing bits in validations and formatLiteral(), some tweaks to assertions and one bit missing in BlockHasher.

libyul/Utilities.cpp

test/tools/yulInterpreter/Interpreter.cpp

libyul/AsmAnalysis.cpp

libyul/optimiser/LoadResolver.cpp

libyul/optimiser/StackToMemoryMover.cpp

libyul/optimiser/SyntacticalEquality.h

libyul/optimiser/BlockHasher.cpp

test/libyul/yulOptimizerTests/equivalentFunctionCombiner/constant_representation.yul

libyul/AsmAnalysis.cpp

cameel · 2024-06-21T16:51:27Z

libyul/AsmAnalysis.cpp

 			3069_error,
 			nativeLocationOf(_literal),
-			"String literal too long (" + std::to_string(_literal.value.str().size()) + " > 32)"
+			"String literal too long (" + std::to_string(formatLiteral(_literal, true).size()) + " > 32)"


Wait, won't the assertion in formatLiteral() fail when the string is too long? How it passing tests?

Suggested change

"String literal too long (" + std::to_string(formatLiteral(_literal, true).size()) + " > 32)"

"String literal too long (" + std::to_string(formatLiteral(_literal, false).size()) + " > 32)"

this is b/c it is now assumeValidity :)

if I assume the validity is given and I trust the flag, then asserts can minimal / be skipped

Haha, I didn't expect that :) This is the exact opposite of how I was thinking about it. You assume it is valid, so you can safely assert that. Otherwise it may or may not be valid and the function is fully prepared to handle the invalid cases. Assertion just documents the assumption and does not count as a proper validation (which is part of the reason I didn't really like the idea of it being in the parameter name).

But if this wasn't clear then it's not a good name after all and we need to change it again. I still don't like _assertValidity, but I guess it would still be better than something that is ambiguous. My proposal would be to call it _validated though.

At this point I'm fine with whichever you choose, not to hold this PR any longer :)

cameel · 2024-06-21T17:11:12Z

libyul/optimiser/ForLoopConditionIntoBody.cpp

 				debugData,
 				LiteralKind::Boolean,
-				"true"_yulstring,
+				LiteralValue{true, "true"},


Suggested change

LiteralValue{true, "true"},

LiteralValue{true},

Looks like some tests are failing in CI because true gets replaced with 1 now. But it can't be this change, can it? It's a boolean literal and those should be formatted as true/false without a hint.

cameel

I went through it again and looks like it just needs a few last tweaks. I'm going to preemptively approve, meaning that none of this looks critical and you're free to merge after you make an attempt to apply those tweaks and squash the commits.

BTW, it does not have to be necessarily all squashed into a single commit. You can split it more if you prefer, as long as each commit is an atomic and self-contained change with a sensible description. We just don't usually keep review fixes as separate commits but prefer them squashed into the original commits they are meant to fixup so that history is less of a mess.

cameel · 2024-06-26T15:54:57Z

libyul/AsmAnalysis.cpp

 			5170_error,
 			nativeLocationOf(_literal),
-			"Invalid type \"" + _literal.type.str() + "\" for literal \"" + _literal.value.str() + "\"."
+			"Invalid type \"" + _literal.type.str() + "\" for literal \"" + formatLiteral(_literal, false) + "\"."


By the way, this call to formatLiteral() uses the default value for the flag so you can omit it:

Suggested change

"Invalid type \"" + _literal.type.str() + "\" for literal \"" + formatLiteral(_literal, false) + "\"."

"Invalid type \"" + _literal.type.str() + "\" for literal \"" + formatLiteral(_literal) + "\"."

Since in this function you pass in both false and true explicitly, I didn't notice that the default was swapped :) If it was omitted, it would have clued me in :)

Also, might be a good idea to add a comment with flag name in the other call above, where you'll still be passing it in explicitly.

libyul/Utilities.cpp

clonker · 2024-06-27T12:13:09Z

the tests are now failing due to the change in expression hasher. what do you think @cameel, should I undo it? or rather update test expectations? Intuitively I would say undo, feels more natural to have a true instead of 1. But either is fine for me

cameel · 2024-06-27T16:19:21Z

Well, it's definitely better to have true there, but why exactly does the BlockHasher affect that? Does the less specific hash result in some extra optimization kicking in that wasn't before? The change did seem correct so maybe the actual problem is outside of BlockHasher. If we locate that then maybe we could keep the change and still keep it at true?

In any case, it would still probably be best to have the same thing in both functions, since I don't see why we'd ever want them to behave differently. They should both hash kind or both hash unlimited(). Perhaps you could just pull the code it into a common local helper for hashing a Literal since they've been getting out of sync all the time in this PR? :)

…limited' - unlimited is a string literal for which bytes restrictions don't hold; can occur in builtin function arguments

cameel · 2024-06-28T08:14:33Z

libyul/optimiser/BlockHasher.h

 	uint64_t m_hash = fnvEmptyHash;
 };

+class ASTHasherBase: public HasherBase


I expected this to be a bit simpler, i.e. just a small helper in an unnamed namespace in BlockHasher.cpp, but I guess introducing a new base class is fine too if you prefer it that way :)

Then I'd have to change the visibility of the member functions :) I thought this was cleaner.

No, why? Maybe it's not entirely clear what I meant.

Note that there's already a namespace { ... } block at the top of BlockHasher.cpp that defines a compileTimeLiteralHash() helper. This is used by both implementations and has no issues with their visibility. What I meant was to put the helper there.

Right, without using the hash{8,16,32,64} helpers it's of course possible:)

Ah wait, those are defined in HasherBase. I was assuming they were global.

So ok, you do have a point here :)

It did itch me a little, something like this would also be possible (essentially the same thing hasher base is doing):

template<typename Component> constexpr void combine(uint64_t& _hash, Component const& _c) { auto const* data = reinterpret_cast<uint8_t const *>(&_c); for (size_t i = 0; i < sizeof(Component); ++i) { _hash *= HasherBase::fnvPrime; _hash ^= data[i]; } } template<typename LiteralComponents, auto... Index> constexpr uint64_t hashComponentsImpl(std::index_sequence<Index...>, LiteralComponents const& _components) { constexpr std::array indices{Index...}; uint64_t hashValue = HasherBase::fnvEmptyHash; (combine(hashValue, std::get<indices[Index]>(_components)),...); return hashValue; } template<typename... LiteralComponents> constexpr uint64_t hashComponents(LiteralComponents const&... _components) { return hashComponentsImpl(std::make_index_sequence<sizeof...(LiteralComponents)>(), std::tie(_components...)); } uint64_t hashLiteral(Literal const& _literal) { if (!_literal.value.unlimited()) return hashComponents(compileTimeLiteralHash("Literal"), std::hash<u256>{}(_literal.value.value()), _literal.type.hash(), _literal.kind, _literal.value.unlimited()); else return hashComponents(compileTimeLiteralHash("Literal"), std::hash<std::string>{}(_literal.value.builtinStringLiteralValue()), _literal.type.hash(), _literal.kind, _literal.value.unlimited()); }

nikola-matic previously approved these changes May 17, 2024

View reviewed changes

libyul/AsmJsonImporter.cpp Outdated Show resolved Hide resolved

libyul/AsmJsonImporter.cpp Outdated Show resolved Hide resolved

libyul/Utilities.cpp Outdated Show resolved Hide resolved

clonker force-pushed the rework_yul_literals_2 branch from e1c8cd3 to 3ca7a0a Compare May 21, 2024 06:12

nikola-matic reviewed May 22, 2024

View reviewed changes

libyul/AsmJsonImporter.cpp Outdated Show resolved Hide resolved

cameel reviewed May 27, 2024

View reviewed changes

cameel reviewed May 28, 2024

View reviewed changes

clonker force-pushed the rework_yul_literals_2 branch from b9373bf to f712d53 Compare May 29, 2024 11:02

cameel reviewed Jun 7, 2024

View reviewed changes

clonker force-pushed the rework_yul_literals_2 branch from 8658ad2 to 617efd2 Compare June 10, 2024 13:17

cameel reviewed Jun 20, 2024

View reviewed changes

libyul/AsmAnalysis.cpp Outdated Show resolved Hide resolved

cameel reviewed Jun 21, 2024

View reviewed changes

cameel previously approved these changes Jun 21, 2024

View reviewed changes

clonker dismissed cameel’s stale review via b22c14e June 24, 2024 06:30

clonker force-pushed the rework_yul_literals_2 branch 2 times, most recently from 8543f5b to 4f7fe25 Compare June 24, 2024 06:49

clonker mentioned this pull request Jun 24, 2024

Numerical Yul node id handles #15215

Closed

10 tasks

clonker force-pushed the rework_yul_literals_2 branch from 4f7fe25 to 8f2ea7b Compare June 26, 2024 05:37

cameel reviewed Jun 26, 2024

View reviewed changes

libyul/Utilities.cpp Outdated Show resolved Hide resolved

clonker force-pushed the rework_yul_literals_2 branch from 8f2ea7b to 58955cf Compare June 26, 2024 18:12

cameel previously approved these changes Jun 26, 2024

View reviewed changes

clonker dismissed cameel’s stale review via 64f3d19 June 27, 2024 16:06

clonker force-pushed the rework_yul_literals_2 branch from 58955cf to 64f3d19 Compare June 27, 2024 16:06

clonker force-pushed the rework_yul_literals_2 branch 2 times, most recently from 7d55651 to 8ef134c Compare June 28, 2024 06:18

yul literal value is struct of u256 + optional formatting hint or 'un…

477c413

…limited' - unlimited is a string literal for which bytes restrictions don't hold; can occur in builtin function arguments

clonker force-pushed the rework_yul_literals_2 branch from 8ef134c to 477c413 Compare June 28, 2024 06:49

cameel approved these changes Jun 28, 2024

View reviewed changes

clonker merged commit f3f7720 into argotorg:develop Jun 28, 2024

clonker deleted the rework_yul_literals_2 branch June 28, 2024 09:29

cameel mentioned this pull request Jul 10, 2024

replace YulString with YulName typedef #15083

Merged

cameel mentioned this pull request Sep 25, 2024

eof: Support DATALOADN (EIP-7480) #15456

Merged

		LiteralValue(Data const& _data, std::optional<std::string> const& _hint = std::nullopt, bool _unlimited = false):
		m_data(_data), m_hint(_hint ? std::make_shared<std::string>(*_hint) : nullptr), m_unlimited(_unlimited) {}

	"String literal too long (" + std::to_string(formatLiteral(_literal, true).size()) + " > 32)"
	"String literal too long (" + std::to_string(formatLiteral(_literal, false).size()) + " > 32)"

	"Invalid type \"" + _literal.type.str() + "\" for literal \"" + formatLiteral(_literal, false) + "\"."
	"Invalid type \"" + _literal.type.str() + "\" for literal \"" + formatLiteral(_literal) + "\"."

yul literal value as struct of u256 + optional formatting hint #15112

yul literal value as struct of u256 + optional formatting hint #15112

Uh oh!

Conversation

clonker commented May 16, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cameel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clonker May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel Jun 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cameel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

clonker May 29, 2024 •

edited

Loading

cameel Jun 8, 2024 •

edited

Loading

cameel left a comment •

edited

Loading

cameel Jun 26, 2024 •

edited

Loading

cameel commented Jun 27, 2024 •

edited

Loading

cameel Jun 28, 2024 •

edited

Loading