Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luajit: add decoder #709

Merged
merged 22 commits into from
Jun 21, 2023
Merged

luajit: add decoder #709

merged 22 commits into from
Jun 21, 2023

Conversation

dlatchx
Copy link
Contributor

@dlatchx dlatchx commented Jun 20, 2023

a decoder for LuaJIT bytecode/precompiled files

Copy link
Owner

@wader wader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Did some quick review and will do some more later. I think we should be able to add some custom decode function to replace some FieldAnyFn and maybe kgcType could use some standard mapper?

Overall looks good i think and thanks for including tests!

BTW you can add a luajit.md with author add embedd it, see wasm.go/wasm.md for example. Then it will be part of the documentation and also available in the CLI help.

d.FieldValueBool("BE", flags&0x01 > 0)
d.FieldValueBool("STRIP", flags&0x02 > 0)
d.FieldValueBool("FFI", flags&0x04 > 0)
d.FieldValueBool("FR2", flags&0x08 > 0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names are usually lowercase as it's the convention in jq

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also use 0b... notation if you want to be explicit about bit positions

ok := bytes.Equal([]byte(magic), []byte{0x1b, 0x4c, 0x4a})
if !ok {
d.Errorf("invalid magic number")
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you can replace this with d.FieldStr("magic", 3, d.AssertStr("..")) or d.FieldRawLen("magic", 3*8, d.AssertBitBuf([]byte{...})) which will do the compare and also the magic will be shown as "valid"


if !di.Strip {
namelen := d.FieldU8("namelen")
d.FieldStr("name", int(namelen), encoding.Nop)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is encoding.Nop important? can skip if it's ok to be decoded as UTF-8

func init() {
for i := 0; i < len(opcodes); i++ {
bcOpSyms[uint64(i)] = opcodes[i].Name
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if you can make the BcDef implement scalar.UintMapper, i think the formatEntries type in msgpack.go does something similar. If so you can skip init and bcOpSyms

@dlatchx
Copy link
Contributor Author

dlatchx commented Jun 20, 2023

I think we should be able to add some custom decode function to replace some FieldAnyFn

I added more decode function, but is still need FieldAnyFn, because LuaJIT's format makes a weird use of ULEB128 encoding, and I want to map the bits of the two 32bit-max ULEB128 to the resulting 64bit Field. (see LuaJITDecodeI64(), for example)
Maybe there's another way to do that ?

@dlatchx
Copy link
Contributor Author

dlatchx commented Jun 20, 2023

... and maybe kgcType could use some standard mapper?

The problem is that we need a default case: 0 to 4 are types with know size, but 5 and more indicates a string and its size:
5 is a 0-length string, 6 is a 1-byte string, ect.

I don't think we can use a UintMapSymStr here.

@@ -0,0 +1,274 @@
$ fq dv -d luajit simple.luac
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to the .fqtest files or a README.md or Makefile how to generate the luac files?

format.LuaJIT,
&decode.Format{
Description: "LuaJIT 2.0 bytecode dump",
DecodeFn: LuaJITDecode,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic is quite unique so if you think it make senes you could add the format to the probe group

interp.RegisterFormat(
format.LuaJIT,
&decode.Format{
Description: "LuaJIT 2.0 bytecode dump",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly skip the "dump" part or are .luac files known as dump files?

@wader
Copy link
Owner

wader commented Jun 20, 2023

I added more decode function, but is still need FieldAnyFn, because LuaJIT's format makes a weird use of ULEB128 encoding, and I want to map the bits of the two 32bit-max ULEB128 to the resulting 64bit Field. (see LuaJITDecodeI64(), for example) Maybe there's another way to do that ?

I see, yeap that might be tricky.

The problem is that we need a default case: 0 to 4 are types with know size, but 5 and more indicates a string and its size:
5 is a 0-length string, 6 is a 1-byte string, ect.

I don't think we can use a UintMapSymStr here.

Aha that makes sense, i guess you could use a range mapper but maybe not worth it.

I added some more comments but getting a bit late here so will continue tomorrow. Think we're close to something mergable. Thanks for the patience!

Description: "LuaJIT 2.0 bytecode",
Groups: []*decode.Group{format.Probe},
DecodeFn: LuaJITDecode,
Dependencies: []decode.Dependency{
Copy link
Owner

@wader wader Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependencies you can skip, it's if the format decoder itself want to decode something using some group

})

if !di.Strip {
d.FieldArray("debug", func(d *decode.D) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When i played around with the decoder i noticed that data in debug has some strings, is there any known structure to it or should be threaded as opaque blob? if blob should the array be replaced with d.FieldRawLen("debug", debuglen*8)?

}
}

type kgcType struct{}
Copy link
Owner

@wader wader Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One variant is to do something like:

type fallbackUintMapSymStr struct {
	fallback string
	scalar.UintMapSymStr
}

func (m fallbackUintMapSymStr) MapUint(s scalar.Uint) (scalar.Uint, error) {
	s.Sym = m.fallback
	return m.UintMapSymStr.MapUint(s)
}

and then

	kgctype := d.FieldULEB128("type", fallbackUintMapSymStr{
		fallback: "str",
		UintMapSymStr: scalar.UintMapSymStr{
			0: "child",
			1: "tab",
			2: "i64",
			3: "u64",
			4: "complex",
		},
	})

and use it for ktabType also

Maybe the scalar package should have something like this if its common hmm

}

func LuaJITDecodeI64(d *decode.D) {
d.FieldAnyFn("i64", func(d *decode.D) any {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some other format decoders that do something similar i've decode the value as a "value" field, see msgpack or bson etc. Maybe that would make sense here also? for complex maybe the value field could an extra d.FieldValueAny("value", ...)?

interp.RegisterFS(LuaJITFS)
}

// reinterpret an int as a float
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand how this works. So all numbers are floats but encoded i two parts where the hi part will be zero for < 32 bit integers? and hi might be > 2^32 so hi << 32 + lo will be > 2^64 and needs to be float? but looks like we only care about 8 bytes when reinterpreting? hope that made sense :)

// make it a signed integer
lo_data_i32 := int32(lo_data_u32)

// return a larger type to make fq happy
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕺💃 maybe there should be smaller number types hmm, should probably do another round of profiling memory soon

})

// TODO: find out more about how to decode these strings
d.FieldArray("anotations", func(d *decode.D) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: annotations

👍 with more debug structure

@wader
Copy link
Owner

wader commented Jun 21, 2023

Lint not happy about unused probeGroup

Think i'm happy to merge if your happy too?

@dlatchx
Copy link
Contributor Author

dlatchx commented Jun 21, 2023

it's good for me

@wader wader merged commit 5605ebe into wader:master Jun 21, 2023
@wader
Copy link
Owner

wader commented Jun 21, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants