Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "placement new"-style constructors generation #79

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ZimM-LostPolygon
Copy link
Contributor

@ZimM-LostPolygon ZimM-LostPolygon commented Oct 3, 2024

Based on the idea and the list of types from here:
#55 (comment)

Generated code looks like this:

CIMGUI_API void cimgui::ImFontConfig_ImFontConfig(cimgui::ImFontConfig* self)
{
    IM_PLACEMENT_NEW(reinterpret_cast<::ImFontConfig*>(self)) ::ImFontConfig();
}

  • Should the generated function naming style be ImFontConfig_ImFontConfig, ImFontConfig_Construct, ImFontConfig_New, something else? How do we differentiate between placement new constructors and by-value constructors that return an already constructed value? Should we even differentiate? (personally I like the consistency of using the same ImFontConfig_ImFontConfig style for everything, but then we also already have ImVector_Construct)
  • This generates only the constructor, do we need the destructors? If yes, then for which scenario?

@ZimM-LostPolygon
Copy link
Contributor Author

Not directly related, but this assumes the C and C++ structs all have exactly the same memory layout. Which makes sense, but I've found this bit of code puzzling in that context:

# Emit code to copy each member

Why do we need to do per-member copy instead of a simply memcpy?

@ShironekoBen
Copy link
Collaborator

Thanks - this looks good to me!

On the questions front:

Should the generated function naming style be ImFontConfig_ImFontConfig, ImFontConfig_Construct, ImFontConfig_New, something else? How do we differentiate between placement new constructors and by-value constructors that return an already constructed value? Should we even differentiate? (personally I like the consistency of using the same ImFontConfig_ImFontConfig style for everything, but then we also already have ImVector_Construct)

Hmm... this is a complicated one. My thinking at the moment is:

  1. I'm not fond of ImFontConfig_ImFontConfig simply because in my eyes it looks a bit odd and isn't particularly obvious if you are coming from a language which doesn't use the "constructors take the name of the type" convention (and even in those languages the number of situations where you would manually use the fully-qualified ImFontConfig::ImFontConfig version are very limited). It also raises the question of "what would we name the destructor?" (as obviously ~ImFontConfig is a non-starter).

  2. New/Delete would be fine but since we already have one case of Construct/Destruct it seems sensible to keep them consistent (also potentially avoids minor-but-annoying issues when generating bindings in a language where those are reserved words).

  3. I pondered a lot about differentiating placement-new constructors and by-value constructors but ultimately I don't think we need to. void ImVector_Construct(void* self) and ImThing ImThing_Construct() are self-evidently different and you can't really get it wrong in any way that isn't going to result in an obvious compile error. If Intellisense wasn't a thing then maybe it would be more of a concern but when in 99% of cases your IDE is going to immediately tell you what the arguments are it seems fine to use the same name for both. This also avoids the problem that if we split those up it would be logical to split up the destructors too, but that then results in something like ImVector_Destruct(void* self) and ImThing_Delete(ImThing* self) that look different but actually do entirely the same thing...

  4. ImFontConfig_Init() feels like a strong contender as well, but only if we don't need destructors...

This generates only the constructor, do we need the destructors? If yes, then for which scenario?

That's a very good question, and it ties into the naming problem as well I think.

Basically, my feeling is that if we go for Construct/Destruct we should always have those functions as a pair, as otherwise we lead the user into the trap of "this type is constructed with Construct but doesn't have/need a Destruct call, but this otherwise identical-looking type which is also constructed with Construct will leak memory if you don't call Destruct, and you have no way of knowing which is which without looking at the header file/documentation". So I'd like a simple rule users can follow that "if you see a Construct() call there must always be a corresponding Destruct() somewhere".
(even if internally that Destruct() call does nothing)

Conversely though, that does mean that the user ends up making Destruct() calls which are entirely pointless in some situations. If we want to avoid that, we probably need to split things so that there's a clear user-facing distinction - for example, ImFoo_Init() is just a default-value setter and ImFoo can be safely discarded without any further effort, but ImThing_Construct() means that you need a ImThing_Destruct() somewhere. This avoids unnecessary overhead, but has the downside that adding something that requires destruction is an API breaking change (although that is also a positive, as it forces users to look at their code and hopefully in the process add the required Destruct() call).

So to summarize, I've personally got two preferred options, I think:

  1. Use Construct()/Destruct() for all types, keeping the API entirely consistent at the cost of some redundant code.
  2. Use Construct()/Destruct() for types that require destruction, and Init() for those that do not. Reduces code overhead now at the cost of adding some minor maintenance later if type behaviours change.

My gut feeling is that 2 is probably the better option (as there aren't really very many types that require the full Construct()/Destruct() pattern right now), but what are your thoughts?

As for this:

Why do we need to do per-member copy instead of a simply memcpy?

Hmm... to be honest, "why indeed?" is the answer that springs to mind!

I suspect when I wrote that code there was still some uncertainty in my head about how much I could rely on keeping structure layouts the same, and also the possibility that some types would need custom conversion logic at some point. It does feel like a memcpy() would do the job there, so I'll try changing it over at some point.

Thanks!

@ZimM-LostPolygon
Copy link
Contributor Author

ZimM-LostPolygon commented Oct 6, 2024

Use Construct()/Destruct() for types that require destruction, and Init() for those that do not. Reduces code overhead now at the cost of adding some minor maintenance later if type behaviours change.

I personally prefer this option, if nothing else than simply to avoid "dummy" destructor calls when using wrappers in higher-level languaegs like C#. But we really should add some metadata to mark function as constructor/destructor/initializer, for purposes of generating bindings from C to another languages. Right now, the only way to know if a function is a constructor is to check if the function's original_fully_qualified_name is equal to the name of the parent struct, which is janky at best.

at the cost of adding some minor maintenance later if type behaviours change.

Just thinking out loud - this will probably only work when used directly from C or a similar low-level language, as a typical C# wrapper would wrap the Init() method with a struct constructor anyway, so if the behavior changes, there's not a lot of ways to communicate that suddenly you also need to call an extra Dispose()-like method.

It does feel like a memcpy() would do the job there

I've tried it locally and noticed no problems! Also looking at that entire part of code, there's entire machinery to generate methods like ConvertFromCPP_ImVec2?.. Assuming the memory layout is the same, just reinterpret_cast will suffice like we do for non-by-value structs. At least I can't really think of any cases where custom logic would be needed, and if there are - those would probably be manual helpers like ImVector_Construct?

@ZimM-LostPolygon
Copy link
Contributor Author

ZimM-LostPolygon commented Oct 6, 2024

Just recalled that there are a few silly destructors in Dear ImGui that are actually something like "block finishers", and duplicate existing struct functions:

ImGuiListClipper::~ImGuiListClipper()
{
    End();
}

I'm fairly sure those should be special-cased and not emitted as destructors at all, and instead the users should be asked to just call End() themselves. After all, you have to call Begin() manually, it's not included in the constructor or anything, so why is it different for End()?
Does that make any sense, or should we try to not outsmart ourselves and just emit it anyway?

@ocornut
Copy link
Member

ocornut commented Oct 6, 2024

I believe clipper is the only case, and maybe due to legacy reason, but it is therefore ok for it to only call End.
However since it needs a Construct anyhow I believe the Destruct should call c++ constructor and no need to be smarter than that.

@@ -246,6 +245,22 @@ def convert_header(
'ImRect',
'ImGuiListClipperRange'
])

# Mark certain types as needing placement new constructors that initialize the memory block passed into them
mod_mark_placement_constructor_structs.apply(dom_root, [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list was given by @ocornut . lets not depend on that.
how can i figuire out this list and/or knows when to update it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ask is to provide additional detail in a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't list we can automatically figure out what goes in the list.
I can try to notify this project if new types appear that are likely to require constructors, otherwise people will eventually request them.

@zaafar
Copy link
Contributor

zaafar commented Oct 6, 2024

  1. Use Construct()/Destruct() for all types, keeping the API entirely consistent at the cost of some redundant code.
    2) Use Construct()/Destruct() for types that require destruction, and Init() for those that do not. Reduces code overhead now at the cost of adding some minor maintenance later if type behaviours change.

Personally I like 1 as its less Cognitive load on the dev but 2 is fine as long as it's explicitly mention in the readme of this repo what does construct/Destruct and init mean. what's the difference between them. why would u use them at all.

@zaafar
Copy link
Contributor

zaafar commented Oct 6, 2024

adding some minor maintenance later if type behaviours change.

also, if this repo is gonna last for many years to come...it's not a minor maintenance. let's not add more work. When these things changes (i.e. constructor isn't required it's init now) you may not be thinking about dear_bindings....if something changes from init to constructor...that's worst...now u get memory leak.....unless imgui itself have a way of differentiating between them programmatically i.e. is it a constructor or a init i wouldn't go there.

@zaafar
Copy link
Contributor

zaafar commented Oct 6, 2024

Right now, the only way to know if a function is a constructor is to check if the function's original_fully_qualified_name is equal to the name of the parent struct, which is janky at best.

oh. so there is a way, in that case, nevermind :)

Just thinking out loud - this will probably only work when used directly from C or a similar low-level language, as a typical C# wrapper would wrap the Init() method with a struct constructor anyway, so if the behavior changes, there's not a lot of ways to communicate that suddenly you also need to call an extra Dispose()-like method.

yeah but you as a c# wrapper owner have a keep an eye on the changelog (or worst the code) to ensure c# wrapper is consistent every imgui release. they do a good job with changelog so maybe I am just being paranoid.

@ZimM-LostPolygon
Copy link
Contributor Author

yeah but you as a c# wrapper owner have a keep an eye on the changelog (or worst the code) to ensure c# wrapper is consistent every imgui release. they do a good job with changelog so maybe I am just being paranoid.

If I want to be ultra paranoid, I can take the previous version metadata and check if any _Init() functions have become _Construct. Plus, yes, the changelogs are usually excellent, I don't see this becoming a constant problem

@zaafar
Copy link
Contributor

zaafar commented Oct 7, 2024

if the behavior changes, there's not a lot of ways to communicate that suddenly you also need to call an extra Dispose()-like method.

the only way I can think of to solve this problem is to re-name the struct based on if it has intit, construct or nothing. (for those who aren't familiar with C# tldr structs over here has constructors but no destructors).....personally i like compiler error when upgrading imgui over runtime error OR memory leak issue :)

ZimM-LostPolygon, how do those struct behaves when they are passed by copy? do they call the constructor again?

EDIT: you have to do new(); in order to call your fav constructor so pass-by-copy is safe. however, this mean calling new() is a must.

@ZimM-LostPolygon
Copy link
Contributor Author

personally i like compiler error when upgrading imgui over runtime error OR memory leak issue :)

everyone does, but appending prefixes to struct based on that sounds very ugly. I'd rather do the diff on every imgui release - normally you'd still want to do that to check for any other things that need tweaking.

ZimM-LostPolygon, how do those struct behaves when they are passed by copy? do they call the constructor again?

It's simply disallowed - if a by-value struct has a destructor, dear_bindings will throw an error.

@zaafar
Copy link
Contributor

zaafar commented Oct 7, 2024

everyone does, but appending prefixes to struct based on that sounds very ugly. I'd rather do the diff on every imgui release - normally you'd still want to do that to check for any other things that need tweaking.

it's not just you (binding owner) who has to do that, it's everyone who uses your bindings....and i understand that's the right thing to do (and i do that as well) but given imgui doesn't follow semantic versioning + imgui release notes != imgui binding release notes (e.g. adding constructor+distructor might not be considered as breaking change)....it's easy to miss...at-least for new ppl.

@zaafar
Copy link
Contributor

zaafar commented Oct 7, 2024

anyway, there is no right or wrong over here, it's just personal preferences. at the end of the day if we write down something about it in readme for new ppl, i think it will be fine. also, i am assuming these structs rarely changes.

e.g. readme questions = as a project owner/binding owner what do i need to look for in releases, what's init, what's constructor, etc etc.

@ShironekoBen
Copy link
Collaborator

I don't think I'm too worried about bindings for a non-C language (e.g. C#) running into the case where a type changes from not needing a constructor/destructor to needing one because that seems pretty straightforward to automate - if we emit an is_constructor/is_destructor marker in the metadata, then the consumer of that can generate appropriate constructs in the target language, and in most cases I think that will either result in compile errors or at least warnings from code checking tools if there is now a missing Dispose() as a result.

And in C the only case that's really problematic is "class that had no init or constructor suddenly needs one or the other", as for the other cases (init becomes constructor, etc) there will be a compile error trying to call the old function. If we standardise on "all classes have some sort of init/construct function" then this basically disappears as a problem, I suspect. (which feels reasonable... whilst there are cases where for performance you might want to deliberately leave a structure uninitialised and then overwrite all the values yourself, that's still doable just by ignoring the init function, and I'm not sure we want to be encouraging that sort of behaviour anyway because it's even more hilariously brittle in the face of structure changes)

Sure, there will probably be cases that slip through the cracks but pretty much any scheme short of the suggestion above of renaming types based on constructor presence is going to run into that I suspect. And with non-C bindings the binding generator for the target language could always implement that scheme if they felt it was necessary for their use-case - there's no need for Dear Bindings to enforce it.

@zaafar
Copy link
Contributor

zaafar commented Oct 9, 2024

there's no need for Dear Bindings to enforce it.

agree

in most cases I think that will either result in compile errors or at least warnings from code checking tools if there is now a missing Dispose() as a result.

maybe there is an innovative approach with new c# language version to do that but as far as I am aware this line won't be true. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants