Skip to content

Add support for string constructors to the interpreter #115914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kg
Copy link
Member

@kg kg commented May 22, 2025

  • Updates the compiler to identify NEWOBJ opcodes that are operating on string or multidim arrays, and generates a different specialized newobj opcode for them.
  • Updates the callstub generator to know how to generate the appropriate type of stub for those constructors.
  • Adds a specialized newobj opcode for strings and md arrays (md arrays not actually implemented in this PR.)
  • Modifies InvokeCompiledMethod to accept the code address from outside.

@Copilot Copilot AI review requested due to automatic review settings May 22, 2025 22:26
@kg kg requested review from BrzVlad and janvorli as code owners May 22, 2025 22:26
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for string constructors in the interpreter and updates the associated call stub generation.

  • Added a new test case (TestStringCtor) to verify string constructor functionality
  • Updated interpreter execution logic to correctly handle fcalls for string constructors
  • Adjusted call stub generation to account for special string constructors

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/tests/JIT/interpreter/Interpreter.cs Added test method for string constructor support
src/coreclr/vm/interpexec.cpp Updated interpreter method call to support special string constructors
src/coreclr/vm/callstubgenerator.cpp Modified call stub generation for special constructor handling

@@ -1186,15 +1186,46 @@ void InterpExecMethod(InterpreterFrame *pInterpreterFrame, InterpMethodContextFr
callArgsOffset = ip[2];
methodSlot = ip[3];

OBJECTREF objRef = AllocateObject((MethodTable*)pMethod->pDataItems[ip[4]]);
MethodTable *pClass = (MethodTable*)pMethod->pDataItems[ip[4]];
// FIXME: Duplicated code from CALL_INTERP_SLOT
Copy link
Preview

Copilot AI May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring the duplicated code block for handling string constructor invocations to improve maintainability.

Suggested change
// FIXME: Duplicated code from CALL_INTERP_SLOT

Copilot uses AI. Check for mistakes.

@kg
Copy link
Member Author

kg commented May 23, 2025

Anyone know what's up with this crossdac failure on CI?

  [458/464] Linking CXX static library unwinder\unwinder_dac.lib
  [459/464] Building RC object dlls\mscordbi\CMakeFiles\mscordbi.dir\Native.rc.res
  [460/464] Building C object D:\a\_work\1\s\artifacts\obj\external\libunwind\CMakeFiles\libunwind_xdac.dir\D_\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c.obj
  FAILED: D:/a/_work/1/s/artifacts/obj/external/libunwind/CMakeFiles/libunwind_xdac.dir/D_/a/_work/1/s/src/native/external/libunwind/src/dwarf/Gparser.c.obj 
  C:\PROGRA~1\MICROS~1\2022\ENTERP~1\VC\Tools\MSVC\1443~1.348\bin\Hostx64\x64\cl.exe  /nologo -DBUILDENV_CHECKED=1 -DCROSS_COMPILE -DDEBUG -DDISABLE_CONTRACTS -DHAVE_CONFIG_H=1 -DHAVE_DL_ITERATE_PHDR=1 -DHAVE_UNW_GET_ACCESSORS -DHAVE___THREAD=0 -DHOST_64BIT -DHOST_AMD64 -DHOST_WINDOWS -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_STRING=\"\" -DTARGET_64BIT -DTARGET_AMD64 -DTARGET_LINUX -DTARGET_UNIX -DUNW_REMOTE_ONLY -DURTBLDENV_FRIENDLY=Checked -D_CRT_DECLARE_NONSTDC_NAMES -D_CRT_SECURE_NO_WARNINGS -D_DBG -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_TIME_BITS=64 -D_Thread_local="" -D__amd64__ -D__linux__ -D__x86_64__ -ID:\a\_work\1\s\artifacts\obj\external\libunwind -ID:\a\_work\1\s\src\native\external\libunwind_extras -ID:\a\_work\1\s\src\native -ID:\a\_work\1\s\src\native\inc -ID:\a\_work\1\s\src\native\external\libunwind\include\tdep -ID:\a\_work\1\s\src\native\external\libunwind\include -ID:\a\_work\1\s\artifacts\obj\external\libunwind\include\tdep -ID:\a\_work\1\s\artifacts\obj\external\libunwind\include -ID:\a\_work\1\s\src\native\external\libunwind\include\remote -ID:\a\_work\1\s\src\native\external\libunwind\include\remote\win -ID:\a\_work\1\s\src\native\external\libunwind\src /DWIN32 /D_WINDOWS -std:c11 -MTd /O2 /nologo /W4 /WX /Oi /Oy- /Gm- /Zp8 /Gy /GS /fp:precise /FC /MP /Zm200 /Zc:strictStrings /Zc:wchar_t /Zc:inline /Zc:forScope /wd4065 /wd4100 /wd4127 /wd4131 /wd4189 /wd4200 /wd4201 /wd4206 /wd4239 /wd4245 /wd4291 /wd4310 /wd4324 /wd4366 /wd4456 /wd4457 /wd4458 /wd4459 /wd4463 /wd4505 /wd4702 /wd4706 /wd4733 /wd4815 /wd4838 /wd4918 /wd4960 /wd4961 /wd5105 /wd5205 /we4007 /we4013 /we4102 /we4551 /we4640 /we4806 /we4055 /we4146 /we4242 /we4244 /we4267 /we4302 /we4308 /we4509 /we4510 /we4532 /we4533 /we4610 /we4611 /we4700 /we4701 /we4703 /we4789 /we4995 /we4996 /w34092 /w34121 /w34125 /w34130 /w34132 /w34212 /w34530 /w35038 /w44177 /Zi /ZH:SHA_256 /source-charset:utf-8 /guard:cf /guard:ehcont /permissive- -wd4068 -wd4334 -wd4311 -wd4475 -wd4477 /TC /showIncludes /FoD:\a\_work\1\s\artifacts\obj\external\libunwind\CMakeFiles\libunwind_xdac.dir\D_\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c.obj /FdD:\a\_work\1\s\artifacts\obj\external\libunwind\CMakeFiles\libunwind_xdac.dir\ /FS -c D:\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c
  D:\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c(1181): fatal error C1090: PDB API call failed, error code '23': (0x00000005)
  [461/464] Building CXX object dlls\mscordbi\CMakeFiles\mscordbi.dir\mscordbi.cpp.obj
  ninja: build stopped: subcommand failed.
##[error]BUILD: Error: native component build failed. Refer to the build log files for details.

EDIT: Looks like https://developercommunity.visualstudio.com/t/C1090-PDB-API-call-failed-error-code-2/48897

@jkotas
Copy link
Member

jkotas commented May 23, 2025

Also tracked here: #48070 . Build analysis should flag it for you.

// fcall that is basically a static method that returns the new instance.
if (pMD && pClass->HasComponentSize())
{
// The compiler didn't know about this so it reserved space for a this-reference. We need to skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a temporary workaround. The compiler can know about this (by checking CORINFO_FLG_VAROBJSIZE flag). What needs to happen to move this logic to the compiler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_compHnd didn't appear to expose the things I needed to determine this. I can take another look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the flag CORINFO_FLG_VAROBJSIZE that @jkotas mentioned offline yesterday:

if (opcode == CEE_NEWOBJ)
{
if (clsFlags & CORINFO_FLG_VAROBJSIZE)
{
assert(!(clsFlags & CORINFO_FLG_ARRAY)); // arrays handled separately
// This is a 'new' of a variable sized object, wher
// the constructor is to return the object. In this case
// the constructor claims to return VOID but we know it
// actually returns the new object
assert(callRetTyp == TYP_VOID);
callRetTyp = TYP_REF;
call->gtType = TYP_REF;
impSpillSpecialSideEff();
impPushOnStack(call, typeInfo(clsHnd));
}
else

You can get it by getClassAttribs or by the getCallInfo in the CORINFO_CALL_INFO::classFlags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this on the compiler side complicates it a lot. Could we just keep the extra slot allocation we ignore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. 0298d2e
Not sure how I feel about it.

callArgsOffset = ip[2];
methodSlot = ip[3];

// FIXME: Duplicated code from CALL_INTERP_SLOT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why this opcode is not a normal call like the others. Could we avoid having this code duplication here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generating the call and getting everything to work right given the way we do the tagged method pointer and then cache the call target looked like a pain.

right now we rely on being able to cache the call target and then use CodeInfo to figure out whether it is interp code or jit code. the helpers for this are a third category so we would need another different tag for them or would need to generate a generic helper opcode.

if we're not ok with a special opcode for arrays and strings i can figure something out, but i don't know how long it will take

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

md arrays make it worse because you have to do a weird thing to adapt the array of dimensions to its actual call signature, so that one would need additional setup opcodes before a regular call. you can see that in the mdarray draft, look for an array called dims

// FIXME: Duplicated code from CALL_INTERP_SLOT
size_t targetMethod = (size_t)pMethod->pDataItems[methodSlot];
MethodDesc *pMD = nullptr;
if (targetMethod & INTERP_METHOD_HANDLE_TAG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be assert instead? If this is not set, the pMD is null and the TryGetMultiCallableAddrOfCode below will crash.

kg added 3 commits May 29, 2025 13:08
Update src/coreclr/vm/callstubgenerator.cpp

Co-authored-by: Aaron Robinson <arobins@microsoft.com>

Update isSpecialConstructor to match other parts of the runtime

Migrate some string/array ctor smarts from interpexec to compiler

Separate newobj opcode for string and mdarray
@kg kg force-pushed the interp-stringctor branch from e1e5efd to c450924 Compare May 29, 2025 22:55
@@ -1213,6 +1213,28 @@ void InterpExecMethod(InterpreterFrame *pInterpreterFrame, InterpMethodContextFr
ip += 5;
goto CALL_INTERP_SLOT;
}
case INTOP_NEWOBJ_VAROBJSIZE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and INTOP_CALL?

In other words - if the JIT produced a regular INTOP_CALL targetMethod instead of this special opcode, where would it break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, nowhere, but mdarrays are going to use this opcode and have special behavior. I'm open to generating call for this and reserving the opcode only for mdarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generating call for this and reserving the opcode only for mdarray.

I think it would make more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test generating CALL and see if anything breaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use INTOP_CALL because the string ctors aren't IL, they're icalls:

public extern String(char c, int count);

Right now we can only call IL methods with INTOP_CALL because of how it implements invoking native code.

I wanted to originally pierce through and find the managed method that implements the ctors and call that, but it sounded like there's not a way to do that from inside the JIT (and adding a new method to the JIT to do it would have been a pain anyway, and was opposed when I suggested it).

The new opcode I added happens to be constructed in a way that works for icalls. And then the mdarrays PR will expand it to also handle the unique calling convention for mdarray ctors. IMO it makes sense to have a dedicated opcode for the two variable-size objects we have in the type system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpreter will need a way to call icalls. I am actually surprised that we are not hitting problem with calling icalls in more places. What would it take to make icalls work in the interpreter instead of adding new special opcode?

unique calling convention for mdarray ctors

I agree that mdarray ctors have unique calling convention and it makes sense to have a special opcode for those. I do not think the string ctors are special like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to detect whether a method will successfully PrepareInitialCode or not, either at compile time (preferable) or at execution time. I'm not sure how to do it at compile time, maybe getCallInfo or getMethodInfo. I can look into that. At execution time we can check all the relevant flags, though it looks like there are a lot of them.

I think what I would do if I had to fix this now is only attempt calls through GetNativeCode for methods with IL, and do everything via TryGetMultiCallableAddrOfCode for anything else. We know that if IsIL() then PrepareInitialCode should work.

This complicates the existing call opcodes at execution time though because we don't have anywhere to store this flag. We would need to add an additional data item to store it, or add new opcode(s) for 'non-IL calls'. This is because we use a tag bit to put the MethodDesc and native code ptr in the same data item instead of storing both. If we were to start storing both separately we could do MethodDesc->IsIL() before every call to decide what to do. Or we add a new INTOP_CALL_NATIVE opcode that is designed for targets which are not IL - would we also need a INTOP_CALLVIRT_NATIVE or anything? I can't think of a case where we would.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to detect whether a method will successfully PrepareInitialCode or not, either at compile time

We have a temporary hack for that https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/interpexec.cpp#L1144-L1150 . You can update the hack to invoke the native code instead of EEPOLICY_HANDLE_FATAL_ERROR_WITH_MESSAGE. I think changing the condition to if (!codeInfo.IsValid() || codeInfo.GetCodeManager() != ExecutionManager::GetInterpreterCodeManager()) should do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into that.

There is a discussion about how this should work in one of the Teams chats. @janvorli started a design doc to describe how this should work.

Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants