Add support for string constructors to the interpreter #115914

kg · 2025-05-22T22:26:28Z

Updates the compiler to identify NEWOBJ opcodes that are operating on string or multidim arrays, and generates a different specialized newobj opcode for them.
Updates the callstub generator to know how to generate the appropriate type of stub for those constructors.
Adds a specialized newobj opcode for strings and md arrays (md arrays not actually implemented in this PR.)
Modifies InvokeCompiledMethod to accept the code address from outside.

Copilot

Pull Request Overview

This PR adds support for string constructors in the interpreter and updates the associated call stub generation.

Added a new test case (TestStringCtor) to verify string constructor functionality
Updated interpreter execution logic to correctly handle fcalls for string constructors
Adjusted call stub generation to account for special string constructors

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/tests/JIT/interpreter/Interpreter.cs	Added test method for string constructor support
src/coreclr/vm/interpexec.cpp	Updated interpreter method call to support special string constructors
src/coreclr/vm/callstubgenerator.cpp	Modified call stub generation for special constructor handling

Copilot · 2025-05-22T22:26:59Z

src/coreclr/vm/interpexec.cpp

@@ -1186,15 +1186,46 @@ void InterpExecMethod(InterpreterFrame *pInterpreterFrame, InterpMethodContextFr
                    callArgsOffset = ip[2];
                    methodSlot = ip[3];

-                    OBJECTREF objRef = AllocateObject((MethodTable*)pMethod->pDataItems[ip[4]]);
+                    MethodTable *pClass = (MethodTable*)pMethod->pDataItems[ip[4]];
+                    // FIXME: Duplicated code from CALL_INTERP_SLOT


Consider refactoring the duplicated code block for handling string constructor invocations to improve maintainability.

Suggested change

// FIXME: Duplicated code from CALL_INTERP_SLOT

src/coreclr/vm/callstubgenerator.cpp

kg · 2025-05-23T00:06:30Z

Anyone know what's up with this crossdac failure on CI?

  [458/464] Linking CXX static library unwinder\unwinder_dac.lib
  [459/464] Building RC object dlls\mscordbi\CMakeFiles\mscordbi.dir\Native.rc.res
  [460/464] Building C object D:\a\_work\1\s\artifacts\obj\external\libunwind\CMakeFiles\libunwind_xdac.dir\D_\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c.obj
  FAILED: D:/a/_work/1/s/artifacts/obj/external/libunwind/CMakeFiles/libunwind_xdac.dir/D_/a/_work/1/s/src/native/external/libunwind/src/dwarf/Gparser.c.obj 
  C:\PROGRA~1\MICROS~1\2022\ENTERP~1\VC\Tools\MSVC\1443~1.348\bin\Hostx64\x64\cl.exe  /nologo -DBUILDENV_CHECKED=1 -DCROSS_COMPILE -DDEBUG -DDISABLE_CONTRACTS -DHAVE_CONFIG_H=1 -DHAVE_DL_ITERATE_PHDR=1 -DHAVE_UNW_GET_ACCESSORS -DHAVE___THREAD=0 -DHOST_64BIT -DHOST_AMD64 -DHOST_WINDOWS -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_STRING=\"\" -DTARGET_64BIT -DTARGET_AMD64 -DTARGET_LINUX -DTARGET_UNIX -DUNW_REMOTE_ONLY -DURTBLDENV_FRIENDLY=Checked -D_CRT_DECLARE_NONSTDC_NAMES -D_CRT_SECURE_NO_WARNINGS -D_DBG -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_TIME_BITS=64 -D_Thread_local="" -D__amd64__ -D__linux__ -D__x86_64__ -ID:\a\_work\1\s\artifacts\obj\external\libunwind -ID:\a\_work\1\s\src\native\external\libunwind_extras -ID:\a\_work\1\s\src\native -ID:\a\_work\1\s\src\native\inc -ID:\a\_work\1\s\src\native\external\libunwind\include\tdep -ID:\a\_work\1\s\src\native\external\libunwind\include -ID:\a\_work\1\s\artifacts\obj\external\libunwind\include\tdep -ID:\a\_work\1\s\artifacts\obj\external\libunwind\include -ID:\a\_work\1\s\src\native\external\libunwind\include\remote -ID:\a\_work\1\s\src\native\external\libunwind\include\remote\win -ID:\a\_work\1\s\src\native\external\libunwind\src /DWIN32 /D_WINDOWS -std:c11 -MTd /O2 /nologo /W4 /WX /Oi /Oy- /Gm- /Zp8 /Gy /GS /fp:precise /FC /MP /Zm200 /Zc:strictStrings /Zc:wchar_t /Zc:inline /Zc:forScope /wd4065 /wd4100 /wd4127 /wd4131 /wd4189 /wd4200 /wd4201 /wd4206 /wd4239 /wd4245 /wd4291 /wd4310 /wd4324 /wd4366 /wd4456 /wd4457 /wd4458 /wd4459 /wd4463 /wd4505 /wd4702 /wd4706 /wd4733 /wd4815 /wd4838 /wd4918 /wd4960 /wd4961 /wd5105 /wd5205 /we4007 /we4013 /we4102 /we4551 /we4640 /we4806 /we4055 /we4146 /we4242 /we4244 /we4267 /we4302 /we4308 /we4509 /we4510 /we4532 /we4533 /we4610 /we4611 /we4700 /we4701 /we4703 /we4789 /we4995 /we4996 /w34092 /w34121 /w34125 /w34130 /w34132 /w34212 /w34530 /w35038 /w44177 /Zi /ZH:SHA_256 /source-charset:utf-8 /guard:cf /guard:ehcont /permissive- -wd4068 -wd4334 -wd4311 -wd4475 -wd4477 /TC /showIncludes /FoD:\a\_work\1\s\artifacts\obj\external\libunwind\CMakeFiles\libunwind_xdac.dir\D_\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c.obj /FdD:\a\_work\1\s\artifacts\obj\external\libunwind\CMakeFiles\libunwind_xdac.dir\ /FS -c D:\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c
  D:\a\_work\1\s\src\native\external\libunwind\src\dwarf\Gparser.c(1181): fatal error C1090: PDB API call failed, error code '23': (0x00000005)
  [461/464] Building CXX object dlls\mscordbi\CMakeFiles\mscordbi.dir\mscordbi.cpp.obj
  ninja: build stopped: subcommand failed.
##[error]BUILD: Error: native component build failed. Refer to the build log files for details.

EDIT: Looks like https://developercommunity.visualstudio.com/t/C1090-PDB-API-call-failed-error-code-2/48897

jkotas · 2025-05-23T00:19:04Z

Also tracked here: #48070 . Build analysis should flag it for you.

jkotas · 2025-05-23T00:55:36Z

src/coreclr/vm/interpexec.cpp

+                    //  fcall that is basically a static method that returns the new instance.
+                    if (pMD && pClass->HasComponentSize())
+                    {
+                        // The compiler didn't know about this so it reserved space for a this-reference. We need to skip


This sounds like a temporary workaround. The compiler can know about this (by checking CORINFO_FLG_VAROBJSIZE flag). What needs to happen to move this logic to the compiler?

m_compHnd didn't appear to expose the things I needed to determine this. I can take another look.

That's the flag CORINFO_FLG_VAROBJSIZE that @jkotas mentioned offline yesterday:

runtime/src/coreclr/jit/importercalls.cpp

Lines 1010 to 1026 in fc85a87

if (opcode == CEE_NEWOBJ)

{

if (clsFlags & CORINFO_FLG_VAROBJSIZE)

{

assert(!(clsFlags & CORINFO_FLG_ARRAY)); // arrays handled separately

// This is a 'new' of a variable sized object, wher

// the constructor is to return the object. In this case

// the constructor claims to return VOID but we know it

// actually returns the new object

assert(callRetTyp == TYP_VOID);

callRetTyp = TYP_REF;

call->gtType = TYP_REF;

impSpillSpecialSideEff();

impPushOnStack(call, typeInfo(clsHnd));

}

else

You can get it by getClassAttribs or by the getCallInfo in the CORINFO_CALL_INFO::classFlags.

Fixing this on the compiler side complicates it a lot. Could we just keep the extra slot allocation we ignore?

Updated. 0298d2e
Not sure how I feel about it.

src/coreclr/vm/callstubgenerator.cpp

src/coreclr/vm/interpexec.cpp

BrzVlad · 2025-05-27T16:51:32Z

src/coreclr/vm/interpexec.cpp

+                    callArgsOffset = ip[2];
+                    methodSlot = ip[3];
+
+                    // FIXME: Duplicated code from CALL_INTERP_SLOT


I don't really understand why this opcode is not a normal call like the others. Could we avoid having this code duplication here ?

generating the call and getting everything to work right given the way we do the tagged method pointer and then cache the call target looked like a pain.

right now we rely on being able to cache the call target and then use CodeInfo to figure out whether it is interp code or jit code. the helpers for this are a third category so we would need another different tag for them or would need to generate a generic helper opcode.

if we're not ok with a special opcode for arrays and strings i can figure something out, but i don't know how long it will take

md arrays make it worse because you have to do a weird thing to adapt the array of dimensions to its actual call signature, so that one would need additional setup opcodes before a regular call. you can see that in the mdarray draft, look for an array called dims

janvorli · 2025-05-28T18:23:38Z

src/coreclr/vm/interpexec.cpp

+                    // FIXME: Duplicated code from CALL_INTERP_SLOT
+                    size_t targetMethod = (size_t)pMethod->pDataItems[methodSlot];
+                    MethodDesc *pMD = nullptr;
+                    if (targetMethod & INTERP_METHOD_HANDLE_TAG)


Should this be assert instead? If this is not set, the pMD is null and the TryGetMultiCallableAddrOfCode below will crash.

Update src/coreclr/vm/callstubgenerator.cpp Co-authored-by: Aaron Robinson <arobins@microsoft.com> Update isSpecialConstructor to match other parts of the runtime Migrate some string/array ctor smarts from interpexec to compiler Separate newobj opcode for string and mdarray

jkotas · 2025-05-29T23:43:07Z

src/coreclr/vm/interpexec.cpp

@@ -1213,6 +1213,28 @@ void InterpExecMethod(InterpreterFrame *pInterpreterFrame, InterpMethodContextFr
                    ip += 5;
                    goto CALL_INTERP_SLOT;
                }
+                case INTOP_NEWOBJ_VAROBJSIZE:


What's the difference between this and INTOP_CALL?

In other words - if the JIT produced a regular INTOP_CALL targetMethod instead of this special opcode, where would it break?

Right now, nowhere, but mdarrays are going to use this opcode and have special behavior. I'm open to generating call for this and reserving the opcode only for mdarray.

generating call for this and reserving the opcode only for mdarray.

I think it would make more sense.

I'll test generating CALL and see if anything breaks.

We can't use INTOP_CALL because the string ctors aren't IL, they're icalls:

runtime/src/libraries/System.Private.CoreLib/src/System/String.cs

Line 287 in 60e9340

public extern String(char c, int count);

Right now we can only call IL methods with INTOP_CALL because of how it implements invoking native code.

I wanted to originally pierce through and find the managed method that implements the ctors and call that, but it sounded like there's not a way to do that from inside the JIT (and adding a new method to the JIT to do it would have been a pain anyway, and was opposed when I suggested it).

The new opcode I added happens to be constructed in a way that works for icalls. And then the mdarrays PR will expand it to also handle the unique calling convention for mdarray ctors. IMO it makes sense to have a dedicated opcode for the two variable-size objects we have in the type system.

The interpreter will need a way to call icalls. I am actually surprised that we are not hitting problem with calling icalls in more places. What would it take to make icalls work in the interpreter instead of adding new special opcode?

unique calling convention for mdarray ctors

I agree that mdarray ctors have unique calling convention and it makes sense to have a special opcode for those. I do not think the string ctors are special like that.

We would need to detect whether a method will successfully PrepareInitialCode or not, either at compile time (preferable) or at execution time. I'm not sure how to do it at compile time, maybe getCallInfo or getMethodInfo. I can look into that. At execution time we can check all the relevant flags, though it looks like there are a lot of them.

I think what I would do if I had to fix this now is only attempt calls through GetNativeCode for methods with IL, and do everything via TryGetMultiCallableAddrOfCode for anything else. We know that if IsIL() then PrepareInitialCode should work.

This complicates the existing call opcodes at execution time though because we don't have anywhere to store this flag. We would need to add an additional data item to store it, or add new opcode(s) for 'non-IL calls'. This is because we use a tag bit to put the MethodDesc and native code ptr in the same data item instead of storing both. If we were to start storing both separately we could do MethodDesc->IsIL() before every call to decide what to do. Or we add a new INTOP_CALL_NATIVE opcode that is designed for targets which are not IL - would we also need a INTOP_CALLVIRT_NATIVE or anything? I can't think of a case where we would.

We would need to detect whether a method will successfully PrepareInitialCode or not, either at compile time

We have a temporary hack for that https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/interpexec.cpp#L1144-L1150 . You can update the hack to invoke the native code instead of EEPOLICY_HANDLE_FATAL_ERROR_WITH_MESSAGE. I think changing the condition to if (!codeInfo.IsValid() || codeInfo.GetCodeManager() != ExecutionManager::GetInterpreterCodeManager()) should do it.

I can look into that.

There is a discussion about how this should work in one of the Teams chats. @janvorli started a design doc to describe how this should work.

dotnet-policy-service · 2025-05-31T14:13:09Z

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Copilot AI review requested due to automatic review settings May 22, 2025 22:26

kg requested review from BrzVlad and janvorli as code owners May 22, 2025 22:26

github-actions bot added the area-Interop-coreclr label May 22, 2025

dotnet-policy-service bot assigned kg May 22, 2025

Copilot AI reviewed May 22, 2025

View reviewed changes

AaronRobinsonMSFT reviewed May 22, 2025

View reviewed changes

src/coreclr/vm/callstubgenerator.cpp Outdated Show resolved Hide resolved

kg mentioned this pull request May 22, 2025

Add multi-dim array support to the interpreter #115916

Draft

jkotas reviewed May 23, 2025

View reviewed changes

janvorli reviewed May 23, 2025

View reviewed changes

src/coreclr/vm/callstubgenerator.cpp Outdated Show resolved Hide resolved

BrzVlad reviewed May 23, 2025

View reviewed changes

src/coreclr/vm/interpexec.cpp Outdated Show resolved Hide resolved

BrzVlad reviewed May 27, 2025

View reviewed changes

janvorli reviewed May 28, 2025

View reviewed changes

kg added 3 commits May 29, 2025 13:08

Address pr feedback

4cc83dc

Revert unnecessary changes

c450924

kg force-pushed the interp-stringctor branch from e1e5efd to c450924 Compare May 29, 2025 22:55

jkotas reviewed May 29, 2025

View reviewed changes

build-analysis bot mentioned this pull request May 30, 2025

Test failure: baseservices/threading/regressions/115178/115178/115178.cmd #116060

Open

filipnavara added area-CodeGen-Interpreter-coreclr and removed area-Interop-coreclr labels May 31, 2025

	if (opcode == CEE_NEWOBJ)
	{
	if (clsFlags & CORINFO_FLG_VAROBJSIZE)
	{
	assert(!(clsFlags & CORINFO_FLG_ARRAY)); // arrays handled separately
	// This is a 'new' of a variable sized object, wher
	// the constructor is to return the object. In this case
	// the constructor claims to return VOID but we know it
	// actually returns the new object
	assert(callRetTyp == TYP_VOID);
	callRetTyp = TYP_REF;
	call->gtType = TYP_REF;
	impSpillSpecialSideEff();

	impPushOnStack(call, typeInfo(clsHnd));
	}
	else

Add support for string constructors to the interpreter #115914

Are you sure you want to change the base?

Add support for string constructors to the interpreter #115914

Conversation

kg commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kg commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dotnet-policy-service bot commented May 31, 2025

Uh oh!

Uh oh!

kg commented May 22, 2025 •

edited

Loading

kg commented May 23, 2025 •

edited

Loading

jkotas commented May 23, 2025 •

edited

Loading