Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

GC stress #4822

Open
jkotas opened this issue Oct 27, 2017 · 17 comments
Open

GC stress #4822

jkotas opened this issue Oct 27, 2017 · 17 comments

Comments

@jkotas
Copy link
Member

jkotas commented Oct 27, 2017

Mode that triggers GC on every N-th allocation and/or PInvoke transition.

@dlewis-arcontech
Copy link

dlewis-arcontech commented Oct 31, 2017

Hi I've changed the code as suggested:

GetGCgen0size () const { return 64 * 1024; }

But the problem callstack still looks the same. I'll try and see if I can recreate the problem on windows and start to debug RhpAssignRef as you suggested.

@dlewis-arcontech
Copy link

Hello, I have managed to reproduce the issue in an isolated code set. The code works for 'dotnet run' on linux but fails in the way mentioned in #4676 for native compilation after a short time (running on Ubuntu 14.04.5 LTS). I've zipped up C/C++ the share object code and the C# code to get the issue to happen and attached them to this post. The zip file contains:

\arc-lib\ - run build.sh to produce libarc.so and set up your LD_LIBRARY_PATH to point at this shared object. I used g++ 4.8.4 to build this.

ConsoleApp2 - build it as native code and run and pretty soon you should get the following Segmentation fault:

Process 12773 stopped

  • thread Add .gitattributes and .gitignore files #3: tid = 12777, 0x00000000004161ec ConsoleApp2EEType::get_BaseSize(this=0x0000000000317fe8) + 12 at eetype.h:365, name = 'ConsoleApp2', stop reason = signal SIGSEGV: invalid address (fault address: 0x317fec) frame #0: 0x00000000004161ec ConsoleApp2EEType::get_BaseSize(this=0x0000000000317fe8) + 12 at eetype.h:365
    362
    363 #ifndef RHDUMP
    364 UInt32 get_BaseSize()
    -> 365 { return m_uBaseSize; }
    366
    367 UInt16 get_ComponentSize()
    368 { return m_usComponentSize; }
    (lldb) bt
  • thread Add .gitattributes and .gitignore files #3: tid = 12777, 0x00000000004161ec ConsoleApp2`EEType::get_BaseSize(this=0x0000000000317fe8) + 12 at eetype.h:365, name = 'ConsoleApp2', stop reason = signal SIGSEGV: invalid address (fault address: 0x317fec)
    • frame #0: 0x00000000004161ec ConsoleApp2EEType::get_BaseSize(this=0x0000000000317fe8) + 12 at eetype.h:365 frame #1: 0x00000000004ec7e5 ConsoleApp2MethodTable::GetBaseSize(this=0x0000000000317fe8) + 21 at gcenv.h:91
      frame Add basic build scripts #2: 0x00000000004936a9 ConsoleApp2WKS::gc_heap::mark_object_simple(unsigned char**) [inlined] WKS::my_get_size(ob=0x00007fffdf199c68) + 34 at gc.cpp:8897 frame #3: 0x0000000000493687 ConsoleApp2WKS::gc_heap::mark_object_simple(po=0x00007fffeec59450) + 535 at gc.cpp:18171
      frame Improve .gitignore #4: 0x00000000004a1d32 ConsoleApp2WKS::gc_heap::mark_through_cards_for_large_objects(void (*)(unsigned char**), unsigned int) [inlined] WKS::gc_heap::mark_through_cards_helper(poo=0x00007fffeec59450, n_gen=0x00007fffddfb3cc0, cg_pointers_found=0x00007fffddfb3cd0, fn=(ConsoleApp2WKS::gc_heap::mark_object_simple(unsigned char**) at gc.cpp:18157), nhigh="\xa0q����, next_boundary="p\x01�)(unsigned char**), unsigned char*, unsigned char*) + 79 at gc.cpp:27545
      frame Make GC and GCSample compile and run on Linux #5: 0x00000000004a1ce3 ConsoleApp2WKS::gc_heap::mark_through_cards_for_large_objects(fn=(ConsoleApp2WKS::gc_heap::mark_object_simple(unsigned char**) at gc.cpp:18157), relocating=NO)(unsigned char**), unsigned int) + 7539 at gc.cpp:31857
      frame CoreLib doesn't implement Environment.GetEnvironmentVariable and friends #6: 0x000000000047887f ConsoleApp2WKS::gc_heap::mark_phase(condemned_gen_number=0, mark_only_p=NO) + 3631 at gc.cpp:19662 frame #7: 0x000000000046fbe7 ConsoleApp2WKS::gc_heap::gc1() + 519 at gc.cpp:15262
      frame Add type system source files #8: 0x000000000048cbe3 ConsoleApp2WKS::gc_heap::garbage_collect(n=0) + 3395 at gc.cpp:16853 frame #9: 0x0000000000461861 ConsoleApp2WKS::GCHeap::GarbageCollectGeneration(this=0x0000000000dfb1e0, gen=0, reason=reason_alloc_soh) + 2049 at gc.cpp:35188
      frame Add ILToNative source files #10: 0x0000000000466a0f ConsoleApp2WKS::gc_heap::try_allocate_more_space(acontext=0x00007fffddfb55e8, size=56, gen_number=0) + 1503 at gc.cpp:13066 frame #11: 0x0000000000466b83 ConsoleApp2WKS::gc_heap::allocate_more_space(acontext=0x00007fffddfb55e8, size=56, alloc_generation_number=0) + 35 at gc.cpp:13362
      frame Make simple test work #12: 0x00000000004e0295 ConsoleApp2WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 277 at gc.cpp:13393 frame #13: 0x00000000004e0180 ConsoleApp2WKS::GCHeap::Alloc(this=0x0000000000dfb1e0, context=0x00007fffddfb55e8, size=56, flags=0) + 96 at gc.cpp:34534
      frame Implement simple casting #14: 0x0000000000413f63 ConsoleApp2::RhpGcAlloc(pEEType=0x0000000000d48088, uFlags=0, cbSize=56, pTransitionFrame=0x00007fffddfb4940) + 211 at gcrhenv.cpp:273 frame #15: 0x00000000005e31e0 ConsoleApp2RhpNewArrayRare + 58 at unixasmmacrosamd64.inc:650
      frame Fix build warnings around NuGet dependencies #16: 0x00000000006074a5 ConsoleApp2String__FastAllocateString + 77 at String.cs:632 frame #17: 0x000000000063f870 ConsoleApp2String__CreateStringFromEncoding + 204 at String.cs:606
      frame Delete project.lock.json and add it to .gitignore instead #18: 0x000000000061f083 ConsoleApp2S_P_CoreLib_System_Text_Encoding__GetString + 215 at Encoding.cs:1084 frame #19: 0x0000000000619d3f ConsoleApp2String__CreateStringForSByteConstructor + 147 at String.cs:294
      frame Move thread safe flag manipulation into a helper #20: 0x000000000060deaf ConsoleApp2String__Ctor_3 + 183 at String.cs:256 frame #21: 0x0000000000640651 ConsoleApp2ConsoleApp2_RecordUpdate__get_Field + 69 at Program.cs:47
      frame Add CppCodeGen prototype sources #22: 0x00000000006313a9 ConsoleApp2ConsoleApp2_ConsoleApp2_CallbackProcessors__OnFeedUpdated + 157 at Program.cs:101 frame #23: 0x000000000064040a ConsoleApp2 at ThrowHelper.cs:74 frame #24: 0x00007fffddfbfdbf libarc.soProcessCallbacks + 63
      frame Make build.cmd more like the build.cmd in CoreCLR repo #25: 0x00007fffddfc1bf7 libarc.sovoid std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) + 37 frame #26: 0x00007fffddfc1b51 libarc.sostd::_Bind_simple<void (*())()>::operator()() + 27
      frame Enhance virtual function resolution logic to be roughly correct. #27: 0x00007fffddfc1aea libarc.sostd::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() + 28 frame #28: 0x00007ffff7b87a60 libstdc++.so.6___lldb_unnamed_symbol191$$libstdc++.so.6 + 32
      frame Fix build break #29: 0x00007ffff6d93184 libpthread.so.0start_thread(arg=0x00007fffddfb5700) + 196 at pthread_create.c:312 frame #30: 0x00007ffff6ac037d libc.so.6__clone + 109 at clone.S:111

If I can help any further just let me know.
GCIssue.zip

@dlewis-arcontech
Copy link

Seen (maybe 1 out of 5 runs) a slight different callstack, not sure if this helps in term of debugging:

(lldb) bt

  • thread Add .gitattributes and .gitignore files #3: tid = 13711, 0x00007ffff69fcc37 libc.so.6`__GI_raise(sig=6) + 55 at raise.c:56, name = 'ConsoleApp2', stop reason = signal SIGABRT
    • frame #0: 0x00007ffff69fcc37 libc.so.6__GI_raise(sig=6) + 55 at raise.c:56 frame #1: 0x00007ffff6a00028 libc.so.6__GI_abort + 328 at abort.c:89
      frame Add basic build scripts #2: 0x00007ffff69f5bf6 libc.so.6__assert_fail_base(fmt=<unavailable>, assertion=<unavailable>, file=<unavailable>, line=<unavailable>, function=<unavailable>) + 294 at assert.c:92 frame #3: 0x00007ffff69f5ca2 libc.so.6__GI___assert_fail(assertion=, file=, line=, function=) + 66 at assert.c:101
      frame Improve .gitignore #4: 0x000000000047b79c ConsoleApp2WKS::gc_heap::plan_phase(condemned_gen_number=0) + 9980 at gc.cpp:21793 frame #5: 0x000000000046fbf9 ConsoleApp2WKS::gc_heap::gc1() + 537 at gc.cpp:15265
      frame CoreLib doesn't implement Environment.GetEnvironmentVariable and friends #6: 0x000000000048cbe3 ConsoleApp2WKS::gc_heap::garbage_collect(n=0) + 3395 at gc.cpp:16853 frame #7: 0x0000000000461861 ConsoleApp2WKS::GCHeap::GarbageCollectGeneration(this=0x0000000000dfb1e0, gen=0, reason=reason_alloc_soh) + 2049 at gc.cpp:35188
      frame Add type system source files #8: 0x0000000000466a0f ConsoleApp2WKS::gc_heap::try_allocate_more_space(acontext=0x00007fffddfb55e8, size=56, gen_number=0) + 1503 at gc.cpp:13066 frame #9: 0x0000000000466b83 ConsoleApp2WKS::gc_heap::allocate_more_space(acontext=0x00007fffddfb55e8, size=56, alloc_generation_number=0) + 35 at gc.cpp:13362
      frame Add ILToNative source files #10: 0x00000000004e0295 ConsoleApp2WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 277 at gc.cpp:13393 frame #11: 0x00000000004e0180 ConsoleApp2WKS::GCHeap::Alloc(this=0x0000000000dfb1e0, context=0x00007fffddfb55e8, size=56, flags=0) + 96 at gc.cpp:34534
      frame Make simple test work #12: 0x0000000000413f63 ConsoleApp2::RhpGcAlloc(pEEType=0x0000000000d48088, uFlags=0, cbSize=56, pTransitionFrame=0x00007fffddfb4940) + 211 at gcrhenv.cpp:273 frame #13: 0x00000000005e31e0 ConsoleApp2RhpNewArrayRare + 58 at unixasmmacrosamd64.inc:650
      frame Implement simple casting #14: 0x00000000006074a5 ConsoleApp2String__FastAllocateString + 77 at String.cs:632 frame #15: 0x000000000063f870 ConsoleApp2String__CreateStringFromEncoding + 204 at String.cs:606
      frame Fix build warnings around NuGet dependencies #16: 0x000000000061f083 ConsoleApp2S_P_CoreLib_System_Text_Encoding__GetString + 215 at Encoding.cs:1084 frame #17: 0x0000000000619d3f ConsoleApp2String__CreateStringForSByteConstructor + 147 at String.cs:294
      frame Delete project.lock.json and add it to .gitignore instead #18: 0x000000000060deaf ConsoleApp2String__Ctor_3 + 183 at String.cs:256 frame #19: 0x00000000006406b1 ConsoleApp2ConsoleApp2_RecordUpdate__get_Value + 69 at Program.cs:58
      frame Move thread safe flag manipulation into a helper #20: 0x00000000006313b6 ConsoleApp2ConsoleApp2_ConsoleApp2_CallbackProcessors__OnFeedUpdated + 170 at Program.cs:101 frame #21: 0x000000000064040a ConsoleApp2 at ThrowHelper.cs:74 frame #22: 0x00007fffddfbfdbf libarc.soProcessCallbacks + 63
      frame Make ThreadSafeFlags public #23: 0x00007fffddfc1bf7 libarc.sovoid std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) + 37 frame #24: 0x00007fffddfc1b51 libarc.sostd::_Bind_simple<void (*())()>::operator()() + 27
      frame Make build.cmd more like the build.cmd in CoreCLR repo #25: 0x00007fffddfc1aea libarc.sostd::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() + 28 frame #26: 0x00007ffff7b87a60 libstdc++.so.6___lldb_unnamed_symbol191$$libstdc++.so.6 + 32
      frame Enhance virtual function resolution logic to be roughly correct. #27: 0x00007ffff6d93184 libpthread.so.0start_thread(arg=0x00007fffddfb5700) + 196 at pthread_create.c:312 frame #28: 0x00007ffff6ac037d libc.so.6__clone + 109 at clone.S:111

@jkotas
Copy link
Member Author

jkotas commented Nov 7, 2017

Thank you! We will take a look.

@dlewis-arcontech
Copy link

No problems, I have been trying out the test code and initially I was getting it to fail within the first 5-10 seconds every time (a good 20+ runs), I currently have a run going that has survived around a minute of running and no crash as of yet. Depending on your hardware (I'm running linux on a VM on windows on a pretty recent i7) it could be a little bit more difficult to recreate but hopefully the test will hit it the issue for you. I'll try change some of the parameters in the testing code see if I can increase the chances of it happening.

@dlewis-arcontech
Copy link

dlewis-arcontech commented Nov 7, 2017

After that one successful run that I killed in the end and after a bit of playing around, it seems what I sent is a pretty good attempt at recreating the issue, the next 5 runs all crashed in a few seconds.

@jkotas
Copy link
Member Author

jkotas commented Nov 9, 2017

Here is the minimal repro:

using System;
using System.Collections.Generic;

internal class Program
{
    static public void Main()
    {
        for (;;)
        {
            List<KeyValuePair<string, string>> fields = new List<KeyValuePair<string, string>>();

            for (int i = 0; i < 50000; i++)
            {
                fields.Add(new KeyValuePair<string,string>(new string('x', 10), new string('y', 10)));
            }
         }
     }
 }

@dlewis-arcontech
Copy link

dlewis-arcontech commented Nov 9, 2017

Hi @jkotas , I've tried the code here and it's crashing every time, so nothing to do with the PInvoke! I was wondering if it's too early in the project to ask about optimisation? I found the issue while running some of our system stress tests on the natively compiled output. If I reduce the iterations in stress test to a very small number so the gc issue isn't always hit, I'm seeing a big different in the time the test takes to complete for native against JIT, here are the results from 3 runs of the same test on various frameworks/operating systems:

Framework Run 1 (ms) Run 2 (ms) Run 3 (ms) Average (ms)
.Net Framework 2.0 978 976 969 974
.Net Framework 4.0 910 895 870 892
.Net Core 2.0 Window 7 882 883 869 878
.Net Core 2.0 Ubuntu 14.04 878 885 858 874
.Net CoreRT Ubuntu 14.04 2483 2464 2461 2469

The .Net code is calling a tcpip unmanaged protocol handling .dll/.so, then in .Net there are basically several dispatching threads and lots of collection/string based operations. I'm not sure if at this stage such a discrepancy is expected with CoreRT or not?

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Nov 9, 2017

@dlewis-arcontech Could you open another issue for the perf findings? Some slowdown is expected because of known perf issues (such as #2640, #2393, or #2394), but providing a profile could help us rule out if this is known or unknown. Just to make sure - this is from a Release build of CoreRT, right? (The repo was built as Release, and your test project was published as Release and compiled with optimizations enabled.)

@dlewis-arcontech
Copy link

@MichalStrehovsky good spot this was indeed a debug build! I did note a difference initially when I started to run the tests in Release build but then I quickly moved to Debug build to help investigate the issue we are discussing here and didn't change back when I took the above results. I've re-run the last 2 columns above in Release build and managed to increase the iterations count in the tests before hitting any issues:

Framework Run 1 (ms) Run 2 (ms) Run 3 (ms) Average (ms)
.Net Core 2.0 Ubuntu 14.04 1470 1458 1457 1462
.Net CoreRT Ubuntu 14.04 1773 1792 1779 1781

CoreRT a little higher but not dramatically, could well be one of the issues you have already noted. How should I proceed, should I still create an issue?

@dlewis-arcontech
Copy link

I forgot to mention, just to check in case there are any optimizations I'm missing, to build CoreRT I am running:

.\build.sh Release

Then reference IlcPath to the release build, then my console app is using a dependent .Net dll, so as well as having in the main csproj:

<ItemGroup> <IlcReference = "xxx\dll.csproj" />

I also have:

<ItemGroup> <IlcReference Include="[release build path].dll />

Then I run:

dotnet build /t:LinkNative --configuration Release

Or for the JIT run above I use:

dotnet run Release

Regards.

@jkotas
Copy link
Member Author

jkotas commented Nov 9, 2017

Given stacktraces for some of the crashes reported by @dlewis-arcontech (e.g. #4676 (comment)), the perf bottlenect is likely in the un-optimized comparers (#763). We should have all parts required to implement the optimization now. @MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.

@sergiy-k
Copy link
Contributor

sergiy-k commented Nov 9, 2017

@dlewis-arcontech, thank you so much for all the data that you provided and your help with diagnosing the issues! :)

@dlewis-arcontech
Copy link

@sergiy-k my pleasure. As things progress with CoreRT and WebAssembly I'll be committing more of our resource to porting our .Net code, Unit/System tests to .Net Standard/.Net Core and getting it natively compiled and running under Windows and Linux. If I encounter any problems I'll raise an issue and I'm always happy to help to investigate.

@jkotas
Copy link
Member Author

jkotas commented Nov 10, 2017

@dlewis-arcontech #4907 should fix the GC crash.

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.

OK, I'll have a look

@dlewis-arcontech
Copy link

I have updated and re-run the problem test and can confirm it now passes, thanks! I'll move on to attempting to run the entire test suite.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants