Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source generator POC #65

Closed
wants to merge 10 commits into from
Closed

Source generator POC #65

wants to merge 10 commits into from

Conversation

gasparnagy
Copy link
Contributor

@gasparnagy gasparnagy commented Mar 3, 2024

Two projects have been added:

  1. Reqnroll.FeatureSourceGenerator.NUnit - produces a NuGet package containing the source generator.
  2. Reqnroll.FeatureSourceGenerator.NUnit.Example - consumes the NuGet package to host the source generator in a sample project.

The following functions are demonstrated:

  • Invoking the Gherkin parser to process feature files into generated source.
  • Gherkin syntax errors added to compiler output as errors.
  • NUnit test-fixture generation based on scenario names.

Not demonstrated:

  • Unit testing the generator using Microsoft's Roslyn testing libraries.
  • Hooking into the Reqnroll infrastructure to execute scenario steps.

Created by @Code-Grump related to the discussion: https://github.com/orgs/reqnroll/discussions/11

@Code-Grump
Copy link
Contributor

You can see the source contributed to the build by the analyser in Visual Studio:
image

@Code-Grump
Copy link
Contributor

I've dropped in a simple example of a behavioural test for the source generator. It invokes the Roslyn source-generation system and updates a compilation with the generated sources; this is what happens within the build process without the overhead of actually performing the build.

Right now the test just checks the output was produced and the code generated matches a simple string. We could opt to compare normalized syntax trees instead, or do a full compilation and examine or execute the produced assembly.

@Code-Grump
Copy link
Contributor

image
Totally forgot to show how Gherkin syntax errors appear in VS.

@Code-Grump
Copy link
Contributor

A few thoughts from doing this work:

  1. Due to how they're loaded into Roslyn, we ideally want a single package that contains all the analysers and code-generators with as much self-contained dependencies as possible. Quite reasonable for us to ship these components in the primary Reqnroll package. The extreme version of this approach means we'd be baking-in all the supported test-runners (xunit, NUnit, MSTest) without a good extension-point for additional runners. Maybe that's ok, but it'd be a change from the current open model.
  2. Syntax-checking and other analysers might get in the way of existing tooling doing the same work. There are a few mechanism for doing this and we might want to provide documentation to help guide users.
  3. Loading plugins still feels like a tricky problem. We have a lot of power available to us - we could compile and run code defined within the project itself as part of the build process, if we so choose (could be useful for configuration-as-code approaches) - the question will be around how users are expected to package their plug-ins and especially how their dependencies are managed. I haven't seen a solution to this problem that doesn't also feel prone to lots of difficult-to-diagnose failures.

@gasparnagy
Copy link
Contributor Author

Due to how they're loaded into Roslyn, we ideally want a single package that contains all the analysers and code-generators with as much self-contained dependencies as possible.

@Code-Grump Isn't there a .NET framework mismatch then? If I compile my test assembly to .NET 6 on .NET 8 SDK, will the generator be running in .NET 6 or .NET 8? So with other words: should the generator's .NET version match the target framework's .NET version or the SDK's version?

@gasparnagy
Copy link
Contributor Author

Syntax-checking and other analysers might get in the way of existing tooling doing the same work.

@Code-Grump Do you have an example in your mind about what other analyzers might cause a problem here?

@Code-Grump
Copy link
Contributor

Isn't there a .NET framework mismatch then? If I compile my test assembly to .NET 6 on .NET 8 SDK, will the generator be running in .NET 6 or .NET 8? So with other words: should the generator's .NET version match the target framework's .NET version or the SDK's version?

The analyzers run in the SDK's runtime. If you're building using a .NET 8 SDK, I want to believe your analyzer runs on the .NET 8 runtime. If you target .NET 6 but build on the .NET 8 SDK, you will still be on the .NET 8 runtime.

This is part of the motivation behind why analyzers must target .NET Standard 2.0.

Syntax-checking and other analysers might get in the way of existing tooling doing the same work.

No concrete examples, but it seems plausible that an IDE could have some Gherkin language support which emits its own syntax errors and would "double-up" on errors in the output. It's worth keeping in mind that analyzers run in just about any .NET context, either as a build-time tool, or during the live editing process, so there are chances of "bumping into" other tooling we may not be immediately aware of.

@Code-Grump
Copy link
Contributor

Next steps:

  • Going to look at replicating the output of the current generator for NUnit+CSharp projects.
  • Will add add some tests to validate the behaviour.
  • Will begin looking at ways to organise the codebase to support multiple runners and languages.
  • Will be keeping in mind how we might incorporate customisations like additional attributes and external data sources

@gasparnagy
Copy link
Contributor Author

I don't know whether this is the right time to drop this in, so feel free to ignore, but I would like to change the generator in a way that everything should be in a single method. The current implementation depends (?) on the tear-down events of the test runners and the "ITestRunner" class level instance field, but this causes problems with parallel execution, so the best would be to put everything into a single test method and use a local variable for the testRunner and a try-finally for the tear-down logic.

If this causes too much complication right now, feel free to ignore it for the POC.

@Code-Grump
Copy link
Contributor

@gasparnagy The only complication would be determining the desired output. If you can give me an desired output for an example input, I can drop it onto a test and make it happen.

@gasparnagy
Copy link
Contributor Author

@Code-Grump I will make a sample tomorrow.

@gasparnagy
Copy link
Contributor Author

gasparnagy commented Mar 11, 2024

I made a hard coded sample. I reviewed and added comments so that it should be easier to understand. It is for MsTest, but I tried to mark the parts that will be different for other test runners. As you will see, I used all type reference with global::. This is required to avoid conflicts with types users made with same name. I have also left in the #line pragmas that are needed for scenario debugging (can be ignored for the POC).

Tested it and it works (*).

The feature file was:

#language: en
@featureTag1
Feature: Calculator

@mytag
Scenario: Add two numbers
	Given the first number is 50
	And the second number is 70
	When the two numbers are added
	Then the result should be 120

The class that we should generate is:

namespace MyProject.Features;

[global::Microsoft.VisualStudio.TestTools.UnitTesting.TestClass]
public class HardCodedCalculatorFeature
{
    // start: MSTest Specific part

    private global::Microsoft.VisualStudio.TestTools.UnitTesting.TestContext? _testContext;

    public virtual global::Microsoft.VisualStudio.TestTools.UnitTesting.TestContext? TestContext
    {
        get
        {
            return this._testContext;
        }
        set
        {
            this._testContext = value;
        }
    }

    // end: MSTest Specific part

    // start: shared service method & consts, NO STATE!

#line 1 "Calculator.feature"
#line hidden

    private static string[] featureTags = new string[] { "featureTag1" };

    private static global::Reqnroll.FeatureInfo featureInfo = new global::Reqnroll.FeatureInfo(new System.Globalization.CultureInfo("en"), "Features", "Calculator", null, global::Reqnroll.ProgrammingLanguage.CSharp, featureTags);

    public async global::System.Threading.Tasks.Task ScenarioInitialize(global::Reqnroll.ITestRunner testRunner, global::Reqnroll.ScenarioInfo scenarioInfo)
    {
        // handle feature initialization
        if (testRunner.FeatureContext == null || !object.ReferenceEquals(testRunner.FeatureContext.FeatureInfo, featureInfo))
            await testRunner.OnFeatureStartAsync(featureInfo);

        // handle scenario initialization
        testRunner.OnScenarioInitialize(scenarioInfo);

        // MsTest specific customization:
        testRunner.ScenarioContext.ScenarioContainer.RegisterInstanceAs(_testContext);
    }

    // end: shared service method & consts, NO STATE!


    // start: MSTest Specific test method decoration
    [global::Microsoft.VisualStudio.TestTools.UnitTesting.TestMethod()]
    [global::Microsoft.VisualStudio.TestTools.UnitTesting.Description("Add two numbers")]
    [global::Microsoft.VisualStudio.TestTools.UnitTesting.TestProperty("FeatureTitle", "Calculator")]
    [global::Microsoft.VisualStudio.TestTools.UnitTesting.TestCategory("featureTag1")]
    [global::Microsoft.VisualStudio.TestTools.UnitTesting.TestCategory("mytag")]
    // end: MSTest Specific test method decoration
    public async Task AddTwoNumbers()
    {
        // getting test runner
        string testWorkerId = global::System.Threading.Thread.CurrentThread.ManagedThreadId.ToString(); // this might be different with other test runners
        var testRunner = global::Reqnroll.TestRunnerManager.GetTestRunnerForAssembly(null, testWorkerId);

        // start: calculate ScenarioInfo
        string[] tagsOfScenario = new string[] { "mytag" };
        var argumentsOfScenario = new global::System.Collections.Specialized.OrderedDictionary(); // needed for scenario outlines
        var inheritedTags = featureTags; // will be more complex if there are rules
        var scenarioInfo = new global::Reqnroll.ScenarioInfo("Add two numbers", null, tagsOfScenario, argumentsOfScenario, inheritedTags);
        // end: calculate ScenarioInfo

        try
        {
#line 6
            await ScenarioInitialize(testRunner, scenarioInfo);
#line hidden

            if (global::Reqnroll.TagHelper.ContainsIgnoreTag(scenarioInfo.CombinedTags))
            {
                testRunner.SkipScenario();
            }
            else
            {
                await testRunner.OnScenarioStartAsync();

                // start: invocation of scenario steps
#line 7
                await testRunner.GivenAsync("the first number is 50", null, null, "Given ");
#line hidden
#line 8
                await testRunner.AndAsync("the second number is 70", null, null, "And ");
#line hidden
#line 9
                await testRunner.WhenAsync("the two numbers are added", null, null, "When ");
#line hidden
#line 10
                await testRunner.ThenAsync("the result should be 120", null, null, "Then ");
#line hidden
                // end: invocation of scenario steps
            }

            // finishing the scenario
            await testRunner.CollectScenarioErrorsAsync();
        }
        finally
        {
            await testRunner.OnScenarioEndAsync();
        }
    }
}

(*) With this approach the very last "AfterFeature" hook is not invoked, but we can work this out in Reqnroll itself. We can ignore it for now.

@Code-Grump
Copy link
Contributor

I've added an implementation of an MSTest generator using the example provided as a specification. I made a few alterations to target text, but kept it as close as possible, including comments.

In this implementation, I focused on creating a strategy for test-framework specific and language-specific classes. After several attempts, this is the strategy I felt worked best:

  • A base class for generating the syntax of a test-fixture in C#
  • Test-framework specific implementations inherit from the base class and override accordingly

Some things I learned doing this

Testing a code-generator's output as a string is a bit more painful than I had imagined.

  • There's not a great deal of support for highlighting the discrepancies between a generated source and the expected output. I frequently took the strings in the test output and dropped them into a diffing tool to get better help. The next version of Fluent Assertions promises to help and provide a diff-like message.
  • Whitespace "errors" are exceedingly frustrating to track down. Doubly so because generally the whitespace has no bearing on whether the code is proper.

We don't take advantage of MSTest specific details when creating our generated code.
MSTest creates a new instance of a class in every test run, so we could use constructors and IDisposable to implement before- and after-scenario hooks. There are other assembly and class-scoped attributes available for before- and after-test-run and -feature hooks that would be easy to build into.

When I was looking at implementing my own Roslyn-based Gherkin test generator I was interested in leveraging xunit's similar features. It felt like leaning into the framework was the right thing to do.

@Code-Grump
Copy link
Contributor

Forgot to mention: I put in a test-generator selector mechanism that looks for test framework assemblies to determine which framework is being used. We could reasonably put all the code generation and framework adaptation into the generator and package Reqnroll as a single NuGet package.

@gasparnagy
Copy link
Contributor Author

Thank you, @Code-Grump for the improvements. Sounds promising.

What should be the next steps? I think feature-wise the POC is complete. Do you have anything else you would like to experiment with before we start planning how to integrate this to the product?

We don't take advantage of MSTest specific details when creating our generated code.

That was intentional. (The current prod version of Reqnroll does use the fw specific features.) I would like to rely to the fw specific stuff as little as possible for the following reasons:

  • make the generator simpler (less difference between test fw implementations)
  • make the implementation more robust for test framework changes (e.g. if a later version of MsTest decides to reuse test class instances)
  • no need to hassle with storing the test runner to an instance field
  • more control on scenario preparation/execution/teardown (e.g. forcefully fail a test during preparation - support for an own @ignore-like tag; dry-run mode), but I can also imagine an option, where the entire method will be just a single call to Reqnroll passing in the scenario as Gherkin.

Of course if we find any benefit of using the test fw infrastructure for handling these, we can revert it, but as long this is only a try-finally, I think we can manage it.

@Code-Grump
Copy link
Contributor

What should be the next steps? I think feature-wise the POC is complete. Do you have anything else you would like to experiment with before we start planning how to integrate this to the product?

I'm not happy with the testing model at present. I've experimented with comparing output source to an expected string and interrogating the C# syntax tree more directly. In both cases it felt like a ton of fiddly work to prove the output met a very specific definition of "expected". However, I don't have any other good ideas for a "unit testing" approach that only considers the raw output of the generator.

That makes me feel like there's no good unit test for these generators without compiling and executing the generated output. This is something we could do, but it's starts getting closer to the new suite of spec tests that I can see being assembled.

So while I'm happy to try and put in some work to build a test of this kind, I'd like to know if it's suitable within the broader scope of the project and worth the effort.

Of course if we find any benefit of using the test fw infrastructure for handling these, we can revert it, but as long this is only a try-finally, I think we can manage it.

I don't have the depth of knowledge on this kind of integration that you do, however my instinct is that we could hide some of the "magic instance resolving" required for methods like global::Reqnroll.TestRunnerManager.GetTestRunnerForAssembly by letting the test-framework do all the "managing" and we would tie our object lifecycles into the test-framework lifecycles.

For example, the initialize and destroy methods:

[TestClass]
public class ReqnrolMSTestInitialize
{
    [AssemblyInitialize]
    public static void AssemblyInitialize(TestContext context)
    {
        // Bootstrap Reqnroll here and add to the TestContext to access within the test framework.
    }

    [AssemblyCleanup]
    public static void AssemblyCleanup()
    {
        // Any teardown we need to do.
    }
}

One of the long-standing issue with SpecFlow was its inability to fully parallelise execution. Running test methods in parallel (on separate test class instances) is the default for xunit and MSTest, and NUnit can be made to do so with a pair of attributes. My gut feel is that letting test-framework control the lifecycle of Reqnroll objects affords more guarantees about execution and state, and more confident integration with other test-related libraries than trying to run "alongside" the test-framework lifecycle and looping into its events.

The Roslyn generator has to be a clean break from the existing generator and affords the opportunity to re-think how we should integrate with test frameworks. I feel confident I can produce any output that we want to see, so the big questions now should be about what we want to see!

@ajeckmans
Copy link
Contributor

ajeckmans commented Apr 8, 2024

That makes me feel like there's no good unit test for these generators without compiling and executing the generated output. This is something we could do, but it's starts getting closer to the new suite of spec tests that I can see being assembled.

There is no need to test everything in one go. If you only verify that the code generated is syntactically valid and has some marker stuff such as the correct class names etc. (e.g. by passing it to Roslyn and checking the AST), then that is a valuable fast test on its own (and probably less brittle than a full source text comparison). A full-blown test where you're actually generating and executing the tests generated from a feature file has its own merits, but can be done as an addition on top of the other tests, IMO.

Also like you said whitespace has little meaning and can be very frustrating. Unless we're accidentally adding significant(ly more) whitespace it does not matter and we should avoid failing on the test.

Comment on lines +39 to +61
diagnostics.Should().BeEmpty();

generatedCompilation.SyntaxTrees.Should().HaveCount(1);
var generatedSyntaxTree = generatedCompilation.SyntaxTrees.Single().Should().BeAssignableTo<CSharpSyntaxTree>().Subject!;

generatedSyntaxTree.GetDiagnostics().Should().BeEmpty();

var generatedSyntaxRoot = (CompilationUnitSyntax)generatedSyntaxTree.GetRoot();
generatedSyntaxRoot.ChildNodes().Should().HaveCount(1);

var generatedNamespace = generatedSyntaxRoot.ChildNodes().Single().Should().BeAssignableTo<NamespaceDeclarationSyntax>().Subject;
generatedNamespace.Name.ToString().Should().Be("test");

generatedNamespace.Members.Should().HaveCount(1);
var generatedClass = generatedNamespace.Members.Single().Should().BeAssignableTo<ClassDeclarationSyntax>().Subject;

generatedClass.AttributeLists.Should().HaveCount(1);
var generatedClassAttributes = generatedClass.AttributeLists.Single().Attributes;

generatedClassAttributes.Should().HaveCount(1);
var generatedClassAttribute = generatedClassAttributes.Single();

generatedClassAttribute.Name.ToString().Should().Be("global::Microsoft.VisualStudio.TestTools.UnitTesting.TestClass");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this required to assert:

  • There were no diagnostic issues raised during generation
  • The generated file contains a class in a "test" namespace
  • The class is decorated with the MSTest attribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to check for other attributes, or the test methods were present with their attributes, it's probably triple the line-count.

I do prefer this to checking an exact string output, but it's significantly harder to read and I think presents an immediate maintenance headache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently discovered the Subject / Which properties of fluent assertions that can make this code slightly better:

generatedClass.AttributeList.Should().ContainSingle()
  .Which.Should().ContainSingle()
  .Which.Name.ToString().Should().Be("global::Microsoft.VisualStudio.TestTools.UnitTesting.TestClass");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should().ContainSingle() does help reduce the bloat. This could just be a case of needing some syntactic sugar to make the statements less hurtful.

var generatedClass = generatedSyntaxRoot.Should().ContainNamespaceDeclaration("test")
    .Which.Should().ContainClassDeclaration("Calculator");

generatedClass.Should().HaveAttribute("global::Microsoft.VisualStudio.TestTools.UnitTesting.TestClass");

@gasparnagy
Copy link
Contributor Author

I don't have the depth of knowledge on this kind of integration that you do, however my instinct is that we could hide some of the "magic instance resolving" required for methods like global::Reqnroll.TestRunnerManager.GetTestRunnerForAssembly by letting the test-framework do all the "managing" and we would tie our object lifecycles into the test-framework lifecycles.

This itself has no direct relation to the parallel issues and it cannot be replaced by the fw infrastructure unfortunately.

Let me briefly explain, because this might be relevant and of course you might be able to spot out some logical errors in this.

  • Processing all the Given/When/Then and other attributes is time consuming, therefore Reqnroll ensures that this is processed only once and then shared with all tests. This information is stored in a BindingRegistry class instance. The necessary locking mechanism is in TestRunnerManager.CreateTestRunner. This worked fine so far, I'm not aware of any problems with this.
  • Reqnroll also has a concept of "test thread" that is basically a logical chain of scenario executions that are not running in parallel to each-other. This is primarily for allowing features, like before/after feature & to allow storing expensive resources in TestThreadContext (e.g. a browser driver). This is purely for optimal usage, in principle we could create a new test-thread for every single scenario execution, but in that case the before/after feature would be called for every scenario and you would not be able to cache the expensive resource (you would need to start a new browser for every scenario).
  • The implementation of the "test thread" concept is difficult, because we need to know if the scenarios are executed in the same chain, but for example NUnit provides this info (TestContext.CurrentContext.WorkerId) and for MsTest and xUnit we also found a suitable way to know on which worker thread we are currently on. (In the POC you have now Thread.CurrentThread.ManagedThreadId that works with MsTest, but it is different for all other.) This is difficult and has some risks, but regarding my experience it is quite stable and haven't caused troubles.
  • In principle, Reqnroll would be able to run tests from multiple test assemblies within the same process. This feature is not used by the usual test runners, because they always create a new test execution process but ReSharper has (had?) an option to share the execution process. Obviously each test assembly has its own step definitions and other bindings, so it has to have its own binding registry as well.
  • The TestRunner class is basically an abstraction to access the Reqnroll resources of a test execution. There is one and only one TestRunner instance created for each test assembly and test thread. This is ensured by a locking mechanism implemented by ConcurrentDictionary objects that are stored in the _testRunnerManagerRegistry static and _testRunnerRegistry instance fields of the TestRunnerManager. It is a shared resource, but shared only within the same test-thread, where scenario executions are chained, therefore you can safely call it's methods, like ScenarioStartAsync or GivenAsync, they will maintain the right sequence. Again, I haven't seen problems with this so far.
  • The TestRunnerManager.GetTestRunnerForAssembly method has two parameters: an assembly, that defaults to the calling assembly and a string (testWorkerId) that identifies the logical test thread. It returns the test runner has been created for this combination if it has been created already otherwise creates & registers it.

The parallel issues that we would like to get rid of (and also enable test-level parallelization) are not caused by these, but mainly by two factors:

  1. Originally we (I 😄) thought that since the tests within the same test thread are not running in parallel, therefore anything we do there (e.g. resolving objects from containers) does not have to be thread safe. Unfortunately this turned out to be wrong, because tests might explicitly (worker threads) or implicitly (async) create new threads and those might also want to access the Reqnroll infrastructure. This has been improved a lot, but there might be non-thread safe operations still and unfortunately they only show up if someone uses the Reqnroll infrastructure from threads in a specific way. Too bad, but the test framework infrastructure cannot help in this.
  2. We currently store the test runner in an instance field, but it seems that in some circumstances the test frameworks manage the test class instances differently and this can cause issues. This might vary for the different test frameworks and also it's versions, so hard to get a grip of all combinations. By keeping the the test runner as a local variable we can get rid of this risk. But if we use the test class ctor or some decorated "test initialize" method, we need to pass the test runner to the actual test, so we would be forced to use class level fields for this (as we do currently in the prod version).

@gasparnagy
Copy link
Contributor Author

// Bootstrap Reqnroll here and add to the TestContext to access within the test framework.

Regarding my knowledge the MsTest TestContext is unique for every test execution (it has properties, like TestName) so it cannot be initialized in an [AssemblyInitialize]. (We do use the [AssemblyCleanup] already, it is added to the project by the test fw plugins (see here). This is also something we need to incorporate to our new generator solution, but since it is a single file, this will be easy - I don't think we need to deal with it in the POC.)

@Code-Grump
Copy link
Contributor

The example provided by MSTest includes a TestContext argument. Importantly it contains a "Properties" bag that can be leveraged to route information into all tests.

@Code-Grump
Copy link
Contributor

I am thinking about execution in terms of scopes, analogous to how a DI container scopes resources: if you were to define a singleton at the Test Run scope, that same instance is shared across all child scopes. Information in child scopes is isolated from siblings, but available to descendants. At a scenario scope, instances are unique to that scenario, not shared by anything else.
image

If this mental model is correct, then the trick is making the Reqnroll runtime match those scopes, when some/all of it is tied to the test-framework's lifecycles. In my head, with my knowledge of the three test frameworks supported, that seems straight-forward: they already provide hooks that match the lifetime of these scopes, or whole context-management systems.

The only sticky part I sense (because I haven't attempted this in every framework) is propagating our scopes where they need to be - essentially what the "test thread" concept is trying to solve. I think it could be achieved via the test frameworks' own mechanisms. xunit (for instance) provides a particularly good mechanism for supplying context to all test methods in a class, or to all tests in an assembly: https://xunit.net/docs/shared-context

I am proposing only a series of thin shims that map these scopes to the test frameworks; execution within the Reqnroll runtime would just continue as normal (assuming my mental model holds up.)

@gasparnagy
Copy link
Contributor Author

The example provided by MSTest includes a TestContext argument. Importantly it contains a "Properties" bag that can be leveraged to route information into all tests.

Basically that could be used to replace the static field we have in TestRunnerManager, but that won't improve the situation much, I think.

I am thinking about execution in terms of scopes, analogous to how a DI container scopes resources: if you were to define a singleton at the Test Run scope, that same instance is shared across all child scopes. [...] If this mental model is correct [...]

This is how Reqnroll works. We have a container hierarchy, quite much how you described. But there is the "test thread" container in between the test run (called "global") and the feature one. See https://github.com/reqnroll/Reqnroll/blob/main/docs/extend/available-containers.md

In my head, with my knowledge of the three test frameworks supported, that seems straight-forward: they already provide hooks that match the lifetime of these scopes, or whole context-management systems.

Because of the test thread container, I don't see such an easy match. The shared context stuff you mention for xUnit is a pre-defined grouping of tests that are dependent on a resource. But the test thread is an ad-hoc grouping of test executions that happen to run in the same logical thread. Or maybe I got your thoughts wrong.

@Code-Grump
Copy link
Contributor

You are correct, I think the "Test Thread" container is unique to Reqnroll's architecture, not easily married to any of the test frameworks.

Since the advent of Task in .Net we've been encouraged to not think about threads and thread-affinity. I appreciate this has become more of an "executor context", than being specific to a thread. I suppose I'm posing the question: does it make sense to have this container going forward? The easiest path to concurrency is to have a model that is largely ignorant of its execution specifics - either you're in "local" state and can assume no concurrency, or your in shared state and you must assume concurrency.

These thoughts might be too radical; I have been experimenting about how to make Gherkin-to-C# kind of thing work in xunit for a while, and would consider affinity to an execution context to be something to generally avoid. I'm prepared to believe it's useful in Reqnroll, but it goes against my instincts and experience.

@gasparnagy
Copy link
Contributor Author

gasparnagy commented Apr 8, 2024

I appreciate this has become more of an "executor context", than being specific to a thread.

I absolutely agree. Maybe the name of the "test thread" concept is not ideal. That can also be seen more as an "execution context" and it is not specific to a particular thread.

[...] would consider affinity to an execution context to be something to generally avoid. I'm prepared to believe it's useful in Reqnroll, but it goes against my instincts and experience.

I agree. Test automation, however (especially integration testing that BDD is naturally connected with) is in many cases game of trying to find the right compromises. I have not seen many projects where all the dependencies were easily testable. In contrast in many cases we had some dependencies that needed special smart solutions for testing. The test thread concept is just one option to be used in this tricky situations but that should be avoided generally (similarly to the global or the feature context, or static fields). So this is not necessarily contradicting. Whether this is really the responsibility of Reqnroll to give help in this is a valid question. Maybe not, but we have it already and it has been useful in many situations. (E.g. it makes much more sense than the feature container and the before/after feature concept, that is totally senseless in a parallel & random order execution.) I liked your comment on the step-retry plugin about Polly. That is a nice example of accepting the need but not taking the responsibility with Reqnroll. On the other hand I have also seen teams that does not have such a broad overview of the possibilities and they are overwhelmed with the number of different libraries and tools that they need to combine to solve their tasks. It's a tough decision in either way.

I think in the current situation, removing it (and handling all related support cases) would be more costly then keeping it, even in long term. So let's keep it for now. Once we have the new gen model and better testing infrastructure we can reopen the next big pain point of parallel execution problems. If we will find that the test thread is part of the root causes, we can think it over once again.

In any case, the new gen model will be simpler, cleaner, better tested and better testable then the current solution is (👏), so any changes will be easier to make. Let's go step by step.

@Code-Grump
Copy link
Contributor

Quite happy to proceed this way. As I said, my perspective has been influenced by trying to build something considerably more minimalist and leveraging test framework features. I appreciate you taking the time to consider my opinions, despite my lack of experience with Reqnroll's inner workings.

To take this from a prototype to something shippable, I think there are a couple of important decisions to make, because they're expensive to pivot on later:

  1. Should we have one generator for all test frameworks and no longer have test-framework packages? Essentially: add Reqnroll to your test project and it just picks up the test framework and emits the desired code.
  2. What test strategy should we follow for code-generation? Are we comfortable with performing a limited set of key checks on emitted code and relying on other tests to exercise the generated code fully? Having a set of guidelines here would be beneficial for all involved.
  3. Are we happy with "string concatenation" (appending strings to a buffer) as the primary means of composing code? Are there any alternatives we would like to consider before investing in a strategy? We could, for example, leverage templates, either a templating language or just some string interpolation. We might get easier-to-maintain code, we might hurt our execution speed.
  4. How do we want to handle config? The way Roslyn works, properties from project files and .editorconfig files are made available to the compilation (when present.) We can read any values we like from here "for free". If we want to include our own config file in the mix, (especially if we need it to be extensible) that will require some extra effort and strategy to avoid performance hits whilst also enabling extension.

And although this is something we should be comfortable to pivot on, it would be good to have a clear target up-front: what does the scope of generator need to include for its initial version? A concrete list of what it has to achieve to be "good enough" would be ideal.

@Code-Grump
Copy link
Contributor

My thoughts on these questions.

  1. Should we have one generator for all test frameworks and no longer have test-framework packages?

It's a big change, but it seems feasible. It creates a new problem: determining which test-framework is present. I think it would make sense to have a configuration option which allows manual selection of a test-framework, overriding any logic.

It makes shipping Reqnroll and installing it into a project significantly simpler than at present. The generator can be shipped in the same NuGet package as the runtime - it has no actual dependencies on test-framework libraries and only requires them to be present in the final compilation. We end up with one package that does all the things required for the vanilla Reqnroll experience.

  1. What test strategy should we follow for code-generation?

My gut feel is to want to test the behaviour of the generator: to produce code that invokes the Reqnroll runtime in the expected way which is decorated correctly to run within the specific test-framework. I would create tests to prove:

  • classes are generated with the expected names and attributes
  • methods are created with the expected names and attributes
  • invoking the methods gets the Reqnroll runtime and feeds it the steps to execute

Some of this will overlap with the integration tests, however these feel like useful unit tests, because discovering a failure in an integration test will be much harder to diagnose.

  1. Are we happy with "string concatenation" (appending strings to a buffer) as the primary means of composing code?

Using a series of Append calls to a string builder is probably the most performant way to build code, both in terms of memory-use and execution time, but it's definitely not the easiest to read and maintain. The more dynamic the output needs to be, the harder those are to read.

I've tried to break down the composition process into small chunks and I think it's made it pretty good, so I'd be happy to live with this model, however there are templating languages that could be "fast enough" and provide a much better editing experience. If anybody had experience with getting high performance out of a templating engine, I think it should be a consideration.

  1. How do we want to handle config?

If we want to use the Reqnroll config files for code-generation, we'll need to make them an input to the compilation and figure out a way to efficiently represent the relevant content within the generator. I'm a little worried about a need for extensibility in this space - ideally we'd read from a config file once and provide the structured data to everything that needs to know about it. This is pretty easy when you can represent the output as an object-graph, but if we need to be able to provide values without knowing their structure, we might have to expose some kind of IConfiguration variant that gives a normalized view of all the configuration systems as a single model correctly works with the caching of the generator system.

Alternatively, we could move all generation concerns to the built-in configuration mechanisms: .editorconfig and properties from project files or passed in from the command-line. This requires a lot less code and infrastructure to support, but it does represent a shift from current thinking.

What does the scope of generator need to include for its initial version?

I think this should initially be:

  • Creation of source for MSTest, xunit and NUnit runtimes (we can define a target syntax for each)
  • An ability to add additional attributes to all generated methods from configuration
  • An ability to add additional attributes to generated methods with specific tags

@gasparnagy
Copy link
Contributor Author

My comments:

  1. Should we have one generator for all test frameworks and no longer have test-framework packages?

I would suggest a mixed approach: move all code to the main package as you suggested, but keep the test fw specific packages that would just list the dependencies (Reqnroll and the test fw) and would configure the new infra for the selected the test fw (e.g. by setting an MsBuild prop that we could read out in the generator). (In this way they would be optional, as anyone could set the prop manually as well.)

The benefit of this would be that it would make the configuration easier for the simple cases and all the documentation / videos etc that we have on Reqnroll (and SpecFlow) setup would still remain valid.

  1. What test strategy should we follow for code-generation?

For verifying that the generated stuff really calls Reqnroll correctly, we already have the SystemTests project (soon finished, but already tests a lot). Those tests will remain the same (as they are end-to-end anyway), so they will server as regression tests as well. So we don't need to worry about that part.

What we need to include here is unit tests, so basically from our list the first two:

  • classes are generated with the expected names and attributes
  • methods are created with the expected names and attributes

For these unit tests I would probably make assertions on the dom as you shown earlier, not a string assertion.

You are right that the integration tests (in our case the SystemTests) are harder to diagnose, but I see no other option to test the actual behavior, because you need to get the test fw involved, that you can only do with an e2e test.

  1. Are we happy with "string concatenation" (appending strings to a buffer) as the primary means of composing code?

As this generation will not be open directly to plugins anymore, we have more freedom here. So I would just follow some evolutionary design. As long as "string concatenation" is good, we use it, if we run into troubles with it, we will refactor. It will be our internal stuff anyway. So as you said "live with this model" is fine for me.

I don't have much experience with such low-level perf improvements. I don't think a templating engine would improve much and at the end, the code gen was already part of the build process (in a slightly different way) and it was not catastrophically slow, our new solution must be already significantly faster by not using code dom. So I would not deal with perf opt yet.

  1. How do we want to handle config?

I don't think we have any extensibility options in reqnroll.json config, so reading it up and store the deserialized version should be fine.

Maybe for selecting the test fw we should use MsBuild props and not the reqnroll.config (see above)

  1. What does the scope of generator need to include for its initial version?

I agree to your suggestion, depending on the rollout strategy (see below), we can make it even simpler by not even supporting additional attributes.

  1. Rollout strategy

I think the best & safest would be if we would be able to make a two-phase rollout: we somehow keep the old generation strategy (for a limited time) but also allow to use the new one. (Maybe we could have the new as the default and keep the old as a backup.)

This could be achieved in a way that the Reqnroll.Tools.MsBuild.Generation package would contain the old stuff, but it would not be included as a dependency of the Reqnroll package by default anymore.

If someone would like to go back to the old model, they can manually add the Reqnroll.Tools.MsBuild.Generation package as dependency. The new one would check some main switch MsBuild property (that would be set by Reqnroll.Tools.MsBuild.Generation) and not do anything if the main switch is on.

@ajeckmans
Copy link
Contributor

ajeckmans commented Apr 10, 2024

I agree to your suggestion, depending on the rollout strategy (see below), we can make it even simpler by not even supporting additional attributes.

The Verify plugin as of now requires the attribute because it depends on an old version of Verify and xunit. If we were to update the dependencies to the latest versions that requirement drops as well.

@gasparnagy
Copy link
Contributor Author

If we were to update the dependencies to the latest versions that requirement drops as well.

Good news. We can upgrade then to the latest Verify once we get closer to roll out of the new gen.

@Code-Grump
Copy link
Contributor

Thanks for the input, everybody.

You are right that the integration tests (in our case the SystemTests) are harder to diagnose, but I see no other option to test the actual behavior, because you need to get the test fw involved, that you can only do with an e2e test.

I can see a path to making a small test that mocks the Reqnroll runtime so we can verify we make the proper sequence of calls with the right values. We'd just need the generated code to reference a "root" that we can mock during a test.

  1. Generate code
  2. Compile code to assembly and load into running test
  3. Replace Reqnroll runtime root with a mock
  4. Invoke generated test method
  5. Verify mock runtime was invoked correctly

This would require a small change to the runtime so there was some internal root we could mock, and then small changes to the code to reference the new root. I could add this kind of test to the POC if we want to get an idea of the effort required or if we like the test style, but I'm quite prepared to put this on the backburner if we'd rather lean on the system test for now and focus on getting a 1.0 version done. 😁

The Verify plugin as of now requires the attribute because it depends on an old version of Verify and xunit. If we were to update the dependencies to the latest versions that requirement drops as well.

If we're happy to, dropping the requirements around custom attributes would considerably shorten the path to a release version; they can always be added later.

@gasparnagy
Copy link
Contributor Author

I can see a path to making a small test that mocks the Reqnroll runtime so we can verify we make the proper sequence of calls with the right values. We'd just need the generated code to reference a "root" that we can mock during a test.

Yes, that could be done indeed. Good point. But that would not involve the test fw so it would only be able to test half of it. And my feeling is that it would require quite a lot of efforts. So I would let that go for now.

We will need something similar in my envisioned renovation plan for the "Specs" project, maybe together with that we can re-discuss the idea.

@Code-Grump
Copy link
Contributor

I've started on a robust implementation based on our POC. I have opened this issue in the Gherkin project that would be useful to have cucumber/gherkin#246 @gasparnagy

@gasparnagy
Copy link
Contributor Author

I have opened this issue in the Gherkin project that would be useful to have.

Makes sense. I commented it there.

@Code-Grump Code-Grump closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants