Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test data as code feature concept #2724

Closed
dsankouski opened this issue Feb 4, 2022 · 10 comments · Fixed by #2739
Closed

Test data as code feature concept #2724

dsankouski opened this issue Feb 4, 2022 · 10 comments · Fixed by #2739

Comments

@dsankouski
Copy link
Contributor

dsankouski commented Feb 4, 2022

TestNG Version

Note: only the latest version is supported
7.5.0

The point of this concept is to write test data files as code on kotlin
(or any other language). This allows to put dynamic data
(i.e. generated on each launch, for example, today's date or random string)
into data files.

The problem:

I was involved in a project, where test data was stored in yaml files.
They parsed those yaml files, and used it in a tests.
The problem was, they needed dynamic values,
like today's date, or random sequence of characters.
This data cannot be places in yaml testdata files by default.
So they used custom markup language, to define, for example random alphabetic sequence
of length 6. This custom markup language was processed by custom yaml deserializer,
to replace markup with calculated values. In this case, why can't we write test data
in code?

How test data as kotlin code may look like?
Kotlin code may seem like declarative language, when using named constuctors
for example:

data class Person(
    val name: String,
    val age: Int,
    val profession: String
)

val person = Person(
    name = "Cris",
    age = 18,
    profession = "student"
) 

The problem with this approach is that all data-generation code, and consequently
data(because it's essentially hardcoded) remains in RAM, and may lead to OOM.
We may overcome this with custom classloader, for example:

data class Person(
    val name: String,
    val age: Int,
    val profession: String
)

val person = getPerson()

fun getPerson() {
    /**
     * This custom classloader is used to load code that provides data
     * When loaded code executed, and method returned, classloader, and all
     * classes, loaded by it, will be cleared by GC.
     * */
    val dataClassLoader = DataClassLoader()
    val providerClass = dataClassLoader.loadClass("PersonProvider")
    val providerMethod = providerClass.declaredMethod("provide")
    val person = providerMethod.invoke(
        providerClass.getDeclaredConstructor().newInstance()
    )

    return person
}


class PersonProvider {
    fun provide() = Person(
        name = "Cris",
        age = 18,
        profession = "student"
    )
}

How a test class may look like?

class TestClass {
    @Test(dataAsCode = [["PersonProvider#provide"]])
    fun test(person: Person) {
        
    }
}

What do you guys think? Would you accept such feature in testNG?

@cbeust
Copy link
Collaborator

cbeust commented Feb 4, 2022

Sounds like a .kts file would be a better fit than a .kt but either way, I agree it sounds useful.

@juherr
Copy link
Member

juherr commented Feb 5, 2022

I don't catch the difference between data as code and data provider using code.

It sounds to me like a SPI on top of the data provider logic.
Something like https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests

Am I right?

@dsankouski
Copy link
Contributor Author

@cbeust , .kts files would require a kotlin embedded compiler and script runtime dependencies in tests. It's consume a lot of RAM, and some time to load, which may slow down single test run, when debugging.

@juherr Data provider's code will never be unloaded from heap, so it's not really scalable to put a data generation code there, because it will sit in RAM for entire test run and may eventually lead to OOM, As for junit implementation, I don't know if they unload their method providers, after use. Will look at it later.

Maybe we may make dynamically load / unload dataproviders, when it's possible, i.e. when dataprovider is in different class?

@juherr
Copy link
Member

juherr commented Feb 5, 2022

Ok. I'm just a bit surprised because our known memory problem is on the test results.

If we are going to the way you say, is there any reason to not implement the current data provider feature over it?

If you plan to try something, please share your results here.

@dsankouski
Copy link
Contributor Author

Yes, if we do dataproviders unloadable, test data may be written as a data provider in a separate class, loaded on demand, and unloaded after data is generated.

I never met teems, coding any sophisticated data in dataproviders. I think there was some reasons for that:

  • developers are tought that hardcoding is bad, and that you need to store data in files.
  • java code seem like really imperative and defining data in it is not pretty.
  • memory consumption - dataproviders code will stay in memory all test run

But when memory consumption will not threat scalability, and with scripting language usage it makes sense for me.

@krmahadevan
Copy link
Member

krmahadevan commented Feb 7, 2022

@dsankouski - I am still not sure as to what problem are we trying to solve here. Can you please help me understand that ?

I ask this because, TestNG always remembers the parameters that were passed to a @Test method or to a constructor (in the case of @Factory methods). So irrespective of whether you attempt at getting rid of their references from data provider classes (The kotlin sample didn't make it clear to me as to how are you going to trigger a GC by removing references of the data provider parameters), GC is still not going to happen, because as @juherr mentioned TestNG still has references to the parameters via the ITestResult object.

So can you please break down the problem statement into a bit more elaborate details so that I can understand the problem statement.

May be a Java example could be more explanatory (if possible that is)

@dsankouski
Copy link
Contributor Author

dsankouski commented Feb 7, 2022

@krmahadevan

Suppose, you're decided to use data provider, like in example:

import org.testng.annotations.DataProvider;

class DataProviders {
    @DataProvider
    public static Object[][] dp() {
        return new Object[][] {
                {"Mike", 34, "student"},
                {"Mike", 23, "driver"},
                {"Paul", 20, "director"}
        };
    }
}

class Test {
    @Test(dataprovider = "dp", dataproviderClass = DataProviders.class)
    public void test(String name, int age, String prof) {
        // test code
    }
}

All data in DataProviders class will be placed in constant pool $4.4 upon compilation
When loading DataProviders.class $5.3, it's internal binary representation
will be created in the method area $2.5.4.
This binary representation contains per-class runtime constant pool $2.5.5,
which contains all the data we need for test.
So our code will basically "copy" data from runtime constant pool to Object[][] instance:
$4.4: Java Virtual Machine instructions do not rely on the run-time layout of classes, interfaces, class instances, or arrays. Instead, instructions refer to symbolic information in the constant_pool table

When dp method is invoked, Object[][] instance is created in
heap area, thus duplicating data, contained in method area.

With current DP implementation we end up with data being duplicated
in heap and in method area

Since we only need data itself, i.e. Object[][] instance,
we may load DataProviders class on demand with custom classloader.
It will be garbage collected, along with all loaded classes,
when there'll be no links.

Few words about interned strings:
Though there's string literal pool in java,
it's located in heap, and represent string objects.
String literals in class contant pool are also "interned"
in a sense, that unique literal contained only once in a
class file, and bytecode instructions like LDC "Mike"
refer to string index in constant pool.
See also $4.4.3
$4.4.7

Note, $2.5.4 states: Although the method area is logically part of the heap, simple implementations may choose not to either garbage collect or compact it. This specification does not mandate the location of the method area or the policies used to manage compiled code
This means, that method area garbage collecting is implementation
dependent.

@krmahadevan
Copy link
Member

@dsankouski - Wow that's an amazing insight into the internals of the JVM. Thank you for taking the time to share that. I will go through it in detail and come back.

In the meantime if I understand your requirement, you are basically proposing a way wherein I can basically have the following for a data driven test

  1. The test should be able to specify a class loader powered data provider class which TestNG would basically load on-demand.
  2. After TestNG has finished loading the class via the class loader and then reading the data via the data provider method, TestNG at some point decides to offload the loaded class via some mechanism (I don't know what that mechanism is yet)

Is my understanding correct ?

If it's correct, then this is a very specialised usecase for data providers, because usually people build data providers that feed off an external data source (xml|xls|json|yaml|db|csv) and so there's no need for loading/unloading one class because it wouldn't have any fields that are constants as in your example.

PS: My knowledge on the JVM internals is very primitive, so please bear with me if my questions sound naive.. Excited to understand this even more from a personal learning perspective :)

@dsankouski
Copy link
Contributor Author

@krmahadevan

1 - correct
2 - correct, the easiest way to offload a dataprovider would be to rely on GC, I think. When there's no links to user's classloader, and classes loaded by it, GC may collect that, depending on implementation (Although JVM spec doesn't guarantee that, $2.5.4)

So, what I'm thinking of, is to add dataProviderClass string parameter to @Test annotation, and if that's not empty, load dataprovider via custom classloader, when resolving it's data.

This indeed may be specialized case. Because, when we're got dataProviderClass param as .class variable in @Test annotation, mentioned class will be loaded automatically upon test class loading. Class should be specified as String to be able to load it dynamically.

@krmahadevan
Copy link
Member

@dsankouski

So, what I'm thinking of, is to add dataProviderClass string parameter to @test annotation, and if that's not empty, load dataprovider via custom classloader, when resolving it's data.

Yes. It has to be a string parameter so that we can use reflection backed by a custom class loader to be used to load up the class. Then again, I think you would also need to consider providing a means to inject a custom class loader itself. So would it basically mean that we would now need to add up 2 string parameters (1 which specifies the data provider class and the other which specifies the custom class loader that we would like to be used.). I say this because if we end up specifying the classloader as a class parameter then we are back to square one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants