Test data as code feature concept #2724

dsankouski · 2022-02-04T13:13:56Z

TestNG Version

Note: only the latest version is supported
7.5.0

The point of this concept is to write test data files as code on kotlin
(or any other language). This allows to put dynamic data
(i.e. generated on each launch, for example, today's date or random string)
into data files.

The problem:

I was involved in a project, where test data was stored in yaml files.
They parsed those yaml files, and used it in a tests.
The problem was, they needed dynamic values,
like today's date, or random sequence of characters.
This data cannot be places in yaml testdata files by default.
So they used custom markup language, to define, for example random alphabetic sequence
of length 6. This custom markup language was processed by custom yaml deserializer,
to replace markup with calculated values. In this case, why can't we write test data
in code?

How test data as kotlin code may look like?
Kotlin code may seem like declarative language, when using named constuctors
for example:

data class Person(
    val name: String,
    val age: Int,
    val profession: String
)

val person = Person(
    name = "Cris",
    age = 18,
    profession = "student"
)

The problem with this approach is that all data-generation code, and consequently
data(because it's essentially hardcoded) remains in RAM, and may lead to OOM.
We may overcome this with custom classloader, for example:

data class Person(
    val name: String,
    val age: Int,
    val profession: String
)

val person = getPerson()

fun getPerson() {
    /**
     * This custom classloader is used to load code that provides data
     * When loaded code executed, and method returned, classloader, and all
     * classes, loaded by it, will be cleared by GC.
     * */
    val dataClassLoader = DataClassLoader()
    val providerClass = dataClassLoader.loadClass("PersonProvider")
    val providerMethod = providerClass.declaredMethod("provide")
    val person = providerMethod.invoke(
        providerClass.getDeclaredConstructor().newInstance()
    )

    return person
}


class PersonProvider {
    fun provide() = Person(
        name = "Cris",
        age = 18,
        profession = "student"
    )
}

How a test class may look like?

class TestClass {
    @Test(dataAsCode = [["PersonProvider#provide"]])
    fun test(person: Person) {
        
    }
}

What do you guys think? Would you accept such feature in testNG?

cbeust · 2022-02-04T17:32:21Z

Sounds like a .kts file would be a better fit than a .kt but either way, I agree it sounds useful.

juherr · 2022-02-05T11:57:27Z

I don't catch the difference between data as code and data provider using code.

It sounds to me like a SPI on top of the data provider logic.
Something like https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests

Am I right?

dsankouski · 2022-02-05T14:04:01Z

@cbeust , .kts files would require a kotlin embedded compiler and script runtime dependencies in tests. It's consume a lot of RAM, and some time to load, which may slow down single test run, when debugging.

@juherr Data provider's code will never be unloaded from heap, so it's not really scalable to put a data generation code there, because it will sit in RAM for entire test run and may eventually lead to OOM, As for junit implementation, I don't know if they unload their method providers, after use. Will look at it later.

Maybe we may make dynamically load / unload dataproviders, when it's possible, i.e. when dataprovider is in different class?

juherr · 2022-02-05T15:37:25Z

Ok. I'm just a bit surprised because our known memory problem is on the test results.

If we are going to the way you say, is there any reason to not implement the current data provider feature over it?

If you plan to try something, please share your results here.

dsankouski · 2022-02-05T17:34:05Z

Yes, if we do dataproviders unloadable, test data may be written as a data provider in a separate class, loaded on demand, and unloaded after data is generated.

I never met teems, coding any sophisticated data in dataproviders. I think there was some reasons for that:

developers are tought that hardcoding is bad, and that you need to store data in files.
java code seem like really imperative and defining data in it is not pretty.
memory consumption - dataproviders code will stay in memory all test run

But when memory consumption will not threat scalability, and with scripting language usage it makes sense for me.

krmahadevan · 2022-02-07T04:16:49Z

@dsankouski - I am still not sure as to what problem are we trying to solve here. Can you please help me understand that ?

I ask this because, TestNG always remembers the parameters that were passed to a @Test method or to a constructor (in the case of @Factory methods). So irrespective of whether you attempt at getting rid of their references from data provider classes (The kotlin sample didn't make it clear to me as to how are you going to trigger a GC by removing references of the data provider parameters), GC is still not going to happen, because as @juherr mentioned TestNG still has references to the parameters via the ITestResult object.

So can you please break down the problem statement into a bit more elaborate details so that I can understand the problem statement.

May be a Java example could be more explanatory (if possible that is)

dsankouski · 2022-02-07T19:49:06Z

@krmahadevan

Suppose, you're decided to use data provider, like in example:

import org.testng.annotations.DataProvider;

class DataProviders {
    @DataProvider
    public static Object[][] dp() {
        return new Object[][] {
                {"Mike", 34, "student"},
                {"Mike", 23, "driver"},
                {"Paul", 20, "director"}
        };
    }
}

class Test {
    @Test(dataprovider = "dp", dataproviderClass = DataProviders.class)
    public void test(String name, int age, String prof) {
        // test code
    }
}

All data in DataProviders class will be placed in constant pool $4.4 upon compilation
When loading DataProviders.class $5.3, it's internal binary representation
will be created in the method area $2.5.4.
This binary representation contains per-class runtime constant pool $2.5.5,
which contains all the data we need for test.
So our code will basically "copy" data from runtime constant pool to Object[][] instance:
$4.4: Java Virtual Machine instructions do not rely on the run-time layout of classes, interfaces, class instances, or arrays. Instead, instructions refer to symbolic information in the constant_pool table

When dp method is invoked, Object[][] instance is created in
heap area, thus duplicating data, contained in method area.

With current DP implementation we end up with data being duplicated
in heap and in method area

Since we only need data itself, i.e. Object[][] instance,
we may load DataProviders class on demand with custom classloader.
It will be garbage collected, along with all loaded classes,
when there'll be no links.

Few words about interned strings:
Though there's string literal pool in java,
it's located in heap, and represent string objects.
String literals in class contant pool are also "interned"
in a sense, that unique literal contained only once in a
class file, and bytecode instructions like LDC "Mike"
refer to string index in constant pool.
See also $4.4.3
$4.4.7

Note, $2.5.4 states: Although the method area is logically part of the heap, simple implementations may choose not to either garbage collect or compact it. This specification does not mandate the location of the method area or the policies used to manage compiled code
This means, that method area garbage collecting is implementation
dependent.

krmahadevan · 2022-02-08T16:00:41Z

@dsankouski - Wow that's an amazing insight into the internals of the JVM. Thank you for taking the time to share that. I will go through it in detail and come back.

In the meantime if I understand your requirement, you are basically proposing a way wherein I can basically have the following for a data driven test

The test should be able to specify a class loader powered data provider class which TestNG would basically load on-demand.
After TestNG has finished loading the class via the class loader and then reading the data via the data provider method, TestNG at some point decides to offload the loaded class via some mechanism (I don't know what that mechanism is yet)

Is my understanding correct ?

If it's correct, then this is a very specialised usecase for data providers, because usually people build data providers that feed off an external data source (xml|xls|json|yaml|db|csv) and so there's no need for loading/unloading one class because it wouldn't have any fields that are constants as in your example.

PS: My knowledge on the JVM internals is very primitive, so please bear with me if my questions sound naive.. Excited to understand this even more from a personal learning perspective :)

dsankouski · 2022-02-11T13:33:00Z

@krmahadevan

1 - correct
2 - correct, the easiest way to offload a dataprovider would be to rely on GC, I think. When there's no links to user's classloader, and classes loaded by it, GC may collect that, depending on implementation (Although JVM spec doesn't guarantee that, $2.5.4)

So, what I'm thinking of, is to add dataProviderClass string parameter to @Test annotation, and if that's not empty, load dataprovider via custom classloader, when resolving it's data.

This indeed may be specialized case. Because, when we're got dataProviderClass param as .class variable in @Test annotation, mentioned class will be loaded automatically upon test class loading. Class should be specified as String to be able to load it dynamically.

krmahadevan · 2022-02-15T05:18:21Z

@dsankouski

So, what I'm thinking of, is to add dataProviderClass string parameter to @test annotation, and if that's not empty, load dataprovider via custom classloader, when resolving it's data.

Yes. It has to be a string parameter so that we can use reflection backed by a custom class loader to be used to load up the class. Then again, I think you would also need to consider providing a means to inject a custom class loader itself. So would it basically mean that we would now need to add up 2 string parameters (1 which specifies the data provider class and the other which specifies the custom class loader that we would like to be used.). I say this because if we end up specifying the classloader as a class parameter then we are back to square one.

dsankouski mentioned this issue Mar 10, 2022

DataProvider: possibility to unload dataprovider class, when done with it #2739

Merged

3 tasks

krmahadevan added this to the 7.6.0 milestone Apr 21, 2022

krmahadevan added the Feature: data-provider label Apr 21, 2022

krmahadevan closed this as completed in #2739 Apr 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test data as code feature concept #2724

Test data as code feature concept #2724

dsankouski commented Feb 4, 2022 •

edited

Loading

cbeust commented Feb 4, 2022

juherr commented Feb 5, 2022

dsankouski commented Feb 5, 2022

juherr commented Feb 5, 2022

dsankouski commented Feb 5, 2022

krmahadevan commented Feb 7, 2022 •

edited

Loading

dsankouski commented Feb 7, 2022 •

edited

Loading

krmahadevan commented Feb 8, 2022

dsankouski commented Feb 11, 2022

krmahadevan commented Feb 15, 2022

Test data as code feature concept #2724

Test data as code feature concept #2724

Comments

dsankouski commented Feb 4, 2022 • edited Loading

TestNG Version

cbeust commented Feb 4, 2022

juherr commented Feb 5, 2022

dsankouski commented Feb 5, 2022

juherr commented Feb 5, 2022

dsankouski commented Feb 5, 2022

krmahadevan commented Feb 7, 2022 • edited Loading

dsankouski commented Feb 7, 2022 • edited Loading

krmahadevan commented Feb 8, 2022

dsankouski commented Feb 11, 2022

krmahadevan commented Feb 15, 2022

dsankouski commented Feb 4, 2022 •

edited

Loading

krmahadevan commented Feb 7, 2022 •

edited

Loading

dsankouski commented Feb 7, 2022 •

edited

Loading