Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all versions of job being imported. #256

Open
sgokaram-saagie opened this issue Jul 14, 2020 · 11 comments
Open

Not all versions of job being imported. #256

sgokaram-saagie opened this issue Jul 14, 2020 · 11 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@sgokaram-saagie
Copy link
Contributor

tried to export a spark job with ID 4708 from V1. This job has 7 versions .
The export , import succeeded but when checked in V2 only 3 versions were imported.

@sgokaram-saagie sgokaram-saagie added the bug Something isn't working label Jul 14, 2020
@medamineziraoui
Copy link
Contributor

After exporting versions from v1 to v2, we check if some versions are redundant ( contain same data).
Because we don't export all the properties, we get versions that contain exactly the same data after mapping versions from v1 to v2. so we filtered the versions that are exactely the same so we gain more performance in export and import.

@medamineziraoui
Copy link
Contributor

medamineziraoui commented Jul 14, 2020

This is the exported versions of the file without removing the redundant data

"versions": [
        {
            "commandLine": "ls",
            "number": "1",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493307571-empty.R"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        },
        {
            "commandLine": "spark-submit",
            "number": "2",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493307571-empty.R"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        },
        {
            "commandLine": "spark-submit  --class=wine_clustering {file} 192.168.54.10",
            "number": "3",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493308229-extraction-plcs-assembly-1.0.jar"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        },
        {
            "commandLine": "spark-submit  --class=wine_clustering {file} 192.168.54.10",
            "number": "4",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493308229-extraction-plcs-assembly-1.0.jar"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        },
        {
            "commandLine": "spark-submit  --class=wine_clustering {file} 192.168.54.10",
            "number": "5",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493308229-extraction-plcs-assembly-1.0.jar"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        },
        {
            "commandLine": "spark-submit  --class=wine_clustering {file} 192.168.54.10",
            "number": "6",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493308229-extraction-plcs-assembly-1.0.jar"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        }
    ]

@medamineziraoui
Copy link
Contributor

medamineziraoui commented Jul 14, 2020

This is the versions with removing the redundant versions feature

"versions": [
        {
            "commandLine": "ls",
            "number": "1",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493307571-empty.R"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        },
        {
            "commandLine": "spark-submit",
            "number": "2",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493307571-empty.R"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        },
        {
            "commandLine": "spark-submit  --class=wine_clustering {file} 192.168.54.10",
            "number": "3",
            "runtimeVersion": "2.4",
            "packageInfo": {
                "downloadUrl": null,
                "name": "1493308229-extraction-plcs-assembly-1.0.jar"
            },
            "extraTechnology": {
                "language": "Java/Scala",
                "version": "8"
            }
        }
    ]

@medamineziraoui
Copy link
Contributor

I await for the call if we keep or remove the redundant versions

@sgokaram-saagie
Copy link
Contributor Author

@youenchene Can you confirm if its ok to remove the redudant versions. To me this will lead to confusion and would rather make it an option for users to choose. Doing it automatically might lead to confusion.

@medamineziraoui
Copy link
Contributor

Roger that 👍 . can you provide me the option name
I propose remove_redundant_versions = true | false

@medamineziraoui
Copy link
Contributor

in the current release, there is a case i didn t consider when comparing versions to filter them.
What happens in the current one is that i compare jobVersion properties and for the file part i only compare the name. wich is not sufficent if the user change the contain of the file without chaning the name. our release will consider such files equals.
so now i need to fix that, and i will do that by using a Request of type header that will get only the header without body so it will not get the binary and will do that for all the artifacts. so i can check their size. and after that i will compare the name and the size.
Do you agree with my suggestion?

@sgokaram-saagie
Copy link
Contributor Author

@medamineziraoui - I think we have overcomplicated this . I dont want to implement any smartness , just do a straight 1-1 and be done with it. The way to compare files is not just by name / size but to do a full hash and compare it. For now just comment that piece of logic and just do a simple 1-1

@ZouhairBear
Copy link
Contributor

ZouhairBear commented Sep 1, 2020

How to test :

Version to test : 2.1.4

The Export phase:

Create a build file:
Example of build.projectsExport.gradle:

plugins {
  id "io.saagie.gradle-saagie-dataops-plugin" version "2.1.4"
}
saagie {
    server {
        url = "REPLACE_SAAGIEURL"
        login = "REPLACE_SAAGIELOGIN"
	password = "REPLACE_SAAGIEPASSWORD"
        environment = "REPLACE_SAAGIE"
	jwt = true
    }

    job {
          ids = ["21864"]                // <=== You need to make sure that this job id exist in saagie platform with multiple versions
        include_all_versions = true 
                                                                                           
        }
    exportArtifacts {
          export_file = "./job/bash.zip" // <=== url "./job/" should exist  
          overwrite=true  // <=== temporary_directory ./tmp'  should exist        
    }
}

Export job using this commande :
The command line : gradle -b build.projectsExport.gradle projectsExportV1

The import phase:

Create a new build file:
Example : build.projectsImport.gradle:

plugins {
    id 'groovy'
    id 'io.saagie.gradle-saagie-dataops-plugin' version '2.1.4'
}

saagie {
    server {
        url = "REPLACE_SAAGIEURL"
        login = "REPLACE_SAAGIELOGIN"
	password = "REPLACE_SAAGIEPASSWORD"
        environment = "REPLACE_SAAGIE"
	jwt = true
        acceptSelfSigned = true
    }
    project {
        id = "REPLACE_SAAGIE_PROJECT_ID"
    }
    importArtifacts {
        import_file = "./job/export.zip"
        temporary_directory='./tmp'
    }
}

Then use this command line:
The command line : gradle -b build.projectsImport.gradle projectsImport
Bug behvior
You should get

Success message and not importing all job versions

Expected behvior

Success message and import all job versions

@ZouhairBear
Copy link
Contributor

ZouhairBear commented Sep 1, 2020

  • I have tested the export V1 of Job with multiple version( with include_all_versions = true ) : it works as expected

265exportV1AllVersion

  • I have tested the import V2 of Job with multiple version : it works as expected

Import Articfact that have job with multiple versions

@ZouhairBear
Copy link
Contributor

Test for release 2.1.5

  • I have tested the export V1 of Job with multiple version( with include_all_versions = true ) : it works as expected

256exportNew

  • I have tested the import V2 of Job with multiple version : it works as expected

256importNew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants