Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(c++,spark): support json payload file format #518

Merged
merged 16 commits into from
Jun 25, 2024

Conversation

amygbAI
Copy link
Contributor

@amygbAI amygbAI commented Jun 11, 2024

Reason for this PR

This is a rebased pull request, so to speak of ( due to the issue with JSONOptions being private in spark 3.2 and 3.3 ) ..also added test artifacts separately in incubator-graphar-testing. Was able to compile with spark 3.2 and 3.3 ( in both datasources-32 and datasources-33 )

What changes are included in this PR?

Changes in datasources-32 and datasources-33
Changes in cpp
Changes in pyspark

Are these changes tested?

yes, added the generated ldbc samples in incubator-graphar-testing/ldbc_sample/json

Are there any user-facing changes?

not that i am aware of

None

@SemyonSinchenko
Copy link
Member

SemyonSinchenko commented Jun 11, 2024

@amygbAI I see that all the tests are passed except Neo4j example and the error is about missing test data:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/datadrive/APACHE_GRAPHAR/incubator-graphar/testing/ldbc_sample/person_0_0.csv

I mean, that code is compiled and there is no problem you mentioned in #488

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 12, 2024

thanks at @SemyonSinchenko .. pulled and removed the local reference .. is there some way to run all of these builds before opening a PR ? as per directions on the thread and README's i built all the directories where i had made changes and tested ..

cpp/src/filesystem.cc Outdated Show resolved Hide resolved
cpp/test/test_arrow_chunk_reader.cc Outdated Show resolved Hide resolved
cpp/test/test_arrow_chunk_reader.cc Outdated Show resolved Hide resolved
maven-projects/target/.plxarc Outdated Show resolved Hide resolved
@acezen
Copy link
Contributor

acezen commented Jun 12, 2024

thanks at @SemyonSinchenko .. pulled and removed the local reference .. is there some way to run all of these builds before opening a PR ? as per directions on the thread and README's i built all the directories where i had made changes and tested ..

Hi, @amygbAI, The C++ library you can reference to the https://github.com/apache/incubator-graphar/tree/main/cpp#building to check the test, and for format error, remember that we use the clang-format-8 for format check.

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 12, 2024

resolved all comments, formatted, linted and built using instructions .. also removed the "build" folder from cpp since it wasn't in the repository ( much like target folders )

@SemyonSinchenko
Copy link
Member

May we add also a test for graphar spark+json?

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 12, 2024 via email

@acezen
Copy link
Contributor

acezen commented Jun 12, 2024

please send me an example for spark + csv and i ll refactor it

On Wed, Jun 12, 2024 at 2:51 PM Semyon @.> wrote: May we add also a test for graphar spark+json? — Reply to this email directly, view it on GitHub <#518 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSBBSXKDTSQNNEBHTVLZHAHJ3AVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSGUZDSMJZGU . You are receiving this because you were mentioned.Message ID: @.>

You can refer to the existed test case an just add a test with updated json test data like:

  • Spark
    test("read vertex chunks") {
    // construct the vertex information
    val prefix = testData + "/ldbc_sample/parquet/"
    val vertex_yaml = prefix + "person.vertex.yml"
    val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
    // construct the vertex reader
    val reader = new VertexReader(prefix, vertex_info, spark)
    // test reading the number of vertices
    assert(reader.readVerticesNumber() == 903)
    val property_group = vertex_info.getPropertyGroup("gender")
    // test reading a single property chunk
    val single_chunk_df = reader.readVertexPropertyChunk(property_group, 0)
    assert(single_chunk_df.columns.length == 4)
    assert(single_chunk_df.count() == 100)
    val cond = "gender = 'female'"
    var df_pd = single_chunk_df.select("firstName", "gender").filter(cond)

json test can be:

test("read vertex chunks") {
// construct the vertex information
    val prefix = testData + "/ldbc_sample/json/"
    val vertex_yaml = prefix + "Person.vertex.yml"
    val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)

    // construct the vertex reader
    val reader = new VertexReader(prefix, vertex_info, spark)

    // test reading the number of vertices
    assert(reader.readVerticesNumber() == 903)
    val property_group = vertex_info.getPropertyGroup("gender")

    // test reading a single property chunk
    val single_chunk_df = reader.readVertexPropertyChunk(property_group, 0)
    assert(single_chunk_df.columns.length == 4)
    assert(single_chunk_df.count() == 100)
    val cond = "gender = 'female'"
    var df_pd = single_chunk_df.select("firstName", "gender").filter(cond)
  • pyspark
    def test_vertex_reader(spark):
    initialize(spark)
    vertex_info = VertexInfo.load_vertex_info(
    GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph")
    .joinpath("person.vertex.yml")
    .absolute()
    .__str__()
    )
    vertex_reader = VertexReader.from_python(
    GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph").absolute().__str__(),
    vertex_info,
    )
    assert VertexReader.from_scala(vertex_reader.to_scala()) is not None
    assert vertex_reader.read_vertices_number() > 0
    assert (
    vertex_reader.read_vertex_property_group(
    vertex_info.get_property_group("name")
    ).count()
    > 0
    )
    assert (
    vertex_reader.read_vertex_property_chunk(
    vertex_info.get_property_groups()[0], 0
    ).count()
    > 0
    )
    assert (
    vertex_reader.read_all_vertex_property_groups().count()
    >= vertex_reader.read_vertex_property_group(
    vertex_info.get_property_group("age")
    ).count()
    )
    assert (
    vertex_reader.read_multiple_vertex_property_groups(
    [vertex_info.get_property_group("name")]
    ).count()
    > 0
    )

json test can be:

def test_vertex_reader_with_json(spark):
    initialize(spark)

    vertex_info = VertexInfo.load_vertex_info(
        GRAPHAR_TESTS_EXAMPLES.joinpath("/ldbc_sample/json/")
        .joinpath("Person.vertex.yml")
        .absolute()
        .__str__()
    )
    vertex_reader = VertexReader.from_python(
        GRAPHAR_TESTS_EXAMPLES.joinpath("/ldbc_sample/json/").absolute().__str__(),
        vertex_info,
    )
    assert VertexReader.from_scala(vertex_reader.to_scala()) is not None
    assert vertex_reader.read_vertices_number() > 0
    assert (
        vertex_reader.read_vertex_property_group(
            vertex_info.get_property_group("name")
        ).count()
        > 0
    )
    assert (
        vertex_reader.read_vertex_property_chunk(
            vertex_info.get_property_groups()[0], 0
        ).count()
        > 0
    )
    assert (
        vertex_reader.read_all_vertex_property_groups().count()
        >= vertex_reader.read_vertex_property_group(
            vertex_info.get_property_group("age")
        ).count()
    )
    assert (
        vertex_reader.read_multiple_vertex_property_groups(
            [vertex_info.get_property_group("name")]
        ).count()
        > 0
    )

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 12, 2024 via email

@acezen
Copy link
Contributor

acezen commented Jun 17, 2024

alritey folks .. done On Wed, Jun 12, 2024 at 4:39 PM Weibin Zeng @.> wrote:

please send me an example for spark + csv and i ll refactor it … <#m_-7895995846438427138_> On Wed, Jun 12, 2024 at 2:51 PM Semyon @.
> wrote: May we add also a test for graphar spark+json? — Reply to this email directly, view it on GitHub <#518 (comment) <#518 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSBBSXKDTSQNNEBHTVLZHAHJ3AVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSGUZDSMJZGU https://github.com/notifications/unsubscribe-auth/ATIQOSBBSXKDTSQNNEBHTVLZHAHJ3AVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSGUZDSMJZGU . You are receiving this because you were mentioned.Message ID: @.
> You can refer to the existed test case an just add a test with updated json test data like: - Spark

test("read vertex chunks") {
// construct the vertex information
val prefix = testData + "/ldbc_sample/parquet/"
val vertex_yaml = prefix + "person.vertex.yml"
val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
// construct the vertex reader
val reader = new VertexReader(prefix, vertex_info, spark)
// test reading the number of vertices
assert(reader.readVerticesNumber() == 903)
val property_group = vertex_info.getPropertyGroup("gender")
// test reading a single property chunk
val single_chunk_df = reader.readVertexPropertyChunk(property_group, 0)
assert(single_chunk_df.columns.length == 4)
assert(single_chunk_df.count() == 100)
val cond = "gender = 'female'"
var df_pd = single_chunk_df.select("firstName", "gender").filter(cond)
json test can be: test("read vertex chunks") {// construct the vertex information val prefix = testData + "/ldbc_sample/json/" val vertex_yaml = prefix + "Person.vertex.yml" val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark) // construct the vertex reader val reader = new VertexReader(prefix, vertex_info, spark) // test reading the number of vertices assert(reader.readVerticesNumber() == 903) val property_group = vertex_info.getPropertyGroup("gender") // test reading a single property chunk val single_chunk_df = reader.readVertexPropertyChunk(property_group, 0) assert(single_chunk_df.columns.length == 4) assert(single_chunk_df.count() == 100) val cond = "gender = 'female'" var df_pd = single_chunk_df.select("firstName", "gender").filter(cond) - pyspark
def test_vertex_reader(spark):
initialize(spark)
vertex_info = VertexInfo.load_vertex_info(
GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph")
.joinpath("person.vertex.yml")
.absolute()
.__str__()
)
vertex_reader = VertexReader.from_python(
GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph").absolute().__str__(),
vertex_info,
)
assert VertexReader.from_scala(vertex_reader.to_scala()) is not None
assert vertex_reader.read_vertices_number() > 0
assert (
vertex_reader.read_vertex_property_group(
vertex_info.get_property_group("name")
).count()
> 0
)
assert (
vertex_reader.read_vertex_property_chunk(
vertex_info.get_property_groups()[0], 0
).count()
> 0
)
assert (
vertex_reader.read_all_vertex_property_groups().count()
>= vertex_reader.read_vertex_property_group(
vertex_info.get_property_group("age")
).count()
)
assert (
vertex_reader.read_multiple_vertex_property_groups(
[vertex_info.get_property_group("name")]
).count()
> 0
)
json test can be: def test_vertex_reader_with_json(spark): initialize(spark) vertex_info = VertexInfo.load_vertex_info( GRAPHAR_TESTS_EXAMPLES.joinpath("/ldbc_sample/json/") .joinpath("Person.vertex.yml") .absolute() .str() ) vertex_reader = VertexReader.from_python( GRAPHAR_TESTS_EXAMPLES.joinpath("/ldbc_sample/json/").absolute().str(), vertex_info, ) assert VertexReader.from_scala(vertex_reader.to_scala()) is not None assert vertex_reader.read_vertices_number() > 0 assert ( vertex_reader.read_vertex_property_group( vertex_info.get_property_group("name") ).count() > 0 ) assert ( vertex_reader.read_vertex_property_chunk( vertex_info.get_property_groups()[0], 0 ).count() > 0 ) assert ( vertex_reader.read_all_vertex_property_groups().count() >= vertex_reader.read_vertex_property_group( vertex_info.get_property_group("age") ).count() ) assert ( vertex_reader.read_multiple_vertex_property_groups( [vertex_info.get_property_group("name")] ).count() > 0 ) — Reply to this email directly, view it on GitHub <#518 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSCZKVPDSRXA7AYB56DZHAT7JAVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSG4ZTOMZQGU . You are receiving this because you were mentioned.Message ID: @.**>

Hi, @amygbAI Can you add me to your graphar fork repo's collaborator? I can help you fix the format and test.

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 17, 2024 via email

@acezen
Copy link
Contributor

acezen commented Jun 17, 2024

done ..kindly let me know what changes u make .. i am compiling a document of all the communication we have had so that it becomes a good starting point for noobs like me On Mon, Jun 17, 2024 at 8:43 AM Weibin Zeng @.> wrote:

alritey folks .. done On Wed, Jun 12, 2024 at 4:39 PM Weibin Zeng @. > wrote: … <#m_-8019765962861073975_> please send me an example for spark + csv and i ll refactor it … <#m_-7895995846438427138_> On Wed, Jun 12, 2024 at 2:51 PM Semyon @.> wrote: May we add also a test for graphar spark+json? — Reply to this email directly, view it on GitHub <#518 <#518> (comment) <#518 (comment) <#518 (comment)>>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSBBSXKDTSQNNEBHTVLZHAHJ3AVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSGUZDSMJZGU https://github.com/notifications/unsubscribe-auth/ATIQOSBBSXKDTSQNNEBHTVLZHAHJ3AVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSGUZDSMJZGU https://github.com/notifications/unsubscribe-auth/ATIQOSBBSXKDTSQNNEBHTVLZHAHJ3AVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSGUZDSMJZGU https://github.com/notifications/unsubscribe-auth/ATIQOSBBSXKDTSQNNEBHTVLZHAHJ3AVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSGUZDSMJZGU . You are receiving this because you were mentioned.Message ID: @.> You can refer to the existed test case an just add a test with updated json test data like: - Spark

test("read vertex chunks") {
// construct the vertex information
val prefix = testData + "/ldbc_sample/parquet/"
val vertex_yaml = prefix + "person.vertex.yml"
val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
// construct the vertex reader
val reader = new VertexReader(prefix, vertex_info, spark)
// test reading the number of vertices
assert(reader.readVerticesNumber() == 903)
val property_group = vertex_info.getPropertyGroup("gender")
// test reading a single property chunk
val single_chunk_df = reader.readVertexPropertyChunk(property_group, 0)
assert(single_chunk_df.columns.length == 4)
assert(single_chunk_df.count() == 100)
val cond = "gender = 'female'"
var df_pd = single_chunk_df.select("firstName", "gender").filter(cond)
test("read vertex chunks") {
// construct the vertex information
val prefix = testData + "/ldbc_sample/parquet/"
val vertex_yaml = prefix + "person.vertex.yml"
val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
// construct the vertex reader
val reader = new VertexReader(prefix, vertex_info, spark)
// test reading the number of vertices
assert(reader.readVerticesNumber() == 903)
val property_group = vertex_info.getPropertyGroup("gender")
// test reading a single property chunk
val single_chunk_df = reader.readVertexPropertyChunk(property_group, 0)
assert(single_chunk_df.columns.length == 4)
assert(single_chunk_df.count() == 100)
val cond = "gender = 'female'"
var df_pd = single_chunk_df.select("firstName", "gender").filter(cond)
json test can be: test("read vertex chunks") {// construct the vertex information val prefix = testData + "/ldbc_sample/json/" val vertex_yaml = prefix + "Person.vertex.yml" val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark) // construct the vertex reader val reader = new VertexReader(prefix, vertex_info, spark) // test reading the number of vertices assert(reader.readVerticesNumber() == 903) val property_group = vertex_info.getPropertyGroup("gender") // test reading a single property chunk val single_chunk_df = reader.readVertexPropertyChunk(property_group, 0) assert(single_chunk_df.columns.length == 4) assert(single_chunk_df.count() == 100) val cond = "gender = 'female'" var df_pd = single_chunk_df.select("firstName", "gender").filter(cond) - pyspark
def test_vertex_reader(spark):
initialize(spark)
vertex_info = VertexInfo.load_vertex_info(
GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph")
.joinpath("person.vertex.yml")
.absolute()
.__str__()
)
vertex_reader = VertexReader.from_python(
GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph").absolute().__str__(),
vertex_info,
)
assert VertexReader.from_scala(vertex_reader.to_scala()) is not None
assert vertex_reader.read_vertices_number() > 0
assert (
vertex_reader.read_vertex_property_group(
vertex_info.get_property_group("name")
).count()
> 0
)
assert (
vertex_reader.read_vertex_property_chunk(
vertex_info.get_property_groups()[0], 0
).count()
> 0
)
assert (
vertex_reader.read_all_vertex_property_groups().count()
>= vertex_reader.read_vertex_property_group(
vertex_info.get_property_group("age")
).count()
)
assert (
vertex_reader.read_multiple_vertex_property_groups(
[vertex_info.get_property_group("name")]
).count()
> 0
)
def test_vertex_reader(spark):
initialize(spark)
vertex_info = VertexInfo.load_vertex_info(
GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph")
.joinpath("person.vertex.yml")
.absolute()
.__str__()
)
vertex_reader = VertexReader.from_python(
GRAPHAR_TESTS_EXAMPLES.joinpath("modern_graph").absolute().__str__(),
vertex_info,
)
assert VertexReader.from_scala(vertex_reader.to_scala()) is not None
assert vertex_reader.read_vertices_number() > 0
assert (
vertex_reader.read_vertex_property_group(
vertex_info.get_property_group("name")
).count()
> 0
)
assert (
vertex_reader.read_vertex_property_chunk(
vertex_info.get_property_groups()[0], 0
).count()
> 0
)
assert (
vertex_reader.read_all_vertex_property_groups().count()
>= vertex_reader.read_vertex_property_group(
vertex_info.get_property_group("age")
).count()
)
assert (
vertex_reader.read_multiple_vertex_property_groups(
[vertex_info.get_property_group("name")]
).count()
> 0
)
json test can be: def test_vertex_reader_with_json(spark): initialize(spark) vertex_info = VertexInfo.load_vertex_info( GRAPHAR_TESTS_EXAMPLES.joinpath("/ldbc_sample/json/") .joinpath("Person.vertex.yml") .absolute() .str() ) vertex_reader = VertexReader.from_python( GRAPHAR_TESTS_EXAMPLES.joinpath("/ldbc_sample/json/").absolute().str(), vertex_info, ) assert VertexReader.from_scala(vertex_reader.to_scala()) is not None assert vertex_reader.read_vertices_number() > 0 assert ( vertex_reader.read_vertex_property_group( vertex_info.get_property_group("name") ).count() > 0 ) assert ( vertex_reader.read_vertex_property_chunk( vertex_info.get_property_groups()[0], 0 ).count() > 0 ) assert ( vertex_reader.read_all_vertex_property_groups().count() >= vertex_reader.read_vertex_property_group( vertex_info.get_property_group("age") ).count() ) assert ( vertex_reader.read_multiple_vertex_property_groups( [vertex_info.get_property_group("name")] ).count() > 0 ) — Reply to this email directly, view it on GitHub <#518 (comment) <#518 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSCZKVPDSRXA7AYB56DZHAT7JAVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSG4ZTOMZQGU https://github.com/notifications/unsubscribe-auth/ATIQOSCZKVPDSRXA7AYB56DZHAT7JAVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSG4ZTOMZQGU . You are receiving this because you were mentioned.Message ID: @.
> Hi, @amygbAI https://github.com/amygbAI Can you add me to your graphar folk repo's collaborator? I can help you fix the format and test. — Reply to this email directly, view it on GitHub <#518 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSC245NKCVPWVESRACTZHZH5JAVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZSGA4TIMRZGY . You are receiving this because you were mentioned.Message ID: @.
**>

the GraphAr C++ CI is failed in format check. Can you double check the clang-format is version 8.x.x with command

clang-format --version

cpp/test/tmp Outdated
@@ -0,0 +1,465 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the file for? It seems this is a temporary file?

std::string vertex_info_path =
test_data_dir + "ldbc_sample/json/Person.vertex.yml";
std::cout << "Vertex info path: " << vertex_info_path << std::endl;
auto fs = FileSystemFromUriOrPath(prefix).value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not define the prefix variable, see the CI report

auto new_pg = CreatePropertyGroup({new_property}, pg->GetFileType(),
pg->GetPrefix());
auto maybe_reader =
VertexPropertyArrowChunkReader::Make(vertex_info, new_pg, prefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, see report

@acezen
Copy link
Contributor

acezen commented Jun 17, 2024

BTW, You can watch the Details, there are the reports for the CI/CD, you can check the report and find the reason why the CI failed.
截屏2024-06-17 15 54 59

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 17, 2024 via email

@acezen acezen force-pushed the 170-feat-support-json-payload-file-format branch from 04518a2 to 16b61bc Compare June 20, 2024 09:57
acezen added 2 commits June 20, 2024 18:05
Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
@acezen
Copy link
Contributor

acezen commented Jun 20, 2024

Hi, @amygbAI , I have fix the c++ format with clang-format-8, and do some changes:

  • simplify the test, we do not need to repeat the test of other format again, just test read json chunk is ok
  • Complement json code of datasource.

@acezen acezen changed the title 170 feat support json payload file format feat(c++,spark): support json payload file format Jun 20, 2024
@acezen
Copy link
Contributor

acezen commented Jun 21, 2024

Hi, @amygbAI, I think the the PR is overall LGTM, you may fix the remaining problem base on Sem's comments.

amygbAI pushed a commit to amygbAI/incubator-graphar that referenced this pull request Jun 21, 2024
@amygbAI amygbAI force-pushed the 170-feat-support-json-payload-file-format branch from b6c5cfa to 04c9026 Compare June 21, 2024 05:41
amygbAI pushed a commit to amygbAI/incubator-graphar that referenced this pull request Jun 21, 2024
@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 21, 2024

after making the changes when i tried to push git showed me some conflicts and after gpt + google search i followed the below

git stash
git pull origin 170-feat-support-json-payload-file-format
git stash pop
git add .
git commit -m "msg"
git push origin 170-feat-support-json-payload-file-format

for some reason now it shows some conflict in the scripts file AND also in the cpp file filesystem.cc .. totally confused

@acezen
Copy link
Contributor

acezen commented Jun 21, 2024

after making the changes when i tried to push git showed me some conflicts and after gpt + google search i followed the below

git stash git pull origin 170-feat-support-json-payload-file-format git stash pop git add . git commit -m "msg" git push origin 170-feat-support-json-payload-file-format

for some reason now it shows some conflict in the scripts file AND also in the cpp file filesystem.cc .. totally confused

That's because your branch and main modified the same file. You need to rebase the main first and fix the conflict:
refer: https://www.atlassian.com/git/tutorials/merging-vs-rebasing

@acezen
Copy link
Contributor

acezen commented Jun 21, 2024

after making the changes when i tried to push git showed me some conflicts and after gpt + google search i followed the below

git stash git pull origin 170-feat-support-json-payload-file-format git stash pop git add . git commit -m "msg" git push origin 170-feat-support-json-payload-file-format

for some reason now it shows some conflict in the scripts file AND also in the cpp file filesystem.cc .. totally confused

Or just revert the maven-projects/spark/scripts/run-ldbc-sample2graphar.sh to csv, we don't need to change this file.

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 22, 2024 via email

@acezen
Copy link
Contributor

acezen commented Jun 24, 2024

origin

Hi, @amygbAI, seems that you have reverted the changes that I made ╥﹏╥

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 24, 2024 via email

@acezen acezen force-pushed the 170-feat-support-json-payload-file-format branch from 86727fd to ccfe05e Compare June 24, 2024 03:03
@acezen
Copy link
Contributor

acezen commented Jun 24, 2024

dang .. can u just give me the exact version on which u made the changes, i ll do a hard reset to that version and then add my final changes on it ..

On Mon, Jun 24, 2024 at 8:19 AM Weibin Zeng @.> wrote: origin Hi, @amygbAI https://github.com/amygbAI, seems that you have reverted the changes that I made ╥﹏╥ — Reply to this email directly, view it on GitHub <#518 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSBZPJQX7X6WZFE5OQTZI6CMDAVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVGQ4DOMZUHE . You are receiving this because you were mentioned.Message ID: @.>

It's OK now. I have reset to the commit and apply your final change to it. And it should be OK now.

Copy link
Contributor

@acezen acezen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!
Hi, @SemyonSinchenko, this change is OK to me, could you give a review again?

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 24, 2024 via email

@acezen
Copy link
Contributor

acezen commented Jun 24, 2024

appreciate all your patience .. i think i made it harder than it actually should have been

On Mon, Jun 24, 2024 at 8:42 AM Weibin Zeng @.> wrote: @.* approved this pull request. LGTM, thanks for the work! Hi, @SemyonSinchenko https://github.com/SemyonSinchenko, this change is OK to me, could you give a review again? — Reply to this email directly, view it on GitHub <#518 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAAHSH74FYMCWJG7H3ZI6FCVAVCNFSM6AAAAABJDPBQKGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMZUGU2DGOJVGQ . You are receiving this because you were mentioned.Message ID: @.***>

Don't be bothered about that. Since this is your first time to contribute to a open source project, your continue contribution and the final work are excellent. Hope you can join the community and continue to contribute to GraphAr.:)

@amygbAI
Copy link
Contributor Author

amygbAI commented Jun 24, 2024 via email

@SemyonSinchenko
Copy link
Member

I will review it again in the evening.

Copy link
Member

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution @amygbAI !

@acezen acezen merged commit 73e0702 into apache:main Jun 25, 2024
7 checks passed
Elssky pushed a commit to Elssky/incubator-graphar that referenced this pull request Oct 8, 2024
Reason for this PR
This is a rebased pull request, so to speak of ( due to the issue with JSONOptions being private in spark 3.2 and 3.3 ) ..also added test artifacts separately in incubator-graphar-testing. Was able to compile with spark 3.2 and 3.3 ( in both datasources-32 and datasources-33 )

What changes are included in this PR?
Changes in datasources-32 and datasources-33
Changes in cpp
Changes in pyspark

Are these changes tested?
yes, added the generated ldbc samples in incubator-graphar-testing/ldbc_sample/json

Are there any user-facing changes?

None

---------

Signed-off-by: amygbAI <80807752+amygbAI@users.noreply.github.com>
Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
Co-authored-by: Ubuntu <bithika@amygb.ai>
Co-authored-by: acezen <qiaozi.zwb@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants