Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Move datasources under org.apache.spark package #493

Closed
SemyonSinchenko opened this issue May 23, 2024 · 8 comments · Fixed by #514
Closed

[Spark] Move datasources under org.apache.spark package #493

SemyonSinchenko opened this issue May 23, 2024 · 8 comments · Fixed by #514
Assignees
Labels
Component:Spark enhancement New feature or request

Comments

@SemyonSinchenko
Copy link
Member

Describe the enhancement requested

Currently datasources implementation are under org.apache.graphar space. I suggest to move these classes to org.apache.spark package. It allows us, for example, to use spark internals. For example, it resolve the problem in #488 when JSONOptions is private[sql].

Component(s)

Spark, PySpark

@acezen
Copy link
Contributor

acezen commented May 23, 2024

There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.

@SemyonSinchenko
Copy link
Member Author

It may also simplify the full support of v2 datasources. Last time when I tried to switch to v2 I faced a problem that few important methods are private[sql]

@SemyonSinchenko
Copy link
Member Author

There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.

But do we need it? We may release graphar itself and we may release PyPI package that already include JARs from datasources. Should datasources be released separately?

@acezen
Copy link
Contributor

acezen commented May 23, 2024

There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.

But do we need it? We may release graphar itself and we may release PyPI package that already include JARs from datasources. Should datasources be released separately?

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

@SemyonSinchenko
Copy link
Member Author

SemyonSinchenko commented May 23, 2024

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

@acezen
Copy link
Contributor

acezen commented May 24, 2024

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.

@SemyonSinchenko
Copy link
Member Author

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.

I can experiment with it. It seems to me that yes, it should be possible.

@acezen
Copy link
Contributor

acezen commented May 24, 2024

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.

I can experiment with it. It seems to me that yes, it should be possible.

Great, thanks Sem, Can I assign this issue to you?

@SemyonSinchenko SemyonSinchenko self-assigned this May 24, 2024
acezen pushed a commit that referenced this issue Jun 7, 2024
### Reason for this PR
By moving datasources under `org.apache.spark.sql` we are able to access private Spark API. Last time when I was trying to fully migrate datasources to V2 it was a blocker. Detailed motivation is in #493 

### What changes are included in this PR?
Mostly refactoring.

### Are these changes tested?
Unit tests are passed

I manually checked the generated JARs:
![image](https://github.com/apache/incubator-graphar/assets/29755009/1b094516-88b1-490a-a2ea-8dcd092a3b1d)

### Are there any user-facing changes?
Mostly not because `GarDataSource` was left under the same package.


Close #493
SemyonSinchenko added a commit that referenced this issue Jun 13, 2024
* feat(spark): Refactoring datasources (#514)

### Reason for this PR
By moving datasources under `org.apache.spark.sql` we are able to access private Spark API. Last time when I was trying to fully migrate datasources to V2 it was a blocker. Detailed motivation is in #493 

### What changes are included in this PR?
Mostly refactoring.

### Are these changes tested?
Unit tests are passed

I manually checked the generated JARs:
![image](https://github.com/apache/incubator-graphar/assets/29755009/1b094516-88b1-490a-a2ea-8dcd092a3b1d)

### Are there any user-facing changes?
Mostly not because `GarDataSource` was left under the same package.


Close #493

* feat(dev): Add release and verify scripts (#507)

Reason for this PR
Add scripts for developer or release manager to easily release version or verify a version.

What changes are included in this PR?
Add release and verify scripts
related document is updated to website, see Update the release and verify document, and add development document incubator-graphar-website#18
Are these changes tested?
yes

Are there any user-facing changes?
no
---------

Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>

* chore: Bump to version v0.12.0 (Round 1) (#517)


Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>

* chore: Add CHANGELOG.md (#513)


Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>

* Introduce buf

- v2
- buf.gen
- buf

 On branch format-definition-dev
 Your branch is up to date with 'origin/format-definition-dev'.

 Changes to be committed:
	new file:   buf.gen.yaml
	new file:   buf.yaml
	modified:   format/adjacent_list.proto
	modified:   format/edge_info.proto
	modified:   format/graph_info.proto
	modified:   format/property_group.proto
	modified:   format/types.proto
	modified:   format/vertex_info.proto

---------

Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
Co-authored-by: Weibin Zeng <qiaozi.zwb@alibaba-inc.com>
acezen added a commit that referenced this issue Jun 13, 2024
* feat(spark): Refactoring datasources (#514)

### Reason for this PR
By moving datasources under `org.apache.spark.sql` we are able to access private Spark API. Last time when I was trying to fully migrate datasources to V2 it was a blocker. Detailed motivation is in #493 

### What changes are included in this PR?
Mostly refactoring.

### Are these changes tested?
Unit tests are passed

I manually checked the generated JARs:
![image](https://github.com/apache/incubator-graphar/assets/29755009/1b094516-88b1-490a-a2ea-8dcd092a3b1d)

### Are there any user-facing changes?
Mostly not because `GarDataSource` was left under the same package.


Close #493

* feat(dev): Add release and verify scripts (#507)

Reason for this PR
Add scripts for developer or release manager to easily release version or verify a version.

What changes are included in this PR?
Add release and verify scripts
related document is updated to website, see Update the release and verify document, and add development document incubator-graphar-website#18
Are these changes tested?
yes

Are there any user-facing changes?
no
---------

Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>

* chore: Bump to version v0.12.0 (Round 1) (#517)


Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>

* chore: Add CHANGELOG.md (#513)


Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>

* Introduce buf

- v2
- buf.gen
- buf

 On branch format-definition-dev
 Your branch is up to date with 'origin/format-definition-dev'.

 Changes to be committed:
	new file:   buf.gen.yaml
	new file:   buf.yaml
	modified:   format/adjacent_list.proto
	modified:   format/edge_info.proto
	modified:   format/graph_info.proto
	modified:   format/property_group.proto
	modified:   format/types.proto
	modified:   format/vertex_info.proto

---------

Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
Co-authored-by: Weibin Zeng <qiaozi.zwb@alibaba-inc.com>
Elssky pushed a commit to Elssky/incubator-graphar that referenced this issue Oct 8, 2024
### Reason for this PR
By moving datasources under `org.apache.spark.sql` we are able to access private Spark API. Last time when I was trying to fully migrate datasources to V2 it was a blocker. Detailed motivation is in apache#493 

### What changes are included in this PR?
Mostly refactoring.

### Are these changes tested?
Unit tests are passed

I manually checked the generated JARs:
![image](https://github.com/apache/incubator-graphar/assets/29755009/1b094516-88b1-490a-a2ea-8dcd092a3b1d)

### Are there any user-facing changes?
Mostly not because `GarDataSource` was left under the same package.


Close apache#493
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:Spark enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants