Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support PostgreSQL data source #62

Merged
merged 8 commits into from
Jan 27, 2022
Merged

support PostgreSQL data source #62

merged 8 commits into from
Jan 27, 2022

Conversation

ianhhhhhhhhe
Copy link
Contributor

@ianhhhhhhhhe ianhhhhhhhhe commented Jan 22, 2022

spark_v3.0

@CLAassistant
Copy link

CLAassistant commented Jan 22, 2022

CLA assistant check
All committers have signed the CLA.

@wey-gu wey-gu requested a review from Nicole00 January 24, 2022 03:24
@wey-gu
Copy link
Contributor

wey-gu commented Jan 24, 2022

Thanks so much @ianhhhhhhhhe for the amazing work ;-)

@wey-gu
Copy link
Contributor

wey-gu commented Jan 24, 2022

One test failed.

Results :

Failed tests:   configsSuite(scala.com.vesoft.nebula.exchange.config.ConfigsSuite): assertion failed

Tests run: 25, Failures: 1, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for nebula-exchange 3.0-SNAPSHOT:
[INFO] 
[INFO] nebula-exchange .................................... SUCCESS [  0.076 s]
[INFO] exchange-common .................................... FAILURE [05:31 min]
[INFO] nebula-exchange_spark_2.2 .......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:31 min
[INFO] Finished at: 2022-01-24T03:31:48Z
[INFO] ------------------------------------------------------------------------
Error:  Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project exchange-common: There are test failures.
Error:  
Error:  Please refer to /home/runner/work/nebula-exchange/nebula-exchange/exchange-common/target/surefire-reports for the individual test results.
Error:  -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please read the following articles:
Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:    mvn <args> -rf :exchange-common
Error: Process completed with exit code 1.

@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2022

Codecov Report

Merging #62 (0dd8046) into master (1fc042b) will increase coverage by 0.62%.
The diff coverage is 84.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #62      +/-   ##
============================================
+ Coverage     54.51%   55.14%   +0.62%     
  Complexity       76       76              
============================================
  Files            17       17              
  Lines          1317     1342      +25     
  Branches        250      254       +4     
============================================
+ Hits            718      740      +22     
+ Misses          474      473       -1     
- Partials        125      129       +4     
Impacted Files Coverage Δ
...la/com/vesoft/exchange/common/config/Configs.scala 66.66% <81.81%> (+0.51%) ⬆️
.../vesoft/exchange/common/config/SourceConfigs.scala 62.79% <85.71%> (+2.79%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fc042b...0dd8046. Read the comment docs.

assert(postgresql.port == 5432)
assert(postgresql.user.equals("root"))
assert(postgresql.password.equals("nebula"))
assert(postgresql.database.equals("database"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate test

PulsarReader
}
import com.vesoft.exchange.common.config.{ClickHouseConfigEntry, Configs, DataSourceConfigEntry, FileBaseSourceConfigEntry, HBaseSourceConfigEntry, HiveSourceConfigEntry, JanusGraphSourceConfigEntry, KafkaSourceConfigEntry, MaxComputeConfigEntry, MySQLSourceConfigEntry, Neo4JSourceConfigEntry, PostgresSQLSourceConfigEntry, PulsarSourceConfigEntry, SinkCategory, SourceCategory}
import com.vesoft.nebula.exchange.reader.{CSVReader, ClickhouseReader, HBaseReader, HiveReader, JSONReader, JanusGraphReader, KafkaReader, MaxcomputeReader, MySQLReader, Neo4JReader, ORCReader, ParquetReader, PostgreSQLReader, PulsarReader}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please format the import

Neo4JSourceConfigEntry,
ServerDataSourceConfigEntry
}
import com.vesoft.exchange.common.config.{ClickHouseConfigEntry, HBaseSourceConfigEntry, HiveSourceConfigEntry, JanusGraphSourceConfigEntry, MaxComputeConfigEntry, MySQLSourceConfigEntry, Neo4JSourceConfigEntry, PostgresSQLSourceConfigEntry, ServerDataSourceConfigEntry}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please format the import

.option("user", postgreConfig.user)
.option("password", postgreConfig.password)
.load()
df.createOrReplaceTempView(postgreConfig.table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you test the postgresql datasource? If your table is configed as table but the table name in sentence is db.table, can this sentence be executed successfully?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with

df.createOrReplaceTempView("table")
 session.sql("select * from db.table")

and got exceptions like :

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `db`.`table`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `db`.`table`

So, please change the table name in config sentence to keep the same with config table in config file.

* @param password
* @param sentence
*/
case class PostgresSQLSourceConfigEntry(override val category: SourceCategory.Value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we refactor PostgresSQLSourceConfigEntry to PostgreSQLSourceConfigEntry?

@Nicole00
Copy link
Contributor

can you also add postgresql datasource for spark2.2 and spark2.4 ?

@Nicole00
Copy link
Contributor

add postgresql dependency to avoid exception:

Exception in thread "main" java.sql.SQLException: No suitable driver
	at java.sql.DriverManager.getDriver(DriverManager.java:315)

available postgresql versions are: 9.4.1207 - 9.4.1212, for example:

<dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>9.4.1212</version>
</dependency>

@ianhhhhhhhhe
Copy link
Contributor Author

can you also add postgresql datasource for spark2.2 and spark2.4 ?

Sure ^ - ^

.option("password", postgreConfig.password)
.load()
df.createOrReplaceTempView(postgreConfig.table)
session.sql(sentence)
Copy link
Contributor

@Nicole00 Nicole00 Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in configs, if user does not config sentence, default sentence value is "", which will throw park.sql.catalyst.parser.ParseException when execute session.sql(sentence)

So, please add check for session.sql:

if(!"".equals(sentence)) 
     session.sql(sentence)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I check it when user set the configs ?

require(
    host.trim.length != 0 && 
    port > 0 && 
    database.trim.length > 0 && 
    table.trim.length > 0 && 
    user.trim.length > 0 &&
    sentence.trim.length > 0
)

It will save many if

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's contradictory with the default sentence value:

getOrElse(config, "sentence", "")
  1. for default sentence value, I prefer to use null but not ""
  2. for users, they are allowed to not config the sentence, so we still need to process the two situations for sentence.

Copy link
Contributor Author

@ianhhhhhhhhe ianhhhhhhhhe Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sentence is not configured, executing select * from ${table} might be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sentence is not configured, executing select * from ${table} might be better.

No need to execute select * from table, default read from db for sparksql is reading all data.

.option("password", postgreConfig.password)
.load()
df.createOrReplaceTempView(postgreConfig.table)
session.sql(sentence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

.option("password", postgreConfig.password)
.load()
df.createOrReplaceTempView(postgreConfig.table)
session.sql(sentence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@Nicole00 Nicole00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gread work, thanks~

@Nicole00 Nicole00 merged commit 2559667 into vesoft-inc:master Jan 27, 2022
@Nicole00 Nicole00 added the doc affected PR: improvements or additions to documentation label Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc affected PR: improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants