Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 5, 2018

What changes were proposed in this pull request?

In the PR, I propose to add new option locale into CSVOptions/JSONOptions to make parsing date/timestamps in local languages possible. Currently the locale is hard coded to Locale.US.

How was this patch tested?

Added two tests for parsing a date from CSV/JSON - ноя 2018.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 5, 2018

I will update docs soon.

@SparkQA
Copy link

SparkQA commented Nov 5, 2018

Test build #98489 has finished for PR 22951 at commit 41154bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 6, 2018

@HyukjinKwon @dongjoon-hyun Please, review the changes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #98507 has finished for PR 22951 at commit 93da760.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks good. I or someone else should take a closer look before getting this in.

@dongjoon-hyun
Copy link
Member

Could you rebase this once again, @MaxGekk ?

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98541 has finished for PR 22951 at commit 6ab8501.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98543 has finished for PR 22951 at commit 6ab8501.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98567 has finished for PR 22951 at commit 759bca6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Could you take a look once more, @HyukjinKwon ?

@HyukjinKwon
Copy link
Member

OMG, what does ноя 2018 mean BTW? haha

the default value, empty string.
:param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
it uses the default value, ``en-US``. For instance, ``locale`` is used while
parsing dates and timestamps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we should apply to decimal parsing too actually. But yea we can leave it separate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems parsing decimals using locale will be slightly tricky in JSON case because we leave this to Jackson by calling its method getCurrentToken and getDecimalValue, and I haven't found how to pass locale to it. Probably we will need a custom deserialiser?

In the CSV case, it should be easier since we convert strings ourselves. I will try to do that for CSV first of all when this PR be merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the PR for parsing decimals from CSV: #22979

HyukjinKwon
HyukjinKwon approved these changes Nov 8, 2018
maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None,
columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None,
samplingRatio=None, enforceSchema=None, emptyValue=None):
samplingRatio=None, enforceSchema=None, emptyValue=None, locale=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add emptyValue in streaming.py in the same separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it exists in streaming.py:

enforceSchema=None, emptyValue=None):

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 8, 2018

OMG, what does ноя 2018 mean BTW? haha

It is 3 letters prefix of Ноябрь which is November in Russian.

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
#	sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala
@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #98583 has finished for PR 22951 at commit 8834b4b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 8, 2018

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #98587 has finished for PR 22951 at commit 8834b4b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 8, 2018

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #98591 has finished for PR 22951 at commit 8834b4b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #98598 has finished for PR 22951 at commit 8834b4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@HyukjinKwon
Copy link
Member

Actually let me leave a cc for @srowen. I remember we talked about it before.

@asfgit asfgit closed this in 79551f5 Nov 9, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…SV/JSON

## What changes were proposed in this pull request?

In the PR, I propose to add new option `locale` into CSVOptions/JSONOptions to make parsing date/timestamps in local languages possible. Currently the locale is hard coded to `Locale.US`.

## How was this patch tested?

Added two tests for parsing a date from CSV/JSON - `ноя 2018`.

Closes apache#22951 from MaxGekk/locale.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
@MaxGekk MaxGekk deleted the locale branch August 17, 2019 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants