Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give message with line and column of CSV file if data conversion fails. Allow parsing of DateTime data, and do the basic arithmetic on them #5791

Closed
wants to merge 33 commits into from

Conversation

derekdiamond
Copy link
Contributor

@derekdiamond derekdiamond commented May 11, 2021

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

@dnfadmin
Copy link

dnfadmin commented May 11, 2021

CLA assistant check
All CLA requirements met.

@michaelgsharp michaelgsharp self-assigned this May 11, 2021
@michaelgsharp
Copy link
Member

Hey @derekdiamond, thanks for taking the time to submit this PR!

It looks like somehow some of the compiled native files were placed in src/native/out. Probably a build mess up, but can you remove those files from the PR?

@michaelgsharp michaelgsharp removed their assignment May 11, 2021
@michaelgsharp michaelgsharp requested a review from pgovind May 11, 2021 23:19
@derekdiamond
Copy link
Contributor Author

Hey @derekdiamond, thanks for taking the time to submit this PR!

It looks like somehow some of the compiled native files were placed in src/native/out. Probably a build mess up, but can you remove those files from the PR?

Sure will do

@derekdiamond derekdiamond changed the title Give message with line and column of CSV file if data conversion fails Give message with line and column of CSV file if data conversion fails. May 25, 2021
@derekdiamond derekdiamond changed the title Give message with line and column of CSV file if data conversion fails. Give message with line and column of CSV file if data conversion fails. Allow parsing of DateTime data, and do the basic arithmetic on them May 25, 2021
@codecov
Copy link

codecov bot commented May 25, 2021

Codecov Report

Merging #5791 (d59f439) into main (43c49f6) will decrease coverage by 0.05%.
The diff coverage is 56.73%.

@@            Coverage Diff             @@
##             main    #5791      +/-   ##
==========================================
- Coverage   68.35%   68.30%   -0.06%     
==========================================
  Files        1131     1132       +1     
  Lines      241210   241750     +540     
  Branches    25039    25103      +64     
==========================================
+ Hits       164887   165135     +248     
- Misses      69819    70106     +287     
- Partials     6504     6509       +5     
Flag Coverage Δ
Debug 68.30% <56.73%> (-0.06%) ⬇️
production 62.89% <26.37%> (-0.09%) ⬇️
test 89.26% <94.17%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Microsoft.Data.Analysis/DateTimeComputation.cs 22.60% <22.60%> (ø)
src/Microsoft.Data.Analysis/DataFrame.IO.cs 76.69% <33.33%> (-1.42%) ⬇️
...a.Analysis/PrimitiveDataFrameColumnComputations.cs 45.69% <66.66%> (-0.01%) ⬇️
...st/Microsoft.Data.Analysis.Tests/DataFrameTests.cs 99.43% <92.66%> (-0.52%) ⬇️
...Microsoft.Data.Analysis.Tests/DataFrame.IOTests.cs 98.50% <98.21%> (+0.12%) ⬆️
src/Microsoft.Data.Analysis/DataFrame.cs 86.57% <100.00%> (+0.17%) ⬆️
src/Microsoft.Data.Analysis/Strings.Designer.cs 44.20% <100.00%> (+1.23%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
src/Microsoft.ML.Core/Data/ProgressReporter.cs 70.95% <0.00%> (-6.99%) ⬇️
src/Microsoft.ML.FastTree/FastTreeRanking.cs 50.79% <0.00%> (-4.28%) ⬇️
... and 23 more

@michaelgsharp
Copy link
Member

@pgovind since this is in DataAnalysis can you take a look? I'm still not very familiar with the new code.

@pgovind
Copy link

pgovind commented May 25, 2021

Yup, this just popped up in my notifications. Will review in a bit

Copy link

@pgovind pgovind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment. This is great work, this PR is almost ready to go in. I would just like to see 2 things:

  1. Some unit tests. Since we're now adding DateTime to be detected automatically by LoadCsv, we should test that path. I think duplicating one of the existing LoadCsv tests and adding a DateTime column to it will work fine.
  2. There are still some CMAKE build artifacts in the changed files. I think you can just get rid of them.

@derekdiamond
Copy link
Contributor Author

Cool will do

@pgovind
Copy link

pgovind commented May 26, 2021

Cool will do

Yup I see that the CMAKE artifacts are gone now. Not sure if you intended to fix the other comments with the last commit. Just calling it out in case you has commits locally that didn't get pushed

@derekdiamond
Copy link
Contributor Author

Thanks, all done added tests as well for IO and Computations. So hope all is good now!

@pgovind
Copy link

pgovind commented May 26, 2021

So hope all is good now!

Almost :)

  1. Looks like the cmake stuff creeped in again. (not related to this PR: it looks like all the CMAKE stuff is build artifacts. Maybe consider adding out/* to your .gitignore? But remember not to push the .gitignore file itself!)
  2. Copy paste the license text into DateTimeComputation.cs from any other file that has it?

@derekdiamond
Copy link
Contributor Author

I'm getting there!. Hope all the out* artifacts are gone!

@pgovind pgovind added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Jun 3, 2021
@pgovind pgovind mentioned this pull request Jun 3, 2021
@pgovind
Copy link

pgovind commented Jun 3, 2021

Since we're closing to having a new preview release of ML, I cleaned up the remaining 4 build artifacts in #5834. Going to close this PR in favor of that one.

Awesome work here! Thanks @derekdiamond

@pgovind pgovind closed this Jun 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants