Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Currency parser #444

Closed
jimhester opened this issue Jun 26, 2016 · 9 comments
Closed

Currency parser #444

jimhester opened this issue Jun 26, 2016 · 9 comments
Labels
feature a feature request or enhancement

Comments

@jimhester
Copy link
Collaborator

As mentioned in #308 (comment) we do not support negative currencies for parse number

parse_number("-$3")
#> Warning: 1 parsing failure.
#> row col expected actual
#>   1  -- a number     -$
#> [1] NA
#> attr(,"problems")
#> # A tibble: 1 x 4
#>     row   col expected actual
#>   <int> <int>    <chr>  <chr>
#> 1     1    NA a number     -$

# putting the - directly in front of the number works, but this does not seem to be the standard way to format currencies in the US
parse_number("$-3")
#> [1] -3

# Should we also support using parenthesis for this case?
parse_number("($3)")
#> Warning: 1 parsing failure.
#> row col expected actual
#>   1  -- a number     3)
#> [1] NA
#> attr(,"problems")
#> # A tibble: 1 x 4
#>     row   col expected actual
#>   <int> <int>    <chr>  <chr>
#> 1     1    NA a number     3)

Perhaps we should have a parse_currency() and col_currency() which would be locale aware to handle this and other cases.

@hadley
Copy link
Member

hadley commented Jun 26, 2016

I looked into this a little bit and full currency parsing is a bit tricky because of the combination of options. Worth considering though

@hadley hadley changed the title Negative currency support Currency parser Jul 15, 2016
@hadley hadley added collector feature a feature request or enhancement labels Dec 22, 2016
@jennybc
Copy link
Member

jennybc commented Oct 29, 2017

I just needed this, trying to import csv's on my kids' transit usage. Most of the transactions are negative. They are formatted like so: "-$1.80".

@hammer
Copy link

hammer commented Apr 20, 2018

I, too, would like to parse negative currency values. Perhaps there's an intermediate solution that handles the negative sign outside the currency sign that doesn't require full international currency standards support?

@mine-cetinkaya-rundel
Copy link
Member

mine-cetinkaya-rundel commented Oct 24, 2019

I also just needed this, getting data out of a website where values are stored as -$12,500.

I also wondered if it would be possible to leave the value as is if conversion can't be done, as opposed to converting into NA and lost the information.

@jimhester
Copy link
Collaborator Author

The best bet for now is to parse these columns as characters and do the parsing in R yourself. Something like parse_currency <- function(x) as.numeric(gsub("[$,]", "", x)) should do it for the -$12,500 case.

@mine-cetinkaya-rundel
Copy link
Member

Right, there are workarounds once you realize what the issue is. A more realistic situation where this comes up is something like the following.

library(tidyverse)

df <- tibble(
  x = c("-$12,500", "$2,000", "-$5,000", "$1,000", "-$3,000")
)

df %>%
  mutate(x = parse_number(x))
#> Warning: 3 parsing failures.
#> row col expected actual
#>   1  -- a number      -
#>   3  -- a number      -
#>   5  -- a number      -
#> # A tibble: 5 x 1
#>       x
#>   <dbl>
#> 1    NA
#> 2  2000
#> 3    NA
#> 4  1000
#> 5    NA

but imagine the dataset is much larger. If there was some visual indication of the original values in the warning, it would be easier to spot the issue.

Of course, any change to the warning here would require similar changes to other parse_* functions (at a minimum, maybe other functions with similar purpose that I can't think of too), so I'm not sure if that's feasible.

@jimhester
Copy link
Collaborator Author

Oh, that was actually due to a (likely long standing) bug, the - in the above was only part of the actual field, but the intent was always to include the whole field in the problems data.frame. It is now fixed, now looks like this.

library(tidyverse)

df <- tibble(
  x = c("-$12,500", "$2,000", "-$5,000", "$1,000", "-$3,000")
)

df %>%
  mutate(x = parse_number(x))
#> Warning: 3 parsing failures.
#> row col expected   actual
#>   1  -- a number -$12,500
#>   3  -- a number -$5,000 
#>   5  -- a number -$3,000
#> # A tibble: 5 x 1
#>       x
#>   <dbl>
#> 1    NA
#> 2  2000
#> 3    NA
#> 4  1000
#> 5    NA

Created on 2019-10-24 by the reprex package (v0.3.0)

@mine-cetinkaya-rundel
Copy link
Member

Thank you @jimhester!

@jimhester jimhester reopened this Oct 24, 2019
@jimhester
Copy link
Collaborator Author

This seems unlikely to be done by us in the near future, so will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants