Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_missing() working properly? #27

Closed
rwhitworth opened this issue Jun 13, 2015 · 13 comments
Closed

is_missing() working properly? #27

rwhitworth opened this issue Jun 13, 2015 · 13 comments

Comments

@rwhitworth
Copy link

I notice there are no tests for the is_missing() function, so I created a few, but it looks like is_missing returns undef when it shouldn't. Am I misunderstanding the is_missing or getline_hr functions?

The test is also my fork of this repo in case Markdown eats the code..

!/usr/bin/perl

use strict;
$^W = 1;

use Test::More tests => 9;

BEGIN {
$ENV{PERL_TEXT_CSV} = 0;
use_ok "Text::CSV", ();
plan skip_all => "Cannot load Text::CSV" if $@;
}

open FH, ">_99test.csv";
print FH <<EOC;

2
EOC
close FH;

ok (my $csv = Text::CSV->new (), "new");
is ($csv->is_missing(0), undef, "is_missing()");

open FH, "<_99test.csv";
ok ($csv->column_names ('code'));
ok (my $hr = $csv->getline_hr (_FH));
is ($csv->is_missing(0), undef, "is_missing()");
ok ($hr = $csv->getline_hr (_FH));
is (int $hr->{code}, 2, "code==2");
isnt ($csv->is_missing(0), undef, "isn't is_missing()");
close FH;

@Tux
Copy link

Tux commented Sep 6, 2016

At least in Text::CSV_XS this is not a bug: when using getline_hr and column_names are set, none of the columns can be missing in a _hr call, so is_missing is forced to be false for every column.
This is documented:

   is_missing
        my $missing = $csv->is_missing ($column_idx);

       Where  $column_idx is the  (zero-based)  index of the column in the
       last result of "getline_hr".

        $csv->keep_meta_info (1);
        while (my $hr = $csv->getline_hr ($fh)) {
            $csv->is_missing (0) and next; # This was an empty line
            }

       When using  "getline_hr",  it is impossible to tell if the  parsed
       fields are "undef" because they where not filled in the "CSV" stream
       or because they were not read at all, as all the fields defined by
       "column_names" are set in the hash-ref.    If you still need to know if
       all fields in each row are provided, you should enable "keep_meta_info"
       so you can check the flags.

I admit that the docs can be more clear about that. Suggestions welcome.

@charsbar
Copy link
Collaborator

charsbar commented Sep 6, 2016

@Tux, is that true when we set "keep_meta_info" to true as I did in https://rt.cpan.org/Public/Bug/Display.html?id=117495 ?

I suspect the following line that sets 0x0010 (CSV_FLAGS_MIS) does not work correctly when $#{$fr} is 0. (ie. because we got nothing, shouldn't we mark everything as CSV_FLAGS_MIS, instead of from 1 to $#_COLUMN_NAMES ? )

$self->{_FFLAGS}[$_] = 0x0010 for ($#{$fr} + 1) .. $#{$self->{_COLUMN_NAMES}};

(https://github.com/Tux/Text-CSV_XS/blob/master/CSV_XS.pm#L852 ; Same can be said for https://github.com/makamaka/Text-CSV/blob/master/lib/Text/CSV_PP.pm#L814 )

@Tux
Copy link

Tux commented Sep 6, 2016

I think you are right. Checking …

Uh, no: if you have an empty line, there is still ONE field: an empty field. That field is not missing.

So, if you set just one single column name, that will never be missing, unless you have a parse error on the first field. This however is most likely the most correct code:

@@ -849,7 +849,7 @@ sub getline_hr {
     $self->{_COLUMN_NAMES} or croak ($self->SetDiag (3002));
     my $fr = $self->getline (@args) or return;
     if (ref $self->{_FFLAGS}) {
-       $self->{_FFLAGS}[$_] = 0x0010 for ($#{$fr} + 1) .. $#{$self->{_COLUMN_NAMES}};
+       $self->{_FFLAGS}[$_] = 0x0010 for (@$fr ? $#{$fr} + 1 : 0) .. $#{$self->{_COLUMN_NAMES}};
        }
     @hr{@{$self->{_COLUMN_NAMES}}} = @$fr;
     \%hr;

@charsbar
Copy link
Collaborator

charsbar commented Sep 6, 2016

@Tux, yeah, your patch in the above message seems most reasonable to me, especially because the example code in the pod has been saying you can skip an empty line if is_missing(0) returns true.

$csv->keep_meta_info (1);
while (my $hr = $csv->getline_hr ($fh)) {
    $csv->is_missing (0) and next; # This was an empty line
    }

If we consider there is always at least one field, that is_missing(0) returns always false and the line will never be skipped, contrary to what's said there.

@Tux
Copy link

Tux commented Sep 6, 2016

Thanks for pointing there. That example should be changed. I'll reconsider the docs and the current behavior. Maybe a complete empty line should set missing, even if it is legal.

@rwhitworth
Copy link
Author

Any thoughts on if this would be a bug when Text::CSV_PP is in use?

@Tux
Copy link

Tux commented Sep 6, 2016

@rwhitworth your assumption was buggy in the first place (even though the docs are vague)
An empty line is legal and thus returns a single empty field
As the current behavior is the same in both _PP and _XS, I cannot guarantee that if I change the behavor in _XS it will also be changed in _PP (which I do not maintain)
I thank you for pointing at the vagueness though! Anything that can be improved should be improved (eventually)

I think that detection an empty line using getline_hr would take some serious extra steps from a user point of view, something like

# for CSV, CSV_PP and CSV_XS
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, keep_meta_info => 1, blank_is_undef => 1 });
$csv->column_names ("code", "desc");
while (my $row = $csv->getline_hr ($fh)) {
    $csv->is_missing (1) && !defined $row->{code} and say "EMPTY!";
    }

if you have just one single column, checking for definedness with blank_is_undef should be sufficient

@charsbar
Copy link
Collaborator

charsbar commented Sep 6, 2016

@Tux, the above code is hard to write, and actually doesn't work (says "EMPTY!" for the first two lines) if the csv looks like the following, and empty_is_undef is set to true:

""
(empty line)
1,2
3

I agree an empty line is legal in general, but I'd prefer to keep the doc and change the behavior.

@rwhitworth, your test code you've sent us as a PR has a few issues: 1) it doesn't set keep_meta_info to true, so is_missing() doesn't work as you might expect (this has been documented, though _PP's doc is vague; I'd update it when this settles); 2) your assumption that is_missing() returns either undef or non-undef is not correct. Actually it returns undef only if $csv object hasn't called getline and its friends, and it returns 0 or 1 afterwards (as of this writing).

@rwhitworth
Copy link
Author

I don't fully follow, but I do appreciate the time you both (@Tux and @charsbar) have spent. Feel free to close this issue and/or my PR when it is appropriate.

@charsbar
Copy link
Collaborator

charsbar commented Sep 6, 2016

@rwhitworth , OK, thanks for your kindness. But please don't hesitate to say which or how you prefer to write when you parse your csv. Like or unlike is an important input for both of us! :)

@Tux
Copy link

Tux commented Oct 3, 2016

I still have not made up my mind. So hard to choose between multiple versions of correct :)

@Tux
Copy link

Tux commented Nov 22, 2016

@rwhitworth I have now committed this: Tux/Text-CSV_XS@6b71f39

   is_missing
        my $missing = $csv->is_missing ($column_idx);

       Where  $column_idx is the  (zero-based)  index of the column in the
       last result of "getline_hr".

        $csv->keep_meta_info (1);
        while (my $hr = $csv->getline_hr ($fh)) {
            $csv->is_missing (0) and next; # This was an empty line
            }

       When using  "getline_hr",  it is impossible to tell if the  parsed
       fields are "undef" because they where not filled in the "CSV" stream
       or because they were not read at all, as all the fields defined by
       "column_names" are set in the hash-ref.    If you still need to know if
       all fields in each row are provided, you should enable "keep_meta_info"
       so you can check the flags.

       If  "keep_meta_info"  is "false",  "is_missing"  will always return
       "undef", regardless of $column_idx being valid or not. If this
       attribute is "true" it will return either 0 (the field is present) or 1
       (the field is missing).

       A special case is the empty line.  If the line is completely empty -
       after dealing with the flags - this is still a valid CSV line:  it is a
       record of just one single empty field. However, if "keep_meta_info" is
       set, invoking "is_missing" with index 0 will now return true.

And added this to the test suite:

open $fh, ">", $tfn or die "$tfn: $!\n";
print $fh <<"EOC";
a,b

2
EOC
close $fh;

ok ($csv = Text::CSV_XS->new (), "new");

open $fh, "<", $tfn or die "$tfn: $!\n";
ok ($csv->column_names ("code", "foo"), "set column names");
ok ($hr = $csv->getline_hr ($fh), "get header line");
is ($csv->is_missing (0), undef, "not is_missing () - no meta");
is ($csv->is_missing (1), undef, "not is_missing () - no meta");
ok ($hr = $csv->getline_hr ($fh), "get empty line");
is ($csv->is_missing (0), undef, "not is_missing () - no meta");
is ($csv->is_missing (1), undef, "not is_missing () - no meta");
ok ($hr = $csv->getline_hr ($fh), "get partial data line");
is (int $hr->{code}, 2, "code == 2");
is ($csv->is_missing (0), undef, "not is_missing () - no meta");
is ($csv->is_missing (1), undef, "not is_missing () - no meta");
close $fh;

open $fh, "<", $tfn or die "$tfn: $!\n";
$csv->keep_meta_info (1);
ok ($csv->column_names ("code", "foo"), "set column names");
ok ($hr = $csv->getline_hr ($fh), "get header line");
is ($csv->is_missing (0), 0, "not is_missing () - with meta");
is ($csv->is_missing (1), 0, "not is_missing () - with meta");
ok ($hr = $csv->getline_hr ($fh), "get empty line");
is ($csv->is_missing (0), 1, "not is_missing () - with meta");
is ($csv->is_missing (1), 1, "not is_missing () - with meta");
ok ($hr = $csv->getline_hr ($fh), "get partial data line");
is (int $hr->{code}, 2, "code == 2");
is ($csv->is_missing (0), 0, "not is_missing () - with meta");
is ($csv->is_missing (1), 1, "not is_missing () - with meta");
close $fh;

charsbar added a commit that referenced this issue Jan 11, 2017
- and now is_missing(0) on empty line return 1 for keep_meta_info = true (#27)
@charsbar
Copy link
Collaborator

And Text::CSV_PP passes the test above now, with the above commit. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants