Skip to content

Commit

Permalink
tokenize.pl: Disallow more chars that we can't encode for sed
Browse files Browse the repository at this point in the history
  • Loading branch information
solardiz committed Oct 31, 2024
1 parent b061b4a commit ff9cb26
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions run/tokenize.pl
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
for (my $pos = 0; $pos <= $maxpos; $pos++) {
my $sub = substr($_, $pos, $len);
# Disallow chars we currently can't encode for sed
next if ($sub =~ '[/.\\\]');
next if ($sub =~ '[\'/.*\\\]');
$subcnt{$sub} += $len;
}
}
Expand All @@ -49,8 +49,6 @@

my @tokens;
foreach my $sub (@subtop) {
# print "$sub", "\t", $subcnt{$sub}, "\n";
next if $subcnt{$sub} < 2;
$tokens[$#tokens + 1] = $sub;
last if $#tokens >= $maxtok - 1;
}
Expand Down

0 comments on commit ff9cb26

Please sign in to comment.