Skip to content

The Python implementation of Huffman encoding fails on 1-character-long strings and empty strings #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 of 4 tasks
thecnoNSMB opened this issue Aug 3, 2020 · 2 comments · Fixed by #756
Closed
3 of 4 tasks
Assignees
Labels
Problem This is a problem in the archive or an implementation.

Comments

@thecnoNSMB
Copy link
Contributor

thecnoNSMB commented Aug 3, 2020

Bug Report

Description

There are two bugs here, both in huffman.py. First, if you call build_huffman_tree with an empty string, it throws an IndexError on line 32.

Secondly, if you call build_huffman_tree on a string 1 character long (Hypothesis suggested '0'), and then call build_codebook with the resulting tree, it throws ValueError: not enough values to unpack (expected 2, got 1) on line 41. The tree that build_huffman_tree builds in this case is actually just a string that's one character long, both in this case and for #659, which I'm pretty sure is unintended behavior.

Environment

This was tested in Python 3.8, but is likely reproducible on every Python version the script supports.

Additional context

These bugs were found using the Python property-based testing library Hypothesis. This issue may be a duplicate of, or related to, #659. Apologies if I trimmed too many sections from the template.

For Algorithm Archive Developers

  • The bug can be reproduced
  • The bug can be fixed (if not, please explain why not in a comment below)
  • There is a timeline to fix the bug
  • The bug has been fixed (Please link the PR)
@thecnoNSMB thecnoNSMB added the Problem This is a problem in the archive or an implementation. label Aug 3, 2020
@Amaras
Copy link
Member

Amaras commented Aug 6, 2020

On a related note, the Coconut implementation (still in PR form) also has the same problems (also discovered by Hypothesis), so I will commit the first fixes soon.

However, once I tried to fix them, I uncovered the exact same edge case as issue #659. While we don't decide how to handle the single character case, the problem will remain.

@Liikt Liikt self-assigned this Oct 1, 2020
@Liikt
Copy link

Liikt commented Oct 1, 2020

PR 756 should address both issues mentioned here. Thank you for the submission!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Problem This is a problem in the archive or an implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants