Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to extend the format #1

Open
maximaximal opened this issue Aug 26, 2024 · 2 comments
Open

Proposal to extend the format #1

maximaximal opened this issue Aug 26, 2024 · 2 comments

Comments

@maximaximal
Copy link

Very interesting project! I'm happy to see more work being done on exact-cover solving.

Some time ago I also created an exact-cover solver supporting Algorithm X, C, C$ (a bit), and M: miniexact. The two solvers have very similar file formats already, although we have slightly different ways to specify primary and secondary items.

I think it would be good to build a more common standard that may be used by many solvers, so that implementations get more comparable. I think your idea of just listing primary items in the first line is very good, as it still does not require doing multiple passes over the whole input - I think I could directly integrate this into Miniexact. I don't know if this format also scales to multiplicities though, as it complicates the input a bit.

Another possibility is the DIMACS-esque input format from my README. This optimizes the parsing step, similar to how SAT solvers work. If one uses Python anyway to generate the problem and then interpret the answer, such a specialized format might make the whole process more efficient, compared to forcing the use of many hash-sets in-between. I would be happy to hear about your opinion!

In general, I think it would be nice to create a collection of problems in a common format so that implementations and new ideas may be compared in a good way! :)

All the best, Max

@johnrudge
Copy link
Owner

Thanks for your interest, Max! The very simple file reading I put in the code

def read_xcover_from_file(filename):
was to enable reading of Knuth-formatted files. As you say, the Knuth format is very similar to the text input format you have, and it would be very straightforward to add a similar parser to parse your text files. The only differences are that you denote the primary items with angle brackets and the secondary items with square brackets whereas Knuth puts them on one line separated by |. You end options with a semicolon, whereas Knuth uses a newline character only.

You're right that these kinds of text files are not the most efficient input format for running the solver algorithm due to the need for the various hash-sets. However, in most cases parsing input is not the time-critical part of the solve (particularly for large problems), so I think most users don't need to worry about optimising the parsing step. For the xcover package the ideal input format would be close to what

def algorithm_c(options, options_ptr, colors, n_items, n_secondary_items):
uses in solving, namely the three arrays options, options_ptr, and colors, which represent the options and their colourings in a sparse array format of unsigned integers (like how a CSC sparse matrix is stored) and the two unsigned integers representing the numbers of primary and secondary items. Note that xcover differs from miniexact in using dancing cells rather than dancing links to solve. I think the integer-based DIMACS format you use can straightforwardly be parsed to give the three arrays and two integers. I'm rather busy with other things right now, but when I get chance I'll see if I can have a play with writing a simple parser for your DIMACS format to produce the sparse arrays directly.

For a user interested in getting absolutely the best speed out of xcover I'd encourage them to call xcover.dancing_cells.algorithm_c directly and assemble the relevant arrays in their code.

@maximaximal
Copy link
Author

Thank you for your reply!

Yes, the parsing is not the most performance-critical part of the code. I forgot to mention that I also meant to decrease the complexity of the parsers and the formats themselves - less different decisions during implementing the parsing mean different solvers are more aligned and process the inputs more predictably. I think this would help proliferate using Exact Cover to encode problems, as the barrier to entry decreases.

Thank you for pointing out dancing cells. I now read a bit through the new algorithm and think about integrating it into miniexact, let's see how hackable it really is. Please don't stress yourself about your parser, I would have made a PR myself if I already had had the time to implement it in Python. I am happy you also think the integer-based DIMACS format could be easily parsed and that it would be a good addition!

If more solvers support our common format, we could aim to build an Exact Cover Gallery at some point, showcasing different solvers and algorithms and maybe showing their different strengths, inspired by what the SAT competition is contributing. I am sure there would be interest in such an overview.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants