Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add make_set function for DisjointSets #38692

Merged
merged 11 commits into from
Sep 29, 2024

Conversation

thecaligarmo
Copy link
Contributor

This fixes #35599 by adding a make_set function to DisjointSet using OrbitPartitions. The documentation links to wikipedia.
From wikipedia, the method should be done in place and therefore there is no return

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

#35599

Copy link
Contributor

@dcoudert dcoudert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting new functionality. However, why limiting to one extra element ?

cdef int *new_parent, *new_rank, *new_mcr, *new_size

cdef int *int_array = <int *> sig_malloc( 4*(n+1) * sizeof(int) )
if int_array is NULL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to call free on NULL ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I'll fix that.

new_rank = int_array + (n + 1)
new_mcr = int_array + (2*n + 2)
new_size = int_array + (3 * n + 3)
for i from 0 <= i < n:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i in range(n)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that work for c? If you look at all the other for loops in the .pxd file they all have the same format:
for i from 0 <= i < n:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, I've gone ahead and changed it.

For future: is there a reason why all the other ones use for i from 0 <= i < n? Because every single other for loop in the file uses this syntax rather than the for i in range syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods have been written long time ago and Cython has evolved (improved) since. We have not update all the code yet, it's too big.

@@ -834,6 +854,43 @@ cdef class DisjointSet_of_hashables(DisjointSet_class):
cdef int j = <int> self._el_to_int[f]
OP_join(self._nodes, i, j)

def make_set(self, new_elt = None):
r"""
Return a new disjoint set with an additional item.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not returning a new set, you are adding one new element to the set

new_size = int_array + (3 * n + 3)
for i from 0 <= i < n:
new_parent[i] = OP.parent[i]
new_rank[i] = OP.rank[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using low level memcpy method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not perfect with c, but if I understand correctly, memcpy will also copy the amount of memory allocated for the object. Since we are working with arrays, that would mean that the new copy would not have enough memory allocated to it since we need to add an additional entry into the array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first option is to make 4 different mallocs for parent, rank, etc. when first creating the data structure.
Then, in make_set, you can use realloc to extend the size of the arrays. Realloc should copy the data for you and so you only have to allocate the last value. In case of error, one of these arrays will get a NULL pointer (unless I'm mistaken). Last, when deallocating, you have to do 4 free.

The other option is to use memcpy instead of the for loop to put the good values in new_.... It should be a call like memcpy(new_parent, OP.parent, n). It remains to assign the last cell of new_parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm not as comfortable with C, so I'll see if I can try and use these methods.

@thecaligarmo
Copy link
Contributor Author

This is an interesting new functionality. However, why limiting to one extra element ?

From what I understand from reading wikipedia, DisjointSet is a set of singleton sets where you can put sets together (union) and find where certain elements are located (find). make_set then allows you to add a new singleton set to your DisjointSet. From what I understand, you wouldn't want to add a set of multiple elements (as this defeats the point of DisjointSet) and if you want to add multiple singleton sets, you can call make_set multiple times. In essence, I was trying to use wiki's implementation idea for make_set.

@dcoudert
Copy link
Contributor

This is an interesting new functionality. However, why limiting to one extra element ?

From what I understand from reading wikipedia, DisjointSet is a set of singleton sets where you can put sets together (union) and find where certain elements are located (find). make_set then allows you to add a new singleton set to your DisjointSet. From what I understand, you wouldn't want to add a set of multiple elements (as this defeats the point of DisjointSet) and if you want to add multiple singleton sets, you can call make_set multiple times. In essence, I was trying to use wiki's implementation idea for make_set.

I agree that make_set is designed to add a new singleton set. But we could also add several new singleton sets at once to avoid multiple calls to make_set.

@thecaligarmo
Copy link
Contributor Author

This is an interesting new functionality. However, why limiting to one extra element ?

From what I understand from reading wikipedia, DisjointSet is a set of singleton sets where you can put sets together (union) and find where certain elements are located (find). make_set then allows you to add a new singleton set to your DisjointSet. From what I understand, you wouldn't want to add a set of multiple elements (as this defeats the point of DisjointSet) and if you want to add multiple singleton sets, you can call make_set multiple times. In essence, I was trying to use wiki's implementation idea for make_set.

I agree that make_set is designed to add a new singleton set. But we could also add several new singleton sets at once to avoid multiple calls to make_set.

Yes, but it becomes ambiguous once you do that. For example, if I were to do: d.make_set([1, 2]) is this adding the set [1, 2] or is it adding 2 sets 1 and 2? The function make_set makes it sound like I would be adding only 1 set, not multiple sets, so my initial assumption would be that make_set([1, 2]) would add a exactly 1 new set with entry [1, 2].

We could still make a function that does allow making multiple sets (say make_sets) which could then go through and do all the set creations at once (change OP_make_set to allow an optional second parameter). Or if you want to keep it as make_set, we can add an additional parameter called multiple that's a boolean that people can specify to show that they are adding multiple things instead of just one? so make_set([1, 2]) would add 1 set [1, 2] and make_set([1, 2], multiple=True) would create two sets 1 and 2.

@dcoudert
Copy link
Contributor

Let's keep this for a future PR.

@thecaligarmo
Copy link
Contributor Author

Sounds good.

I also tried to use the memcpy approach you had mentioned above. I ran the tests on my end and they seemed to work, but if there's an error, let me know. I wasn't sure if the 3rd parameter needed changing for the other 3 variables, but due to the previous memory allocation, I'm assuming not.

Fix memcpy allocation
cdef int *new_parent, *new_rank, *new_mcr, *new_size

cdef int *int_array = <int *> sig_malloc( 4*(n+1) * sizeof(int) )
if int_array is not NULL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if int_array is NULL ? you should certainly raise an error.

new_size[n] = 1

OP.parent = new_parent
OP.rank = new_rank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid the extra spaces before =. This is not compatible with Python coding style.


INPUT:

- ``new_elt`` -- (optional) element to add. If `None`, then an integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the second line, you must add 2 extra spaces, like:

        - ``new_elt`` -- (optional) element to add. If `None`, then an integer
          is added.

this currently breaks building the pdf documentation (reported by CI).

@@ -834,6 +854,43 @@ cdef class DisjointSet_of_hashables(DisjointSet_class):
cdef int j = <int> self._el_to_int[f]
OP_join(self._nodes, i, j)

def make_set(self, new_elt = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put spaces around = here. It should be def make_set(self, new_elt=None):

cdef int *int_array = <int *> sig_malloc(4*(n+1) * sizeof(int))
if int_array is NULL:
raise MemoryError("MemoryError allocating int_array in make_set method")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you can remove the else and reduce the indentation of the code. Indeed, the code will be executed only if int_array is not NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'll switch it over.

Copy link

github-actions bot commented Sep 23, 2024

Documentation preview for this PR (built with commit 598ccfd; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

new_mcr[n] = n
new_size[n] = 1

OP.parent = new_parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before this statement, you must release the memory used by the previous arrays, so sig_free(OP.parent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'm not so good with memory allocation stuff, but I think we only need sig_free(OP.parent) correct? I tried adding the other ones and it started throwing errors, so I'm thinking that OP.parent holds the entire memory for all 4 variables so freeing up the memory from that element does it for all 4. If I did that wrong, just let me know.

Copy link
Contributor

@dcoudert dcoudert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

vbraun pushed a commit to vbraun/sage that referenced this pull request Sep 27, 2024
    
<!-- ^ Please provide a concise and informative title. -->
<!-- ^ Don't put issue numbers in the title, do this in the PR
description below. -->
<!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method
to calculate 1 + 2". -->
<!-- v Describe your changes below in detail. -->
<!-- v Why is this change required? What problem does it solve? -->
<!-- v If this PR resolves an open issue, please link to it here. For
example, "Fixes sagemath#12345". -->

This fixes sagemath#35599  by adding a `make_set` function to `DisjointSet`
using `OrbitPartitions`. The documentation links to wikipedia.
From wikipedia, the method should be done in place and therefore there
is no return

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [x] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->

sagemath#35599
    
URL: sagemath#38692
Reported by: Aram Dermenjian
Reviewer(s): Aram Dermenjian, David Coudert
@vbraun vbraun merged commit 40cb8c2 into sagemath:develop Sep 29, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DisjointSet() misses MakeSet method
3 participants