Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in construct_W.py #58

Open
mary-design-testing opened this issue Apr 12, 2020 · 0 comments
Open

Bug in construct_W.py #58

mary-design-testing opened this issue Apr 12, 2020 · 0 comments

Comments

@mary-design-testing
Copy link

mary-design-testing commented Apr 12, 2020

Hi there,

I'm trying to construct the W weight matrix to work with lap_score on the following simple dataset: employes-region.txt. I've tried the following code, which is provided as an example in file test_lap_score.py:

    kwargs_W = {"metric": "euclidean", "neighbor_mode": "knn", "weight_mode": "heat_kernel", "k": 5, 't': 1}
    W = construct_W.construct_W(X, **kwargs_W)

Unfortunately, it fails with the following exception at line 152 of file construct_W.py:

could not broadcast input array from shape (25) into shape (30) 

I've gone through the code, and I think that the problem's that the dimensions of G are wrong. This is the piece of code involved in the exception:

            t = kwargs['t']
            # compute pairwise euclidean distances
            D = pairwise_distances(X)
            D **= 2
            # sort the distance matrix D in ascending order
            dump = np.sort(D, axis=1)
            idx = np.argsort(D, axis=1)  #  *** 1
            idx_new = idx[:, 0:k+1]  #  *** 2
            dump_new = dump[:, 0:k+1] #  *** 2
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k+1), 3)) #  *** 2
            G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1) #  *** 2
            G[:, 1] = np.ravel(idx_new, order='F') # *** EXCEPTION HERE!!
            G[:, 2] = np.ravel(dump_heat_kernel, order='F')
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)

I think that there's a problem at line *** 1. Should it compute idxusing dump? I mean:

            idx = np.argsort(dump, axis=1)  #  *** 1

And the other problem is at the lines *** 2. Shouldn't they use k as a multiplier instead of k+1? That is:

            idx_new = idx[:, 0:k]  #  *** 2
            dump_new = dump[:, 0:k] #  *** 2
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k), 3)) #  *** 2
            G[:, 0] = np.tile(np.arange(n_samples), (k, 1)).reshape(-1) #  *** 2

I've fixed my local installation using this path and I've run the system on a large collection with 200+ datasets. It works correctly now.

I've seen that there are many other lines in which a similar patch might apply, bu I haven't tried other configuration options.

Thanks! Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant