-
Notifications
You must be signed in to change notification settings - Fork 126
error in results due to negative sampling data leakage #28
Comments
I noticed also, so the paper is an error? |
Yes. |
Sorry, have you try more epochs? In experiments. |
I experimented more epochs with other models such as tucker to see that the learning approach is not right. |
sorry, I notice this too and I found another paper 'A Re-evaluation of Knowledge Graph Completion Methods' also mentions this issue. But I suppose the model structure, i.e. the attention mechanism and the encoder-decoder structure is still valid, the problem is in the training and evaluation protocol right? Because I found the idea in this paper interesting and plan to modify and apply it for my work. |
Not sure what you mean by |
Many thanks for you reply, could you please tell me how you run the experiment? in this code: while (random_entities[current_index], self.batch_indices[last_index + current_index, 1], do you change the 'valid_triples_dict' so that it only includes training data? |
Hi @wbaz4321 and @hshahbazi , |
So, just to be sure, as we can conclude by now (can we?):
I don't see how this should be "rectified" by not using ConvKB as the decoder? Is there a bug in ConvKB then? Can you describe your experiments that "rectify the results" more closely? I mean, the two points above just suggest that the bug was in fact in the training and caused by the data leakage. I am sorry for the inconvenience but I am working with your code and as of now I am very confused. |
As you can see in this paper (https://www.aclweb.org/anthology/2020.acl-main.489.pdf), we are dealing with two different issues here.
Above mentioned paper removes the test leakage and uses Random evaluation strategy to get the updated results for our paper. However, they don't experiment with the hyperparameters. After re-evaluating our model and using updated hyperparameters we were able to get ~44 Hits @ 10 using ConvKB (These results are after removing test data leakage and using correct evaluation protocol). The paper also mentions that ConvE does not suffer from this problem and the results of ConvE is independent of the evaluation protocol. More specifically, many of the neurons in ConvKB become zeros after applying ReLU activation function as a result of which ConvKB predicts same scores for numerous triples which causes the stark difference in the results using different evaluation protocol. Following this reasoning we decided to experiment with ConvE instead of using ConvKB. I hope this is helpful to you. |
Thanks for the quick response! But I wonder.... if it's now 53 with the random evaluation strategy, after you freshly tuned hyper-parameters... it would still fall behind TuckeR (53.6) and would be same as RotatE (53.0), in contrast to what's reported in the original result table displayed in https://www.aclweb.org/anthology/2020.acl-main.489.pdf. Can you also report new results on the original task formulation (no randomized evaluation strategy), where the paper reported 62.2 hits? |
The reported results (with ConvE as decoder) are with randomized evaluation strategy and they are independent of the evaluation strategy. The task formulation is still the same but the evaluation strategy used was wrong. So we don't think you should compare the results with wrong evaluation strategy. |
Thanks for your efforts to adjust the model, I also personally think the encoding process is useful and I am currently trying to modify it for my work. Thanks very much for providing a good reference for me. I have another question about the classes SepcialSpmmFinal() and SpecialSpmmFinalFunctionalFinal() could you please tell me what do these function do? In the paper, once we have obtained the exp values of the absolute attention values, we can sum them up the get the value of e_row_sum Also, I found that some one else mentions this issue as well, so if you can help us understand this, we will be very grateful. |
Hi,
I've noticed an issue in this work as follows: In module create_batch.py
the negative samples are not allowed to be taken from test data. When valid_triples_dict is defined as:
self.valid_triples_dict = {j: i for i, j in enumerate(
self.train_triples + self.validation_triples + self.test_triples)}
and later used in:
while (random_entities[current_index], self.batch_indices[last_index + current_index, 1],
self.batch_indices[last_index + current_index, 2]) in self.valid_triples_dict.keys():
The model is taught during training that the test data are valid triples. We are not allowed to use test data during training. This will make the performance goes higher and higher with more epochs without any good reasons.
The text was updated successfully, but these errors were encountered: