when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow. #1482

g-yit · 2024-11-08T03:00:36Z

Describe the bug

In PyKEEN, when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow.

How to reproduce

def replicate_conve():
    import pykeen.datasets
    import pykeen.models
    import pykeen.training
    import pykeen.optimizers
    import pykeen.evaluation
    from pykeen.losses import BCEAfterSigmoidLoss

    # Load the FB15K dataset
    dataset = pykeen.datasets.FB15k(create_inverse_triples=False)
    # Set up the loss function
    loss = BCEAfterSigmoidLoss(reduction='mean')
    # Initialize the ConvE model with the specified parameters
    model = pykeen.models.ConvE(
        embedding_dim=200,
        # input_channels=1,
        output_channels=32,
        embedding_height=10,
        embedding_width=20,
        kernel_height=3,
        kernel_width=3,
        input_dropout=0.2,
        feature_map_dropout=0.2,
        output_dropout=0.3,
        apply_batch_normalization=True,
        entity_initializer='xavier_normal',
        relation_initializer='xavier_normal',
        triples_factory=dataset.training,
        loss=loss
    ).to("cuda")

    # Set up the optimizer
    optimizer = pykeen.optimizers.Adam(
        params=model.get_grad_params(),
        lr=0.001
    )
    # Configure the training loop
    training_loop = pykeen.training.LCWATrainingLoop(
        model=model,
        triples_factory=dataset.training,
        optimizer=optimizer,
    )
    eval_callback = ZTrainingCallback(evaluation_triples=dataset.validation.mapped_triples,
                                      full_test_evaluation_triples=dataset.testing.mapped_triples,
                                      additional_filter_triples=dataset.training.mapped_triples)

    # Train the model
    training_loop.train(
        triples_factory=dataset.training,
        num_epochs=3,
        batch_size=128,
        label_smoothing=0.1,
        use_tqdm_batch=False,
        callbacks=[eval_callback],
    )

    # Evaluate the model
    evaluator = pykeen.evaluation.RankBasedEvaluator(filtered=True)
    results = evaluator.evaluate(
        model=model,
        mapped_triples=dataset.testing.mapped_triples,
        additional_filter_triples=[
            dataset.training.mapped_triples,
            dataset.validation.mapped_triples
        ]
    )

    # Output the results
    print(results.to_dict())

Environment

1.11.1-dev

Additional information

No response

Issue Template Checks

This is not a feature request (use a different issue template if it is)
This is not a question (use the discussions forum instead)
I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed

mberr · 2024-11-08T07:51:34Z

Hi @g-yit ,

this is unfortunately to be expected since ConvE's interaction function is designed to allow fast 1:n tail scoring while being quite inefficient for head scoring. In the original paper, they use inverse relations, so they only need tail scoring 😉

Background:

ConvE uses an interaction function of the form $\langle f(h, r), t \rangle$, where $f(h, r)$ is an expensive operation. When you score tails, you can compute $f(h, r)$ just once, but for head scoring, you need to do this (number of entities) many times.

g-yit · 2024-11-08T15:21:20Z

import pykeen.models
import pykeen.training
import pykeen.optimizers
import pykeen.evaluation
from pykeen.datasets import FB15k237
from pykeen.losses import SoftplusLoss
from pykeen.regularizers import PowerSumRegularizer
from pykeen.sampling import BernoulliNegativeSampler
from tests.ztest.callback.training_callback import ZTrainingCallback

# Load the FB15k-237 dataset
dataset = FB15k237()
regularizer = PowerSumRegularizer(
    weight=0.0005,
    p=2.0,
    apply_only_once=True,
    normalize=False
)
# Set up the loss function
loss = SoftplusLoss(reduction='mean')
# Initialize the ConvKB model with the specified parameters
model = pykeen.models.ConvKB(
    embedding_dim=100,
    num_filters=50,
    hidden_dropout_rate=0.0,
    entity_initializer='xavier_uniform',
    relation_initializer='xavier_uniform',
    triples_factory=dataset.training,
    regularizer=regularizer,
    loss=loss,
).to("cuda")

# Set up the regularizer


# Set up the optimizer
optimizer = pykeen.optimizers.Adam(
    params=model.get_grad_params(),
    lr=5e-06
)

# Set up the negative sampler
negative_sampler = BernoulliNegativeSampler(
    mapped_triples=dataset.training.mapped_triples
)

# Configure the training loop
training_loop = pykeen.training.SLCWATrainingLoop(
    model=model,
    triples_factory=dataset.training,
    optimizer=optimizer,
    negative_sampler=negative_sampler,
    negative_sampler_kwargs=dict(num_negs_per_pos=1)
)

# Train the model
training_loop.train(
    num_epochs=1,
    batch_size=256,
    triples_factory=dataset.training,
    use_tqdm_batch=False,
)

# Evaluate the model
evaluator = pykeen.evaluation.RankBasedEvaluator(filtered=True)
results = evaluator.evaluate(
    model=model,
    mapped_triples=dataset.testing.mapped_triples,
    additional_filter_triples=[
        dataset.training.mapped_triples,
        dataset.validation.mapped_triples
    ]
)
print(results.to_dict())

When running the above code for ConKB training, the validation process is also very slow.

The graphics card is an NVIDIA RTX 4090 with 24GB of memory.

mberr · 2024-11-08T19:27:05Z

ConvKB is also very slow in 1:n scoring (even worse than ConvE where at least one direction is fast). The default evaluation protocol is to use 1:n scoring. You could try to change to SampledRankBasedEvaluator for validation metrics, which does not score against all entities but only some sampled negative. Note however, that most metrics cannot directly be compared between sampled and full setting. These two papers [0] [1] describe a few metrics which are comparable, and those are also implemented in PyKEEN. Basically you need to look for metrics which have "adjusted" in their name.

g-yit · 2024-11-09T12:07:11Z

Thank you. I used the configuration file from the experiment to build the conkb pipeline. There are two issues:

The evaluation time is particularly long, at 1023.292813539505, while the training time is only 566.7723512649536.
The training results differ significantly from those reported in the paper.

Did I configure something incorrectly? (You can also try the code mentioned above.) Is there any guidance on conkb training code that can yield the correct results?

训练的代码如下：
replicate_pipeline_from_path('E:\code\python\kge\pykeen_zyt\pykeen\src\pykeen\experiments\convkb\nguyen2018_convkb_fb15k237.json',directory="./",replicates = 1)

"metrics": {
"both": {
"optimistic": {
"adjusted_arithmetic_mean_rank": 0.49161833830752394,
"adjusted_arithmetic_mean_rank_index": 0.5084529168082013,
"adjusted_geometric_mean_rank_index": 0.8129274434993866,
"adjusted_hits_at_k": 0.04409761990771625,
"adjusted_inverse_harmonic_mean_rank": 0.028990788367274135,
"arithmetic_mean_rank": 3508.025711909189,
"count": 40876.0,
"geometric_mean_rank": 981.909951047591,
"harmonic_mean_rank": 33.689574794872044,
"hits_at_1": 0.00988355024953518,
"hits_at_10": 0.04476954692239945,
"hits_at_3": 0.037723847734611994,
"hits_at_5": 0.03953420099814072,
"inverse_arithmetic_mean_rank": 0.00028506062444330415,
"inverse_geometric_mean_rank": 0.0010184233278551754,
"inverse_harmonic_mean_rank": 0.029682772967268557,
"inverse_median_rank": 0.00039816842524387816,
"median_absolute_deviation": 3584.190863237293,
"median_rank": 2511.5,
"standard_deviation": 3545.3494154993555,
"variance": 12569502.477981621,
"z_arithmetic_mean_rank": 177.82301884691216,
"z_geometric_mean_rank": 164.6389409176076,
"z_hits_at_k": 336.15763297564826,
"z_inverse_harmonic_mean_rank": 545.9140874982392
},
"pessimistic": {
"adjusted_arithmetic_mean_rank": 0.4916184068763812,
"adjusted_arithmetic_mean_rank_index": 0.5084528482297332,
"adjusted_geometric_mean_rank_index": 0.812927421312853,
"adjusted_hits_at_k": 0.04409761990771625,
"adjusted_inverse_harmonic_mean_rank": 0.02899078831481532,
"arithmetic_mean_rank": 3508.0262011938544,
"count": 40876.0,
"geometric_mean_rank": 981.9100673820868,
"harmonic_mean_rank": 33.68957485436971,
"hits_at_1": 0.00988355024953518,
"hits_at_10": 0.04476954692239945,
"hits_at_3": 0.037723847734611994,
"hits_at_5": 0.03953420099814072,
"inverse_arithmetic_mean_rank": 0.0002850605846842533,
"inverse_geometric_mean_rank": 0.0010184232071946706,
"inverse_harmonic_mean_rank": 0.029682772914847125,
"inverse_median_rank": 0.00039816842524387816,
"median_absolute_deviation": 3584.190863237293,
"median_rank": 2511.5,
"standard_deviation": 3545.349755856654,
"variance": 12569504.891352836,
"z_arithmetic_mean_rank": 177.82299486272447,
"z_geometric_mean_rank": 164.63893642425776,
"z_hits_at_k": 336.15763297564826,
"z_inverse_harmonic_mean_rank": 545.9140865104079
},
"realistic": {
"adjusted_arithmetic_mean_rank": 0.4916183617106642,
"adjusted_arithmetic_mean_rank_index": 0.5084528934017807,
"adjusted_geometric_mean_rank_index": 0.8129274160047766,
"adjusted_hits_at_k": 0.04409761990771625,
"adjusted_inverse_harmonic_mean_rank": 0.02899078763356493,
"arithmetic_mean_rank": 3508.02587890625,
"count": 40876.0,
"geometric_mean_rank": 981.9100952148438,
"harmonic_mean_rank": 33.68957562702935,
"hits_at_1": 0.00988355024953518,
"hits_at_10": 0.04476954692239945,
"hits_at_3": 0.037723847734611994,
"hits_at_5": 0.03953420099814072,
"inverse_arithmetic_mean_rank": 0.0002850606106221676,
"inverse_geometric_mean_rank": 0.0010184231214225292,
"inverse_harmonic_mean_rank": 0.029682772234082225,
"inverse_median_rank": 0.0003981684276368469,
"median_absolute_deviation": 3584.190863237293,
"median_rank": 2511.5,
"standard_deviation": 3545.349609375,
"variance": 12569504.0,
"z_arithmetic_mean_rank": 177.82301066090278,
"z_geometric_mean_rank": 164.63893534923432,
"z_hits_at_k": 336.15763297564826,
"z_inverse_harmonic_mean_rank": 545.9140736820508
}
},
"head": {
"optimistic": {
"adjusted_arithmetic_mean_rank": 0.6321478767554082,
"adjusted_arithmetic_mean_rank_index": 0.3679044073597997,
"adjusted_geometric_mean_rank_index": 0.5339996208323922,
"adjusted_hits_at_k": 0.0022226369361912695,
"adjusted_inverse_harmonic_mean_rank": 0.002845870988414374,
"arithmetic_mean_rank": 4448.195958508661,
"count": 20438.0,
"geometric_mean_rank": 2407.67342488841,
"harmonic_mean_rank": 280.3423290052399,
"hits_at_1": 0.0,
"hits_at_10": 0.0029357079949114393,
"hits_at_3": 0.002593208728838438,
"hits_at_5": 0.0027889225951658676,
"inverse_arithmetic_mean_rank": 0.00022481023977533317,
"inverse_geometric_mean_rank": 0.000415338720634983,
"inverse_harmonic_mean_rank": 0.0035670674619433193,
"inverse_median_rank": 0.0002520478890989288,
"median_absolute_deviation": 4028.9715287889735,
"median_rank": 3967.5,
"standard_deviation": 3455.2146644943596,
"variance": 11938508.37773687,
"z_arithmetic_mean_rank": 90.88571043736795,
"z_geometric_mean_rank": 76.47646472821779,
"z_hits_at_k": 11.88184816121705,
"z_inverse_harmonic_mean_rank": 37.5819869458943
},
"pessimistic": {
"adjusted_arithmetic_mean_rank": 0.6321479880095942,
"adjusted_arithmetic_mean_rank_index": 0.3679042960898008,
"adjusted_geometric_mean_rank_index": 0.533999549519833,
"adjusted_hits_at_k": 0.0022226369361912695,
"adjusted_inverse_harmonic_mean_rank": 0.0028458709528019705,
"arithmetic_mean_rank": 4448.196741364126,
"count": 20438.0,
"geometric_mean_rank": 2407.673793184336,
"harmonic_mean_rank": 280.3423318020593,
"hits_at_1": 0.0,
"hits_at_10": 0.0029357079949114393,
"hits_at_3": 0.002593208728838438,
"hits_at_5": 0.0027889225951658676,
"inverse_arithmetic_mean_rank": 0.0002248102002101037,
"inverse_geometric_mean_rank": 0.0004153386571016426,
"inverse_harmonic_mean_rank": 0.0035670674263566728,
"inverse_median_rank": 0.0002520478890989288,
"median_absolute_deviation": 4028.9715287889735,
"median_rank": 3967.5,
"standard_deviation": 3455.2150784679957,
"variance": 11938511.238472598,
"z_arithmetic_mean_rank": 90.88568294964912,
"z_geometric_mean_rank": 76.47645451522848,
"z_hits_at_k": 11.88184816121705,
"z_inverse_harmonic_mean_rank": 37.581986475604225
},
"realistic": {
"adjusted_arithmetic_mean_rank": 0.6321479237315103,
"adjusted_arithmetic_mean_rank_index": 0.3679043603770207,
"adjusted_geometric_mean_rank_index": 0.5339999209348403,
"adjusted_hits_at_k": 0.0022226369361912695,
"adjusted_inverse_harmonic_mean_rank": 0.002845870967054809,
"arithmetic_mean_rank": 4448.1962890625,
"count": 20438.0,
"geometric_mean_rank": 2407.671875,
"harmonic_mean_rank": 280.3423306827129,
"hits_at_1": 0.0,
"hits_at_10": 0.0029357079949114393,
"hits_at_3": 0.002593208728838438,
"hits_at_5": 0.0027889225951658676,
"inverse_arithmetic_mean_rank": 0.00022481022460851818,
"inverse_geometric_mean_rank": 0.0004153389891143888,
"inverse_harmonic_mean_rank": 0.0035670674405992027,
"inverse_median_rank": 0.0002520479029044509,
"median_absolute_deviation": 4028.9715287889735,
"median_rank": 3967.5,
"standard_deviation": 3455.21484375,
"variance": 11938509.0,
"z_arithmetic_mean_rank": 90.88569883092029,
"z_geometric_mean_rank": 76.47650770722639,
"z_hits_at_k": 11.88184816121705,
"z_inverse_harmonic_mean_rank": 37.58198666382428
}
},
"tail": {
"optimistic": {
"adjusted_arithmetic_mean_rank": 0.35493601752362436,
"adjusted_arithmetic_mean_rank_index": 0.6451531573132943,
"adjusted_geometric_mean_rank_index": 0.9249687689860338,
"adjusted_hits_at_k": 0.085971619378208,
"adjusted_inverse_harmonic_mean_rank": 0.0551351505596238,
"arithmetic_mean_rank": 2567.855465309717,
"count": 20438.0,
"geometric_mean_rank": 400.4476446015376,
"harmonic_mean_rank": 17.92163563189566,
"hits_at_1": 0.01976710049907036,
"hits_at_10": 0.08660338584988747,
"hits_at_3": 0.07285448674038555,
"hits_at_5": 0.07627947940111557,
"inverse_arithmetic_mean_rank": 0.00038943001797002886,
"inverse_geometric_mean_rank": 0.0024972053487667344,
"inverse_harmonic_mean_rank": 0.055798478472593796,
"inverse_median_rank": 0.0015503875968992248,
"median_absolute_deviation": 951.8306242805965,
"median_rank": 645.0,
"standard_deviation": 3381.221139328889,
"variance": 11432656.39304455,
"z_arithmetic_mean_rank": 159.73652598187994,
"z_geometric_mean_rank": 132.46303022548187,
"z_hits_at_k": 467.33221826444293,
"z_inverse_harmonic_mean_rank": 740.3253305927404
},
"pessimistic": {
"adjusted_arithmetic_mean_rank": 0.3549360445757312,
"adjusted_arithmetic_mean_rank_index": 0.6451531302574478,
"adjusted_geometric_mean_rank_index": 0.9249687626685364,
"adjusted_hits_at_k": 0.085971619378208,
"adjusted_inverse_harmonic_mean_rank": 0.05513515049031892,
"arithmetic_mean_rank": 2567.8556610235837,
"count": 20438.0,
"geometric_mean_rank": 400.4476782343243,
"harmonic_mean_rank": 17.921635654139724,
"hits_at_1": 0.01976710049907036,
"hits_at_10": 0.08660338584988747,
"hits_at_3": 0.07285448674038555,
"hits_at_5": 0.07627947940111557,
"inverse_arithmetic_mean_rank": 0.0003894299882889001,
"inverse_geometric_mean_rank": 0.0024972051390315317,
"inverse_harmonic_mean_rank": 0.05579847840333757,
"inverse_median_rank": 0.0015503875968992248,
"median_absolute_deviation": 951.8306242805965,
"median_rank": 645.0,
"standard_deviation": 3381.221266794729,
"variance": 11432657.255024955,
"z_arithmetic_mean_rank": 159.73651928299472,
"z_geometric_mean_rank": 132.46302932076503,
"z_hits_at_k": 467.33221826444293,
"z_inverse_harmonic_mean_rank": 740.3253296621515
},
"realistic": {
"adjusted_arithmetic_mean_rank": 0.3549360179991497,
"adjusted_arithmetic_mean_rank_index": 0.6451531568377032,
"adjusted_geometric_mean_rank_index": 0.9249687885809247,
"adjusted_hits_at_k": 0.085971619378208,
"adjusted_inverse_harmonic_mean_rank": 0.055135150511545356,
"arithmetic_mean_rank": 2567.85546875,
"count": 20438.0,
"geometric_mean_rank": 400.4475402832031,
"harmonic_mean_rank": 17.921635647326898,
"hits_at_1": 0.01976710049907036,
"hits_at_10": 0.08660338584988747,
"hits_at_3": 0.07285448674038555,
"hits_at_5": 0.07627947940111557,
"inverse_arithmetic_mean_rank": 0.00038943003164604306,
"inverse_geometric_mean_rank": 0.002497205976396799,
"inverse_harmonic_mean_rank": 0.0557984784245491,
"inverse_median_rank": 0.001550387591123581,
"median_absolute_deviation": 951.8306242805965,
"median_rank": 645.0,
"standard_deviation": 3381.22119140625,
"variance": 11432656.0,
"z_arithmetic_mean_rank": 159.73652586412607,
"z_geometric_mean_rank": 132.46303303162918,
"z_hits_at_k": 467.33221826444293,
"z_inverse_harmonic_mean_rank": 740.3253299471687
}
}
},
"times": {
"evaluation": 1023.292813539505,
"training": 566.7723512649536
}
}

g-yit · 2024-11-09T14:48:45Z

When I trained using conve and set create_inverse_triples to False, I can understand that the evaluation would be slow, but it is only 2.83 triples/s.

mberr · 2024-11-10T09:49:07Z

The training results differ significantly from those reported in the paper.

Did you first train a TransE model and use its weights to initialize the ConvKB ones? That one is easy to miss, and unfortunately not easy to set as default initialization for ConvKB (since it requires training another model first). You can find a config for this training on FB15k237 here. Note that there are some best guesses in there because the paper did not report all hyperparameters for this first run. Once you have the weights, you can use pykeen.nn.init.PretrainedInitializer as initializer.

Even with those, we were not able to reproduce the paper's results, cf. https://arxiv.org/pdf/2006.13365, Table 9 in the appendix.

mberr · 2024-11-10T09:50:04Z

When I trained using conve and set create_inverse_triples to False, I can understand that the evaluation would be slow, but it is only 2.83 triples/s.

For each triple it needs to score ~15k entities, i.e., it is running 3.31*15k = 49.65k score evaluations per second.

mberr · 2024-11-10T09:51:14Z

btw, from you method name replicate_conve it looks like you are trying to reproduce ConvE? If so, you may want to take a look at https://github.com/pykeen/pykeen/blob/master/src/pykeen/experiments/conve/dettmers2018_conve_fb15k237.json. In particular, reproducing the ConvE paper requires the use of inverse triples.

g-yit added the bug Something isn't working label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow. #1482

when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow. #1482

g-yit commented Nov 8, 2024 •

edited by mberr

Loading

mberr commented Nov 8, 2024

g-yit commented Nov 8, 2024 •

edited by cthoyt

Loading

mberr commented Nov 8, 2024

g-yit commented Nov 9, 2024

g-yit commented Nov 9, 2024

mberr commented Nov 10, 2024

mberr commented Nov 10, 2024

mberr commented Nov 10, 2024

when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow. #1482

when "create_inverse_triples" is set to false for ConvE, the evaluation becomes extremely slow. #1482

Comments

g-yit commented Nov 8, 2024 • edited by mberr Loading

Describe the bug

How to reproduce

Environment

Additional information

Issue Template Checks

mberr commented Nov 8, 2024

g-yit commented Nov 8, 2024 • edited by cthoyt Loading

mberr commented Nov 8, 2024

g-yit commented Nov 9, 2024

g-yit commented Nov 9, 2024

mberr commented Nov 10, 2024

mberr commented Nov 10, 2024

mberr commented Nov 10, 2024

g-yit commented Nov 8, 2024 •

edited by mberr

Loading

g-yit commented Nov 8, 2024 •

edited by cthoyt

Loading