Questions about Table18/19 #7

beyondguo · 2024-11-14T08:23:10Z

Thanks for you nice work. I have some questions about the results on Table 18 & 19 in the paper.

Why the numbers from the two tables are different for the same method and same dataset?
Could you brefly explain why PAttn is better than PatchTST?
The Time-LLM stills looks quite good in these two tables, seems to be the best one in Table 19, better than PAttn and PathTST. However, your ablation study shows that Time-LLM without LLM can even be better. So what part of Time-LLM really matters?

BennyTMT · 2024-11-14T16:39:55Z

Thank you for your interest!

The numbes are the same; you might notice that the order of dataset names in the first column differs.
For an explanation of PAttn, if you don`t mind, please allow me to reference this response.
For question 2 This is an excellent question!
- One possible answer is that PAttn, unlike PatchTST, does not use the full Transformer structure but instead opts for a simple Attention mechanism. It has been shown that Transformers do not have significant advantages in time series tasks (as noted in "Are Transformers Effective for Time Series Forecasting?").
- Another perspective, which I am not entirely sure about but worth sharing, is that these eight datasets have been extensively studied over the years. The differences shown by current architectures are minimal and not enough to indicate which one is superior; it’s possible that these models are merely racing fitting the noise in the data.
  (Therefore, even if our numbers appear better, we have never claimed to be the superior model. It is just simpler.)
If you look closely, you will see that the performance of Time-LLM is very similar to that of simpler method, making it difficult to definitively state that it has a significant advantage. We simply note that their performances are comparable.

beyondguo · 2024-11-15T01:35:54Z

Thanks for you clearification!

beyondguo closed this as completed Nov 15, 2024

Provide feedback