Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about Table18/19 #7

Closed
beyondguo opened this issue Nov 14, 2024 · 2 comments
Closed

Questions about Table18/19 #7

beyondguo opened this issue Nov 14, 2024 · 2 comments

Comments

@beyondguo
Copy link

Thanks for you nice work. I have some questions about the results on Table 18 & 19 in the paper.
image
image

  1. Why the numbers from the two tables are different for the same method and same dataset?
  2. Could you brefly explain why PAttn is better than PatchTST?
  3. The Time-LLM stills looks quite good in these two tables, seems to be the best one in Table 19, better than PAttn and PathTST. However, your ablation study shows that Time-LLM without LLM can even be better. So what part of Time-LLM really matters?
@BennyTMT
Copy link
Owner

BennyTMT commented Nov 14, 2024

Thank you for your interest!

  1. The numbes are the same; you might notice that the order of dataset names in the first column differs.

  2. For an explanation of PAttn, if you don`t mind, please allow me to reference this response.
    For question 2 This is an excellent question!

    • One possible answer is that PAttn, unlike PatchTST, does not use the full Transformer structure but instead opts for a simple Attention mechanism. It has been shown that Transformers do not have significant advantages in time series tasks (as noted in "Are Transformers Effective for Time Series Forecasting?").
    • Another perspective, which I am not entirely sure about but worth sharing, is that these eight datasets have been extensively studied over the years. The differences shown by current architectures are minimal and not enough to indicate which one is superior; it’s possible that these models are merely racing fitting the noise in the data.
      (Therefore, even if our numbers appear better, we have never claimed to be the superior model. It is just simpler.)
  3. If you look closely, you will see that the performance of Time-LLM is very similar to that of simpler method, making it difficult to definitively state that it has a significant advantage. We simply note that their performances are comparable.

@beyondguo
Copy link
Author

Thanks for you clearification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants