-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow: how to track more than 2 billion actions with PiwiK? #3288
Comments
Replying to matt: I think this is the same issue I have run in to. what tracking stats will be lost if I drop and recreate the corrupt table, attempting repair now, but if it's the row limit wont it continue to fail?
|
Replying to Davidj:
Sorry I missed your earlier comment! how did you go with repairing the table ? I haven't tested the procedure myself, but I think this should work:
Have you tested it, or something else? I'd really like to hear your experience, cheers |
I think you all should allow it to be an option in the install if a person wants primary keys of int or bigint. If we use the int primary key, then I will be limited to 10 years of data. I can update it myself, but other people should know about the limitation. |
The piwik database can't handle more than 4.294.967.295 loglines in total due to the fact that numeric fields are integers. And that number is the max. I have hit now for the second time, and the complete Piwik setup is useless. The reports now show that there are no page views anymore. So there are no updates anymore, and everything stays zero. Please consider the change all numeric fields to either bit integers or longs. So that the limit is theoretically 18.446.744.073.709.551.615 loglines. I am doing about 10.000.000 loglines a day, and with that amount, it will take me about 430 days to hit the max. That is almost 1,5 year. It cannot be that I have to delete the complete Piwik install every 1,5 year. With bigint the period will be 1.844.674.400.000 days. The solution of making backup, dropping table and create it again, is not handy. Because you will lose the option to regenerate the reports, due to missing hits. I am now trying to change the fields to Bigint, in order to preserve my history. But it takes very long time with 800.000.000+ loglines in the table. Also I am surprised that this is already known for two years, and still not fixed... |
Maybe we could only add the index for new users, and do not run the ALTER for existing users? |
Yeah, I think that an upgrade is taken to much time if you have much data. On the other hand, you could make a FAQ item about which fields you should change from INT to BIGINT. So it is to the administrator to decide if he changes the table structure. And warn for the downtime For newly installs, I would suggest to use BIGINT for auto-number fields that could be big. I am not sure which fields that could be, so I have make almost every INT field to BIGINT :) Maybe a bit to much. The overhead of BIGINT compared to INT is for me not a reason not to do it. I have space enough on disk. But there is also a code change needed. The code for cleaning up the log hits, uses a temporary table. That temporary table is created with a INT field. But if the auto-number fields are BIGINT, the temporary table should also use a BIGINT field. So for now, I would suggest to alter the code for creating the temporary table in file core/DataAccess/RawLogDao.php: private function createTempTableForStoringUsedActions()
{
$sql = "CREATE TEMPORARY TABLE " . Common::prefixTable(self::DELETE_UNUSED_ACTIONS_TEMP_TABLE_NAME) . " (
idaction BIGINT,
PRIMARY KEY (idaction)
)";
Db::query($sql);
} I changed INT(11) to BIGINT |
Adding to 3.0.0 milestone. Steps:
Optional
|
Adding to 3.0.0-b1: my suggestion is to make Piwik by default use BIGINT fields, to prevent any issues in the future, for all users. Downside is that more disk space and index space will be used, but I think this should be acceptable? |
I think that the more space it needs is less of an issue than recreating the tables when the max INT has been hit. I have already changed to bigint more then a year ago, and I don't notice any down site effects. At this time, my auto increment value is 11,763,937,301 for the piwik_log_link_visit_action table :) A side step. Make the INT unsigned, and you can have the double amount of data. Because auto increment fields are never negative, so there is no need for a sign bit. This will increase the max from 2+ billion, to 4+ billion. So maby an extra option, use unsigned intergers of big integers for auto increment fields |
We are very impressed @theyosh - this validates the power and scalability of Piwik 🚀 |
Extra info: this is on a TokuDB enabled MySQL server. Which is faster then the regular MySQL server. |
Currently in Piwik, we have many fields set to auto increment. But the auto increment fields which are likely to hit the 4 billion unsigned int limit, are located in the
We need to
I propose that in Piwik 3 we actually change the fields even for existing users, so to have a clean slate. It will take forever to run the upgrade, but there are also many other schema changes that will take a while, so it may not add much overhead to the upgrade process. I'll work on this improvement next week 👍 |
Fixed in PR: #10548 |
We just realised that InnoDB will book the row space (ie. 8 bytes per BIGINT) even for NULLable columns. therefore we decide that it's not needed for now, to make |
Second PR reverting partially the first PR #10569 |
Piwik Mysql tables use INT primary key, which cap at: 2^31-1=2147483647 when signed.
There was an ommission in log_link_visit_action where the primary key is not unsigned causing overflow to happen at 2 billion instead of 4 billion pages.
But, as part of this ticket, we should also figure out a strategy for "beyond 4294967295 page views".
The text was updated successfully, but these errors were encountered: