-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/content extension #62
Feature/content extension #62
Conversation
Hey @linzhp, this PR is (sadly) not runnable or depends on another PR.
|
Sorry, I failed to mention this PR depends on MetricsGrimoire/RepositoryHandler#4 |
@@ -394,7 +395,7 @@ def create_tables (self, cursor): | |||
")") | |||
cursor.execute ("CREATE TABLE actions (" + | |||
"id integer primary key," + | |||
"type varchar(1)," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which action (Added, Modified, Deleted, etc.) got two character?
I dont see a change / select / insert on the actions table in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In merge commits, the actions often have two characters.
@@ -64,7 +64,7 @@ def set_tail(self, tail): | |||
patterns['committer'] = re.compile("^Commit:[ \t]+(.*)[ \t]+<(.*)>$") | |||
patterns['date'] = re.compile( | |||
"^CommitDate: (.* [0-9]+ [0-9]+:[0-9]+:[0-9]+ [0-9][0-9][0-9][0-9]) ([+-][0-9][0-9][0-9][0-9])$") | |||
patterns['file'] = re.compile("^([MAD])[ \t]+(.*)$") | |||
patterns['file'] = re.compile("^([MAD]+)[ \t]+(.*)$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why this is needed. I wouldn't have to change the database schema without a good explanation because this can cause collateral effects.
I've tested your code with a CVSAnalY repo and it works perfectly but before merging it, I would like to know why it is needed to change the database schema. Are there any action with more than a letter to categorize it? |
Run this command on the master branch of CVSAnalY and you will see actions with two letters:
|
Thanks for the clarification @linzhp. From my point of view, we should keep the table as is. The idea of 'type' field was to homogenize or to establish a map for all the kind of actions that we can find in SCM systems. I know in MininGit you didn't need that because it only analyzes git repositories and the change in the structure of the table was because of that. But, I think the same behavior is not valid for CVSAnalY. If there are actions with two letters we should try to map into an existing type, if possible. If not, find another letter. If this is not possible, then we will change the type field. My personal reason here is I have a huge number of databases that follow the current schema. A change on the schema makes that I couldn't use the latest version of CVSAnalY anymore (unless I make the change by myself in all the databases). Other users of CVSAnalY will have the same problem. But this doesn't mean that we cannot change the schema. What I am saying is we have to find strong arguments to do it. By the way, I think the code that changes the schema should go out of this patch. Reviewing the code of the extension I didn't find any place where it is needed. Anyway, @linzhp thanks for your contribution and your patience with us. |
@sduenas Here is how this extension needs the schema change: this extension intends to store all revisions of all files in source repositories. However, taking a snapshot of all files at all commits will blow up the database. So it only stores the content of a file when it gets a new revision, i.e., it's changed in the commit. In a merge commit, a file can get a new revision different from the revisions before the merge, indicated by actions such as 'MM'. If you are reluctant to modify the database schema and the regular expression, those actions will be ignored by CVSAnalY when it is populating the Is the argument strong enough? |
I'm jumping into this discussion, but I'm not sure I have all the elements to make up my mind, so please ignore it if you feel like it is just random noise... Anyway, if I understand the patch and the comments, it seems that:
Could we explore the solutilon that @sduenas proposes, about using some other one-character-long actions for MM? If that were the only problem, we could easily use eg "2" or "X", for example, for those "change in revision number but no change in content" in actions. This said, another possibility which maybe would break less the way CVSAnalY currently works would be creating another table, say actions_git, with all info in actions plus these "changes". The extension could either recreate it with the current code, or just copy actions into actions_git, and then populate the added information. I find it very valuable to have this extension in CVSAnalY, since it solves a recurrent problem, so let's explore options... |
Alright, now the database schema is preserved. As for merge actions, I am only interested in files modified for now, I convert all merge actions with 'M' into a single letter 'M'. You guys can decide what to do with actions such as 'AD', 'AA', 'DD' in future. Please take another look |
@linzhp Thanks for your work. |
Hey, i love to see such discussions and i want to jump in for days. But you know. The time runs and runs. Thanks for mentioned @sduenas! In my point of view we should change the database scheme and support more than one action char (e.g. 2 in this change). Why?
I understand the point from @sduenas that he got a huge number of installations which has to be updated as well, if he switch to the new versio of CVSAnaly. And of course this point is valid as hell :D
Both proposals can be combined. Another solution to support multiple actions letters can be to create a second column in the action table. This is quite similiar to the authors date commit (364f67f). This change is valid for git but not for subversion. We should not fear database changes. This kind of changes are necessary for to development of such a tool like cvsanaly. And of course this problem is the same for Bicho and so on, too. Now i`d like to hear your feeback on this proposals :) |
I prefer the schema change too--that's why I implemented that way initially--but I would like to see this PR merged as soon. As this extension could run without schema change, you could merge this PR and at the same time create another issue to revert changes in af146b5 regarding to action types, and do whatever it takes to change the schema. |
@andygrunwald
This is the refreshed pull request for the content extension.
If #60 is accepted, I will update this pull request to call
fr.get_path(repo, path or repo.get_uri())
instead.Please review.