-
Notifications
You must be signed in to change notification settings - Fork 785
Description
When upgrading from 0.12 to 0.13-rc8, I noticed partial indexing is harvesting all the files under the .hg
directories of Mercurial repositories. In fact, it was harvesting all the files under directories which would normally be ignored like SCCS
, etc. This caused huge increase of indexing times.
Tracking down on what is happening I discovered this ignoredNames
object dump from in configuration.xml
as written by the indexer from 0.13-rc8 when run with -W /var/opengrok/etc/configuration.xml -i bar.txt
:
<void id="IgnoredNames0" property="ignoredNames">
<void id="ArrayList0" property="items">
<void index="26">
<string>bar.txt</string>
</void>
<void index="27">
<string>SCCS</string>
</void>
<void index="28">
<string>CVS</string>
</void>
<void index="29">
<string>CVSROOT</string>
</void>
<void index="30">
<string>RCS</string>
</void>
<void index="31">
<string>Codemgr_wsdata</string>
</void>
<void index="32">
<string>deleted_files</string>
</void>
<void index="33">
<string>.svn</string>
</void>
<void index="34">
<string>.repo</string>
</void>
<void index="35">
<string>.git</string>
</void>
<void index="36">
<string>.hg</string>
</void>
<void index="37">
<string>.razor</string>
</void>
<void method="add">
<string>.bzr</string>
</void>
</void>
<void property="items">
<object idref="ArrayList0"/>
</void>
</void>
This does not look correct. Even if this could somehow work with all the index numbers, the default elements (i.e. anything but bar.txt
) should not be present as they are not meant to be configurable.
In 0.12, i.e. prior to the changes done for #508 , the section would have looked like this:
<void id="IgnoredNames0" property="ignoredNames">
<void property="items">
<void method="add">
<string>bar.txt</string>
</void>
</void>
</void>
so there is something wrong with serialization of ignoredNames
. The class has 2 subclasses to provide functionality for files and directories.
There is a getItems()
method which looks like this:
44 public List<String> getItems() {
45 List<String> twoLists = new ArrayList<>();
46 twoLists.addAll(ignoredFiles.getItems());
47 twoLists.addAll(ignoredDirs.getItems());
48 return twoLists;
49 }
This method is used when env.writeConfiguration()
is called from Indexer's main()
so whatever it spits out has consequences on what will end up in configuration.xml
.
Now, the problem with ignored files/directories seems to be happening for some reason only with partial indexing.