Skip to content

ignoredNames serialization problem causes indexing of hidden directories/files #1383

@vladak

Description

@vladak

When upgrading from 0.12 to 0.13-rc8, I noticed partial indexing is harvesting all the files under the .hg directories of Mercurial repositories. In fact, it was harvesting all the files under directories which would normally be ignored like SCCS, etc. This caused huge increase of indexing times.

Tracking down on what is happening I discovered this ignoredNames object dump from in configuration.xml as written by the indexer from 0.13-rc8 when run with -W /var/opengrok/etc/configuration.xml -i bar.txt:

  <void id="IgnoredNames0" property="ignoredNames">
   <void id="ArrayList0" property="items">
    <void index="26">
     <string>bar.txt</string>
    </void>
    <void index="27">
     <string>SCCS</string>
    </void>
    <void index="28">
     <string>CVS</string>
    </void>
    <void index="29">
     <string>CVSROOT</string>
    </void>
    <void index="30">
     <string>RCS</string>
    </void>
    <void index="31">
     <string>Codemgr_wsdata</string>
    </void>
    <void index="32">
     <string>deleted_files</string>
    </void>
    <void index="33">
     <string>.svn</string>
    </void>
    <void index="34">
     <string>.repo</string>
    </void>
    <void index="35">
     <string>.git</string>
    </void>
    <void index="36">
     <string>.hg</string>
    </void>
    <void index="37">
     <string>.razor</string>
    </void>
    <void method="add">
     <string>.bzr</string>
    </void>
   </void>
   <void property="items">
    <object idref="ArrayList0"/>
   </void>
  </void>

This does not look correct. Even if this could somehow work with all the index numbers, the default elements (i.e. anything but bar.txt) should not be present as they are not meant to be configurable.

In 0.12, i.e. prior to the changes done for #508 , the section would have looked like this:

  <void id="IgnoredNames0" property="ignoredNames">
   <void property="items">
    <void method="add">
     <string>bar.txt</string>
    </void>
   </void>
  </void>

so there is something wrong with serialization of ignoredNames. The class has 2 subclasses to provide functionality for files and directories.

There is a getItems() method which looks like this:

44      public List<String> getItems() {
45          List<String> twoLists = new ArrayList<>();
46          twoLists.addAll(ignoredFiles.getItems());
47          twoLists.addAll(ignoredDirs.getItems());
48          return twoLists;
49      }

This method is used when env.writeConfiguration() is called from Indexer's main() so whatever it spits out has consequences on what will end up in configuration.xml.

Now, the problem with ignored files/directories seems to be happening for some reason only with partial indexing.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions