Skip to content

Commit

Permalink
Fix slowness of source indexing and other problems
Browse files Browse the repository at this point in the history
In late June the source indexing step on Windows builds went from taking
about 15 minutes to taking several hours. This is because a rust
toolchain directory was added and its files are not mapped in as a git
repo. Therefore every one of these files that was referenced in the
source information had "git log" run on it, and this command is slow,
especially on non-existent files. Source indexing of chrome.dll went
from 45 seconds to 4,500 s - 100x slower. Source indexing of
gaia1_0.dll was similarly slowed. Nothing else seemed to be affected.

The output and C++ toolchain directories are already excluded from
source indexing because of this issue, so all we need to do is add
another excluded directory. This time I'm adding a --excluded-dirs
switch to make it generic and more obvious. I'm also making other
changes:
1) The toolchain and output directories used to have to be specified as
absolute paths. Passing a relative path would make the script take hours
longer. The script should make them absolute, and should fail if the
specified directory does not exist.
2) The lack of an encoding passed to subprocess was causing various
decode exceptions that presumably caused information to be lost, and
also cluttered the output.
3) Some Python 2/3 compatibility hacks can now be removed.
4) Since calling git log when a file doesn't exist is expensive this
change adds a message whenever this happens. In my tests this message is
never hit. If it starts showing up thousands of times then the cause of
the slowdown will be obvious next time.

While this CL does make this script handle relative paths, and does fix
encoding exceptions, it does not by itself fix the performance issue.
For that a change in how this script is called will be needed, after
this CL lands.

Test commands:
#1: reproduce the problem and make sure no exclusion-dirs is
handled:
vpython3 tools/symsrc/source_index.py out\official_test\chrome.dll.pdb --build-dir=out\official_test --toolchain-dir=third_party\depot_tools\win_toolchain\vs_files\27370823e7

#2: use exclusion-dirs to get a 100x speedup:
vpython3 tools/symsrc/source_index.py out\official_test\chrome.dll.pdb --build-dir=out\official_test --toolchain-dir=third_party\depot_tools\win_toolchain\vs_files\27370823e7 --exclusion-dirs=third_party\rust-toolchain

#3: use exclusion-dirs with semi-colon separation to replace using
--build-dir (still fast):
vpython3 tools/symsrc/source_index.py out\official_test\chrome.dll.pdb --toolchain-dir=third_party\depot_tools\win_toolchain\vs_files\27370823e7 --exclusion-dirs=third_party\rust-toolchain;out\official_test

A clean copy of chrome.dll.pdb is needed for each test. The freshly
built DLL was copied elsewhere and then copied back before each test.

See README.chromium for more testing ideas.

Bug: 1471723
Change-Id: I9d28a44ba2bed9d52258c42961e14ddf432ac4bd
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4771880
Reviewed-by: Sébastien Marchand <sebmarchand@chromium.org>
Commit-Queue: Bruce Dawson <brucedawson@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1182418}
  • Loading branch information
randomascii authored and Chromium LUCI CQ committed Aug 11, 2023
1 parent 00359e0 commit 139fc68
Showing 1 changed file with 31 additions and 29 deletions.
60 changes: 31 additions & 29 deletions tools/symsrc/source_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,6 @@
dealt with as win32 paths, since we have to interact with the Microsoft tools.
"""

from __future__ import print_function

try:
# Python 3.x
from urllib.parse import urlparse
# Replace the Python 2 unicode function with str when running Python 3.
unicode = str
except ImportError:
# Python 2.x
from urlparse import urlparse

import optparse
import os
import re
Expand All @@ -67,6 +56,7 @@
import win32api

from collections import namedtuple
from urllib.parse import urlparse

# Map that associates Git repository URLs with the URL that should be used to
# retrieve individual files from this repo. Entries in this map should have the
Expand Down Expand Up @@ -122,7 +112,7 @@

def GetCasedFilePath(filename):
"""Return the correctly cased path for a given filename"""
return win32api.GetLongPathName(win32api.GetShortPathName(unicode(filename)))
return win32api.GetLongPathName(win32api.GetShortPathName(str(filename)))


def FindSrcSrvFile(filename, toolchain_dir):
Expand Down Expand Up @@ -153,6 +143,7 @@ def RunCommand(*cmd, **kwargs):
kwargs.setdefault('stdout', subprocess.PIPE)
kwargs.setdefault('stderr', subprocess.PIPE)
kwargs.setdefault('universal_newlines', True)
kwargs.setdefault('encoding', 'utf-8')
raise_on_failure = kwargs.pop('raise_on_failure', True)

proc = subprocess.Popen(cmd, **kwargs)
Expand Down Expand Up @@ -252,6 +243,8 @@ def ExtractGitInfo(local_filename):
cwd=local_file_dir, raise_on_failure=False)

if not file_info:
print('No results from running git log %s in %s. This can be expensive.' %
(local_file_basename, local_file_dir))
return

# Get the revision of the master branch.
Expand Down Expand Up @@ -453,26 +446,29 @@ def DirectoryIsPartOfPublicGitRepository(local_dir):
return False


def UpdatePDB(pdb_filename, verbose=True, build_dir=None, toolchain_dir=None,
def UpdatePDB(pdb_filename,
verbose=True,
build_dir=None,
toolchain_dir=None,
exclusion_dirs=None,
follow_junctions=False):
"""Update a pdb file with source information."""
dir_exclusion_list = { }

if build_dir:
# Excluding the build directory allows skipping the generated files, for
# Chromium this makes the indexing ~10x faster.
build_dir = (os.path.normpath(build_dir)).lower()
for directory, _, _ in os.walk(build_dir):
dir_exclusion_list[directory.lower()] = True
dir_exclusion_list[build_dir.lower()] = True

if toolchain_dir:
# Exclude the directories from the toolchain as we don't have revision info
# for them.
toolchain_dir = (os.path.normpath(toolchain_dir)).lower()
for directory, _, _ in os.walk(toolchain_dir):
dir_exclusion_list[directory.lower()] = True
dir_exclusion_list[toolchain_dir.lower()] = True
# We want to exclude files that aren't in git repos. This includes generated
# files, toolchain files, and possibly others.

dirs = [build_dir, toolchain_dir]
if exclusion_dirs:
dirs.extend(exclusion_dirs.split(';'))
for dir in dirs:
if dir:
if not os.path.exists(dir):
raise Exception('Exclusion directory %s does not exist.' % dir)
dir = (os.path.abspath(dir)).lower()
for directory, _, _ in os.walk(dir):
dir_exclusion_list[directory.lower()] = True
dir_exclusion_list[dir.lower()] = True

# Writes the header of the source index stream.
#
Expand Down Expand Up @@ -577,6 +573,12 @@ def main():
parser.add_option('--toolchain-dir', help='The directory containing the VS '
'toolchain that has been used for this build. All the files present in '
'this directory (or one of its subdirectories) will be skipped.')
parser.add_option(
'--exclusion-dirs',
help='A semicolon separated list of '
'directories containing files that should be excluded from looking up '
'source control information. Note that the build-dir and toolchain-dir '
'are automatically excluded.')
parser.add_option('--follow-junctions', action='store_true',help='Indicates '
'if the junctions should be followed while doing the indexing.',
default=False)
Expand All @@ -590,7 +592,7 @@ def main():

for pdb in args:
UpdatePDB(pdb, options.verbose, options.build_dir, options.toolchain_dir,
options.follow_junctions)
options.exclusion_dirs, options.follow_junctions)

return 0

Expand Down

0 comments on commit 139fc68

Please sign in to comment.