Skip to content

Conversation

@wchargin
Copy link
Contributor

@wchargin wchargin commented Jan 23, 2020

Summary:
The Pip HTTP cache stores all downloaded wheels and is never evicted.
With new tf-nightly wheels every day, this adds up quickly. We last
cleared our Travis caches about a month ago, and they’re up to 14.3 GB.

Investigation shows that the Pip HTTP cache accounts for the majority of
the cache (about 70% after about a month of cache accrual), and also
that jobs with larger caches have significantly longer startup times,
with delta on the order of 8 minutes (again, after about a month). Also,
uploading large caches at the end of a job can take minutes, and Travis
doesn’t report success until this finishes. Fetching tf-nightly should
be comparatively cheap.

This reverts part of #2278.

Test Plan:
This PR reduces the “before install” time (i.e., time spent by Travis
internals before it gets to our script, including restoring cache) from
9m40s to 4m59s, a 48% improvement. The “install” time is increased from
3m36s to 3m56s, which seems acceptable.

wchargin-branch: ci-drop-pip-http-cache

Summary:
The Pip HTTP cache stores all downloaded wheels and is never evicted.
With new `tf-nightly` wheels every day, this adds up quickly. We last
cleared our Travis caches about a month ago, and they’re up to 14.3 GB.

Investigation shows that the Pip HTTP cache accounts for the majority of
the cache (about 70% after about a month of cache accrual), and also
that jobs with larger caches have significantly longer startup times,
with delta on the order of 8 minutes (again, after about a month). Also,
uploading large caches at the end of a job can take minutes, and Travis
doesn’t report success until this finishes. Fetching `tf-nightly` should
be comparatively cheap.

Caches may need to be cleared for this to take effect. We’ll find out.

Test Plan:
See what Travis thinks.

wchargin-branch: ci-drop-pip-http-cache
@wchargin
Copy link
Contributor Author

This PR reduces the “before install” time (i.e., time spent by Travis
internals before it gets to our script, including restoring cache) from
9m40s to 4m59s, a 48% improvement.

The “install” time is increased from 3m36s to 3m56s, which seems
acceptable. It takes longer to install tf-nightly than to download it,
and a full download-and-install from clean cache takes just 48 seconds
on my machine.

Travis is a bit deceptive about the time that it takes for it to restore
caches—it deletes this portion of the build log after it completes, and
backdates the elapsed timestamp—so I can’t give precise numbers about
improvements there.


cache:
# Reuse the pip cache directory across build machines.
pip: true
Copy link
Contributor

@stephanwlee stephanwlee Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip: false then explain why opted out? Some stupid person like me may flip it back on thinking that it will improve our CI (and saved the world) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; thanks. Done.

@wchargin
Copy link
Contributor Author

@stephanwlee pointed out offline that while this is an improvement, it
could be even better to persist a Pip cache that we evict daily. We came
up with two alternative approaches:

  • Move the Travis cron job to ~04:30 PT*, and rm the cache
    directory during cron jobs only.
  • Create a cache structure ~/.cache/tb-pip/YYYYMMDD, start each
    build with ln -s ~/.cache/tb-pip/TODAY ~/.cache/pip, and end each
    build by running rm ~/.cache/pip and also deleting any caches from
    prior days.

The cron job deletion is kind of ugly (to have effectful state in the
cron, and to rely on that particular time). It’d also be annoying to set
up, because the only way to do this is to wait until it’s actually 04:30
and start the job. You can’t set the scheduled time. :-/

The cache structure requires a bit of extra logic and thinking, but
seems pretty nice to me.

If we run into issues where we decide that we want one-day caches back,
we can consider doing something like this. For now, this PR chops
minutes off the CI latency and doesn’t increase the install time by
much, so I’ll propose that we merge it as is.

wchargin-branch: ci-drop-pip-http-cache
wchargin-source: 87d25ebd4ffd6db7bff21fa755b897892df1488d
@wchargin wchargin changed the title ci: don’t store Pip HTTP cache ci: don’t store Pip HTTP cache Jan 23, 2020
@wchargin wchargin merged commit 3849abe into master Jan 23, 2020
@wchargin wchargin deleted the wchargin-ci-drop-pip-http-cache branch January 23, 2020 19:48
@wchargin wchargin mentioned this pull request Jan 23, 2020
@wchargin
Copy link
Contributor Author

wchargin commented Feb 6, 2020

Following up: Caches appear to be holding steady at around 3500 MB now.
That’s still fairly large, but at least it’s not spiraling out of
control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants