Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating devices takes an inordinate amount of time #11775

Closed
kkthxbye-code opened this issue Feb 17, 2023 · 7 comments · Fixed by #11785
Closed

Creating devices takes an inordinate amount of time #11775

kkthxbye-code opened this issue Feb 17, 2023 · 7 comments · Fixed by #11785
Assignees
Labels
status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application

Comments

@kkthxbye-code
Copy link
Contributor

kkthxbye-code commented Feb 17, 2023

NetBox version

v3.4.4

Python version

3.10

Steps to Reproduce

This issue will be used to collect what I find while investigating the issue, as such the scenario is based on my test database and test devicetype.

  1. Create a devicetype with 62 interfaces
  2. Create a device based on the interface
  3. Notice the responsetime

Expected Behavior

The device should be create within a reasonable time period.

Observed Behavior

The device takes many seconds to create.

Do note, that while the performance was never great, it was made worse by both the search indexing and the fact that we fixed a bug where changelog entries were not created for instantiated components.

@kkthxbye-code kkthxbye-code added type: bug A confirmed report of unexpected behavior in the application status: under review Further discussion is needed to determine this issue's scope and/or implementation labels Feb 17, 2023
@kkthxbye-code
Copy link
Contributor Author

kkthxbye-code commented Feb 17, 2023

Some background around my test database:

Row counts:

extras_cachedvalue: 798217
dcim_device: 9622
dcim_interface: 97056

Rough response time (without debug) for a devicetype with 62 interfaces: 6.01 seconds

Initial findings:

A lot of the processing time is spent deleting from extras_cachedvalue:
image

It's slow because there's no index on object_id. Adding an index on object_id:

create index extras_cachedvalue_object_id_sdfsdf ON extras_cachedvalue(object_id);

Response time after adding the index: 2.36 s (~60% decrease).

Before index:
image

After index:
image

Insertions did not seem to be noticeably effected in my testing, but reindexing might be slower, will need to test. Otherwise this seems like a very easy win.

@kkthxbye-code
Copy link
Contributor Author

I tested reindexing with and with an index on object_id and it seems like there's no difference.

With index:
python manage.py reindex 76,35s user 1,80s system 74% cpu 1:44,20 total

Without index:
python manage.py reindex 76,78s user 1,79s system 76% cpu 1:42,92 total

@DanSheps
Copy link
Member

DanSheps commented Feb 17, 2023

The device takes many seconds to create.

Do note, that while the performance was never great, it was made worse by both the search indexing and the fact that we fixed a bug where changelog entries were not created for instantiated components.

Wonder if we could push changelogging to the RQWorker (maybe include the option to spin up additional workers as well) for certain execution paths (anything in a RQ worker already can be completed there, anything web/api can go through the RQ).

This might also offer additional performance improvements, you just wouldn't get real-time change logging.

@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed status: under review Further discussion is needed to determine this issue's scope and/or implementation labels Feb 18, 2023
@jeremystretch jeremystretch self-assigned this Feb 18, 2023
@jeremystretch
Copy link
Member

Thanks for tackling this @kkthxbye-code!

As we discussed earlier today, we can sidestep the root issue here and avoid clearing any existing cache entries altogether when a new object is created (for which it's impossible to have any pre-existing entries). Of course, we should still add the index to serve cases where we do need to clear them.

Response time after adding the index: 2.36 s (~60 decrease).

I'm not able to reproduce such a stark reduction, but it's likely my cache just isn't large enough to impose that much of a delay. Can you say roughly how many object your test cache contains?

Wonder if we could push changelogging to the RQWorker

I opened #11202 to explore that idea recently, but haven't done any work on it yet.

@jeremystretch
Copy link
Member

@kkthxbye-code can you test PR #11785 when you get a chance? I haven't added the index yet, but it should result in the same improvement as we're sidestepping the call to clear() entirely.

If this is deemed acceptable to resolve this specific issue, I'd like to do an audit to add indexes everywhere we use generic foreign keys.

@kkthxbye-code
Copy link
Contributor Author

@jeremystretch

I'm not able to reproduce such a stark reduction, but it's likely my cache just isn't large enough to impose that much of a delay. Can you say roughly how many object your test cache contains?

I noted it in my first reply, but around 800k entries. Important to note that the 60% reduction is without DEBUG enabled. What does the reduction look like for you when just looking at the SQL tab in the debug toolbar?

I'll try the PR out!

@jeremystretch
Copy link
Member

I noted it in my first reply, but around 800k entries.

Sorry, missed that. I'll need to generate a bunch of objects to increase the cache to test properly on my end.

What does the reduction look like for you when just looking at the SQL tab in the debug toolbar?

It shaves some time off, but I don't think I have a large enough cache yet for it to be noticeable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants