Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds index on long url (and domain id) #2133

Closed

Conversation

marijnvandevoorde
Copy link
Contributor

@marijnvandevoorde marijnvandevoorde commented May 21, 2024

Fixes #2131

This drastically improves the performance when using findIfExists when dealing with a lot of urls.
Below the results when working on a local db of 450K URLs:
Without index:


          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: script.js
        output: -

     scenarios: (100.00%) 1 scenario, 10 max VUs, 40s max duration (incl. graceful stop):
              * default: 10 looping VUs for 10s (gracefulStop: 30s)


     data_received..................: 36 kB 3.2 kB/s
     data_sent......................: 24 kB 2.1 kB/s
     http_req_blocked...............: avg=199.86µs min=2µs      med=4µs   max=1.21ms p(90)=1.14ms  p(95)=1.17ms  
     http_req_connecting............: avg=72.25µs  min=0s       med=0s    max=457µs  p(90)=414µs   p(95)=442.2µs 
     http_req_duration..............: avg=1.81s    min=633.91ms med=1.81s max=2.68s  p(90)=2.12s   p(95)=2.21s   
       { expected_response:true }...: avg=1.81s    min=633.91ms med=1.81s max=2.68s  p(90)=2.12s   p(95)=2.21s   
     http_req_failed................: 0.00% ✓ 0        ✗ 59  
     http_req_receiving.............: avg=87.27µs  min=40µs     med=83µs  max=300µs  p(90)=111.2µs p(95)=130.89µs
     http_req_sending...............: avg=77.66µs  min=19µs     med=37µs  max=389µs  p(90)=255µs   p(95)=282.79µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s    max=0s     p(90)=0s      p(95)=0s      
     http_req_waiting...............: avg=1.81s    min=633.37ms med=1.81s max=2.68s  p(90)=2.12s   p(95)=2.21s   
     http_reqs......................: 59    5.263379/s
     iteration_duration.............: avg=1.81s    min=636.58ms med=1.81s max=2.68s  p(90)=2.12s   p(95)=2.21s   
     iterations.....................: 59    5.263379/s
     vus............................: 5     min=5      max=10
     vus_max........................: 10    min=10     max=10

With index:


          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: script.js
        output: -

     scenarios: (100.00%) 1 scenario, 10 max VUs, 40s max duration (incl. graceful stop):
              * default: 10 looping VUs for 10s (gracefulStop: 30s)


     data_received..................: 90 kB 8.5 kB/s
     data_sent......................: 61 kB 5.7 kB/s
     http_req_blocked...............: avg=51.89µs  min=1µs      med=4µs      max=747µs    p(90)=5µs      p(95)=696.2µs 
     http_req_connecting............: avg=26.75µs  min=0s       med=0s       max=429µs    p(90)=0s       p(95)=379.4µs 
     http_req_duration..............: avg=691ms    min=274.01ms med=689.2ms  max=935.62ms p(90)=803.77ms p(95)=837.59ms
       { expected_response:true }...: avg=691ms    min=274.01ms med=689.2ms  max=935.62ms p(90)=803.77ms p(95)=837.59ms
     http_req_failed................: 0.00% ✓ 0         ✗ 149 
     http_req_receiving.............: avg=83.63µs  min=21µs     med=78µs     max=668µs    p(90)=97.2µs   p(95)=124.19µs
     http_req_sending...............: avg=34.2µs   min=10µs     med=31µs     max=129µs    p(90)=45µs     p(95)=70.79µs 
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=690.88ms min=273.81ms med=689.1ms  max=935.51ms p(90)=803.65ms p(95)=837.48ms
     http_reqs......................: 149   14.113973/s
     iteration_duration.............: avg=691.32ms min=276.25ms med=689.43ms max=935.72ms p(90)=803.92ms p(95)=837.8ms 
     iterations.....................: 149   14.113973/s
     vus............................: 10    min=10      max=10
     vus_max........................: 10    min=10      max=10


This drastically improves the performance when using findIfExists
@marijnvandevoorde
Copy link
Contributor Author

Ugh, I guess I need to figure out how to deal with prefix indexes on MS SQL here... I'll see what I can do.

@acelaya
Copy link
Member

acelaya commented May 22, 2024

Ugh, I guess I need to figure out how to deal with prefix indexes on MS SQL here... I'll see what I can do.

Before you spend a lot more time on this, let's discuss the general problem and potential solutions a bit further in #2131, because I think it is a bit harder than what's proposed here.

See #2131 (comment) for context

@marijnvandevoorde
Copy link
Contributor Author

I'm going to close this one as it won't help us right now. We may end up doing a migration on our own infra, but that will also involve changing the original_url field to a varchar(255) or something similar. I fully understand that will never be merged upstream, that's why I'm closing this.

@acelaya
Copy link
Member

acelaya commented May 28, 2024

I'm going to close this one as it won't help us right now

Have you considered not using findIfExists at all? If I'm not wrong, you are planning to create many temp short URLs with an expiration date (validUntil). When this is provided, findIfExists will also take it into consideration, and create a different short URL if none of existing ones match the long URL + validUntil.

If you don't pass findIfExists the short URL creation should be much faster, and they can be pruned later via short-url:delete-expired command.

@marijnvandevoorde
Copy link
Contributor Author

Aaah, it makes sense of course, that it has to use both. So the index would also be a bit more complex than I thought. In fact, the whole idea of trying to insert without checking if it's present would fail for this reason: the combo domain + short code might exist, but be expired...

Either way, we'll move forward without using findIfExists, that was our backup plan anyway. All good from our end!

@acelaya
Copy link
Member

acelaya commented Jun 6, 2024

So the index would also be a bit more complex than I thought. In fact, the whole idea of trying to insert without checking if it's present would fail for this reason: the combo domain + short code might exist, but be expired...

Correct 😅

That's more or less what I meant in #2131 (comment), when I pointed out to the method doing multiple checks in multiple fields, not just long URL + domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance when using findIfExists
2 participants