Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

359 index tags by key #361

Merged
merged 5 commits into from
Nov 3, 2024
Merged

359 index tags by key #361

merged 5 commits into from
Nov 3, 2024

Conversation

lemon24
Copy link
Owner

@lemon24 lemon24 commented Nov 2, 2024

Approach described in #359 (comment).

Timings below, summary:

  • slight improvement for feeds, expected because of the low number
  • huge improvement for entries where only a few of the entries have that tag (i.e. the index clearly is working), but worse if almost all entries have the tag – I expect the new version to be better as long as only <10% of the entries have a tag

Before:

177 total feeds
1 feeds tagged '.update'
4 feeds tagged 'corp'
8 feeds tagged 'webcomic'
64 feeds tagged 'main'

get_feeds(tags=...)
['.update']               min 0.000598  avg 0.000666
['corp']                  min 0.000659  avg 0.000692
['webcomic']              min 0.000733  avg 0.000764
['main']                  min 0.001843  avg 0.001880
[True]                    min 0.003799  avg 0.003851
[['corp', 'webcomic']]    min 0.001007  avg 0.001047
[['-webcomic']]           min 0.003975  avg 0.004035
[['corp'], ['webcomic']]  min 0.000621  avg 0.000658

get_feed_counts(tags=...)
['.update']               min 0.000438  avg 0.000454
[True]                    min 0.000385  avg 0.000431
[['corp', 'webcomic']]    min 0.000634  avg 0.000667
[['corp'], ['webcomic']]  min 0.000496  avg 0.000524

19117 total entries
2 entries tagged '.comments'
18989 entries tagged '.readtime'

get_entry_counts(tags=...)
['.comments']             min 0.145794  avg 0.147272
['.readtime']             min 0.172323  avg 0.174604

After:

177 total feeds
1 feeds tagged '.update'
4 feeds tagged 'corp'
8 feeds tagged 'webcomic'
64 feeds tagged 'main'

get_feeds(tags=...)
['.update']               min 0.000421  avg 0.000481
['corp']                  min 0.000491  avg 0.000527
['webcomic']              min 0.000570  avg 0.000599
['main']                  min 0.001692  avg 0.001735
[True]                    min 0.003749  avg 0.003809
[['corp', 'webcomic']]    min 0.000662  avg 0.000690
[['-webcomic']]           min 0.003915  avg 0.003982
[['corp'], ['webcomic']]  min 0.000638  avg 0.000668

get_feed_counts(tags=...)
['.update']               min 0.000251  avg 0.000258
[True]                    min 0.000435  avg 0.000470
[['corp', 'webcomic']]    min 0.000284  avg 0.000292
[['corp'], ['webcomic']]  min 0.000515  avg 0.000549

19117 total entries
2 entries tagged '.comments'
18989 entries tagged '.readtime'

get_entry_counts(tags=...)
['.comments']             min 0.000929  avg 0.000966
['.readtime']             min 0.415308  avg 0.421528
Timings generated with:
import timeit
from reader import make_reader

reader = make_reader('db.sqlite')
url = 'https://death.andgravity.com/_feed/index.xml'
tags = ".update corp webcomic main".split()

def time(stmt, label='', repeat=100):
    times = timeit.repeat(stmt, repeat=repeat, number=1, globals=globals())
    print( f"{label or stmt:24}  min {min(times):.6f}  avg {sum(times)/len(times):.6f}")
    
print(reader.get_feed_counts().total, 'total feeds')
for tag in tags:
    print(reader.get_feed_counts(tags=[tag]).total, f'feeds tagged {tag!r}')
print()

def time_get_feeds(tags):
    time(f"for _ in reader.get_feeds(tags={tags!r}): ...", f"{tags}")
    
print("get_feeds(tags=...)")
for tag in tags:
    time_get_feeds([tag])
time_get_feeds([True])
time_get_feeds([['corp', 'webcomic']])
time_get_feeds([['-webcomic']])
time_get_feeds([['corp'], ['webcomic']])
print()
    
def time_get_feed_counts(tags):
    time(f"reader.get_feed_counts(tags={tags!r})", f"{tags}")

print("get_feed_counts(tags=...)")
time_get_feed_counts(['.update'])
time_get_feed_counts([True])
time_get_feed_counts([['corp', 'webcomic']])
time_get_feed_counts([['corp'], ['webcomic']])
print()


entry_tags = ".comments .readtime".split()
print(reader.get_entry_counts().total, 'total entries')
for tag in entry_tags:
    print(reader.get_entry_counts(tags=[tag]).total, f'entries tagged {tag!r}')
print()

def time_get_entry_counts(tags):
    time(f"reader.get_entry_counts(tags={tags!r})", f"{tags}", 10)

print("get_entry_counts(tags=...)")
for tag in entry_tags:
    time_get_entry_counts([tag])
print()

reader.close()

@lemon24 lemon24 linked an issue Nov 2, 2024 that may be closed by this pull request
Copy link

codecov bot commented Nov 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.16%. Comparing base (97a0c98) to head (7c57706).
Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #361      +/-   ##
==========================================
+ Coverage   95.14%   95.16%   +0.01%     
==========================================
  Files          96       96              
  Lines       12147    12193      +46     
  Branches      825      837      +12     
==========================================
+ Hits        11557    11603      +46     
  Misses        516      516              
  Partials       74       74              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lemon24
Copy link
Owner Author

lemon24 commented Nov 2, 2024

@nobrowser, this exists now (feel free to leave feedback if you want to); I will merge it and make a release in the next days.

@nobrowser
Copy link

nobrowser commented Nov 3, 2024

@nobrowser, this exists now (feel free to leave feedback if you want to); I will merge it and make a release in the next days.

Thank you, I'm taking a step back and thinking if I should use python for this at all. But looks like this change would indeed be helpful.

@lemon24 lemon24 merged commit 3702de9 into master Nov 3, 2024
18 checks passed
@lemon24 lemon24 deleted the 359-index-tags-by-key branch November 3, 2024 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feed and entry tags should be indexed by key
2 participants