Skip to content
This repository has been archived by the owner on Aug 11, 2024. It is now read-only.

Improve indexing performance #1123

Merged
merged 8 commits into from
Feb 28, 2024
Merged

Conversation

bfollington
Copy link
Collaborator

@bfollington bfollington commented Feb 21, 2024

  • Batch reads and database calls in tight loops
  • Reduce re-parsing cost in MemoRecord init
  • Index peers in TaskGroup
  • Reduce re-parsing again
  • Adjust task priority

Speeds up indexing between 4x and 6x.

Before

image

After

image

@bfollington bfollington changed the base branch from main to 2024-02-16-peer-discovery February 21, 2024 05:28
Comment on lines +87 to +103
var idxA = currentIndex
var idxB = subsequence.startIndex

// iterating through both indices is MUCH faster than constructing and comparing
// a substring slice, especially since we get early exist on the first mismatched
// character.
repeat {
if rest[idxA] != subsequence[idxB] {
return false
}

idxA = rest.index(after: idxA)
idxB = subsequence.index(after: idxB)
} while idxB < subsequence.endIndex

self.currentIndex = endIndex
return true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this one speedup to be equally effective vs. storing an array of indices (a la https://maximeremenko.com/string-random-access)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

There may still be an advantage to storing indices in an array once at Tape creation IFF it has the same performance characteristics.

this.indices = Array(base.indices)

The reason is that consumeMatch is just one place where we want to do this kind of operation. I think we use it for most of the parser, but there are other cases where we might want to use peek or other helpers.

We also use Substring and Substring.Index in a bookkeeping role, for example, to keep track of the rest position and startIndex. I'm wondering if we could get a performance improvement by storing indices in an array, and thus avoiding the manipulation of Substring indexes altogether.

  • rest() could become a Range<Int> that indexes into indices
  • advance() coud increment an Int counter by one
  • Etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave that for a future exploration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test exactly this and found it didn't significantly affect indexing time, but I agree it may be worth further investigation. Specifically: if we could speed up the attributed string renderer our UI would feel significantly more responsive in list views.

@bfollington bfollington marked this pull request as ready for review February 27, 2024 03:53
@bfollington bfollington force-pushed the 2024-02-21-indexing-performance branch from e4486e8 to 0991a48 Compare February 27, 2024 03:53
@bfollington bfollington changed the base branch from 2024-02-16-peer-discovery to main February 27, 2024 03:53
Copy link
Collaborator

@gordonbrander gordonbrander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small request to update comment. Otherwise LGTM!

Great work.

Comment on lines +87 to +103
var idxA = currentIndex
var idxB = subsequence.startIndex

// iterating through both indices is MUCH faster than constructing and comparing
// a substring slice, especially since we get early exist on the first mismatched
// character.
repeat {
if rest[idxA] != subsequence[idxB] {
return false
}

idxA = rest.index(after: idxA)
idxB = subsequence.index(after: idxB)
} while idxB < subsequence.endIndex

self.currentIndex = endIndex
return true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

There may still be an advantage to storing indices in an array once at Tape creation IFF it has the same performance characteristics.

this.indices = Array(base.indices)

The reason is that consumeMatch is just one place where we want to do this kind of operation. I think we use it for most of the parser, but there are other cases where we might want to use peek or other helpers.

We also use Substring and Substring.Index in a bookkeeping role, for example, to keep track of the rest position and startIndex. I'm wondering if we could get a performance improvement by storing indices in an array, and thus avoiding the manipulation of Substring indexes altogether.

  • rest() could become a Range<Int> that indexes into indices
  • advance() coud increment an Int counter by one
  • Etc

Comment on lines +87 to +103
var idxA = currentIndex
var idxB = subsequence.startIndex

// iterating through both indices is MUCH faster than constructing and comparing
// a substring slice, especially since we get early exist on the first mismatched
// character.
repeat {
if rest[idxA] != subsequence[idxB] {
return false
}

idxA = rest.index(after: idxA)
idxB = subsequence.index(after: idxB)
} while idxB < subsequence.endIndex

self.currentIndex = endIndex
return true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave that for a future exploration.

xcode/Subconscious/Shared/Services/DataService.swift Outdated Show resolved Hide resolved
xcode/Subconscious/Shared/Services/DataService.swift Outdated Show resolved Hide resolved
@bfollington bfollington merged commit d2e2c8a into main Feb 28, 2024
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants