Version 1.3
Two significant performance improvements.
- Row magnitudes for a
VectorSpaceModel
object are now cached in an environment that allows some pass-by-reference editing. This means that the most time-consuming part of any comparison query is only done once for any given vector set; subsequent queries are at least an order of magnitude (10-20x)? faster.
Although this is a big performance improvement, certain edge cases might not wipe the cache clear. In particular, assignment inside a VSM object might cause incorrect calculations. I can't see why anyone would be in the habit of manually tweaking a row or block (rather than a whole matrix).
- Access to rows in a
VectorSpaceModel
object is now handled through callNextMethod() rather than accessing the element's .Data slot. For reasons opaque to me, hitting the .Data slot seems to internally require copying the whole huge matrix internally. Now that no longer happens.