Replies: 13 comments 44 replies
-
"It's a wrench in the gears of the common response to string searching: sorting the word list in some fashion to reduce the number of comparisons made. If we do not retain inverse word definition order when calling FIND, Forth doesn't work correctly." Actually, that does not need to be a problem. If a stable sort algorithm is used, like merge sort, then words with the same name could keep their relative order. |
Beta Was this translation helpful? Give feedback.
-
Similarly, I considered to make search faster by storing a 16-bit string hash instead of the full string. |
Beta Was this translation helpful? Give feedback.
-
Not yet mentioned is the effect of this sorting on item 1: string comparison. The |
Beta Was this translation helpful? Give feedback.
-
The design mentioned by whammo (editor does compilation instantly) actually
is not new. Maybe colorforth does something similar?
Im sure OKAD does
…On Tue, 11 Jan 2022 at 06:35, Whammo ***@***.***> wrote:
I see now that the context of this thread was set plainly in the beginning
of this thread and in that context my comments WERE intrusive and
off-topic. I will try to do better in the future and humbly apprentice.
—
Reply to this email directly, view it on GitHub
<#428 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAY34O3ONW4L3IGAQIPQ64LUVO6SXANCNFSM5LVACLRA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
One way to reduce comparisons when searching for a word is by first comparing the length. EDIT: That's why FIND-NAME does that. |
Beta Was this translation helpful? Give feedback.
-
There's an order-preserving hash table in CPython 3.6's dict type. The essence of it is fairly obvious, separating the hash-indexed array from the key array. This index could grant hash speed lookups into a dictionary, and you could fall back to linear search if the hash index didn't fit. E.g. we could build the hash table by traversing the words and not overwriting hash entries that are found. Then, if the lookup by hash finds a mismatched word, we can search from that position anyway as the word we want is either missing or deeper. Alternatively, we could traverse misses from latest as now, and update the hash table for a most recently used symbol cache. Best of all, these methods could be patched on and transient. Something like:
It's just a rough outline. Not tested. dowords will be helpful in building the hash table. The suggested latest check is vulnerable to specific forget/marker/define combinations that land on the same length again. Workaround: always rehash after forget or marker. Alternative for static hashtable: update-hash along with define. Actually, invalidate the whole table on any forget, because it changes the nametokens for everything newer. Another method to reduce comparisons is not to search the full dictionary. Forget does this by pruning internal words, and search order word lists can do so for larger sets. I think that extension is reasonably implementable. Our current find (or find-name) stops when it encounters a 0. If we add the pointer back in to that 0-length entry, we can use it to indicate the next word list to search. That's 2 bytes per word-list, and a small extra check making the difference between find and search-wordlist. Set-order can then rearrange these word-list links. The wordlist identifiers can simply hold the beginning of each dictionary. The current compilation wordlist can be the latest, or we could make set-current actually move the word-list to the lowest address. latest will have different values for define and find. |
Beta Was this translation helpful? Give feedback.
-
How would we search and maintain the directory if we had unlimited memory, and we want compiling as fast as possible? |
Beta Was this translation helpful? Give feedback.
-
Just to loop back into the conversation. Hashing is a fun idea, but realistically I do not think it will give any dramatic improvement. This is because what we already have, string length + first character, is already a fairly decent 16-bit hash. Implementing the Search-Order word set, like Whammo suggested, should give a very good improvement. This is obvious, since the assembler words alone take up more than half of the word list. Switching them out should give a notable speed-up. |
Beta Was this translation helpful? Give feedback.
-
Here's a silly little proof of concept that can hide the assembly words:
The The reason this has a lift-words routine rather than a bury-words is to perform the entire dictionary change in one move. I'm too tired to figure out the proper sequence of it all now. |
Beta Was this translation helpful? Give feedback.
-
FWIW, I think I now have some evidence that
I expect |
Beta Was this translation helpful? Give feedback.
-
Proof of concept verification for simpler
By restricting the problem domain to the symbols we discussed, this reduced the lowercase routine core to four instructions. A few PETSCII graphics code points were converted into arrows, brackets, fetch and comment characters. As a bonus, this converts the ASCII lowercase range also. The main decision is where to store the $c0 constant for BIT; it would be more efficient to have a permanent value in zeropage, but anywhere could do. Not using BIT means we'd need to save the original value, which the current CHAR_TO_LOWERCASE does. Then again, putting it out in code memory and still using BIT just costs two bytes and one cycle per call, far faster than any save/restore. |
Beta Was this translation helpful? Give feedback.
-
Searching the dictionary will inevitably be a number of string comparisons. We can control 2 things about this process, performance-wise:
Currently, our string comparison function is relatively fast. It filters first by length, then character-by-character until a branch on mis-match, or a fall-through match. Not much could be done here to improve the speed of string matching, if anything.
We have many comparisons to make, however. If
wdepth
is the word's distance fromlatest
in the dictionary, then we have to makewdepth
comparisons to find that word. New words lookup quickly, while older words lookup slowly.This has bad effects for common, builtin words, as the most fundamental words take the longest to find. Complicating the issue is the Forth Standard, which specifies that newer entries need to be found before later entries in the case of duplicate words.
It's a wrench in the gears of the common response to string searching: sorting the word list in some fashion to reduce the number of comparisons made. If we do not retain inverse word definition order when calling
FIND
, Forth doesn't work correctly.So, if we are to employ sorting at all, we must include data about the definition sequence in the
COMPARE
operation of whichever algorithm we might seek to employ, growing the dictionary size.Here we see illustrated the trade-off we face with dictionary search: speed for size. There may be a much faster dictionary structure, but it will be larger.
Beta Was this translation helpful? Give feedback.
All reactions