On Dictionary Search #428

burnsauce · 2022-01-11T00:39:36Z

burnsauce
Jan 11, 2022

Searching the dictionary will inevitably be a number of string comparisons. We can control 2 things about this process, performance-wise:

How fast each string comparison is
How many string comparisons are made

Currently, our string comparison function is relatively fast. It filters first by length, then character-by-character until a branch on mis-match, or a fall-through match. Not much could be done here to improve the speed of string matching, if anything.

We have many comparisons to make, however. If wdepth is the word's distance from latest in the dictionary, then we have to make wdepth comparisons to find that word. New words lookup quickly, while older words lookup slowly.

This has bad effects for common, builtin words, as the most fundamental words take the longest to find. Complicating the issue is the Forth Standard, which specifies that newer entries need to be found before later entries in the case of duplicate words.

It's a wrench in the gears of the common response to string searching: sorting the word list in some fashion to reduce the number of comparisons made. If we do not retain inverse word definition order when calling FIND, Forth doesn't work correctly.

So, if we are to employ sorting at all, we must include data about the definition sequence in the COMPARE operation of whichever algorithm we might seek to employ, growing the dictionary size.

Here we see illustrated the trade-off we face with dictionary search: speed for size. There may be a much faster dictionary structure, but it will be larger.

jkotlinski · 2022-01-11T00:51:52Z

jkotlinski
Jan 11, 2022
Maintainer

"It's a wrench in the gears of the common response to string searching: sorting the word list in some fashion to reduce the number of comparisons made. If we do not retain inverse word definition order when calling FIND, Forth doesn't work correctly."

Actually, that does not need to be a problem. If a stable sort algorithm is used, like merge sort, then words with the same name could keep their relative order.
The bigger problem is how to preserve functionality of words like marker.

1 reply

burnsauce Jan 11, 2022
Author

Yep, that's one way to unwrench it.

marker is fine if dictionary entries are sequentially numbered. Same size->speed tradeoff.

jkotlinski · 2022-01-11T00:54:57Z

jkotlinski
Jan 11, 2022
Maintainer

Similarly, I considered to make search faster by storing a 16-bit string hash instead of the full string.
That would work fine, except for words.

5 replies

burnsauce Jan 11, 2022
Author

As long as the hash was fast and collision-free. Would you stop storing the actual string? see no evil...

burnsauce Jan 11, 2022
Author

Using this algorithm:

int hash = 0;
for (int i = 0; i < s.length(); i++)
    hash = (R * hash + s[i]) % M;

I am searching for a set of R, M primes that fit the dictionary after include test with no --test--, a dictionary of over 700 words.

In a live durexForth environment, it might take me a while.

Let me know if you have any insight.

Whammo Jan 11, 2022
Collaborator

Could a CRC serve the same purpose?

burnsauce Jan 11, 2022
Author

If the CRC applied to every word in the dictionary is unique, yes!

Whammo Jan 11, 2022
Collaborator

Would it be calculating over the string, the first letter, the whole word?
Maybe just the length byte and the string.

burnsauce · 2022-01-11T00:57:24Z

burnsauce
Jan 11, 2022
Author

Not yet mentioned is the effect of this sorting on item 1: string comparison.

The COMPARE function of the search always needs to do more than comparing the string length. It still has to be efficient to outperform the current design, even if it is O(log n).

0 replies

Whammo · 2022-01-11T01:13:55Z

Whammo
Jan 11, 2022
Collaborator

3 replies

burnsauce Jan 11, 2022
Author

Care to edit this post? I don't understand what you mean. Each word already has a unique address.

burnsauce Jan 11, 2022
Author

Pre-compiling the durexforth source as a cross-compilation process would indeed speed up build times, but do nothing for compilation of source at runtime.

burnsauce Jan 11, 2022
Author

And yeah it's a little off-topic. The subject is searching the dictionary at runtime, sorry if that wasn't clear :)

Whammo · 2022-01-11T02:15:26Z

Whammo
Jan 11, 2022
Collaborator

4 replies

burnsauce Jan 11, 2022
Author

If the addresses are so "fixed" in source, how would you handle different include orders?

burnsauce Jan 11, 2022
Author

Imagine this simple example:

A user writes some code in v and compiles it, then saves his source file to disk.

He later goes back and modifies the source code to add a word up at the top.

What becomes of all the pre-compiled source addresses?

burnsauce Jan 11, 2022
Author

Wow, sounds like quite a design. Let me know when you have it working, I would be interested to see.

Whammo Jan 11, 2022
Collaborator

I see now that the context of this thread was set plainly in the beginning of this thread and in that context my comments WERE intrusive and off-topic. I will try to do better in the future and humbly apprentice.

jkotlinski · 2022-01-11T05:48:12Z

jkotlinski
Jan 11, 2022
Maintainer

The design mentioned by whammo (editor does compilation instantly) actually is not new. Maybe colorforth does something similar? Im sure OKAD does

…

On Tue, 11 Jan 2022 at 06:35, Whammo ***@***.***> wrote: I see now that the context of this thread was set plainly in the beginning of this thread and in that context my comments WERE intrusive and off-topic. I will try to do better in the future and humbly apprentice. — Reply to this email directly, view it on GitHub <#428 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAY34O3ONW4L3IGAQIPQ64LUVO6SXANCNFSM5LVACLRA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

1 reply

Whammo Jan 11, 2022
Collaborator

I believe colorforth does, but perhaps it's like its storing the dictionary in arrays. To do it efficiently, requires higher-order instructions the 6502 doesn't have.

Whammo · 2022-06-29T05:15:15Z

Whammo
Jun 29, 2022
Collaborator

One way to reduce comparisons when searching for a word is by first comparing the length.
Then you compare the first letter.

EDIT: That's why FIND-NAME does that.

0 replies

lonetech · 2022-07-10T15:33:54Z

lonetech
Jul 10, 2022

There's an order-preserving hash table in CPython 3.6's dict type. The essence of it is fairly obvious, separating the hash-indexed array from the key array. This index could grant hash speed lookups into a dictionary, and you could fall back to linear search if the hash index didn't fit. E.g. we could build the hash table by traversing the words and not overwriting hash entries that are found. Then, if the lookup by hash finds a mismatched word, we can search from that position anyway as the word we want is either missing or deeper. Alternatively, we could traverse misses from latest as now, and update the hash table for a most recently used symbol cache. Best of all, these methods could be patched on and transient.

Something like:

: hash ( caddr u -- c ) ( some compact hash function) nip ;
variable hashtable
 \ Could be reorganized into ( -- hashtable | 0 ) for use with ?dup if
: ?hash-valid ( -- flag ) 
  hashtable @ dup
  here u> swap
  $200 + latest =
and ; (allow hashtable to be discarded when dictionary changes)
: find-name-h ( caddr u -- 0 | nt )
  ?hash-valid if
    2dup hash words hashtable + @  \ lookup hashed entry
    ( Most recent used cache: check entry, full find if wrong, update.
      Most recent defined: do find from entry, return 0 if not found. )
    ?dup if
      latest >r  to latest  find-name  <r to latest   \ hack: let find-name do verification, and deeper search
    else
      exit  \ hashtable already confirmed not found. Alternative: full find-name. 
    then
  else find-name then ;
: update-hash ( nt -- flag )
  dup c@ 1+ swap hash 2*  ( nt hash )
  hashtable @ + ( nt hashentry )
  dup @ if 
    2drop  \ guard against overwrites
  else
    !
  then
  -1
;
: rehash ( -- )
  latest $200 - 
  dup here < if drop exit then  \ guard against out of memory
  hashtable !
  hashtable $200 0 fill
  ['] update-hash dowords
;

It's just a rough outline. Not tested. dowords will be helpful in building the hash table. The suggested latest check is vulnerable to specific forget/marker/define combinations that land on the same length again. Workaround: always rehash after forget or marker. Alternative for static hashtable: update-hash along with define. Actually, invalidate the whole table on any forget, because it changes the nametokens for everything newer.

Another method to reduce comparisons is not to search the full dictionary. Forget does this by pruning internal words, and search order word lists can do so for larger sets. I think that extension is reasonably implementable. Our current find (or find-name) stops when it encounters a 0. If we add the pointer back in to that 0-length entry, we can use it to indicate the next word list to search. That's 2 bytes per word-list, and a small extra check making the difference between find and search-wordlist. Set-order can then rearrange these word-list links. The wordlist identifiers can simply hold the beginning of each dictionary. The current compilation wordlist can be the latest, or we could make set-current actually move the word-list to the lowest address. latest will have different values for define and find.

4 replies

lonetech Jul 14, 2022

FWIW, now I have tested some code. It appears it takes some inline 8 bit code to do, but an average speedup looks plausible:

\ Accelerate name lookups with hash table

\ Need a hash function that is fast and preferably compact.
\ Pearson hashes are reasonably strong, but require minimum 4 cycles per byte for lookup.

: nt>au dup 1+ swap c@ ;

code hash ( caddr u -- c )
\ idea: asl and sbc. asl will set carry flag, sbc folds it back in.

\ Setup: load address in w3, length into y
lsb ldy,x
lsb 1+ lda,x
w3 sta,
msb 1+ lda,x
w3 1+ sta,

dey,    \ for 0-index on caddr. Not needed for nt / cstr!
\ Now we have caddr in w3 and u in y. loop
0 lda,#
:-
asl,a       \ 2 cycles
w3 sbc,(y)  \ 5+ cycles!
dey,
-branch bpl,

\ teardown: put result on stack
inx,        \ drop
lsb sta,x
0 lda,#
msb sta,x
;code

\ TODO: place hashtable. e.g. latest $ff00 and $200 -
\ Aligning to $100 avoids page boundary penalty
$200 allot
here $ff00 and $100 -
constant hashtable

: nt>h? ( nt -- -1 )
\ Update hash entry if not already filled
dup nt>au hash  \ calculate hash for nt
$fe and    \ single page hashtable (128 entries)
hashtable + dup @ ( nt he hev )
if \ entry present
  2drop
else
  !
then -1 ;

: rehash
hashtable $100 0 fill
['] nt>h? dowords ;

: find-name-h ( caddr u - 0 | nt )
2dup hash
$fe and    \ single page hashtable (128 entries)
hashtable + @
( caddr u hev )
?dup if
    latest >r
    to latest
    find-name
    r> to latest
else
    2drop 0
then ;

code find-name-hc ( caddr u - 0 | nt )
\ Setup: load address in w3, length into y
lsb ldy,x
lsb 1+ lda,x
w3 sta,
msb 1+ lda,x
w3 1+ sta,

dey,    \ for 0-index on caddr. Not needed for nt / cstr!
\ Now we have caddr in w3 and u in y. loop
0 lda,#
:-
asl,a       \ 2 cycles
w3 sbc,(y)  \ 5+ cycles!
dey,
-branch bpl,

\ Lookup entry
$fe and,#
tay,
hashtable lda,y
w3 sta,
hashtable 1+ lda,y
w3 1+ sta,
\ Possible optimization: entries are not in zero page,
\ so zero test can be done on MSB only
w3 ora,

+branch bne,

\ Entry empty, so not found
\ ' find-name jmp,
inx,
0 lda,#
lsb sta,x
msb sta,x
rts,

:+
\ Have an entry, search from there
\ find-name will check full name
\ Save latest
' latest 3 + lda,
pha,
' latest 1+ lda,
pha,
\ Start search from cached entry
w3 lda,
' latest 1+ sta,
w3 1+ lda,
' latest 3 + sta,
' find-name jsr,
pla,
' latest 1+ sta,
pla,
' latest 3 + sta,
;code

Demonstration code (uses timer.fs):

: fnt0 nt>au find-name ;
: fnt1 nt>au find-name-h ;
: fnt2 nt>au find-name-hc ;

: verify nt>au
2dup find-name >r
2dup find-name-h >r
2dup find-name-hc
dup r> = swap
r> = and
if
  2drop
  -1
else
  ." !"
  type
  0
then ;

require timer

: hashdemo
  rehash
  60
  ." find-name   " start over 0 do
  ['] fnt0 dowords loop stop cr
  ." find-name-h " start over 0 do
  ['] fnt1 dowords loop stop cr
  ." find-name-hc" start over 0 do
  ['] fnt2 dowords loop stop cr
  ." verify" ['] verify dowords cr
  drop
;

Example results:

hashdemo
find-name   7.566
find-name-h 8.633
find-name-hc4.783
verify
ok

Whammo Jul 14, 2022
Collaborator

This looks like something that could find it's way into the 4k RAM under I/O.

lonetech Jul 15, 2022

With a luxurious 4k, I'd split the MSB and LSB (like the data stack) and use the full 8 bit hash, removing the $fe and,#. Twice as many entry points and two cycles faster. Anyhow, from here the next step is patching calls to load-name, and adding an update call to header, ; and :code, so the hash table can be kept current during included. Another improvement would be self modifying the pointer instead of indirecting through W3, to save a cycle per character in the hashing routine.

lonetech Aug 24, 2022

I just had another silly little thought. The find function could be sped up if the earliest comparison it makes held more information. If I hash all but the first character of a word, and mix that hash in with the first byte, it would reduce the rate of false positives because there would be 256 possible values in the byte rather than about 68. The downside is that name tokens would require additional decoding for display, but I think that's only in words, seldom used; we could just add a nt>s decoder word. Oh, and this is very much a weird train of thought; I was thinking of how I could improve dictionary search performance without making it larger.

The base idea was to have words checked for matching hash, not just length, before going full comparison.

Come to think of it, this process is inductive. We could make every character mix in every later character. And that's even easy to build, as you'd do a scan from right rather than a fold. And the hash function already processes from the right to simplify the loop condition.

Whammo · 2022-07-10T22:15:04Z

Whammo
Jul 10, 2022
Collaborator

How would we search and maintain the directory if we had unlimited memory, and we want compiling as fast as possible?

2 replies

lonetech Jul 11, 2022

With unlimited memory, we'd simply allocate a huge region of memory for each possible word. Just convert the word straight into an address. Markers can be allowed by simply storing a copy of the dictionary (copy on write virtual memory may be of use).

If we e.g. assume words are made up of printable ASCII, that's 95 symbols, so with a length up to 31 characters we'd be looking at about log(95**31,2)=204 bit indices into the dictionary array. IIRC there are Forths that use only three characters from the name, which is a more manageable 20 bits but obviously a pretty severe restriction. 31 is taken from the standard and matches durexForth (STRLEN_MASK in durexforth.asm). The standard is case insensitive, which reduces the potential character set by another 26, but the difference isn't super significant (durexForth implements that by lowercasing words).

This is effectively the extreme case of a hash table; one where the hash is a 1 to 1 representation of the name itself, and the table large enough to hold every possible hash value. No collisions can occur, so no collision resolution strategy is needed.

Whammo Jul 11, 2022
Collaborator

Thank you for your thoughtful answers. The actual problem doesn't seem so large now.

jkotlinski · 2022-08-24T17:24:02Z

jkotlinski
Aug 24, 2022
Maintainer

Just to loop back into the conversation. Hashing is a fun idea, but realistically I do not think it will give any dramatic improvement. This is because what we already have, string length + first character, is already a fairly decent 16-bit hash.

Implementing the Search-Order word set, like Whammo suggested, should give a very good improvement. This is obvious, since the assembler words alone take up more than half of the word list. Switching them out should give a notable speed-up.

11 replies

lonetech Aug 25, 2022

It's not super smart, but proof of concept stuff. What I found was that this would on average speed up references to defined words. It still leaves a lot of dictionary traversal, and scales badly; if you just define a word for each hash value, you'll still always be traversing the entire dictionary below those.

lonetech Sep 10, 2022

Note: One possible optimization is to build a hash table for whatever suffix of the word list contains no collisions. That would allow a single check fall back, and mix nicely with @burnsauce's idea to section off suffixes. TODO: Write tools to build such suffix hash tables and report how many words it fits per table. Tables may overlap in found words; the sooner a hit is found the better.

lonetech Sep 10, 2022

A quick check with my current hash function shows at least three 6-fold collisions, yet lots of slots have nothing in them. The hash function is simply not great, expected as it was designed for speed and size. This means a smarter resolution than duplicate tables might be called for, but for starters, 6 tables takes 3KiB, so would fit in the $d000 hidden RAM.

Whammo Sep 10, 2022
Collaborator

4k of memory at $d000 to trade for speed. Are there loops to unroll?

lonetech Sep 12, 2022

Why yes. There's a hashing function for which 6 characters would suffice (along with length), and unrolling it lets us do distinct operations for each character, as well as leave information in the carry flag - which means we could maybe fit all 340 base words in one collision free 1KiB hash table (ignoring the actual duplicate words like ;code and jmp, from labels). Also the word to name conversion in header and find-name could be factored out and inline the lowercase routine.

lonetech · 2022-08-24T20:10:43Z

lonetech
Aug 24, 2022

Here's a silly little proof of concept that can hide the assembly words:

: lift-words
  parse-name find-name
  dup c@ $1f and + 3 +   \ older

  parse-name find-name   \ newest

  2dup -                 \ size
  dup latest swap -      \ tmp

  2 pick   \ newest
  1 pick   \ tmp
  3 pick   \ size
  move

  latest                 \ latest
  4 pick latest -        \ bothsize
  move

  drop drop drop
;

\ Rearrange the dictionary
lift-words adc,# -branch
lift-words pushya does>
lift-words -rot lift-words

: asm
 [ parse-name -branch find-name
   dup literal c@ literal ]
 over c@ xor swap c! ;

The asm word here toggles the -branch entry between end of dictionary and the start of assembly words. It is precisely the sort of code that makes this dictionary rearranging trick risky, because it keeps a pointer to where -branch was when asm was compiled.

The reason this has a lift-words routine rather than a bury-words is to perform the entire dictionary change in one move. I'm too tired to figure out the proper sequence of it all now.

0 replies

lonetech · 2022-09-04T16:32:13Z

lonetech
Sep 4, 2022

FWIW, I think I now have some evidence that find-name is where we spend our time. Based on an execution trace of the base compilation:

>>> where.groupby('name').sum().sort_values('cycles').tail(50)
                      cycles
name                        
word                 16632.0
.compile:comma:      26773.0
drop                 29553.0
.getc                36536.0
.here                38360.0
.:lparen:            38598.0
.key                 39953.0
.:dot::lparen:       42317.0
:gt:                 58582.0
tuck                 59230.0
:equals:             65398.0
.included            65828.0
.:gt:in              99530.0
.header             103417.0
quit                123208.0
.litc               123950.0
.c:comma:           139505.0
min                 145269.0
.:comma:            167181.0
:two:dup            170834.0
:lt:                198124.0
:one:               222747.0
ud:slash:mod        241669.0
:store:             259181.0
um:star:            334966.0
:fetch:             377678.0
.source             418296.0
interpret           426365.0
pushya              477828.0
.branch             514313.0
:zero::equals:      548746.0
parse:minus:name    599343.0
execute             603210.0
.ioabort            609081.0
.lit                679459.0
c:fetch:            846151.0
r:fetch:            925181.0
:plus:              935581.0
.:slash:string      998629.0
:minus:            1078964.0
dup                1199980.0
over               1480184.0
:gt:r              1640455.0
r:gt:              1709267.0
.:zero:branch      1718133.0
.saveb             2191946.0
swap               2297198.0
:gt:xt             2795020.0
.refill            3739136.0
find:minus:name   28892893.0

I expect refill is reading the source code, saveb is saving our final program, and >xt is called by interpret if find-name found a word. The leading periods in the names are artifacts of me parsing the label file slightly wrong. Also, my profiler is rather unpolished, with a 1.9G monitor log to parse. Overall find-name was responsible for 47.5% of the execution time.

12 replies

burnsauce Sep 10, 2022
Author

For the idea about moving word flags from length byte to name tail, I feel that it is possibly not an optimization, but just making it plain better. I guess only way to find out is to try to make the change, and see how it turns out.

I'm excited to see how it would look! Would the length field be a full 8 bits now?

e: Rereading the discussion, I'm more compelled to say this:

If we're crunching data in the dictionary and accepting the corresponding en/decoding tax on string data, I wonder if stuffing other data in there instead might work better.

If we populate a search tree, users could load an accelerator at runtime. Maybe it could sit under I/O?

lonetech Sep 10, 2022

This was make deploy - including running Vice headless to do the base compilation before converting to cartridge, as discussed in #312 and implemented in my faster-build branch. The Vice x64 execution which compiled base took less than a second. The host is a first generation GPD Win Max running Debian GNU/Linux.

The faster-build approach does have an issue with creating a definition just to hold an invalid instruction to terminate running after turnkey finishes. I think that could be improved using return oriented programming, which looks remarkably sane in Forth. Particularly if we use the debugcart instead of JAM.

I disagree with your definition of optimization. You describe trade-offs, such as loop unrolling trading execution speed for program size, but optimizing itself is working towards an optimum, per my Webster's. E.g. if we find code that performs a task in a needlessly complicated manner, the cost is the work of finding the better solution, which could improve the program by any metric. We're only looking at tradeoffs if we optimize for one metric over another.

I believe switching dictionary encoding from PETSCII to screen codes is a net win for the lowercase routine in particular, though there is a cost in name>string used by words and dump-labels. Point being, every word lookup including words becomes cheaper, as well as the header for declaring every word. So much cheaper that you'd have to rerun words hundreds of times for the tradeoff to manifest. It's replacing code with smaller, less often run code.

I'm less certain about the 6 to 8 bit packing, which carries a new execution cost while converting names for lookup, but it would be faster on longer names and could be implemented in the style of Duff's device.

The key property is that we're making dictionary searches for every word, and they already use the lowercase routine which is costly as is. It's relaxing the requirements on it to use the 63-character set we've discussed that saves such a large portion other complications become plausible.

Either way, with the profiling tool set up (it's still clumsy at about 90 seconds runtime) we should be able to produce empirical measurements.

@burnsauce What sort of accelerator did you have in mind? Bear in mind executing on a stock C64 is a rather core feature. Also, building and balancing search trees is quite the task.

I did design the hash table lookup as a software approach to such optional acceleration, but it doesn't scale great and should be hooked in after lowercase, not before.

burnsauce Sep 10, 2022
Author

@burnsauce What sort of accelerator did you have in mind? Bear in mind executing on a stock C64 is a rather core feature. Also, building and balancing search trees is quite the task.

Building and balancing is work, yes, but searching is not. If the "stock" words have their tree created at build time, there's no impact to the user. If the accelerator were simply a redefinition of find-name to search this tree before falling back to a linear dictionary search that terminates at the core words, and no updates to the tree were possible at "runtime", this would be almost all upside.

... assuming that such a tree would fit in the stock dictionary and that searching it (including decoding) was faster than find-name currently is for core words.

lonetech Sep 10, 2022

Ah, I understand! And this would be possible without insane effort, too. We could keep the old find-name for the user dictionary, just put an end marker between it and the core dictionary, and call the core search-tree function if find-name failed to find a user word. (Forth ordering is that latest words take priority.) The same approach works for the hash system too, and as bonus we get the lowercase name prepared in the find buffer. And this strategy only requires one injection at .find_failed in find-name, where there's already a jmp.

burnsauce Sep 10, 2022
Author

(Forth ordering is that latest words take priority.)

Ah yes of course, how could I forget?

lonetech · 2022-09-10T17:45:53Z

lonetech
Sep 10, 2022

Proof of concept verification for simpler lowercase routine:

code lc-core
  \ Asm core routine size: 29 bytes
  \ Shortest path: 7 instructions
  lsb lda,x

  $15f6 jsr,

  lsb sta,x
;code

code lc-prop
  $c0 lda,#    \ This setup let us use BIT.
  w   sta,
  lsb lda,x

  w       bit,
  +branch beq,
  $1f     and,#
  $40     ora,#     \ Remove this instruction for $00-$3f screen codes
:+

  lsb sta,x
;code

: check
  dup dup
  emit lc-core emit lc-prop emit
;

: verify
  2dup
  cr ." Verifying range " . .
  do
    i lc-core
    i lc-prop
    <> if
      cr i . i check
    then
  loop
;

cr $60 $20 verify
cr $80 $60 verify
cr $e0 $c0 verify

By restricting the problem domain to the symbols we discussed, this reduced the lowercase routine core to four instructions. A few PETSCII graphics code points were converted into arrows, brackets, fetch and comment characters. As a bonus, this converts the ASCII lowercase range also. The main decision is where to store the $c0 constant for BIT; it would be more efficient to have a permanent value in zeropage, but anywhere could do. Not using BIT means we'd need to save the original value, which the current CHAR_TO_LOWERCASE does. Then again, putting it out in code memory and still using BIT just costs two bytes and one cycle per call, far faster than any save/restore.

1 reply

lonetech Sep 10, 2022

Implemented a simplified lowercase routine optimized for the used/permitted character set.

hits time
16470490 60798409
hits time
16345082 60458637

That's a 0.5% speed improvement in base compilation for using the shorter lowercase routine. It reported 901 bytes remain (up from 883) and still recognizes mixed case word input. The downside is it will happily swallow a variety of control codes, graphics symbols etc and turn them into less invalid names. E.g. non-breaking space now turns into @. Please don't use that in source code.

lonetech@e0608d6

On Dictionary Search #428

Replies: 13 comments · 44 replies

jkotlinski Jan 11, 2022 Maintainer

burnsauce Jan 11, 2022 Author

jkotlinski Jan 11, 2022 Maintainer

burnsauce Jan 11, 2022 Author

burnsauce Jan 11, 2022 Author

Whammo Jan 11, 2022 Collaborator

burnsauce Jan 11, 2022 Author

Whammo Jan 11, 2022 Collaborator

burnsauce Jan 11, 2022 Author

Whammo Jan 11, 2022 Collaborator

burnsauce Jan 11, 2022 Author

burnsauce Jan 11, 2022 Author

burnsauce Jan 11, 2022 Author

Whammo Jan 11, 2022 Collaborator

burnsauce Jan 11, 2022 Author

burnsauce Jan 11, 2022 Author

burnsauce Jan 11, 2022 Author

Whammo Jan 11, 2022 Collaborator

jkotlinski Jan 11, 2022 Maintainer

Whammo Jan 11, 2022 Collaborator

Whammo Jun 29, 2022 Collaborator

Whammo Jul 14, 2022 Collaborator

Whammo Jul 10, 2022 Collaborator

Whammo Jul 11, 2022 Collaborator

jkotlinski Aug 24, 2022 Maintainer

Whammo Sep 10, 2022 Collaborator

burnsauce Sep 10, 2022 Author

burnsauce Sep 10, 2022 Author

burnsauce Sep 10, 2022 Author

Replies: 13 comments 44 replies

jkotlinski
Jan 11, 2022
Maintainer

burnsauce Jan 11, 2022
Author

jkotlinski
Jan 11, 2022
Maintainer

burnsauce Jan 11, 2022
Author

burnsauce Jan 11, 2022
Author

Whammo Jan 11, 2022
Collaborator

burnsauce Jan 11, 2022
Author

Whammo Jan 11, 2022
Collaborator

burnsauce
Jan 11, 2022
Author

Whammo
Jan 11, 2022
Collaborator

burnsauce Jan 11, 2022
Author

burnsauce Jan 11, 2022
Author

burnsauce Jan 11, 2022
Author

Whammo
Jan 11, 2022
Collaborator

burnsauce Jan 11, 2022
Author

burnsauce Jan 11, 2022
Author

burnsauce Jan 11, 2022
Author

Whammo Jan 11, 2022
Collaborator

jkotlinski
Jan 11, 2022
Maintainer

Whammo Jan 11, 2022
Collaborator

Whammo
Jun 29, 2022
Collaborator

Whammo Jul 14, 2022
Collaborator

Whammo
Jul 10, 2022
Collaborator

Whammo Jul 11, 2022
Collaborator

jkotlinski
Aug 24, 2022
Maintainer

Whammo Sep 10, 2022
Collaborator

burnsauce Sep 10, 2022
Author

burnsauce Sep 10, 2022
Author

burnsauce Sep 10, 2022
Author