Skip to content

Conversation

ZERICO2005
Copy link
Contributor

@ZERICO2005 ZERICO2005 commented Sep 30, 2025

Added multiply high signed/unsigned routines. These can be used to optimize division by a constant. __smulhu is optimized, but the rest are not well optimized. They use the exact same calling convention as the regular multiplication routines. We can optimize these routines in later PR's.

__smulhu   :         HL = ((uint32_t)         HL * (uint32_t)      BC) >> 16
__imulhu   :        UHL = ((uint48_t)        UHL * (uint48_t)     UBC) >> 24
__lmulhu   :      E:UHL = ((uint64_t)      E:UHL * (uint64_t)   A:UBC) >> 32
__i48mulhu :    UDE:UHL = ((uint96_t)    UDE:UHL * (uint96_t) UIY:UBC) >> 48
__llmulhu  : BC:UDE:UHL = ((uint128_t)BC:UDE:UHL * (uint128_t) (SP64)) >> 64

__smulhs   :         HL = ((int32_t)          HL * (int32_t)       BC) >> 16
__imulhs   :        UHL = ((int48_t)         UHL * (int48_t)      UBC) >> 24
__lmulhs   :      E:UHL = ((int64_t)       E:UHL * (int64_t)    A:UBC) >> 32
__i48mulhs :    UDE:UHL = ((int96_t)     UDE:UHL * (int96_t)  UIY:UBC) >> 48
__llmulhs  : BC:UDE:UHL = ((int128_t) BC:UDE:UHL * (int128_t)  (SP64)) >> 64
__smulhu   :  32 bytes |  33F +  12R +   9W +  17
__imulhu   : 117 bytes | 118F +  39R +  38W +  37
__lmulhu   : 1 call to __llmulu
__i48mulhu :  93 bytes | 902F + 246R + 182W + 344
__llmulhu  : (disables interrupts to use exx) slightly faster than 2 calls to __llmulu

__bmulhu was not added since it is just mlt bc \ ld a, b (and the 8-bit calling convention is not well defined).

@ZERICO2005 ZERICO2005 marked this pull request as draft September 30, 2025 03:02
@ZERICO2005 ZERICO2005 added the crt label Oct 10, 2025
@ZERICO2005 ZERICO2005 changed the title added mulhu CRT routines added mulhu and mulhs CRT routines Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

1 participant