Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to process PNG icons for perceptual hash calculation #1090

Merged
merged 2 commits into from
Jul 24, 2022

Conversation

HoundThe
Copy link
Member

@HoundThe HoundThe commented Jul 22, 2022

I've integrated https://github.com/nothings/stb stb_image.h to add loading of PNG icons for perceptual hash calculation. I've flipped the PNG image upside down, as the existing DIB format is stored upside down, and I wanted to align with the current implementation.

I've handcrafted some test samples to the appropriate repository with mention of this PR. I took a sample with PNG Icon, extracted it in a different resolution, converted it to DIB format, and put it inside another file. Then I test if the perceptual hash of both file match.

this->image.push_back(row);
}

stbi_image_free(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a ScopeExitGuard in utils/scope_exit.h. Check it out. I would utilize it here as it should provide more safety when it comes to exceptions etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, TIL. I am not 100% if I used it correctly, but I've tried to utilize it.

@metthal metthal merged commit 0749a46 into master Jul 24, 2022
@metthal metthal deleted the png-icons branch July 24, 2022 19:09
PeterMatula pushed a commit that referenced this pull request Dec 5, 2022
* Add ability to process PNG icons for perceptual hash calculation

* Use SCOPE_EXIT for deallocation
PeterMatula added a commit that referenced this pull request Dec 5, 2022
* Update Capstone to v4.0

* [Capstone-next] Update to capstone-next branch

* [Capstone-next] Update to Capstone-Next Branch
-[ARM]
    -Add ARM_INS_MOVS support
-[ARM64]
    -Remove vess.
        -It overlaps with ARM64_VAS
    -Fix A64SysReg_* into ARM64_SYSREG_*
-[PowerPC]
    -Fix PPC_REG_X2 into PPC_REG_XER
-[X86]
    -Remove X86_INS_FADDP
        -In capstone-next, faddp is actually fadd, both belong to
            "ID 15(fadd)"

* [tests][capstone2llvmir][arm] Fix MOVW Unit Test
- In test, "movw r0, #0xabcd" do not read any register
    and the result is 0xabcd not 0x1234abcd

* [tests][capstone2llvmir][arm] Fix Nop test
- In arm, the NOP instruction is HINT instruction
- Also, in capstone, the cs_insn->id of nop is point to
    HINT(ID: 63)
- So, an error will be occurred when looking for a translate
    instruction method because it is points to nullptr

* [Capstone2llvmir][arm64] Add ADDCS Support

* [capstone2llvmir][arm64] Add ADDS Support

* [capstone2llvmir][arm64] Add ANDS Support

* [capstone2llvmir][arm64] Add SUP Support

* [capstone2llvmir][arm64] Add BICS Support

* [capstonellvmir][PowerPC] Update Register Name

* [capstone2llvmir][PowerPC] Update Register Name

* [capstone2llvmir][PowerPC] Fix CMP Support

* [capstone2llvmir][PowerPC] Add CMPL Support

* [capstone2llvmir][PowerPC] Fix CMPL

* [capstone2llvmir][PowerPC] Add BLT Support

* [capstone2llvmir][PowerPC] Add  Branch mnemonics incorporating
conditions Suppport

* [capstone2llvmir][PowerPC] Fix RLWINM
- RLWINM and clrlwi are same ID

* [tests][capstone2llvmir][PowerPC] Fix Crand Tests

* [capstone2llvmir][PowerPC] Fix bdzla BUG

* [capstone2llvmir][PowerPC] Remove BDZLA TODO

* [capstone2llvmir][x86] Fix ud2b

* [capstone2llvmir][X86] Fix FADD/FADDP

* [capstone2llvmir][x86] Fix FADD/FADDP

* [capstone2llvmir][x86] Fix FXCH
- when transalte "FXCH instruction, in the value of loadOpFloatingBinaryTop Function,
    "top" is equal to idx, which causes the value to be written to top
    twice when exchanging data.

* clean code

* Update Capstone to v5.0

* [capstone2llvmir][x86][PowerPC] Clean code

* [capstone2llvmir][PowerPC] Clean code

* [capstone2llvmir][PowerPC] Remove BUN* and BNU*
-In CapstoneV5, they are both equivalent to BSO* and BNS*

* [capstone2llvmir][PowerPC] Fix rlwinm
- In capstone V5, rlwinm is equivalent to to clrlwi

* [capstone2llvmir][PowerPC] Fix BNL*

* [capstone2llvmir][PowerPC] Add PPC_REG_ZERO

* [capstone2llvmir][PowerPC] Add comment

* Fix merge conflict

* Update YARA to 4.2.X

* Add dll_name from export directory to output

* llvm/CMakeLists: Manually-specified variables were not used by the project.

The following variables were set in CMakeLists, however, they
were not used by the LLVM project build:

LLVM_USE_CRT_DEBUG
LLVM_USE_CRT_RELEASE

* CHANGELOG.md: add entries for #1060 #1061 PRs

* Fixed loading import directory that is modified by relocations

* Fixed comment

* Remove useless trailing whitespace

There is absolutely no reason for it being in the code.

* pelib: Fix a typo in a comment in PeLib::ImageLoader::Load()

* Add a CHANGELOG entry for #1063

* Move signing certificate to separate object

* Updated authenticode parser to the newest version

* Fix uninitialize free, use finer sanity checks in auth. parser

* Add a directory for RetDec-related publications

The list of publications has been originally placed on
https://retdec.com/publications/ (https://retdec.com/ has been redirected
to https://github.com/avast/retdec, and we wanted to keep the list somewhere).

* Fix the wording for an invalid max-memory error in scripts/retdec-unpacker.py

There are the following two reasons for the fix:
- The check only verifies whether the passed value is an integer.
- The parameter can be 0 (i.e. a non-negative integer). It does not have to a
  positive integer.

* Never try to limit memory on macOS

We can't limit memory on macOS. Before macOS 12
limitSystemMemoryOnPOSIX() does not actually do anything on macOS.
Anyway, it just succeed, since macOS 12 it returns error and retdec
can't start.

To be honest Apple can control memmory limit via so-called the ledger()
system call which is private. An old verison which was opened to
OpenSource (from 10.9-10.10?) using setrlimit() but at some point
setrlimit() was broken and not ledger(). Probably at macOS 12 the
setrlimit() was completley broken.

Because we haven't got any other choose just return true which haven't
change anything.

See: #379
Fixes: #1045

* Remove a redundant period from CHANGELOG

* utils: Improve the wording of a comment in getTotalSystemMemoryOnMacOS()

* Add a CHANGELOG entry for #1074 and #1045

* Update authenticode-parser, use-after-free, signedness issues

* Using multistage build for Dockerfile, reduces container size by ~1.5G

* Check for possible overflow when checking for segment overlaps. Fix incorrect range exception message

* Fix parameter and return types for dynamically called functions

Calls to dynamically-linked functions go through the procedure linkage
table (PLT).  RetDec turns a PLT entry into a function, say
malloc@plt, that appears to do nothing but call the external function,
say malloc (though the assembly code will do a jump rather than a
call). User code that logically wants to call malloc instead calls
malloc@plt (and sets up arguments as if calling malloc). The
malloc@plt code first jumps to the dynamic linker which modifies it so
that subsequent calls to malloc@plt will jump directly to malloc. We
say that malloc@plt wraps malloc.  The call to malloc in malloc@plt
will not have any arguments setup, so malloc will appear to have
no parameters or returns (unless that information is provided by
link-time-information, debug information, or name demangling), but it
needs to have the same parameter types and return type as
malloc@plt. The propagateWrapped methods copy the argument information
from the DataFlowEntry of the wrapping function to the wrapped
function. Then, when the calls to the wrapping function are inlined
(in connectWrappers), effectively the call to the wrapping function is
changed into a call to the wrapped function.

The motivation for this change is the programs that analyze the
output of RetDec (either the C code, or the LLVM code) want to
recognize library functions and treat them specially. This
change makes it so that the library function names are used
directly (rather than the plt version) and they are passed
their parameters correctly.

* Upgrade to Capstone release 4.0.2

* Add additional patch on capstone 4.0.2 for PPC Signed 16 bit immediates

Capstone version 4.0.2 has a bug when disassembling a powerpc instruction
with a signed 16-bit immediate.
See capstone-engine/capstone#1746 and
capstone-engine/capstone#1746 (comment).

This change adds to the capstone patch to fix this problem.

* Treat endbr32/endbr64 instructions as NOPs

* capstone2llvmir/powerpc: remove PPC_INS_BDZLA hack fix

As Capstone was updated, the fix in capstone-engine/capstone#968 took effect and the original RetDec fix is not needed - in fact, it caused problems.

* Handle Procedure Linkage calls for 32bit x86 from gcc

This case is for x86 32 bit compiled with GCC. Its PLT entries are in
sections .plt.sec or .plt.got. An entry is of the form:

jmp *offset(%ebx)

When this code is encountered register %ebx has been loaded with the
address of the start of the Global Offset Table (.got) section.
This change handles that case.

* Add ability to process PNG icons for perceptual hash calculation (#1090)

* Add ability to process PNG icons for perceptual hash calculation

* Use SCOPE_EXIT for deallocation

* In generated C, add prototypes for dynamically-linked functions without headers

When the program involves dynamically-linked functions like _Znwj
(operator new) that return a pointer, it is necessary to have
prototypes for them, since otherwise they will be implicitly deduced
to return "int" which cannnot be dereferenced.

Previously RetDec was emitting comments telling which functions were
dynamically linked. This change moves them up before the functions are
emitted and instead emits prototypes for the functions. However,
RetDec also inserts includes of headers for functions for with know
headers. We do not emit prototypes for functions with headers as that
would be redundant.  As a result, some dynamically-linked functions
that used to show in the comments no longer appear as the included
header will declare them.

The section header comment for dynamically-linked functions is only
produced if some prototypes are written for dynamically-linked
functions.

A related PR will have added tests as well as changes needed for
existing tests.

* Add printing of analysis time to retdec-fileinfo output

* Yara: inherits linker flags

* Use provided libtool via `CMAKE_LIBTOOL`

* Added missed `${RETDEC_INSTALL_BIN_DIR}` to `pat2yara`

* Added sanity check for page index when loading pages from broken samples

There are certain samples where page index might go beyond available
pages when trying to load them which will be prevented with this patch.

* Virtual Size overflow is now handler properly

* Fixed error code

* Updated yaramod

* Fix removeZeroSequences

* README.md: add "limited maintenance mode" note

Co-authored-by: Peter Kubov <peter.kubov@avast.com>
Co-authored-by: houndthe <houndthe@protonmail.com>
Co-authored-by: Peter Matula <peter.matula@avast.com>
Co-authored-by: Ladislav Zezula <ladislav.zezula@avast.com>
Co-authored-by: Petr Zemek <petr.zemek@avast.com>
Co-authored-by: Marek Milkovič <marek.milkovic@avast.com>
Co-authored-by: Kirill A. Korinsky <kirill@korins.ky>
Co-authored-by: me <me>
Co-authored-by: Richard L Ford <richardlford@gmail.com>
Co-authored-by: 未赢 <26459963+neverwin@users.noreply.github.com>
PeterMatula added a commit that referenced this pull request Dec 5, 2022
* Update Capstone to v4.0

* [Capstone-next] Update to capstone-next branch

* [Capstone-next] Update to Capstone-Next Branch
-[ARM]
    -Add ARM_INS_MOVS support
-[ARM64]
    -Remove vess.
        -It overlaps with ARM64_VAS
    -Fix A64SysReg_* into ARM64_SYSREG_*
-[PowerPC]
    -Fix PPC_REG_X2 into PPC_REG_XER
-[X86]
    -Remove X86_INS_FADDP
        -In capstone-next, faddp is actually fadd, both belong to
            "ID 15(fadd)"

* [tests][capstone2llvmir][arm] Fix MOVW Unit Test
- In test, "movw r0, #0xabcd" do not read any register
    and the result is 0xabcd not 0x1234abcd

* [tests][capstone2llvmir][arm] Fix Nop test
- In arm, the NOP instruction is HINT instruction
- Also, in capstone, the cs_insn->id of nop is point to
    HINT(ID: 63)
- So, an error will be occurred when looking for a translate
    instruction method because it is points to nullptr

* [Capstone2llvmir][arm64] Add ADDCS Support

* [capstone2llvmir][arm64] Add ADDS Support

* [capstone2llvmir][arm64] Add ANDS Support

* [capstone2llvmir][arm64] Add SUP Support

* [capstone2llvmir][arm64] Add BICS Support

* [capstonellvmir][PowerPC] Update Register Name

* [capstone2llvmir][PowerPC] Update Register Name

* [capstone2llvmir][PowerPC] Fix CMP Support

* [capstone2llvmir][PowerPC] Add CMPL Support

* [capstone2llvmir][PowerPC] Fix CMPL

* [capstone2llvmir][PowerPC] Add BLT Support

* [capstone2llvmir][PowerPC] Add  Branch mnemonics incorporating
conditions Suppport

* [capstone2llvmir][PowerPC] Fix RLWINM
- RLWINM and clrlwi are same ID

* [tests][capstone2llvmir][PowerPC] Fix Crand Tests

* [capstone2llvmir][PowerPC] Fix bdzla BUG

* [capstone2llvmir][PowerPC] Remove BDZLA TODO

* [capstone2llvmir][x86] Fix ud2b

* [capstone2llvmir][X86] Fix FADD/FADDP

* [capstone2llvmir][x86] Fix FADD/FADDP

* [capstone2llvmir][x86] Fix FXCH
- when transalte "FXCH instruction, in the value of loadOpFloatingBinaryTop Function,
    "top" is equal to idx, which causes the value to be written to top
    twice when exchanging data.

* clean code

* Update Capstone to v5.0

* [capstone2llvmir][x86][PowerPC] Clean code

* [capstone2llvmir][PowerPC] Clean code

* [capstone2llvmir][PowerPC] Remove BUN* and BNU*
-In CapstoneV5, they are both equivalent to BSO* and BNS*

* [capstone2llvmir][PowerPC] Fix rlwinm
- In capstone V5, rlwinm is equivalent to to clrlwi

* [capstone2llvmir][PowerPC] Fix BNL*

* [capstone2llvmir][PowerPC] Add PPC_REG_ZERO

* [capstone2llvmir][PowerPC] Add comment

* Fix merge conflict

* Update YARA to 4.2.X

* Add dll_name from export directory to output

* llvm/CMakeLists: Manually-specified variables were not used by the project.

The following variables were set in CMakeLists, however, they
were not used by the LLVM project build:

LLVM_USE_CRT_DEBUG
LLVM_USE_CRT_RELEASE

* CHANGELOG.md: add entries for #1060 #1061 PRs

* Fixed loading import directory that is modified by relocations

* Fixed comment

* Remove useless trailing whitespace

There is absolutely no reason for it being in the code.

* pelib: Fix a typo in a comment in PeLib::ImageLoader::Load()

* Add a CHANGELOG entry for #1063

* Move signing certificate to separate object

* Updated authenticode parser to the newest version

* Fix uninitialize free, use finer sanity checks in auth. parser

* Add a directory for RetDec-related publications

The list of publications has been originally placed on
https://retdec.com/publications/ (https://retdec.com/ has been redirected
to https://github.com/avast/retdec, and we wanted to keep the list somewhere).

* Fix the wording for an invalid max-memory error in scripts/retdec-unpacker.py

There are the following two reasons for the fix:
- The check only verifies whether the passed value is an integer.
- The parameter can be 0 (i.e. a non-negative integer). It does not have to a
  positive integer.

* Never try to limit memory on macOS

We can't limit memory on macOS. Before macOS 12
limitSystemMemoryOnPOSIX() does not actually do anything on macOS.
Anyway, it just succeed, since macOS 12 it returns error and retdec
can't start.

To be honest Apple can control memmory limit via so-called the ledger()
system call which is private. An old verison which was opened to
OpenSource (from 10.9-10.10?) using setrlimit() but at some point
setrlimit() was broken and not ledger(). Probably at macOS 12 the
setrlimit() was completley broken.

Because we haven't got any other choose just return true which haven't
change anything.

See: #379
Fixes: #1045

* Remove a redundant period from CHANGELOG

* utils: Improve the wording of a comment in getTotalSystemMemoryOnMacOS()

* Add a CHANGELOG entry for #1074 and #1045

* Update authenticode-parser, use-after-free, signedness issues

* Using multistage build for Dockerfile, reduces container size by ~1.5G

* Check for possible overflow when checking for segment overlaps. Fix incorrect range exception message

* Fix parameter and return types for dynamically called functions

Calls to dynamically-linked functions go through the procedure linkage
table (PLT).  RetDec turns a PLT entry into a function, say
malloc@plt, that appears to do nothing but call the external function,
say malloc (though the assembly code will do a jump rather than a
call). User code that logically wants to call malloc instead calls
malloc@plt (and sets up arguments as if calling malloc). The
malloc@plt code first jumps to the dynamic linker which modifies it so
that subsequent calls to malloc@plt will jump directly to malloc. We
say that malloc@plt wraps malloc.  The call to malloc in malloc@plt
will not have any arguments setup, so malloc will appear to have
no parameters or returns (unless that information is provided by
link-time-information, debug information, or name demangling), but it
needs to have the same parameter types and return type as
malloc@plt. The propagateWrapped methods copy the argument information
from the DataFlowEntry of the wrapping function to the wrapped
function. Then, when the calls to the wrapping function are inlined
(in connectWrappers), effectively the call to the wrapping function is
changed into a call to the wrapped function.

The motivation for this change is the programs that analyze the
output of RetDec (either the C code, or the LLVM code) want to
recognize library functions and treat them specially. This
change makes it so that the library function names are used
directly (rather than the plt version) and they are passed
their parameters correctly.

* Upgrade to Capstone release 4.0.2

* Add additional patch on capstone 4.0.2 for PPC Signed 16 bit immediates

Capstone version 4.0.2 has a bug when disassembling a powerpc instruction
with a signed 16-bit immediate.
See capstone-engine/capstone#1746 and
capstone-engine/capstone#1746 (comment).

This change adds to the capstone patch to fix this problem.

* Treat endbr32/endbr64 instructions as NOPs

* capstone2llvmir/powerpc: remove PPC_INS_BDZLA hack fix

As Capstone was updated, the fix in capstone-engine/capstone#968 took effect and the original RetDec fix is not needed - in fact, it caused problems.

* Handle Procedure Linkage calls for 32bit x86 from gcc

This case is for x86 32 bit compiled with GCC. Its PLT entries are in
sections .plt.sec or .plt.got. An entry is of the form:

jmp *offset(%ebx)

When this code is encountered register %ebx has been loaded with the
address of the start of the Global Offset Table (.got) section.
This change handles that case.

* Add ability to process PNG icons for perceptual hash calculation (#1090)

* Add ability to process PNG icons for perceptual hash calculation

* Use SCOPE_EXIT for deallocation

* In generated C, add prototypes for dynamically-linked functions without headers

When the program involves dynamically-linked functions like _Znwj
(operator new) that return a pointer, it is necessary to have
prototypes for them, since otherwise they will be implicitly deduced
to return "int" which cannnot be dereferenced.

Previously RetDec was emitting comments telling which functions were
dynamically linked. This change moves them up before the functions are
emitted and instead emits prototypes for the functions. However,
RetDec also inserts includes of headers for functions for with know
headers. We do not emit prototypes for functions with headers as that
would be redundant.  As a result, some dynamically-linked functions
that used to show in the comments no longer appear as the included
header will declare them.

The section header comment for dynamically-linked functions is only
produced if some prototypes are written for dynamically-linked
functions.

A related PR will have added tests as well as changes needed for
existing tests.

* Add printing of analysis time to retdec-fileinfo output

* Yara: inherits linker flags

* Use provided libtool via `CMAKE_LIBTOOL`

* Added missed `${RETDEC_INSTALL_BIN_DIR}` to `pat2yara`

* Added sanity check for page index when loading pages from broken samples

There are certain samples where page index might go beyond available
pages when trying to load them which will be prevented with this patch.

* Virtual Size overflow is now handler properly

* Fixed error code

* Updated yaramod

* Fix removeZeroSequences

* README.md: add "limited maintenance mode" note

Co-authored-by: Peter Kubov <peter.kubov@avast.com>
Co-authored-by: houndthe <houndthe@protonmail.com>
Co-authored-by: Peter Matula <peter.matula@avast.com>
Co-authored-by: Ladislav Zezula <ladislav.zezula@avast.com>
Co-authored-by: Petr Zemek <petr.zemek@avast.com>
Co-authored-by: Marek Milkovič <marek.milkovic@avast.com>
Co-authored-by: Kirill A. Korinsky <kirill@korins.ky>
Co-authored-by: me <me>
Co-authored-by: Richard L Ford <richardlford@gmail.com>
Co-authored-by: 未赢 <26459963+neverwin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants