-
-
Notifications
You must be signed in to change notification settings - Fork 0
Add duplicate file analysis for Android #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9c786cf
to
e5ecf87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all of this is Cursor generated but I did to back and check that stuff makes sense but I’m no python expert. Most of it looks a lot like iOS
e5ecf87
to
10d97d9
Compare
tests/unit/test_android_analyzer.py
Outdated
assert file_info.hash_md5 is not None | ||
assert len(file_info.hash_md5) > 0 | ||
|
||
def test_duplicate_detection_algorithm(self, test_apk_path: Path, android_analyzer: AndroidAnalyzer) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this test. we can remove it.
assert isinstance(duplicate_insight.total_savings, int) | ||
assert isinstance(duplicate_insight.duplicate_count, int) | ||
assert duplicate_insight.total_savings == 51709 | ||
assert duplicate_insight.duplicate_count == 52 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out there were some dupes in hn.apk. it is mostly duplicated META-INF files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be curious how users can remove those, if they can't remove them or mitigate we should try to add ignores on our end to not show. Any idea behind these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Here’s how you can remove them: EmergeTools/hackernews#489
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appears this would just filter out the META-INF/ files, not just duplicates? I don't think that's what we'd want to do here as I know some tools package files in there for tooling & runtime analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah you're right. the kotlin .module
files are important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR to just remove the duplicate .version
files and the duplicate LICENSE.txt
files. This way it keeps the .module
files which are needed. (and also the .module
files were not duplicates)
10d97d9
to
e42c373
Compare
src/launchpad/insights/common.py
Outdated
file_analysis: FileAnalysis | ||
treemap: TreemapResults | None | ||
binary_analysis: List[MachOBinaryAnalysis] | ||
binary_analysis: Sequence[BaseBinaryAnalysis] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change to use Sequence
? I understand the None
change, but not following the Sequence
one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason when it was List
it would throw a casting error during testing and I asked Cursor to fix it and this is what it gave me. I’m honestly not an expert on the subtleties of python here so open to suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s the error:
.venv/bin/python -m mypy src
src/launchpad/analyzers/apple.py:127: error: Argument "binary_analysis" to "InsightsInput" has incompatible type "list[MachOBinaryAnalysis]"; expected "list[BaseBinaryAnalysis] | None" [arg-type]
src/launchpad/analyzers/apple.py:127: note: "list" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
src/launchpad/analyzers/apple.py:127: note: Consider using "Sequence" instead, which is covariant
Found 1 error in 1 file (checked 46 source files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try List
(from typing
) instead of list
? I think that might be the issue based on the casing here 'note: "list" is invariant'.
edit: oh I see that's what it was in the old diff 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted it to use List
from typing
and it works now after the switch to uv
and ruff
. going to merge
32a46b2
to
a31906a
Compare
a31906a
to
b91beba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared code LGTM
src/launchpad/insights/insight.py
Outdated
file_analysis: FileAnalysis | ||
treemap: TreemapResults | None | ||
binary_analysis: list[MachOBinaryAnalysis] | ||
binary_analysis: Sequence[BaseBinaryAnalysis] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this get rid of the None
? We can use an empty list in that case. Also can this change back to List
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i got rid of the None.
Regarding the List
, I’m not python expert but this was the error I got when I tried to use List
. #51 (comment)
abbce43
to
3e67611
Compare
3e67611
to
e7d0d9f
Compare
No description provided.