Skip to content

Add duplicate file analysis for Android #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2025

Conversation

runningcode
Copy link
Contributor

No description provided.

Copy link
Contributor Author

runningcode commented Jun 19, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@runningcode runningcode force-pushed the no/add_duplicate_file_analysis_for_android branch from 9c786cf to e5ecf87 Compare June 19, 2025 09:18
Copy link
Contributor Author

@runningcode runningcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all of this is Cursor generated but I did to back and check that stuff makes sense but I’m no python expert. Most of it looks a lot like iOS

@runningcode runningcode requested review from rbro112 and trevor-e June 19, 2025 09:20
@runningcode runningcode force-pushed the no/add_duplicate_file_analysis_for_android branch from e5ecf87 to 10d97d9 Compare June 19, 2025 12:09
assert file_info.hash_md5 is not None
assert len(file_info.hash_md5) > 0

def test_duplicate_detection_algorithm(self, test_apk_path: Path, android_analyzer: AndroidAnalyzer) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this test. we can remove it.

assert isinstance(duplicate_insight.total_savings, int)
assert isinstance(duplicate_insight.duplicate_count, int)
assert duplicate_insight.total_savings == 51709
assert duplicate_insight.duplicate_count == 52
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out there were some dupes in hn.apk. it is mostly duplicated META-INF files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be curious how users can remove those, if they can't remove them or mitigate we should try to add ignores on our end to not show. Any idea behind these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Here’s how you can remove them: EmergeTools/hackernews#489

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears this would just filter out the META-INF/ files, not just duplicates? I don't think that's what we'd want to do here as I know some tools package files in there for tooling & runtime analysis.

Copy link
Contributor Author

@runningcode runningcode Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah you're right. the kotlin .module files are important.

Copy link
Contributor Author

@runningcode runningcode Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to just remove the duplicate .version files and the duplicate LICENSE.txt files. This way it keeps the .module files which are needed. (and also the .module files were not duplicates)

@runningcode runningcode force-pushed the no/add_duplicate_file_analysis_for_android branch from 10d97d9 to e42c373 Compare June 19, 2025 15:01
file_analysis: FileAnalysis
treemap: TreemapResults | None
binary_analysis: List[MachOBinaryAnalysis]
binary_analysis: Sequence[BaseBinaryAnalysis] | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to use Sequence? I understand the None change, but not following the Sequence one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason when it was List it would throw a casting error during testing and I asked Cursor to fix it and this is what it gave me. I’m honestly not an expert on the subtleties of python here so open to suggestions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s the error:

.venv/bin/python -m mypy src
src/launchpad/analyzers/apple.py:127: error: Argument "binary_analysis" to "InsightsInput" has incompatible type "list[MachOBinaryAnalysis]"; expected "list[BaseBinaryAnalysis] | None"  [arg-type]
src/launchpad/analyzers/apple.py:127: note: "list" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
src/launchpad/analyzers/apple.py:127: note: Consider using "Sequence" instead, which is covariant
Found 1 error in 1 file (checked 46 source files)

Copy link
Member

@trevor-e trevor-e Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try List (from typing) instead of list? I think that might be the issue based on the casing here 'note: "list" is invariant'.

edit: oh I see that's what it was in the old diff 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted it to use List from typing and it works now after the switch to uv and ruff. going to merge

@runningcode runningcode force-pushed the no/add_duplicate_file_analysis_for_android branch 7 times, most recently from 32a46b2 to a31906a Compare June 23, 2025 09:11
@runningcode runningcode force-pushed the no/add_duplicate_file_analysis_for_android branch from a31906a to b91beba Compare June 23, 2025 09:16
Copy link
Member

@trevor-e trevor-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared code LGTM

file_analysis: FileAnalysis
treemap: TreemapResults | None
binary_analysis: list[MachOBinaryAnalysis]
binary_analysis: Sequence[BaseBinaryAnalysis] | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this get rid of the None? We can use an empty list in that case. Also can this change back to List

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i got rid of the None.
Regarding the List, I’m not python expert but this was the error I got when I tried to use List. #51 (comment)

@runningcode runningcode force-pushed the no/add_duplicate_file_analysis_for_android branch 3 times, most recently from abbce43 to 3e67611 Compare June 25, 2025 06:19
@runningcode runningcode force-pushed the no/add_duplicate_file_analysis_for_android branch from 3e67611 to e7d0d9f Compare June 25, 2025 06:21
@runningcode runningcode merged commit f297371 into main Jun 25, 2025
13 checks passed
@runningcode runningcode deleted the no/add_duplicate_file_analysis_for_android branch June 25, 2025 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants