-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
134 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
--- | ||
title: "GSoC: Week8" | ||
date: 2023-07-30T12:54:09+05:30 | ||
Description: "In this blog, I would like to share my progress of Google Summer of Code 2023, for week 8" | ||
thumbnail: "images/post11/week8.png" | ||
Tags: ["OpenSUSE", "Open Source", "Google", "Summer of Code"] | ||
Categories: ["Open source", "Programming"] | ||
Series: | ||
- "gsoc-weekly-report" | ||
--- | ||
|
||
Hello, welcome back again. In this blog post, I would like to share my progress on the Google Summer of Code with OpenSUSE for the RPMLint project for week 8. | ||
|
||
As mentioned in the previous blog post, I started to work on `DuplicatesCheck.py`. Here is a detailed overview of my progress. | ||
|
||
## Progress [########........] | ||
|
||
For this week, I decided to focus on `DuplicatesCheck.py`. However, I soon realized that the tests required some additional capabilities of the `FakePkg` class. | ||
|
||
Here is the current interface of `FakePkg` we use to create a `mockPkg`: | ||
|
||
```python3 | ||
def get_tested_mock_package(files=None, real_files=None, header=None): | ||
mockPkg = FakePkg('mockPkg') | ||
if files is not None: | ||
mockPkg.create_files(files, real_files) | ||
if header is not None: | ||
mockPkg.add_header(header) | ||
return mockPkg | ||
``` | ||
|
||
Using the `header` argument alone doesn't provide all the required information for some tests. Certain tests require much more information, which cannot be directly passed through the header parameter. | ||
|
||
For the current discussion with `DuplicatesCheck`, the test function I am considering is `test_unexpanded_macros` in the file `test_files.py`. For this test, we need what's called `md5` hash values. These are the hashes of files that are in a binary RPM file. | ||
|
||
To learn more about **MD5** follow this wiki [link](https://en.wikipedia.org/wiki/MD5) | ||
|
||
For example, consider the test function: | ||
|
||
```python3 | ||
@pytest.mark.parametrize('package', ['binary/duplicates']) | ||
def test_duplicates(tmp_path, package, duplicatescheck): | ||
output, test = duplicatescheck | ||
test.check(get_tested_package(package, tmp_path)) | ||
out = output.print_results(output.results) | ||
|
||
assert 'E: hardlink-across-partition /var/foo /etc/foo' in out | ||
assert 'E: hardlink-across-config-files /var/foo2 /etc/foo2' in out | ||
assert 'W: files-duplicate /etc/bar3 /etc/bar:/etc/bar2' in out | ||
assert 'W: files-duplicate /etc/strace2.txt /etc/strace1.txt' in out | ||
assert 'W: files-duplicate /etc/small2 /etc/small' not in out | ||
assert 'E: files-duplicated-waste 270544' in out | ||
``` | ||
|
||
Here, the binary file we are checking is `test/binary/duplicates-0-0.x86_64.rpm`, and the hashes of all the files are generated by md5 to find duplicate files with the same hashes. See the list of files in the below output sample of an RPM command to list all the files in a binary RPM file: | ||
|
||
```bash | ||
$ rpm -qlp test/binary/duplicates-0-0.x86_64.rpm | ||
|
||
/etc/bar | ||
/etc/bar2 | ||
/etc/bar3 | ||
/etc/foo | ||
/etc/foo2 | ||
/etc/small | ||
/etc/small2 | ||
/etc/strace1.txt | ||
/etc/strace2.txt | ||
/var/foo | ||
/var/foo2 | ||
``` | ||
|
||
After hashing these files using md5 within pytest runtime, I obtained the hash values of these files. Because there are duplicate files, the same hash values are generated and stored with a key-value data structure (Dictionary). See the output sample below: | ||
|
||
```python3 | ||
==> (Pdb) p md5s | ||
|
||
{ | ||
'b3ab937fbdc55ae7bf96749074e816056f0605491d419f9f5b97dc00c8c04aae': | ||
{ | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c1189430>, | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c1189640>, | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c11894e0> | ||
}, | ||
'bc1a4e47244cdf6b4c4735453cf55503a995334f1735458ab2e3c01455e159e3': | ||
{ | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c118a090>, | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c11897a0> | ||
}, | ||
'e18b816e748b2366af6cb7281bcf8fca7f65be8a41e456e562f6acd8b267fc32': | ||
{ | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c1189900>, | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c118a1f0> | ||
}, | ||
'1618780f802ed0571225ec155527f82a0eaa540d16983c387baede6208ced745': | ||
{ | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c1189f30>, | ||
<rpmlint.pkgfile.PkgFile object at 0x7f69c1189dd0> | ||
} | ||
} | ||
``` | ||
|
||
As shown, there are 9 files that are hashed, and some files have the same hash values. Additionally, in the `rpm` query, there are a total of 11 files. This differnece is because there are very small files, rpmlint ignores them. The file size limit id defined in configuration file, which are less than the minimum file size limit, all files will be ignored. | ||
|
||
And yes, I found out these hash values using the Python Debugger (Pdb). These values are stored in a variable `md5s` during runtime in the `DuplicatesCheck.py` file. [Here] | ||
|
||
I believe this would work, provided that we implement the `md5` variable within the `Pkgfile` class and pass header information while creating a mock package using `FakePkg`. I am not sure whether I should hard-code these hash values into the header object for passing to the test function. I even tried creating real files using the `real_files=True` argument, but it didn't work. | ||
|
||
## Misc. | ||
|
||
In addition to working on `DuplicatesCheck.py`, I have also made some progress with `FilesCheck.py`. This also requires some more capabilities of `FakePkg`. However, I haven't explored all the possible ways to create files and test them yet. I plan to do that in the coming week. | ||
|
||
As mentioned in my last [post], I will be visiting the SUSE office in Bangalore. I will share the visit date. I am planning to visit around the 3rd week of August. I will also share the details on my LinkedIn page. Do follow me on <i class="fa-brands fa-linkedin"></i> [LinkedIn]. | ||
|
||
--- | ||
|
||
Links: | ||
- [post] | ||
- [md5 ref] | ||
- [LinkedIn] | ||
- [MD5](https://en.wikipedia.org/wiki/MD5) | ||
|
||
|
||
[post]: /post/week7-at-gsoc/ | ||
[Here]: https://github.com/afrid18/rpmlint/blob/2494367319ad2603023aaa4ffd6a6c6330dca28d/rpmlint/checks/DuplicatesCheck.py#L31 | ||
[md5 ref]: https://github.com/afrid18/rpmlint/blob/2494367319ad2603023aaa4ffd6a6c6330dca28d/rpmlint/checks/DuplicatesCheck.py#L31 | ||
[LinkedIn]: https://www.linkedin.com/in/afridhussain/ | ||
|
||
|
||
<h1 style="text-align: center"> Thank You </h1> | ||
|
||
|
||
___ | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.