Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leo/bsi b soup #94

Closed
wants to merge 14 commits into from
Closed

Leo/bsi b soup #94

wants to merge 14 commits into from

Conversation

adamjanovsky
Copy link
Collaborator

No description provided.

KeleranV added 2 commits June 3, 2021 09:59
…ems to work properly, now working on a good demo script
@adamjanovsky
Copy link
Collaborator Author

adamjanovsky commented Jun 9, 2021

As of 9 June, the following is done:

  • Traversal of BSI webpage, only active certificates

As of 9 June, the following remains to be done:

  • Full traversal of archived certificates
  • Extraction of all information about a single certificate
  • Feed the BSI outputs into CommonCriteriaCert class
  • Output the mismatch between BSI source and CommonCriteria.org source
  • Select 3 certificates and verify (using hash function) that their reports are the same

Feedback from 1. July

@KeleranV see the subtasks here:

  • Fix retrieval of duplicate archived certificates
  • Change relative import to absolute import in BSI_parse.py
  • root_url could be made ClassVar of BSIBrowser and a default parameter as well
  • Retrieve categories of handlers and store the certificates in dictionaries instead of lists, in a form of {'category_name': list_of_certificate_objects}
  • Split the collected certificates into two dictionaries: One for active, the other for achived certificates
  • Work on code serialization, define serialized_attributes and fix constructors to comply with ComplexSerializableType convention.

@adamjanovsky adamjanovsky marked this pull request as draft June 9, 2021 14:23
Copy link
Collaborator Author

@adamjanovsky adamjanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some intermediate review. Once done with your work, be sure to get rid of non-descriptive comments and make the code (at least somewhat) PEP-8 compliant.


def process(root_url: str):
browser = BsiBrowser(root_url)
def process(base_url: str):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels that this whole method should be part of BsiBrowser.parse() method.

def process(root_url: str):
browser = BsiBrowser(root_url)
def process(base_url: str):
browser = BsiBrowser(base_url)
browser.parse()
tmp_list = []
for handler in browser.handler_list:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole for loop can be written as a list comprehension and you can call tmp_list.extend([the_comprehension])



# To end the processing, a last class will be used with final links
# to retrieve simple data written on the page


class Bsitmp(ComplexSerializableType):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name of class is not really explanatory. Should be named something like BSICertificate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, you probably won't be calling BSICertificate(link) but some alternative constructor BSICertificate.from_url(url)

Comment on lines 129 to 132
for arch_handler in handler.handler_list:
for arch_url in arch_handler.link_list:
arch_temp = Bsitmp(arch_url)
tmp_list.append(arch_temp)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, use comprehension and .extend() notation

def process(root_url: str):
browser = BsiBrowser(root_url)
def process(base_url: str):
browser = BsiBrowser(base_url)
browser.parse()
tmp_list = []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could then be instance variable

[BsiHandler("https://www.bsi.bund.de/" + a['href'])
for a in self.soup.find_all('a', href=True, recursive=True, title=re.compile('Archive'))]

if len(self.handler_list) != 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.handler_list:

Comment on lines 24 to 27
test_url = "https://www.bsi.bund.de/SharedDocs/Zertifikate_CC/CC/Digitale_Signatur_Kartenlesegeraete/1046.html" \
";jsessionid=AF4A9C0B7D27992808B8EA8C426454BE.internet461?nn=513452 "

root_url = "https://www.bsi.bund.de/EN/Topics/Certification/certified_products/digital_signature" \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare with typing.Final type and underscored uppercase to denote constants. Furthemore, these can be moved to BSIParser class afaik.

valid_until: str
soup: BeautifulSoup
id: str
pdf_links: list[str]
Copy link
Collaborator Author

@adamjanovsky adamjanovsky Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't work. You need to use List[str] from typing module.

self.handler_list = \
[BsiHandler("https://www.bsi.bund.de/" + a['href'])
for a in self.soup.find_all('a', href=True, recursive=True)
if 'c-navigation-teaser' in str(a.get('class'))]
Copy link
Collaborator Author

@adamjanovsky adamjanovsky Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can search directly by link class with find_all(class_='c-navigation-teaser')

KeleranV added 5 commits June 28, 2021 13:01
…e use the same logic as bsi_parse.py, but adapt it to the layout

- attempt to use a dictionary in bsi_parse.py, last fix in the constructors to match the requirement of the serialization.

- issue encountered :
   - the websites seems to block some requests, after putting some delay between each request, it seems to be working better
   - the last field used as a key in the dictionary wasn't retrieved, testing a new one
…g this kind of key : 'BSI - smart cards and similar devices - BSI-DSZ-CC-0954-2015'

todo: do the same thing on ANSSI
…ictionary, with the key being the reference of the product
@J08nY
Copy link
Member

J08nY commented Jul 13, 2022

This is now done in #247.

@J08nY J08nY closed this Jul 13, 2022
@adamjanovsky adamjanovsky deleted the leo/BSI_BSoup branch July 27, 2022 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants