-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PCI IDs #3420
base: main
Are you sure you want to change the base?
Support PCI IDs #3420
Conversation
56c2acf
to
f17921b
Compare
1941252
to
19dca4b
Compare
return device_type_list | ||
|
||
@retry(KeyError, tries=10, delay=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why retry can fix key errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some times difference seen between outputs of "lspci -n" and "lspci -m" used in this routine.
It usually occurs while checking for VFs right after VM booting when the VF count is 8.
All VF becomes available only after couple of minutes of VM booting.
This is for re-trying in such cases instead of simply failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check which one of "-n" or "-m" is incorrect, and why it's fixed by rerun? 10*20 means 200 seconds, if this method called many times, it slows down the test cases significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both -n and -m are correct. Its not an issue of lspci commands. Its the number of PCI devices newly added between these command executions is causing the exception. With out this wait, testcases will be dealing with less number of VF devices than actually attached.
76bd4a9
to
33623ea
Compare
This will help in identifying devices even better. Example: NVMe local devices vs remote storage devices when disc controller type == NVMe. An issue with below testcases detected was fixed as part of this PR. stress_sriov_with_max_nics_reboot_from_platform stress_sriov_with_max_nics_stop_start_from_platform Issue: It takes a while to populate/detect all VFs after VM boot. These tests were passing even if not all VF PCI devices were detected. Added 120s wait time to populate all VFs correctly in below testcases to fix the issue.
4dd6ad7
to
cdc729a
Compare
if use_pci_ids: | ||
for device in devices_list: | ||
if ( | ||
device.controller_id in CONTROLLER_ID_DICT[device_type.upper()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define a class instead of 3 dict with the same key. The class can define a method to accept a device, and compare if exists in one dict.
@@ -142,47 +165,107 @@ def _install(self) -> bool: | |||
return self._check_exists() | |||
|
|||
def get_device_names_by_type( | |||
self, device_type: str, force_run: bool = False | |||
self, device_type: str, force_run: bool = False, use_pci_ids: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks the new argument is to filter devices by name? If so, you can add a new method called get_devices_by_allowed_list
, it calls get_device_names_by_type and filter them.
if matched_pci_device_info_list: | ||
matched_pci_device_info = matched_pci_device_info_list[0] | ||
self.slot = matched_pci_device_info.get("slot", "").strip() | ||
def parse(self, raw_str: str, pci_ids: Dict[str, Any]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method looks not need to be changed, the filter happens on upper caller too. It doesn't need to filter in this method.
@@ -274,6 +275,8 @@ def stress_sriov_with_max_nics_reboot_from_platform( | |||
for node in environment.nodes.list(): | |||
start_stop = node.features[StartStop] | |||
start_stop.restart() | |||
# Add delay to wait for the network interface ready. | |||
sleep(120) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add sleep, and it's so long. Find a way to wait some signal.
Make sure you have tested it. like |
Support PCI IDs
This will help in identifying devices even better.
Example: NVMe local devices vs remote storage devices when disc controller type == NVMe.
An issue with below testcases detected was fixed as part of this PR.
stress_sriov_with_max_nics_reboot_from_platform
stress_sriov_with_max_nics_stop_start_from_platform
Issue:
It takes a while to populate/detect all VFs after VM boot.
These tests were passing even if not all VF PCI devices were detected. Added 120s wait time to populate all VFs correctly in below testcases to fix the issue.