-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCP tftp install fails with 100 GbitE Mellenox adapter #226
Comments
@theresax : Is your bootstrap node in this state? Can you provide access details to the bastion? |
@manojnkumar, My team has to move quickly to complete our perf plan. So the machine is no longer in this state. I have used virtual storage via VIOS server during node activation to do the install, and now the cluster is already setup. This proved the dhcp, dns, firewall and httpd + haproxy configurations we have are all good (not the cause of the problem). |
I have provided a "test" LPAR under the same cluster and recreated the tftp problem: HMC login: |
I have opened a bugzilla defect per request from Brian King: |
@theresax this is public github and it might not be appropriate to mention internal system details including passwords |
@bpradipt , thanks you so much for pointing this out. Even though the issue is marked internal, I should not have included passwords. I have updated the ticket. |
The defect 194377 has been rejected, and new defect 194410 (https://bugzilla.linux.ibm.com/show_bug.cgi?id=194410) has been opened to replace that with the same people currently assigned to the previous defect. |
This is not a new issue for OCP 4.9 we have seen it since OC 4.6, and very likely not OCP install specific given that this happens when the first OCP bootstrap node is installed.
The OCP bare metal install with 100 GbitE Mellenox adapter is likely an usage scenario that expose the problem. We used 3 S922 systems, each with a 100 GbitE Melenox adapter and a 1 GbitE adapter. The OCP cluster's private network is defined on the SRIOV shared 100 GbitE network interface.
After defining DHCP, dnsmasq, httpd and firewall rules and haproxy, when the OCP node is activated using HMC's System Management Service shell, I can see that the node (bootstrap) can't reach the tftp servers. The install on that node ends with error "!BA017021". I have tried different layouts where the bootstrap node is on the same server or different servers as the bastion (the node for dhcp, dnsmasq, httpd and haproxy) - neither case works. When I switched from tftp boot to virtual media on a VIOS server (for the iso image), the install worked. It is able to pull other files from the httpd server from the private network (without any firewall, dns or httpd changes).
The text was updated successfully, but these errors were encountered: