-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark 128-core System76 Thelio Astra #44
Comments
On my first run, it compiled and started the benchmark, but a few minutes in, after the system was consuming 430W or so continuously (my UPS beeped a bit as I passed its 600W threshold), I saw power draw drop to 38W, and the system seemed to be locked up. Even a reboot from OpenBMC didn't seem to restore it—it is stuck in power off state even if I try powering it on via BMC. I had to manually power cycle the machine using the power button. I'm also wondering... I never heard the fans spin up at all, they just stayed in their idle RPM AFAICT—maybe the fan curve or the fan control on the little breakout adapter isn't running correctly? I'll ask System76 if that could be the case. |
Press 'o' for options, press '2' for the cpu tab and scroll down to 'Cpu sensor'. |
Ha! didn't even think of that. |
Just got the system back—this time things were intact, but the system still seemed to get warmer and warmer until hitting above 90°C and locking up—I saw DIMM and SoC overheat errors in OpenBMC, and it would hard lock, requiring a physical power button hold to shut down, or an 'immediate' poweroff in the BMC (an immediate reset wouldn't work). The SoC gets back down to 35°C pretty quickly, as the idle fan PWM seems to be fine (something like a silent 50% duty cycle). According to this commit, the But maybe |
On my Thelio Astra:
If I set the Ps=1,Qs=128 and run the HPL benchmark from this repo, the SoC temperature quickly rises to 60C then over the course of 10-15 minutes rises to 71C and is now reporting 72C. I'm concerned that if I was to leave it overnight it might continue rising.
Edit: I'm now occasionally seeing |
@geerlingguy Could you post the output of the |
@bexcran - my On my system the temperature rose pretty quickly from 35°C to 90°C (in the course of 10 minutes or so), and I never heard the fans move any faster listening closely out the exhaust fan (the air was nice and toasty though!). |
Ah, I left it plugged in so I could boot up remotely, yay!
Their power daemon is not running:
Are there any instructions for setting it up on Ubuntu? And separately—since it seems that must run to get the fan to work at different SoC temps... is there any way to guarantee it will work if running other OSes on the box, which might not be supported by |
I found System76 Driver (Install), which tells me to install the driver with:
That succeeded, and now:
I didn't see any fan speeds in |
With the daemon running, temps are more controlled—SoC goes up between 75-80°C, and fan speeds ramp between 1500-1800 RPM (intake around 1100-1200 RPM), while the CPU's burning 200-210W. I'm using the default OpenBLAS installation, not sure if it's picking the right profile for ampere or just generic arm64. |
Using the defaults: 1,147.2 Gflops at 415W, for 2.76 Gflops/W (total system power draw, CPU + IO was using around 225W) I may see about running the Ampere optimized Blis: https://github.com/AmpereComputing/HPL-on-Ampere-Altra . As a point of comparison, the M128-28X got 1.265 Tflops, so I'm guessing this will get at least 1.3 or 1.4—possibly more since we have two more memory channels on this system...
|
Re-testing with these instructions, the system certainly ramps up hotter, hitting the high 80°C's... The fan curve (currently |
@geerlingguy I think that's why OpenBMC uses a PID control loop for its fan controller - so that it keeps them running faster than needed for a few seconds as the temperature drops. https://github.com/openbmc/phosphor-pid-control/blob/master/tuning.md:
This page seems to have a good description of the process: https://www.west-cs.com/products/l2/pid-temperature-controller/
|
Happy to see this system get better than the expected result of 1597 Gflops according to the HPL-on-Ampere repo :) |
The Thelio Astra has an M128-30 Ampere Altra Max CPU, and the configuration I was sent includes 512 GB of ECC DDR4-3200 RAM. See: geerlingguy/sbc-reviews#53
The text was updated successfully, but these errors were encountered: