Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBP 14,3 runs extremely hot #70

Closed
whereswaldon opened this issue Aug 3, 2018 · 20 comments
Closed

MBP 14,3 runs extremely hot #70

whereswaldon opened this issue Aug 3, 2018 · 20 comments

Comments

@whereswaldon
Copy link

I frequently see messages like:

[ 1895.960304] CPU7: Core temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960305] CPU3: Core temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960308] CPU0: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960308] CPU4: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960317] CPU5: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960318] CPU3: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960319] CPU6: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960320] CPU2: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960325] CPU7: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.960396] CPU1: Package temperature above threshold, cpu clock throttled (total events = 10)
[ 1895.963293] CPU7: Core temperature/speed normal
[ 1895.963293] CPU3: Core temperature/speed normal
[ 1895.963295] CPU3: Package temperature/speed normal
[ 1895.963296] CPU7: Package temperature/speed normal
[ 1895.963336] CPU0: Package temperature/speed normal
[ 1895.963337] CPU1: Package temperature/speed normal
[ 1895.963338] CPU5: Package temperature/speed normal
[ 1895.963338] CPU4: Package temperature/speed normal
[ 1895.963339] CPU6: Package temperature/speed normal
[ 1895.963340] CPU2: Package temperature/speed normal

These messages appear while running almost no workload (browser, slack, maybe one other thing). When I look, I see temps approaching 100C on the CPUs.

Does anyone else have this? Are there tricks for dealing with it? I'm researching the available applesmc options right now, but I haven't had tons of success with that in the past.

@myrgy
Copy link

myrgy commented Aug 3, 2018

Hi,

try to check cpu ussage. There are some issues with interruptions. I was able to workaround it using
https://askubuntu.com/questions/1029745/ubuntu-18-04-w-macbook-pro-kworker-keeps-hogging-up-my-cpu-solved

@whereswaldon
Copy link
Author

I was able to get things under control with:

echo disable | sudo tee /sys/firmware/acpi/interrupts/gpe07

Thanks for your help @myrgy !

@roadrunner2
Copy link
Contributor

Interesting - it looks like this is related to https://bugzilla.kernel.org/show_bug.cgi?id=198169 , though I've never seen the cpu usage of the kworker go above 5% or so. Can you show some sample stack traces? (sudo cat /proc/<pid-of-kworker>/stack)

@myrgy
Copy link

myrgy commented Aug 10, 2018

Will do.

@roadrunner2
Copy link
Contributor

Btw., I've found the GPE07 issue - see the last two patches attached to the above bug.

@roadrunner2 roadrunner2 mentioned this issue Sep 20, 2018
@Dunedan
Copy link
Owner

Dunedan commented Sep 20, 2018

@whereswaldon: Does the patch mentioned by @roadrunner2 fix your problems?

@whereswaldon
Copy link
Author

I'm sorry that it has taken so long for me to circle back to this. I've applied the following as patches:

From 9f575eb923df20a83fb25f1d0c7ae15df1422a95 Mon Sep 17 00:00:00 2001
From: Zhang Rui <rui.zhang@intel.com>
Date: Wed, 29 Aug 2018 10:26:51 +0800
Subject: [PATCH] acpi/sbshc: handle acpi notification

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 drivers/acpi/sbshc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c
index 7a34310..252268b 100644
--- a/drivers/acpi/sbshc.c
+++ b/drivers/acpi/sbshc.c
@@ -20,6 +20,7 @@
 
 #define ACPI_SMB_HC_CLASS	"smbus_host_ctl"
 #define ACPI_SMB_HC_DEVICE_NAME	"ACPI SMBus HC"
+#define ACPI_SMB_HC_STATUS_CHANGE	0x80
 
 struct acpi_smb_hc {
 	struct acpi_ec *ec;
@@ -34,6 +35,7 @@ struct acpi_smb_hc {
 
 static int acpi_smbus_hc_add(struct acpi_device *device);
 static int acpi_smbus_hc_remove(struct acpi_device *device);
+static void acpi_smbus_notify(struct acpi_device *device, u32 event);
 
 static const struct acpi_device_id sbs_device_ids[] = {
 	{"ACPI0001", 0},
@@ -50,6 +52,7 @@ static struct acpi_driver acpi_smb_hc_driver = {
 	.ops = {
 		.add = acpi_smbus_hc_add,
 		.remove = acpi_smbus_hc_remove,
+		.notify = acpi_smbus_notify,
 		},
 };
 
@@ -245,6 +248,14 @@ extern int acpi_ec_add_query_handler(struct acpi_ec *ec, u8 query_bit,
 			      acpi_handle handle, acpi_ec_query_func func,
 			      void *data);
 
+static void acpi_smbus_notify(struct acpi_device *device, u32 event)
+{
+	struct acpi_smb_hc *hc = acpi_driver_data(device);
+
+	if (event == ACPI_SMB_HC_STATUS_CHANGE)
+		smbus_alarm(hc);
+}
+
 static int acpi_smbus_hc_add(struct acpi_device *device)
 {
 	int status;
-- 
2.7.4

From 7fa985b355d4721e4e63b14d8e7e4e585048d310 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ronald=20Tschal=C3=A4r?= <ronald@innovation.ch>
Date: Tue, 4 Sep 2018 01:03:14 -0700
Subject: [PATCH 2/2] ACPI/sbs: Fix GPE storm on recent MacBookPro's.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

On Apple machines, plugging-in or unplugging the power triggers a GPE
for the EC. Since these machines expose an SBS device, this GPE ends
up triggering the acpi_sbs_callback(). This in turn tries to get the
status of the SBS charger. However, on MBP13,* and MBP14,* machines,
performing the smbus-read operation to get the charger's status triggers
the EC's GPE again. The result is an endless re-triggering and handling
of that GPE, consuming significant CPU resources (> 50% in irq).

In the end this is quite similar to commit 3031cddea633 (ACPI / SBS:
Don't assume the existence of an SBS charger), except that on the above
machines a status of all 1's is returned. And like there, we just want
ignore the charger here.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=198169
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
---
 drivers/acpi/sbs.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/sbs.c b/drivers/acpi/sbs.c
index 045369c431de..dc2017991e91 100644
--- a/drivers/acpi/sbs.c
+++ b/drivers/acpi/sbs.c
@@ -441,9 +441,13 @@ static int acpi_ac_get_present(struct acpi_sbs *sbs)
 
 	/*
 	 * The spec requires that bit 4 always be 1. If it's not set, assume
-	 * that the implementation doesn't support an SBS charger
+	 * that the implementation doesn't support an SBS charger.
+	 *
+	 * And on some MacBooks a status of 0xffff is always returned, no
+	 * matter whether the charger is plugged in or not, which is also
+	 * wrong, so ignore the SBS charger for those too.
 	 */
-	if (!((status >> 4) & 0x1))
+	if (!((status >> 4) & 0x1) || status == 0xffff)
 		return -ENODEV;
 
 	sbs->charger_present = (status >> 15) & 0x1;
-- 
2.17.1

@@ -, +, @@ 
---
 drivers/acpi/osl.c   | 1 +
 drivers/acpi/sbshc.c | 2 ++
 2 files changed, 3 insertions(+)
--- a/drivers/acpi/osl.c	
+++ a/drivers/acpi/osl.c	
@@ -1129,6 +1129,7 @@ void acpi_os_wait_events_complete(void)
 	flush_workqueue(kacpid_wq);
 	flush_workqueue(kacpi_notify_wq);
 }
+EXPORT_SYMBOL(acpi_os_wait_events_complete);
 
 struct acpi_hp_work {
 	struct work_struct work;
--- a/drivers/acpi/sbshc.c	
+++ a/drivers/acpi/sbshc.c	
@@ -196,6 +196,7 @@ int acpi_smbus_unregister_callback(struct acpi_smb_hc *hc)
 	hc->callback = NULL;
 	hc->context = NULL;
 	mutex_unlock(&hc->lock);
+	acpi_os_wait_events_complete();
 	return 0;
 }
 
@@ -292,6 +293,7 @@ static int acpi_smbus_hc_remove(struct acpi_device *device)
 
 	hc = acpi_driver_data(device);
 	acpi_ec_remove_query_handler(hc->ec, hc->query_bit);
+	acpi_os_wait_events_complete();
 	kfree(hc);
 	device->driver_data = NULL;
 	return 0;
-- 

I still get a ton of GPE07 interrupts, but there isn't a kworker chewing through them and making the machine melt, so I guess it's fixed?

@roadrunner2
Copy link
Contributor

@whereswaldon The first patch (from Zhang Rui) is unnecessary (the notifications are already sent, but via a different path); only my patch is needed. Btw., this patch is now in master, i.e. will be in 4.20/5.0.

How many interrupts/second are you still getting, roughly? And how much cpu is it using? (it could show up in sys and/or irq) Does the battery indicator update appropriately within a few seconds when you plug/unplug the power now, or does it take longer? Also, how many adapters do you see in /sys/class/power_supply/?

@Dunedan
Copy link
Owner

Dunedan commented Dec 28, 2018

Closing this, as there is no feedback from the reporter. I'd be happy though to re-open this issue if the problem isn't resolved yet.

@Dunedan Dunedan closed this as completed Dec 28, 2018
@whereswaldon
Copy link
Author

Sorry that it's taken me so long to circle back here. I've been using a different machine most of the time. I will say that I don't see a ton of CPU utilization from it anymore, so I can't be sure that GPE07 is the problem, but my machine does still overheat a few times a day. On the offchance that this GPE is the problem, I'll answer @roadrunner2's questions:

Adapters:

$ ls /sys/class/power_supply/
ADP1@  BAT0@

I've attached a CSV of the GPE07 interrupts every second for 100 seconds (while AC adapter plugged in). I don't have much of a frame of reference for how many is "a lot".
gpe07.txt

 $ uname -a
Linux dhcp-9-27-202-67 4.20.3-200.fc29.x86_64 #1 SMP Thu Jan 17 15:19:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I'm having trouble testing battery status right at this second (not showing up as charging, but it's pretty full). I'll try to replicate later, but I'll post this data now so that I don't forget to again.

Thanks everyone for your support! Work is so much more pleasant from Linux, and you're the reason that I can even do that on this hardware.

@Dunedan Dunedan reopened this Feb 11, 2019
@roadrunner2
Copy link
Contributor

Looking at those interrupt counts it looks like there are 161 interrupts exactly every 2 seconds (and a few times every second). That's odd, both in that it's exactly 161 every time, and that there are any interrupts at all - on my system I only see any interrupts when I unplug or plug in the power (and then it's something like 800 interrupts spread over around 8 seconds). In any case, I sort of doubt this is the cause of your overheating.

@Dunedan
Copy link
Owner

Dunedan commented Sep 16, 2019

@whereswaldon Did you have any additional insights regarding this issue? Do you still have this problem?

@marc-git
Copy link
Contributor

I have this laptop, but I'm no genius, how can I check this?
sudo cat /proc/<pid-of-kworker>/stack
how to find out the pid of a kworker?

@whereswaldon
Copy link
Author

whereswaldon commented Sep 20, 2019 via email

@marc-git
Copy link
Contributor

so solved ? or should I have a look?

@whereswaldon
Copy link
Author

whereswaldon commented Sep 23, 2019 via email

@marc-git
Copy link
Contributor

I was experiencing non-extreme overheating before I disabled the GPU. I mean it would get hot idling on my desk. Since disabling it, the laptop runs warmer in Linux than OSX but it isn't extraordinarily hot.

@ClashTheBunny
Copy link
Contributor

I wonder if this is just a hardware issue:
https://apple.stackexchange.com/a/363933

@Dunedan
Copy link
Owner

Dunedan commented Nov 10, 2019

Even under macOS those MacBookPro's have horrible thermal management, because Apple squeezed too many high power devices in such a small form factor. Under Linux this is worse, because Linux still lacks several power management features (like missing Thunderbolt power management, which also prevents the CPU package to reach power saving states). Under Linux I have a warm case the whole time, no matter what I do. As it's still not too hot to touch and as the fans crank up as soon as I apply some load, I believe that's usual behavior.

What'd be unusual would be such a hot case, that touching hurts, but in that case I'd assume a hardware defect.

If that's what you experience as well, let's close this issue.

@marc-git
Copy link
Contributor

Even under macOS those MacBookPro's have horrible thermal management, because Apple squeezed too many high power devices in such a small form factor. Under Linux this is worse, because Linux still lacks several power management features (like missing Thunderbolt power management, which also prevents the CPU package to reach power saving states). Under Linux I have a warm case the whole time, no matter what I do. As it's still not too hot to touch and as the fans crank up as soon as I apply some load, I believe that's usual behavior.

What'd be unusual would be such a hot case, that touching hurts, but in that case I'd assume a hardware defect.

If that's what you experience as well, let's close this issue.

Yes let's close it. I was just trying to see if I could boost fans with mbpfan but that hasn't been developed for the 14 models so maybe not worth it. Anyway extreme overheating is not something I am experiencing.

@Dunedan Dunedan closed this as completed Nov 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants