Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vchiq: Missing commits from downstream #63

Closed
wants to merge 5 commits into from

Conversation

popcornmix
Copy link

Just tested vchiq in 4.9 and it works fine with downstream kernel.
I notices some commits present in 4.4 tree were missing.

Phil Elwell added 5 commits October 18, 2016 14:31
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Reading through this code looking for another problem (now found in userland)
the use of dequeue_pending outside a lock didn't seem safe.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Service callbacks are not allowed to return an error. The internal callback
that delivers events and messages to user tasks does not enqueue them if
the service is closing, but this is not an error and should not be
reported as such.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
An issue was observed when flushing openmax components
which generate a large number of messages returning
buffers to host.

We occasionally found a duplicate message from 16
messages prior, resulting in a buffer returned twice.

While only one thread adds completions, without the
mutex you don't get the protection of the automatic
memory barrier you get with synchronisation objects.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Claiming the completion_mutex within add_completion did prevent some
messages appearing twice, but provokes a deadlock caused by vcsm using
vchiq within a page fault handler.

Revert the use of completion_mutex, and instead fix the original
problem using more memory barriers.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
@anholt
Copy link
Owner

anholt commented Nov 11, 2016

Note: I'm not allowed to merge pull requests; all commits must go through the email workflow. I'm barely keeping up with reviewing/acking vchiq patches as they're sent in, and I don't have time to manage being the one forwarding this set upstream.

@lategoodbye
Copy link

AFAIK these patches hasn't been merged yet. I could take care of them, but they aren't ready for submission yet.

Copy link

@lategoodbye lategoodbye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a commit message which explain why this is necessary.

@popcornmix
Copy link
Author

Are you referring to the first logging tweak commit, or something else?

@lategoodbye
Copy link

Yes. Sorry, i'm using the review feature for the first time and i didn't find a delete button.

@popcornmix
Copy link
Author

@pelwell can you provide a suitable commit comment for the logging tweak commit?

Copy link

@lategoodbye lategoodbye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my notes before i'm able to send the patches upstream.

}

if (SRVTRACE_ENABLED(service,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a commit message which explain why this complete patch is necessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Move the trace message outside the gather loop so fragmentation
    doesn't mask significant bytes.
  • Reduce the size of the data logged - 16 bytes is sufficient.

Signed-off-by: Phil Elwell phil@raspberrypi.org

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It seems that the first part has already been applied.

@@ -210,18 +210,26 @@ add_completion(VCHIQ_INSTANCE_T instance, VCHIQ_REASON_T reason,
VCHIQ_COMPLETION_DATA_T *completion;
DEBUG_INITIALISE(g_state.local)

mutex_lock(&instance->completion_mutex);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i get it right, the following patch reverts this patch, so this should be dropped completely.

@@ -64,10 +64,10 @@
#define VCHIQ_MINOR 0

/* Some per-instance constants */
#define MAX_COMPLETIONS 16
#define MAX_COMPLETIONS 128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit messages doesn't explain why it's increased. So this should be a separate patch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pelwell Thanks for the other comment, but the changes to MAX_COMPLETIONS and MSG_QUEUE_SIZE needs a separate commit message, too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vchiq_arm: Avoid premature stalls

The constants MAX_COMPLETIONS and MSG_QUEUE_SIZE control the number of
messages that can be outstanding to each client before the system as a
whole stalls. If the numbers are too small then unnecessary thread
switching will occur while waiting for a (potentially low priority)
client thread to consume some data; badly written clients can even
lead to deadlock.

For services that carry many short messages, 16 messages can represent
a very small amount of data. Since the resources are small - 16 bytes
for a completion, 4 bytes for a message pointer - increase the limits
so they are unlikely to be hit except in exceptional circumstances.

Signed-off-by: Phil Elwell phil@raspberrypi.org

while (slot_queue_available != local->slot_queue_recycle) {
unsigned int pos;
int slot_index = local->slot_queue[slot_queue_available++ &
VCHIQ_SLOT_QUEUE_MASK];
char *data = (char *)SLOT_DATA_FROM_INDEX(state, slot_index);
int data_found = 0;

rmb();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pelwell I've rebased and rearranged this series. But checkpatch complains about a missing comment why this read memory barrier is necessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	/* Beware of the address dependency - data is calculated
	** using an index written by the other side. */

@@ -741,6 +743,8 @@ process_free_queue(VCHIQ_STATE_T *state)
up(&state->data_quota_event);
}

mb();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pelwell checkpatch complains here, too

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		/* Don't allow the slot to be reused until we are no
		** longer interested in it. */

@Electron752
Copy link

This is just a heads up that I submitted a set of patches for the ioctl compatibility cleanup without realizing that these changes modify some of the same areas. It looks like it's mostly the await for completion ioctl that has a conflict.

I don't know what set may get applied first...

@lategoodbye
Copy link

Yes, i didn't expect such a patch set one day after mine. I will add you to CC next time.

Btw real ARM64 test results for this patch series are appreciated.

@Electron752
Copy link

Sure thing on the testing. I'm going to be away from the computer for awhile, so I won't be able to get to test it until later in the day.

It doesn't look very likely to break ARM64 though but I'll test it anyway.

@lategoodbye
Copy link

Thanks, but no need to hurry. In best case the patch set gets applied next week.

@Electron752
Copy link

OK, I ran some tests on arm64 with the patches and it appears to work.

I ran multiple instances of arm64 vchiq_test concurrently running "-f 200" and "-p 5000". As far as I can tell nothing is broken.

It is a bit difficult to tests the upstream tree well of course since none of the downstream drivers that use vchiq are in the tree.

@lategoodbye
Copy link

Thanks again. I think it's not a big problem having no vchiq user driver upstream as long as the downstream drivers could use it.

@lategoodbye
Copy link

The patch series has been merged to staging-next, so i think this pull request could be closed.

@popcornmix
Copy link
Author

Thanks!

@popcornmix popcornmix closed this Jan 20, 2017
anholt pushed a commit that referenced this pull request Jan 23, 2017
commit 1c7de2b upstream.

There is at least one Chelsio 10Gb card which uses VPD area to store some
non-standard blocks (example below).  However pci_vpd_size() returns the
length of the first block only assuming that there can be only one VPD "End
Tag".

Since 4e1a635 ("vfio/pci: Use kernel VPD access functions"), VFIO
blocks access beyond that offset, which prevents the guest "cxgb3" driver
from probing the device.  The host system does not have this problem as its
driver accesses the config space directly without pci_read_vpd().

Add a quirk to override the VPD size to a bigger value.  The maximum size
is taken from EEPROMSIZE in drivers/net/ethernet/chelsio/cxgb3/common.h.
We do not read the tag as the cxgb3 driver does as the driver supports
writing to EEPROM/VPD and when it writes, it only checks for 8192 bytes
boundary.  The quirk is registered for all devices supported by the cxgb3
driver.

This adds a quirk to the PCI layer (not to the cxgb3 driver) as the cxgb3
driver itself accesses VPD directly and the problem only exists with the
vfio-pci driver (when cxgb3 is not running on the host and may not be even
loaded) which blocks accesses beyond the first block of VPD data.  However
vfio-pci itself does not have quirks mechanism so we add it to PCI.

This is the controller:
Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port Adapter [1425:0030]

This is what I parsed from its VPD:
===
b'\x82*\x0010 Gigabit Ethernet-SR PCI Express Adapter\x90J\x00EC\x07D76809 FN\x0746K'
 0000 Large item 42 bytes; name 0x2 Identifier String
	b'10 Gigabit Ethernet-SR PCI Express Adapter'
 002d Large item 74 bytes; name 0x10
	#00 [EC] len=7: b'D76809 '
	#0a [FN] len=7: b'46K7897'
	#14 [PN] len=7: b'46K7897'
	#1e [MN] len=4: b'1037'
	#25 [FC] len=4: b'5769'
	#2c [SN] len=12: b'YL102035603V'
	#3b [NA] len=12: b'00145E992ED1'
 007a Small item 1 bytes; name 0xf End Tag

 0c00 Large item 16 bytes; name 0x2 Identifier String
	b'S310E-SR-X      '
 0c13 Large item 234 bytes; name 0x10
	#00 [PN] len=16: b'TBD             '
	#13 [EC] len=16: b'110107730D2     '
	#26 [SN] len=16: b'97YL102035603V  '
	#39 [NA] len=12: b'00145E992ED1'
	#48 [V0] len=6: b'175000'
	#51 [V1] len=6: b'266666'
	#5a [V2] len=6: b'266666'
	#63 [V3] len=6: b'2000  '
	#6c [V4] len=2: b'1 '
	#71 [V5] len=6: b'c2    '
	#7a [V6] len=6: b'0     '
	#83 [V7] len=2: b'1 '
	#88 [V8] len=2: b'0 '
	#8d [V9] len=2: b'0 '
	#92 [VA] len=2: b'0 '
	#97 [RV] len=80: b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
 0d00 Large item 252 bytes; name 0x11
	#00 [VC] len=16: b'122310_1222 dp  '
	#13 [VD] len=16: b'610-0001-00 H1\x00\x00'
	#26 [VE] len=16: b'122310_1353 fp  '
	#39 [VF] len=16: b'610-0001-00 H1\x00\x00'
	#4c [RW] len=173: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
 0dff Small item 0 bytes; name 0xf End Tag

10f3 Large item 13315 bytes; name 0x62
!!! unknown item name 98: b'\xd0\x03\x00@`\x0c\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00'
===

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
lategoodbye pushed a commit that referenced this pull request Jan 6, 2018
load_bpf_file() should fail if ioctl with command
PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_SET_BPF fails.
When they do fail, proper error messages are printed.

With this change, the below "syscall_tp" run shows that
the maximum number of bpf progs attaching to the same
perf tracepoint is indeed enforced.
  $ ./syscall_tp -i 64
  prog #0: map ids 4 5
  ...
  prog #63: map ids 382 383
  $ ./syscall_tp -i 65
  prog #0: map ids 4 5
  ...
  prog #64: map ids 388 389
  ioctl PERF_EVENT_IOC_SET_BPF failed err Argument list too long

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants