diff --git a/doc/extending/creating_an_op.rst b/doc/extending/creating_an_op.rst
index 746342ad4a..41a3bc3b89 100644
--- a/doc/extending/creating_an_op.rst
+++ b/doc/extending/creating_an_op.rst
@@ -74,32 +74,21 @@ possibilities you may encounter or need.  For that refer to
         # Properties attribute
         __props__ = ()
 
-        #itypes and otypes attributes are
-        #compulsory if make_node method is not defined.
-        #They're the type of input and output respectively
+        # itypes and otypes attributes are compulsory if make_node method is not defined.
+        # They're the type of input and output respectively
         itypes = None
         otypes = None
 
-        #Compulsory if itypes and otypes are not defined
+        # make_node is compulsory if itypes and otypes are not defined
+        # make_node is more flexible: output types can be determined
+        # based on the input types and Op properties.
         def make_node(self, *inputs):
             pass
 
-        # Python implementation:
+        # Method that evaluates the Op in Python:
         def perform(self, node, inputs_storage, output_storage):
             pass
 
-        # Other type of implementation
-        # C implementation: [see pytensor web site for other functions]
-        def c_code(self, node, inputs, outputs, sub):
-            pass
-
-        # Other implementations:
-        def make_thunk(self, node, storage_map, _, _2, impl=None):
-            pass
-
-        # optional:
-        check_input = True
-
         def __init__(self, *args):
             pass
 
@@ -114,9 +103,7 @@ possibilities you may encounter or need.  For that refer to
 
 An :class:`Op` has to implement some methods defined in the the interface of
 :class:`Op`. More specifically, it is mandatory for an :class:`Op` to define either
-the method :meth:`Op.make_node` or :attr:`Op.itypes`, :attr:`Op.otypes` and one of the
-implementation methods, either :meth:`Op.perform`, :meth:`COp.c_code`
-or :meth:`Op.make_thunk`.
+the method :meth:`Op.make_node` or :attr:`Op.itypes`, :attr:`Op.otypes`, and :meth:`Op.perform`.
 
   :meth:`Op.make_node` method creates an Apply node representing the application
   of the :class:`Op` on the inputs provided. This method is responsible for three things:
@@ -130,8 +117,7 @@ or :meth:`Op.make_thunk`.
       symbolic :class:`Type` to serve as the outputs of this :class:`Op`'s
       application.
     - it creates an :class:`Apply` instance with the input and output :class:`Variable`, and
-      return the :class:`Apply` instance.
-
+      itself an the :class:`Op`. It must return the :class:`Apply` instance.
 
 
   :meth:`Op.perform` method defines the Python implementation of an :class:`Op`.
@@ -165,35 +151,6 @@ or :meth:`Op.make_thunk`.
   :meth:`COp.c_code` and other related ``c_**`` methods. Note that an
   :class:`Op` can provide both Python and C implementations.
 
-  :meth:`Op.make_thunk` method is another alternative to :meth:`Op.perform`.
-  It returns a thunk. A thunk is defined as a zero-arguments
-  function which encapsulates the computation to be performed by an
-  :class:`Op` on the arguments of its corresponding node. It takes several parameters:
-
-    - ``node`` is the :class:`Apply` instance for which a thunk is requested,
-    - ``storage_map`` is a ``dict`` of lists which  maps variables to a one-element
-      lists holding the variable's current value. The one-element list acts as
-      pointer to the value and allows sharing that "pointer" with other nodes
-      and instances.
-    - ``compute_map`` is also a  dict of lists.
-      It maps variables to one-element lists holding booleans.  If
-      the value is 0 then the variable has not been computed and the
-      value should not be considered valid.  If the value is 1 the
-      variable has been computed and the value is valid.  If the value
-      is 2 the variable has been garbage-collected and is no longer
-      valid, but shouldn't be required anymore for this call.
-      The returned function must ensure that it sets the computed
-      variables as computed in the :obj:`compute_map`.
-    - ``impl`` allow to select between multiple implementation.
-      It should have a default value of ``None``.
-
-  :meth:`Op.make_thunk` is useful if you want to generate code and compile
-  it yourself.
-
-  If :meth:`Op.make_thunk` is defined by an :class:`Op`, it will be used by PyTensor
-  to obtain the :class:`Op`'s implementation.
-  :meth:`Op.perform` and :meth:`COp.c_code` will be ignored.
-
   If :meth:`Op.make_node` is not defined, the :attr:`Op.itypes` and :attr:`Op.otypes`
   are used by the :class:`Op`'s :meth:`Op.make_node` method to implement the functionality
   of :meth:`Op.make_node` method mentioned above.
@@ -238,7 +195,7 @@ There are other methods that can be optionally defined by the :class:`Op`:
   procedures.
 
   The :meth:`Op.grad` method is required if you want to differentiate some cost
-  whose expression includes your :class:`Op`. The gradient may be
+  whose expression includes your :class:`Op`. The gradient is
   specified symbolically in this method. It takes two arguments ``inputs`` and
   ``output_gradients``, which are both lists of :class:`Variable`\s, and
   those must be operated on using PyTensor's symbolic language. The :meth:`Op.grad`
@@ -261,61 +218,33 @@ There are other methods that can be optionally defined by the :class:`Op`:
   Jacobian of :math:`f` and right-multiplying it by :math:`v`, the evaluation
   point, namely: :math:`\frac{\partial f}{\partial x} v`.
 
-  The optional boolean :attr:`check_input` attribute is used to specify
-  if you want the types used in your :class:`COp` to check their inputs in their
-  :meth:`COp.c_code`. It can be used to speed up compilation, reduce overhead
-  (particularly for scalars) and reduce the number of generated C files.
-
 
 Example: :class:`Op` definition
 -------------------------------
 
 .. testcode:: example
 
-    import pytensor
     from pytensor.graph.op import Op
     from pytensor.graph.basic import Apply
-
+    from pytensor.tensor import as_tensor_variable
 
     class DoubleOp1(Op):
         __props__ = ()
 
         def make_node(self, x):
-            x = pytensor.tensor.as_tensor_variable(x)
-            # Note: using x_.type() is dangerous, as it copies x's broadcasting
-            # behaviour
-            return Apply(self, [x], [x.type()])
-
-        def perform(self, node, inputs, output_storage):
-            x = inputs[0]
-            z = output_storage[0]
-            z[0] = x * 2
-
-        def infer_shape(self, fgraph, node, i0_shapes):
-            return i0_shapes
+            # Convert (and require) x to be a TensorVariable
+            x = as_tensor_variable(x)
 
-        def grad(self, inputs, output_grads):
-            return [output_grads[0] * 2]
+            if not x.type.ndim == 2 and x.type.dtype == "float64":
+                raise TypeError("x must be a float64 matrix")
 
-        def R_op(self, inputs, eval_points):
-            # R_op can receive None as eval_points.
-            # That mean there is no diferientiable path through that input
-            # If this imply that you cannot compute some outputs,
-            # return None for those.
-            if eval_points[0] is None:
-                return eval_points
-            return self.grad(inputs, eval_points)
-
-    doubleOp1 = DoubleOp1()
-
-    #Using itypes and otypes
+            # Create an output variable of the same type as x
+            z = x.type()
 
-
-    class DoubleOp2(Op):
-        __props__ = ()
-
-        itypes = [pytensor.tensor.dmatrix]
-        otypes = [pytensor.tensor.dmatrix]
+            # TensorVariables type include shape and dtype, so this is equivalent to the following
+            # z = pytensor.tensor.TensorType(dtype=x.type.dtype, shape=x.type.shape)()
+            # z = pytensor.tensor.tensor(dtype=x.type.dtype, shape=x.type.shape)
+            return Apply(self, [x], [z])
 
         def perform(self, node, inputs, output_storage):
             x = inputs[0]
@@ -323,6 +252,7 @@ Example: :class:`Op` definition
             z[0] = x * 2
 
         def infer_shape(self, fgraph, node, i0_shapes):
+            # The output shape is the same as the input shape
             return i0_shapes
 
         def grad(self, inputs, output_grads):
@@ -337,7 +267,7 @@ Example: :class:`Op` definition
                 return eval_points
             return self.grad(inputs, eval_points)
 
-    doubleOp2 = DoubleOp2()
+    doubleOp1 = DoubleOp1()
 
 At a high level, the code fragment declares a class (e.g., ``DoubleOp1``) and then
 creates one instance of it (e.g., ``doubleOp1``).
@@ -348,32 +278,9 @@ subclass of :class:`Op`). You can call ``doubleOp1(tensor.vector())`` on a
 ``Variable`` to build an expression, and in the expression there will be
 a ``.op`` attribute that refers to ``doubleOp1``.
 
-.. The first two methods in the :class:`Op` are relatively boilerplate: ``__eq__``
-.. and ``__hash__``.
-.. When two :class:`Op`\s are equal, PyTensor will merge their outputs if they are applied to the same inputs.
-.. The base class says two objects are equal if (and only if)
-.. they are the same object.
-.. Writing these boilerplate definitions ensures that the logic of the equality comparison is always explicit.
-
-.. It is an essential part of the :ref:`op_contract` that if two :class:`Op`\s compare
-.. equal, then they must compute the same result when presented with the same
-.. inputs.  Here, if we allocated another instance of ``Fibby`` by typing ``fibby2
-.. = Fibby()`` then we would have two :class:`Op`\s that behave identically.
-..
-.. When should the implementation of ``__eq__`` be more complicated?
-.. If ``Fibby.__init__`` had parameters, then we could
-.. have configured ``fibby2`` differently from ``fibby`` by passing different
-.. arguments to the constructor. If we had done that, and if that different
-.. configuration made ``fibby2`` compute different results from ``fibby`` (for the
-.. same inputs) then we would have to add logic to the ``__eq__`` and ``__hash__``
-.. function so that he two ``Fibby`` :class:`Op`\s would *not be equal*.  The reason why: PyTensor's merge
-.. optimization looks for :class:`Op`\s comparing equal and merges them. If two :class:`Op`\s compare
-.. equal but don't always produce equal results from equal inputs, then you might
-.. see wrong calculation.
-
 The ``make_node`` method creates a node to be included in the expression graph.
-It runs when we apply our :class:`Op` (``doubleOp1``) to the ``Variable`` (``x``), as
-in ``doubleOp1(tensor.vector())``.
+It runs when we ``__call__`` our :class:`Op` (``doubleOp1``) to the ``Variable`` (``x``), as
+in ``doubleOp1(pytensor.tensor.matrix())``.
 When an :class:`Op` has multiple inputs, their order in the inputs argument to ``Apply``
 is important:  PyTensor will call ``make_node(*inputs)`` to copy the graph,
 so it is important not to change the semantics of the expression by changing
@@ -389,7 +296,7 @@ appropriate :class:`Type` for all output variables.
 The :func:`perform` method implements the :class:`Op`'s mathematical logic in Python.
 The inputs (here ``x``) are passed by value, but a single output is returned
 indirectly as the first element of single-element lists.  If ``doubleOp1`` had
-a second output, it would be stored in ``output_storage[1][0]``.
+a second output, it should be stored in ``output_storage[1][0]``.
 
 In some execution modes, the output storage might contain the return value of
 a previous call.  That old value can be reused to avoid memory re-allocation,
@@ -401,66 +308,89 @@ You can try the new :class:`Op` as follows:
 
     import numpy as np
     import pytensor
+    import pytensor.tensor as pt
 
-    x = pytensor.tensor.matrix()
-    f = pytensor.function([x], DoubleOp1()(x))
-    inp = np.random.random_sample((5, 4))
-    out = f(inp)
-    assert np.allclose(inp * 2, out)
-    print(inp)
-    print(out)
+    doubleOp1 = DoubleOp1()
+    x = pt.matrix("x")
+    out = doubleOp1(x)
+    fn = pytensor.function([x], out)
 
-.. testoutput:: example
-   :hide:
-   :options: +ELLIPSIS, +SKIP
+    x_np = np.random.normal(size=(5, 4))
+    out_np = fn(x_np)
+    np.testing.assert_allclose(x_np * 2, out_np)
 
-    <BLANKLINE>
 
-.. code-block:: none
+It's also a good idea to test the `infer_shape` implementation.
+To do this we will request a graph of the shape only:
 
-    [[ 0.08257206  0.34308357  0.5288043   0.06582951]
-     [ 0.65977826  0.10040307  0.5402353   0.55472296]
-     [ 0.82358552  0.29502171  0.97387481  0.0080757 ]
-     [ 0.77327215  0.65401857  0.76562992  0.94145702]
-     [ 0.8452076   0.30500101  0.88430501  0.95818655]]
-    [[ 0.16514411  0.68616713  1.0576086   0.13165902]
-     [ 1.31955651  0.20080613  1.08047061  1.10944593]
-     [ 1.64717104  0.59004341  1.94774962  0.0161514 ]
-     [ 1.5465443   1.30803715  1.53125983  1.88291403]
-     [ 1.6904152   0.61000201  1.76861002  1.9163731 ]]
+.. testcode::
 
-.. testcode:: example
+    out_shape = out.shape
+    shape_f = pytensor.function([x], out_shape)
+    assert tuple(shape_f(x_np)) == out_np.shape
+
+    # We can introspect the compiled function to confirm the Op is not evaluated
+    shape_f.dprint()
+
+.. testoutput::
+
+    MakeVector{dtype='int64'} [id A] 2
+     ├─ Shape_i{0} [id B] 1
+     │  └─ x [id C]
+     └─ Shape_i{1} [id D] 0
+        └─ x [id C]
+
+
+Finally we should test the gradient implementation.
+For this we can use the ``pytensor.gradient.verify_grad`` utility which will compare the output of a gradient function with fininte differences.
 
+.. testcode::
     import numpy as np
-    import pytensor
+    from pytensor.gradient import verify_grad
 
-    x = pytensor.tensor.matrix()
-    f = pytensor.function([x], DoubleOp2()(x))
-    inp = np.random.random_sample((5, 4))
-    out = f(inp)
-    assert np.allclose(inp * 2, out)
-    print(inp)
-    print(out)
+    rng = np.random.default_rng(42)
+    test_x = rng.normal(size=(5, 4))
 
+    # Raises if the gradient output is sufficiently different from the finite difference approximation.
+    verify_grad(doubleOp1, [test_x], rng)
 
-.. testoutput:: example
-   :hide:
-   :options: +ELLIPSIS, +SKIP
 
-    <BLANKLINE>
+Example: :attr:`itypes` and :attr:`otypes` definition
+-----------------------------------------------------
 
-.. code-block:: none
+Since the Op has a very strict type signature, we can use :attr:`itypes` and :attr:`otypes` instead of :meth:`make_node`:
+
+.. testcode:: example with itypes and otypes
+    from pytensor.tensor import dmatrix
+
+    class DoubleOp2(Op):
+        __props__ = ()
+
+        # inputs and output types must be float64 matrices
+        itypes = [dmatrix]
+        otypes = [dmatrix]
+
+        def perform(self, node, inputs, output_storage):
+            x = inputs[0]
+            z = output_storage[0]
+            z[0] = x * 2
+
+        def infer_shape(self, fgraph, node, i0_shapes):
+            return i0_shapes
 
-    [[ 0.02443785  0.67833979  0.91954769  0.95444365]
-     [ 0.60853382  0.7770539   0.78163219  0.92838837]
-     [ 0.04427765  0.37895602  0.23155797  0.4934699 ]
-     [ 0.20551517  0.7419955   0.34500905  0.49347629]
-     [ 0.24082769  0.49321452  0.24566545  0.15351132]]
-    [[ 0.04887571  1.35667957  1.83909538  1.90888731]
-     [ 1.21706764  1.55410779  1.56326439  1.85677674]
-     [ 0.08855531  0.75791203  0.46311594  0.9869398 ]
-     [ 0.41103034  1.48399101  0.69001811  0.98695258]
-     [ 0.48165539  0.98642904  0.4913309   0.30702264]]
+        def grad(self, inputs, output_grads):
+            return [output_grads[0] * 2]
+
+        def R_op(self, inputs, eval_points):
+            # R_op can receive None as eval_points.
+            # That mean there is no diferientiable path through that input
+            # If this imply that you cannot compute some outputs,
+            # return None for those.
+            if eval_points[0] is None:
+                return eval_points
+            return self.grad(inputs, eval_points)
+
+    doubleOp2 = DoubleOp2()
 
 
 Example: :attr:`__props__` definition
@@ -475,11 +405,10 @@ and ``b`` are equal.
 
 .. testcode:: properties
 
-    import pytensor
+    import pytensor.tensor as pt
     from pytensor.graph.op import Op
     from pytensor.graph.basic import Apply
 
-
     class AXPBOp(Op):
         """
         This creates an Op that takes x to a*x+b.
@@ -492,7 +421,7 @@ and ``b`` are equal.
             super().__init__()
 
         def make_node(self, x):
-            x = pytensor.tensor.as_tensor_variable(x)
+            x = pt.as_tensor_variable(x)
             return Apply(self, [x], [x.type()])
 
         def perform(self, node, inputs, output_storage):
@@ -500,8 +429,8 @@ and ``b`` are equal.
             z = output_storage[0]
             z[0] = self.a * x + self.b
 
-        def infer_shape(self, fgraph, node, i0_shapes):
-            return i0_shapes
+        def infer_shape(self, fgraph, node, input_shapes):
+            return input_shapes
 
         def grad(self, inputs, output_grads):
             return [self.a * output_grads[0]]
@@ -516,6 +445,9 @@ We can test this by running the following segment:
 
 .. testcode:: properties
 
+    import pytensor
+    import numpy as np
+
     mult4plus5op = AXPBOp(4, 5)
     another_mult4plus5op = AXPBOp(4, 5)
     mult2plus3op = AXPBOp(2, 3)
@@ -523,7 +455,7 @@ We can test this by running the following segment:
     assert mult4plus5op == another_mult4plus5op
     assert mult4plus5op != mult2plus3op
 
-    x = pytensor.tensor.matrix()
+    x = pt.matrix("x")
     f = pytensor.function([x], mult4plus5op(x))
     g = pytensor.function([x], mult2plus3op(x))
 
@@ -532,6 +464,211 @@ We can test this by running the following segment:
     assert np.allclose(2 * inp + 3, g(inp))
 
 
+To demonstrate the use of equality, we will define the following graph: ``mult4plus5op(x) + another_mult4plus5op(x) + mult3plus2op(x)``.
+And confirm PyTensor infers it can reuse the first term in place of the second ``another_mult4plus5op(x)``.
+
+.. testcode:: exploiting equality
+
+    from pytensor.graph import rewrite_graph
+
+    graph = mult4plus5op(x) + another_mult4plus5op(x) + mult2plus3op(x)
+    print("Before:")
+    graph.dprint()
+
+    print("\nAfter:")
+    rewritten_graph = rewrite_graph(graph)
+    rewritten_graph.dprint()
+
+
+.. testoutput::
+    Before:
+    Add [id A]
+     ├─ Add [id B]
+     │  ├─ AXPBOp{a=4, b=5} [id C]
+     │  │  └─ x [id D]
+     │  └─ AXPBOp{a=4, b=5} [id E]
+     │     └─ x [id D]
+     └─ AXPBOp{a=2, b=3} [id F]
+        └─ x [id D]
+
+    After:
+    Add [id A]
+     ├─ AXPBOp{a=4, b=5} [id B]
+     │  └─ x [id C]
+     ├─ AXPBOp{a=4, b=5} [id B]
+     │  └─ ···
+     └─ AXPBOp{a=2, b=3} [id D]
+        └─ x [id C]
+
+Note how after rewriting, the same variable [id B] is used twice.
+Also the string representation of the Op shows the values of the properties.
+
+
+Example: More complex :class:`Op`
+---------------------------------
+
+As a final example, we will create a multi-output :class:`Op` that takes a matrix and a vector and returns the matrix transposed and the sum of the vector.
+
+Furthermore, this :class:`Op` will work with batched dimensions, meaning we can pass in a 3D tensor or a 2D tensor (or more) and it will work as expected.
+To achieve this behavior we cannot use `itypes` and `otypes` as those encode specific number of dimensions.
+Instead we will have to define the `make_node` method.
+
+We need to be careful in the :meth:`grad` method, as one of output gradients may be disconnected from the cost, in which case we should ignore its contribution.
+If both outputs are disconnected PyTensor will not bother calling the :meth:`grad` method, so we don't need to worry about that case.
+
+.. testcode::
+
+    import pytensor.tensor as pt
+
+    from pytensor.graph.op import Op
+    from pytensor.graph.basic import Apply
+    from pytensor.gradient import DisconnectedType
+
+    class TransposeAndSumOp(Op):
+        __props__ = ()
+
+        def make_node(self, x, y):
+            # Convert to TensorVariables (and fail if not possible)
+            x = pt.as_tensor_variable(x)
+            y = pt.as_tensor_variable(y)
+
+            # Validate inputs dimensions
+            if x.type.ndim < 2:
+                raise TypeError("x must be at least a matrix")
+            if y.type.ndim < 1:
+                raise TypeError("y must be at least a vector")
+
+            # Create output variables
+            out1_static_shape = (*x.type.shape[:-2], x.type.shape[-1], x.type.shape[-2])
+            out1_dtype = x.type.dtype
+            out1 = pt.tensor(dtype=out1_dtype, shape=out1_static_shape)
+
+            out2_static_shape = y.type.shape[:-1]
+            out2_dtype = "float64"  # hard-coded regardless of the input
+            out2 = pt.tensor(dtype=out2_dtype, shape=out2_static_shape)
+
+            return Apply(self, [x, y], [out1, out2])
+
+        def perform(self, node, inputs, output_storage):
+            x, y = inputs
+            out_1, out_2 = output_storage
+            out_1[0] = np.swapaxes(x, -1, -2)
+            out_2[0] = y.sum(-1).astype("float64")
+
+        def infer_shape(self, fgraph, node, input_shapes):
+            x_shapes, y_shapes = input_shapes
+            out1_shape = (*x_shapes[:-2], x_shapes[-1], x_shapes[-2])
+            out2_shape = y_shapes[:-1]
+            return [out1_shape, out2_shape]
+
+        def grad(self, inputs, output_grads):
+            x, y = inputs
+            out1_grad, out2_grad = output_grads
+
+            if isinstance(out1_grad.type, DisconnectedType):
+                x_grad = DisconnectedType()()
+            else:
+                # Transpose the last two dimensions of the output gradient
+                x_grad = pt.swapaxes(out1_grad, -1, -2)
+
+            if isinstance(out2_grad.type, DisconnectedType):
+                y_grad = DisconnectedType()()
+            else:
+                # Broadcast the output gradient to the same shape as y
+                y_grad = pt.broadcast_to(pt.expand_dims(out2_grad, -1), y.shape)
+
+            return [x_grad, y_grad]
+
+Let's test the Op evaluation:
+
+.. testcode::
+
+    import numpy as np
+    from pytensor import function
+
+    transpose_and_sum_op = TransposeAndSumOp()
+
+    x = pt.tensor("x", shape=(5, None, 3), dtype="float32")
+    y = pt.matrix("y", shape=(2, 1), dtype="float32")
+    x_np = np.random.normal(size=(5, 4, 3)).astype(np.float32)
+    y_np = np.random.normal(size=(2, 1)).astype(np.float32)
+
+    out1, out2 = transpose_and_sum_op(x, y)
+
+    # Test the output types
+    assert out1.type.shape == (5, 3, None)
+    assert out1.type.dtype == "float32"
+    assert out2.type.shape == (2,)
+    assert out2.type.dtype == "float64"
+
+    # Test the perform method
+    f = function([x, y], [out1, out2])
+    out1_np, out2_np = f(x_np, y_np)
+    np.testing.assert_allclose(out1_np, x_np.swapaxes(-1, -2))
+    np.testing.assert_allclose(out2_np, y_np.sum(-1))
+
+
+We should also test the shape inference:
+
+.. testcode::
+
+    out1_shape = out1.shape
+    out2_shape = out2.shape
+    shape_fn = function([x, y], [out1_shape, out2_shape])
+
+    out1_shape_np, out2_shape_np = shape_fn(x_np, y_np)
+    assert tuple(out1_shape_np) == out1_np.shape
+    assert tuple(out2_shape_np) == out2_np.shape
+
+    # We can introspect the compiled function to confirm the Op is not needed
+    shape_fn.dprint()
+
+.. testoutput::
+
+    MakeVector{dtype='int64'} [id A] 1
+     ├─ 5 [id B]
+     ├─ 3 [id C]
+     └─ Shape_i{1} [id D] 0
+        └─ x [id E]
+    DeepCopyOp [id F] 2
+     └─ [2] [id G]
+
+Finally, we can test the gradient computation:
+
+Again, we can use pytensor `verify_grad` function to test the gradient implementation.
+Due to the presence of multiple outputs we need to pass a Callable instead of the Op instance.
+There are different cases we want to test: when both or just one of the outputs is connected to the cost
+
+
+.. testcode::
+
+    import numpy as np
+    from pytensor.gradient import verify_grad
+
+    transpose_and_sum_op = TransposeAndSumOp()
+
+    def both_outs_connected(x, y):
+        out1, out2 = transpose_and_sum_op(x, y)
+        return out1.sum() + out2.sum()
+
+    def only_out1_connected(x, y):
+        out1, _ = transpose_and_sum_op(x, y)
+        return out1.sum()
+
+    def only_out2_connected(x, y):
+        _, out2 = transpose_and_sum_op(x, y)
+        return out2.sum()
+
+    rng = np.random.default_rng(seed=37)
+    x_np = rng.random((5, 4, 3)).astype(np.float32)
+    y_np = rng.random((2, 1)).astype(np.float32)
+    verify_grad(both_outs_connected, [x_np, y_np], rng=rng)
+    verify_grad(only_out1_connected, [x_np, y_np], rng=rng)
+    verify_grad(only_out2_connected, [x_np, y_np], rng=rng)
+
+You will see a warning about DisconnectTypes being returned by the gradient method.
+PyTensor would like to know how the outputs of the Op are connected to the input, which could be done with `connection_pattern`
+
 How To Test it
 --------------