Merge pull request #1 from kabilar/quality

JaerongA · web-flow · commit f0388961b667 · 2023-04-21T15:19:06.000-05:00
Update quality notebook
diff --git a/README.md b/README.md
@@ -26,17 +26,18 @@ The easiest way to learn about DataJoint Elements is to use the tutorial noteboo
 
 Here are some options that provide a great experience:
 
-- **Cloud-based IDE**: (*recommended*)
+- Cloud-based Development Environment: (*recommended*)
   - Launch using [GitHub Codespaces](https://github.com/features/codespaces) using the `+` option which will `Create codespace on main` in the codebase repository on your fork with default options. For more control, see the `...` where you may create `New with options...`.
   - Build time for a codespace is **~7m**. This is done infrequently and cached for convenience.
   - Start time for a codespace is **~30s**. This will pull the built codespace from cache when you need it.
   - *Tip*: Each month, GitHub renews a [free-tier](https://docs.github.com/en/billing/managing-billing-for-github-codespaces/about-billing-for-github-codespaces#monthly-included-storage-and-core-hours-for-personal-accounts) quota of compute and storage. Typically we run into the storage limits before anything else since Codespaces consume storage while stopped. It is best to delete Codespaces when not actively in use and recreate when needed. We'll soon be creating prebuilds to avoid larger build times. Once any portion of your quota is reached, you will need to wait for it to be reset at the end of your cycle or add billing info to your GitHub account to handle overages.
   - *Tip*: GitHub auto names the codespace but you can rename the codespace so that it is easier to identify later.
-- **Local IDE**:
-  - Ensure you have [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
-  - Ensure you have [Docker](https://docs.docker.com/get-docker/)
-  - Ensure you have [VSCode](https://code.visualstudio.com/)
-  - Install the [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
+- Local Development Environment:
+  - Note: On Windows, running the tutorial notebook with the example data in a Dev Container is not currently possible due to a s3fs mounting issue.  Please use the `Cloud-based Development Environment` option above.
+  - Install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
+  - Install [Docker](https://docs.docker.com/get-docker/)
+  - Install [VSCode](https://code.visualstudio.com/)
+  - Install the VSCode [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
   - `git clone` the codebase repository and open it in VSCode
   - Use the `Dev Containers extension` to `Reopen in Container` (More info in the `Getting started` included with the extension)
 
diff --git a/notebooks/quality_metrics.ipynb b/notebooks/quality_metrics.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "## Quality Metrics\n",
     "\n",
-    "Visualize the spike sorting quality metrics that are generated from Kilosort (`metrics.csv`) and stored in the DataJoint pipeline (`element-array-ephys`).\n",
+    "Visualize the spike sorting quality metrics that are generated from the [Kilosort](https://github.com/MouseLand/Kilosort) results with the [ecephys_spike_sorting](https://github.com/datajoint/ecephys_spike_sorting) package (i.e. `metrics.csv`) and stored in the DataJoint pipeline (i.e. `element-array-ephys`).\n",
     "\n",
     "If you are new to using this DataJoint pipeline for analyzing electrophysiology recordings from Neuropixels probes, please see the [tutorial](./tutorial.ipynb) notebook for an in-depth explanation to set up and run the workflow.\n",
     "\n",
@@ -59,10 +59,10 @@
     "\n",
     "| Metric | Description |\n",
     "| --- | --- |\n",
-    "| Firing rates (Hz) | Total number of spikes per time in seconds. |\n",
+    "| Firing rate (Hz) | Total number of spikes per second. |\n",
     "| Signal-to-noise ratio | Ratio of the maximum amplitude of the mean spike waveform to the standard deviation of the background noise on a given channel. |\n",
     "| Presence ratio | Proportion of time during a session that a unit is spiking, ranging from 0 to 0.99. |\n",
-    "| ISI (Interspike interval) violation | Rate of ISI violation as a fraction of overall rate. |\n",
+    "| Interspike interval (ISI) violation | Rate of ISI violation as a fraction of overall rate. |\n",
     "| Number violation | Total number of ISI violations. |\n",
     "| Amplitude cut-off | False negative rate of a unit measured by the degree to which its distribution of spike amplitudes is truncated, indicating the fraction of missing spikes. An amplitude cutoff of 0.1 indicates approximately 10% missing spikes. |\n",
     "| Isolation distance | A metric that uses the principal components (PCs) of a unit's waveforms, which are projected into a lower-dimensional PC space after spike sorting. This quantifies how well-isolated the unit is from other potential clusters. |\n",
@@ -488,7 +488,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Plot histograms of cluster metrics. "
+    "Plot histograms of the cluster metrics."
    ]
   },
   {
@@ -497,13 +497,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Plotting function\n",
-    "def plot_metric(ax, data, bins, x_axis_label=None, title=None, color='k',       max_value=-1, smoothing=True, density=False):\n",
+    "def plot_metric(ax, data, bins, x_axis_label=None, title=None, color='k', smoothing=True, density=False):\n",
     "    \"\"\"A function modified from https://allensdk.readthedocs.io/en/latest/_static/examples/nb/ecephys_quality_metrics.html\n",
     "    \"\"\"\n",
     "    from scipy.ndimage import gaussian_filter1d\n",
-    "    if any(data) and np.nansum(data) :\n",
-    "        # Plot data\n",
+    "    if any(data) and np.nansum(data):\n",
     "        h, b = np.histogram(data, bins=bins, density=density)\n",
     "        x = b[:-1]\n",
     "\n",
@@ -532,7 +530,6 @@
     }
    ],
    "source": [
-    "# Plot the results\n",
     "fig, axes = plt.subplots(4, 4, figsize=(12, 9))\n",
     "axes = axes.flatten()\n",
     "plt.suptitle(f\"Cluster Quality Metrics for {key}\", y=.99, fontsize=12)\n",
@@ -561,13 +558,13 @@
     "# Number Violation\n",
     "data = query.fetch(\"number_violation\")\n",
     "bins = np.linspace(0, 1000, 100)\n",
-    "plot_metric(axes[4], data, bins, title=\"Number violation\")\n",
+    "plot_metric(axes[4], data, bins, title=\"Number Violation\")\n",
     "axes[4].set_ylabel(\"Count\")\n",
     "\n",
     "# Amplitude Cutoff\n",
     "data = query.fetch(\"amplitude_cutoff\")\n",
     "bins = np.linspace(0, 0.5, 100)\n",
-    "plot_metric(axes[5], data, bins, title=\"Amplitude cutoff\")\n",
+    "plot_metric(axes[5], data, bins, title=\"Amplitude Cutoff\")\n",
     "\n",
     "# Isolation Distance\n",
     "data = query.fetch(\"isolation_distance\")\n",
@@ -585,15 +582,15 @@
     "plot_metric(axes[8], data, bins, title=\"d-Prime\")\n",
     "axes[8].set_ylabel(\"Count\")\n",
     "\n",
-    "# Nearest-Neighbor Hit Rate\n",
+    "# Nearest-Neighbors Hit Rate\n",
     "data = query.fetch(\"nn_hit_rate\")\n",
     "bins = np.linspace(0, 1, 100)\n",
-    "plot_metric(axes[9], data, bins, title=\"Nearest Neighbor Hit Rate\")\n",
+    "plot_metric(axes[9], data, bins, title=\"Nearest-Neighbors Hit Rate\")\n",
     "\n",
-    "# Nearest-Neighbor Miss Rate\n",
+    "# Nearest-Neighbors Miss Rate\n",
     "data = query.fetch(\"nn_miss_rate\")\n",
     "bins = np.linspace(0, 1, 100)\n",
-    "plot_metric(axes[10], data, bins, title=\"Nearest Neighbor Miss Rate\")\n",
+    "plot_metric(axes[10], data, bins, title=\"Nearest-Neighbors Miss Rate\")\n",
     "\n",
     "# Silhouette Score\n",
     "data = query.fetch(\"silhouette_score\")\n",
@@ -624,14 +621,14 @@
     "\n",
     "| Metric | Description |\n",
     "| -- | -- |\n",
-    "| `Amplitude (μV)` | Absolute difference between the waveform peak and trough. |\n",
-    "| `Duration (ms)` | Time interval between the waveform peak and trough. |\n",
-    "| `Peak-to-Trough (PT)  Ratio` | Absolute amplitude of the peak divided by the absolute amplitude of the trough relative to 0. |\n",
-    "| `Repolarization Slope` | Slope of the fitted regression line to the first 30μs from trough to peak. |\n",
-    "| `Recovery Slope` | Slope of the fitted regression line to the first 30μs from peak to tail. |\n",
-    "| `Spread (μm)` | Spatial extent of channels where the waveform amplitude exceeds 12% of the peak amplitude. |\n",
-    "| `Velocity Above (s/m)` | Inverse velocity of waveform propagation from the soma toward the top of the probe. |\n",
-    "| `Velocity Below (s/m)` | Inverse velocity of waveform propagation from the soma toward the bottom of the probe. |"
+    "| Amplitude (μV) | Absolute difference between the waveform peak and trough. |\n",
+    "| Duration (ms) | Time interval between the waveform peak and trough. |\n",
+    "| Peak-to-Trough (PT)  Ratio | Absolute amplitude of the peak divided by the absolute amplitude of the trough relative to 0. |\n",
+    "| Repolarization Slope | Slope of the fitted regression line to the first 30μs from trough to peak. |\n",
+    "| Recovery Slope | Slope of the fitted regression line to the first 30μs from peak to tail. |\n",
+    "| Spread (μm) | Spatial extent of channels where the waveform amplitude exceeds 12% of the peak amplitude. |\n",
+    "| Velocity Above (s/m) | Inverse velocity of waveform propagation from the soma toward the top of the probe. |\n",
+    "| Velocity Below (s/m) | Inverse velocity of waveform propagation from the soma toward the bottom of the probe. |"
    ]
   },
   {
@@ -944,6 +941,14 @@
     "query"
    ]
   },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Plot histograms of the waveform metrics."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -976,7 +981,7 @@
     "bins = np.linspace(0, 3, 100)\n",
     "plot_metric(axes[1], data, bins, title=\"Duration (ms)\")\n",
     "\n",
-    "# PT Ratio\n",
+    "# Peak-to-Trough Ratio\n",
     "data = query.fetch(\"pt_ratio\")\n",
     "bins = np.linspace(0, 1, 100)\n",
     "plot_metric(axes[2], data, bins, title=\"Peak-to-Trough Ratio\")\n",
diff --git a/tests/conftest.py b/tests/conftest.py
@@ -336,8 +336,8 @@ def ingest_data(setup, pipeline, test_data):
 def testdata_paths():
     """Paths for test data 'subjectX/sessionY/probeZ/etc'"""
     return {
-        "npx3A-p1-ks": "subject5/session1/probe_1/ks2.1_01",
-        "npx3A-p2-ks": "subject5/session1/probe_2/ks2.1_01",
+        "npx3A-p1-ks": "subject5/session1/probe_1/kilosort2-5_1",
+        "npx3A-p2-ks": "subject5/session1/probe_2/kilosort2-5_1",
         "oe_npx3B-ks": "subject4/experiment1/recording1/continuous/"
         + "Neuropix-PXI-100.0/ks",
         "sglx_npx3A-p1": "subject5/session1/probe_1",