-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
+
delimited dual index barcodes, and arbitrary raw barcodes
#62
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theJasonFan minor nitpicks, please fixup, then merge at your leisure.
src/lib/metrics.rs
Outdated
@@ -436,10 +436,11 @@ impl DemuxedGroupSampleMetrics { | |||
best_barcode_template_count: usize, | |||
sample_metadata: &SampleMetadata, | |||
) -> SampleMetricsProcessed { | |||
let barcode = Self::get_metrics_barcode(sample_metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to inline this below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined. Please re-review.
src/lib/metrics.rs
Outdated
@@ -457,6 +458,19 @@ impl DemuxedGroupSampleMetrics { | |||
/ self.base_qual_counter.index_bases_total_seen as f64, | |||
} | |||
} | |||
|
|||
fn get_metrics_barcode(sample_metadata: &SampleMetadata) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- might as well add some function documentation
- why not make this a method on
SampleMetadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved. Please re-review. Note that moving this to SampleMetadata
highlights how we should be better about validating the struct.
.map(|(barcode, count)| { | ||
let b1 = BString::from(&barcode[..b1_len]); | ||
let b2 = BString::from(&barcode[b1_len..]); | ||
let barcode = BString::from(format!("{}+{}", b1, b2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 @jiangweiyao, where hopped barcodes get delimited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Now we have two places where this formatting occurs, once in the sample metadata and once in the metric modules. Can you refactor to once place please and call from both?
- What happens if
b2
is empty? I don't think this is doing the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I strongly agree that this is code-smell coming from different barcode representations. Finding a solution might take more time / effort.
- For hop metrics, we know that the reported barcodes can be split and delimited. It is only used for dual index barcodes. The
SampleBarcodeHopTracker
only updates counts for delimited and hopped barcodes. See:singular-demux/src/lib/metrics.rs
Line 655 in e9706b6
let index1a = &barcode[0..self.checker.index1_length];
src/lib/run.rs
Outdated
let delim = DelimFile::default(); | ||
let hop_metrics: Vec<BarcodeCount> = delim.read_tsv(&hop_metrics).unwrap(); | ||
assert_eq!(hop_metrics.len(), 1); | ||
assert_eq!(hop_metrics[0].barcode, "TTT+AAA"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 @jiangweiyao This test demultiplexes a small example of dual indexed data where a single read hops a pair of sample barcodes. We check for that barcode in the output barcode_hop_metrics.tsv
assert_eq!(per_sample_metrics.len(), 3); | ||
|
||
assert_eq!(per_sample_metrics[0].barcode_name, "s1"); | ||
assert_eq!(per_sample_metrics[0].barcode, "TTT+AAA"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 @jiangweiyao This test demultiplexes a small example of dual indexed data. We check for the barcode in the output per_sample_metrics
. Admittedly, the test is does not exhaustively check for all the metrics (a test for that is above in the source). The focus here is on (a) the actual delimited barcode and the # of matches.
assert_eq!(per_sample_metrics[1].templates, 0); | ||
|
||
assert_eq!(per_sample_metrics[2].barcode_name, "Undetermined"); | ||
assert_eq!(per_sample_metrics[2].barcode, "NNNNNN"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 @jiangweiyao note that the "undetermined" barcode is not delimited.
/// **Note**: this expects that `self` is a valid SampleMetdata struct. i.e., a after | ||
/// an update and sanitization with `update_with_and_set_demux_barcode()`, or is a sentinal | ||
/// value where the barcode is all Ns. | ||
pub fn get_semantic_barcode(&self) -> BString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 refactored this into SampleMetadta
and added a docstring per your previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will want to use this in metrics.rs given you are creating the unmatched barcode there. Sorry, you may have to revert this to being not a method on samplemetadata
@@ -788,4 +811,51 @@ Sample2,GGGG | |||
]; | |||
assert_eq!(actual, expected); | |||
} | |||
#[test] | |||
fn test_get_semantic_barcode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 Moved and renamed this test after your suggestion to move the method into SampleMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 @jiangweiyao ready for re-review
@jiangweiyao Also updated to use delimited barcodes in |
src/lib/run.rs
Outdated
let delim = DelimFile::default(); | ||
let unmatched_metrics: Vec<BarcodeCount> = delim.read_tsv(&unmatched_metrics).unwrap(); | ||
assert_eq!(unmatched_metrics.len(), 1); | ||
assert_eq!(unmatched_metrics[0].barcode, "TTT+AAA"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also check the unmatched barcode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to test unmatched metrics for a single-barcode (not dual) given the above comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the unmatched delimiting of the barcode has a bug. Can you take a look?
.map(|(barcode, count)| { | ||
let b1 = BString::from(&barcode[..b1_len]); | ||
let b2 = BString::from(&barcode[b1_len..]); | ||
let barcode = BString::from(format!("{}+{}", b1, b2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Now we have two places where this formatting occurs, once in the sample metadata and once in the metric modules. Can you refactor to once place please and call from both?
- What happens if
b2
is empty? I don't think this is doing the right thing.
src/lib/run.rs
Outdated
let delim = DelimFile::default(); | ||
let unmatched_metrics: Vec<BarcodeCount> = delim.read_tsv(&unmatched_metrics).unwrap(); | ||
assert_eq!(unmatched_metrics.len(), 1); | ||
assert_eq!(unmatched_metrics[0].barcode, "TTT+AAA"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to test unmatched metrics for a single-barcode (not dual) given the above comment
/// **Note**: this expects that `self` is a valid SampleMetdata struct. i.e., a after | ||
/// an update and sanitization with `update_with_and_set_demux_barcode()`, or is a sentinal | ||
/// value where the barcode is all Ns. | ||
pub fn get_semantic_barcode(&self) -> BString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will want to use this in metrics.rs given you are creating the unmatched barcode there. Sorry, you may have to revert this to being not a method on samplemetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 added the test with single indexing run unmatched barcodes. Other end-to-end tests should cover other single indexing run metrics with matched barcodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
tests: simple test for per sample and barcode hop metrics
tests: add test for unmatched single index barcode
0588ccd
to
a48d6c9
Compare
Addresses #60.
We now term barcodes delimited by "+" a "semantic" barcode --- a representation that is a 1-1 mapping between a fixed length barcodes (e.g,
AAA+CCC
) and its read structure3B3B
.Tasks
Details for per-sample metrics
When using sample sheets, dual index barcodes are reported in metrics with
+
delimiter.For example, the sample sheet:
yields a barcode for
s1
that appear asCCCCC+AAAAA
When using two-column format arbitrary barcodes are allowed. For example:
yields a barcode for
s1
that appear asTTT+=!-AAA
.Details for sample barcode hop and unmatched barcode metrics
Barcodes now also appear also in their delimited semantic representations
Tests
SampleMetadata
, the above mentioned behaviors are now tested and exampled in included explicit test cases.Future Todos
Design schema for displaying and reading multi-segment barcodesValidate multi-segment barcodes. As reflected in the test and implementation, we allow for arbitrary barcodes.