Two-step hybrid refactoring #34

maximskorik · 2022-04-20T14:52:28Z

added test data for two-step hybrid
switched most of data handling functionality to dplyr framework
refactored two-step hybrid into individual steps
extracted repeating methods in adjust.time.R and feature.align.R
reformated semi.sup.R and two.step.hybrid.R

hechth

Amazing work! Couple small changes to fix some possible sources of error and this can be merged!

I'll run some additional tests manually later as well and see if everything runs smoothly.

R/peak.characterize.R

R/two.step.hybrid.R

hechth · 2022-04-25T06:32:15Z

R/two.step.hybrid.R

+  for (batch_id in batches_idx)
+  {
+    this.fake <- step_one_features[[batch_id]]
+    this.fake.medians <- distinct(this.fake, mz, rt, intensity)$intensity


Are these really the medians? If so, how do the distinct intensities form the medians?

At this step this.fake is a long table where intensity is the column name of median intensities. Since each feature is repeated in multiple rows (a row for a sample) and values in intensity column is the same for all samples of a given feature, distinct selects a median intensity of each unique feature.
In retrospect, calling a column that stores median intensities as intensity is confusing. Fixed that in 976e416.

cool - how comes that the distinct(...) function actually computes the median intensities? I thought that it actually selects the unique values in the list?

It doesn't compute the medians. The medians are already computed and stored in the intensity column. distinct(...) only selects the median intensities of each unique feature.

Refactored this part in 99ae4f5.

R/two.step.hybrid.R

hechth · 2022-04-25T06:36:26Z

R/two.step.hybrid.R

+    colnames(final.ftrs) <- stringr::str_remove_all(colnames(final.ftrs), ".mzml")
+    colnames(final.times) <- stringr::str_remove_all(colnames(final.times), ".mzml")


Is this required, to strip the file ending? If so, this should be done more robustly.

No, it's not required. Having a file extension appended to a sample name in the table just doesn't seem pleasant or valuable.

Then this should be changed to strip any file ending since there could also be other names than only ".mzml".

Done in 4e89fa3.

R/two.step.hybrid.R

hechth

Looks good!

hechth · 2022-05-09T06:26:03Z

R/two.step.hybrid.R

+    colnames(final.ftrs)[sample_cols_idx] <- tools::file_path_sans_ext(colnames(final.ftrs)[sample_cols_idx])
+    colnames(final.times)[sample_cols_idx] <- tools::file_path_sans_ext(colnames(final.times)[sample_cols_idx])


Nice way to exclude the file endings!

maximskorik added 30 commits February 9, 2022 13:05

init two-step hybrid test

1a4fab9

update library calls to rcx aplcms

0ce91c9

complete two-step-hybrid test

01d7d65

reformat two.step.hybrid.R

e8239d4

add all test data to two-step-hybrid test

8d72735

reformat inputs and comments

603e2be

refactor semi.sup arg list

61b49dc

add colnames to metadata table

2870cc7

add first step

17069ae

add get_sample_names to NAMESPACE

a26cc0c

add vim

9c4fdce

add cmd-line tools

24ba0db

capitalize dockerfile name

6149939

reformat indentation

ef5d7fa

add batch-splitting; add dplyr to dependencies

1d056f9

extract intensity-median computation

7306bff

reformat function calls

a940dd3

fit function for tibble dataframes

f4ba921

refactor mz, rt, and batch label binding

d5bbd38

remove feature indicies from tables

7ccdfe7

adjust for varying rt colname

d47e7ce

add common colname to cbind to ensure proper rbind

b3e97e8

extract repeating method

3b33103

rename function

d5064ef

put feature.align in wrapper

8d1971d

omit warnings

196f45b

move median-intensity computation inside loop

ec08b5e

reformat table

571dc7a

edit stdout message

ee18f6b

change variable names for iteration

f02abea

maximskorik added 3 commits April 21, 2022 11:32

save feature table in long format

03935cc

bump pkg version

dc7e3f5

add function to namespace

d886329

hechth requested changes Apr 25, 2022

View reviewed changes

maximskorik added 18 commits May 2, 2022 16:19

remove redundant parallelization

527f01c

make tibble tables for proper naming

93efa2a

move metadata csv-reader out

1ec4211

move table reformatting to utils

a1106cc

remove utility methods from NAMESPACE

3970822

expose a function for testthat

0e79bc9

remove deprecated test

fd80dd9

clear cache after test

8703326

rename median intensity column

976e416

replace indicies with column names

0b2fed0

add descriptive method name

c6af1df

iterate through batch indicies

000aa86

add sorting back to NAMESPACE

81886ef

add corrected features to the output

78b2a54

add aligned table to the output

d122cc0

make robust extention stripping

4e89fa3

reformat aligned_features output

dd29e1b

reformat adjust.time input and revert 7d0656d

99ae4f5

hechth approved these changes May 9, 2022

View reviewed changes

xtrojak approved these changes May 9, 2022

View reviewed changes

add known table to the output

e2142ab

maximskorik mentioned this pull request May 9, 2022

Add apLCMS Two-Step Hybrid wrapper RECETOX/galaxytools#264

Merged

2 tasks

maximskorik added 2 commits May 9, 2022 19:57

complete identifiers list

2a4df15

bump correct version

78065ba

maximskorik merged commit 1301763 into RECETOX:master May 10, 2022

maximskorik mentioned this pull request Jun 2, 2022

Implement two.step.hybrid method #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two-step hybrid refactoring #34

Two-step hybrid refactoring #34

maximskorik commented Apr 20, 2022 •

edited

Loading

hechth left a comment

hechth Apr 25, 2022

maximskorik May 4, 2022

hechth May 5, 2022

maximskorik May 5, 2022 •

edited

Loading

maximskorik May 7, 2022

hechth Apr 25, 2022

maximskorik May 4, 2022 •

edited

Loading

hechth May 5, 2022

maximskorik May 7, 2022

hechth left a comment

hechth May 9, 2022

		colnames(final.ftrs) <- stringr::str_remove_all(colnames(final.ftrs), ".mzml")
		colnames(final.times) <- stringr::str_remove_all(colnames(final.times), ".mzml")

		colnames(final.ftrs)[sample_cols_idx] <- tools::file_path_sans_ext(colnames(final.ftrs)[sample_cols_idx])
		colnames(final.times)[sample_cols_idx] <- tools::file_path_sans_ext(colnames(final.times)[sample_cols_idx])

Two-step hybrid refactoring #34

Two-step hybrid refactoring #34

Conversation

maximskorik commented Apr 20, 2022 • edited Loading

hechth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximskorik May 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximskorik May 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hechth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximskorik commented Apr 20, 2022 •

edited

Loading

maximskorik May 5, 2022 •

edited

Loading

maximskorik May 4, 2022 •

edited

Loading