minor improvements #3067

antgonza · 2021-01-26T19:25:18Z

Two minor improvements:

Improve performance of _common_to_dataframe_steps ~17 seconds for AGP:

In [50]: %timeit SampleTemplate(10317).to_dataframe()                                                                                                                                                                                                                                       
1min 8s ± 224 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: %timeit ST(10317).to_dataframe()                                                                                                                                                                                                                                                   
51.8 s ± 259 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Turns out that jobs actually append the node where the job was submitted so that list comprehension will miss jobs submitted in the workers.

codecov-io · 2021-01-26T19:47:55Z

Codecov Report

Merging #3067 (a5b9be5) into dev (36faa57) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##              dev    #3067   +/-   ##
=======================================
  Coverage   95.02%   95.03%           
=======================================
  Files          74       74           
  Lines       14279    14295   +16     
=======================================
+ Hits        13569    13585   +16     
  Misses        710      710

Impacted Files	Coverage Δ
qiita_db/metadata_template/constants.py	`100.00% <ø> (ø)`
qiita_db/metadata_template/prep_template.py	`99.11% <ø> (+0.04%)`	⬆️
qiita_db/__init__.py	`100.00% <100.00%> (ø)`
qiita_db/meta_util.py	`88.62% <100.00%> (ø)`
...ita_db/metadata_template/base_metadata_template.py	`96.25% <100.00%> (-0.08%)`	⬇️
qiita_db/metadata_template/sample_template.py	`100.00% <100.00%> (ø)`
...ta_db/metadata_template/test/test_prep_template.py	`99.55% <100.00%> (+<0.01%)`	⬆️
..._db/metadata_template/test/test_sample_template.py	`99.86% <100.00%> (+<0.01%)`	⬆️
qiita_db/processing_job.py	`73.80% <100.00%> (ø)`
qiita_db/study.py	`94.11% <100.00%> (ø)`
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6e5d80...a5b9be5. Read the comment docs.

wasade · 2021-01-26T20:13:08Z

qiita_db/metadata_template/base_metadata_template.py

-            # Make sure that we are changing np.NaN by Nones
+            df = pd.DataFrame(qdb.sql_connection.TRN.execute_fetchindex())
+            df = pd.concat([df.drop(1, axis=1),
+                            pd.DataFrame(df[1].tolist())], axis=1)


Would it be possible to do pd.concat([df.drop(1, axis=1), df[1]])? It not really obvious why the new DataFrame construction is needed or the tolist. Oh wait, is this supposed to be a reorder operation moving the first column to the end of the frame? If so, then i think this does it:

cols = [df.columns[0]] + df.columns[2:].tolist() + [df.columns[1]] df = df[cols]

Good question. The metadata tables are basically 2 columns: sample_id, metadata (json, which is returned as a dict). In other words, the SQL will return [[sample_id, {'col_a': 'a', 'col_b': 'b'}, ...], [sample_id, {'col_a': 'a', 'col_b': 'b'}, ...], ...] so that line basically expands the dict into actual columns in the dataframe, like this:

In [61]: data = ['x', {'a': 1, 'b': 2}], ['y', {'a': 1, 'c': 2}] In [62]: df = pd.DataFrame(data) In [63]: df Out[63]: 0 1 0 x {'a': 1, 'b': 2} 1 y {'a': 1, 'c': 2} In [64]: pd.concat([df.drop(1, axis=1), pd.DataFrame(df[1].tolist())], axis=1) Out[64]: 0 a b c 0 x 1 2.0 NaN 1 y 1 NaN 2.0

Okay, thanks. What about this? It's a little less juggling and you get the index "for free"?

>>> import pandas as pd >>> data = ['x', {'a': 1, 'b': 2}], ['y', {'a': 1, 'c': 2}] >>> pd.DataFrame([d for _, d in data], index=[i for i, _ in data]) a b c x 1 2.0 NaN y 1 NaN 2.0

Thanks, this made the code a bit faster too. If you agree with the changes, and the tests pass, could you merge?

ElDeveloper · 2021-01-27T18:19:53Z

These changes look good. @antgonza I'm not familiar with the pgsql json side of things but, this seems like the kind of thing that pgsql could probably do much faster than Pandas/Python. Did you happen to look into querying the data directly into a table form? Rather than querying a JSON string, and then converting it to a table?

I'm looking into it now, but just want to make sure I shared this observation.

ElDeveloper · 2021-01-29T01:45:27Z

Following up on this @antgonza shared some benchmarks showing that the pgsql implementation was in fact slower than the changes in this PR.

ElDeveloper · 2021-01-29T01:45:38Z

Thanks @wasade and @antgonza

@wasade

* adding creation and modification timestamps to prep_templates (#3066) * adding creation and modification timestamps to prep_templates * Apply suggestions from code review [skip ci] Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * fix #3068 (#3069) * minor improvements (#3067) * minor improvements * fix test * adding @wasade suggestion * minor fixes (#3073) * fix #3072 (#3074) * fix #3072 * flake8 * fix #3076 (#3077) * fix #3076 * addressing @ElDeveloper comments * fix #3070 (#3075) * fix #3070 * Apply suggestions from code review Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * flake8 Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * fix-3064 (#3080) * Create qiita-ci.yml * add passwd to psql * nginx + redis * redis via conda * redis-server - -> redis-server -- * daemonize -> daemonize * conda libevent * apt-get libevent-dev * adding multiple sections * rm travis_retry * add missing activagte * onfig_test.cfg * rm indent * add 1 indent * pgport * add drop * change db name * redis version * redis -> redbiom version * split tests * source .profile * adding ssh command * mkdir ~/.ssh/ * only ssh * pushing to retrigger * add run * split ssh conf/run * 60367309 * add bash * fix ident * rm ssh cmd * strict=no * localhost * mod permissions * user ssh * rm keys * echos * start ssh service * ssh -vvv * split setup/test * localhost -> 127 * localhost -> 127 * rm ssh setup * -o StrictHostKeyChecking=no * change key creation * echo sshd_config * readd ssh config * cp sshd_config and then move * rm echos * extra qiita folder * ls test_data * rm qiita folder * use key in .ssh * more ssh params * tee -> > * chmod * fullpath * ls key * rm localhost * params * rsa -> rsa-sha2-256 * rm root login * readd -o StrictHostKeyChecking=no * minimizing params * permissions * cleaning up * comments => echo * -o StrictHostKeyChecking=no * export REDBIOM_HOST=http://127.0.0.1:7379 * localhost -> 127.0.0.1 * add runner * cover package * cleaning env vars * renaming jobs * redbiom host * add env vars via export * changing echo * add env COVER_PACKAGE * rm user sftp * just qiita_db * open_sftp -> SFTPClient * +1 worker * only SCPClient * ssh.SCPClient -> SCPClient * return sftp/ssh back * just qiita_db * rm sftp * add all tests * BASE_URL = https://127.0.0.1:8383 * bash -c * start_biom * start_biom 127 * populate QIITA_SERVER_CERT * nginx 127 * adding ports to conf * lowering ports * service qiita * 8383 postgres * curl test * mkdir nginx * moving tests around * moving curl up * echos for curl * sleep 10 * 127.0.0.1 -> localhost * returning some localhost to 127... * qiita_pet! * final clean up * kill qiita first * addreesing @ElDeveloper comments * checkout repo for flake8 * split cover_package * Workflows GUI (#3078) * fix #3076 * addressing @ElDeveloper comments * adding workflows GUI * improve HTML * db changes and improvements * comment out new db changes * rm default_workflows from PluginHandler.get * Apply suggestions from code review [skip ci] Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * fix #3081 (#3082) * fix #3081 * generate_analysis_list_per_study to speed up retrival * minor modifications * split sql * addressing @ElDeveloper comments * add Command.processing_jobs (#3085) * add Command.processing_jobs * Update qiita_db/software.py [skip ci] Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * Improve workflows (#3083) * fix #3076 * addressing @ElDeveloper comments * adding workflows GUI * improve HTML * db changes and improvements * comment out new db changes * rm default_workflows from PluginHandler.get * improving workflows after deployment * fix error * clean up * minor fixes * improving html * comments from @justinshaffer * addressing @ElDeveloper comments * 2021.03 (#3084) * 2021.03 * add Command.processing_jobs to changelog * codecov * codecov required * codecov token Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

minor improvements

180b104

antgonza requested a review from ElDeveloper January 26, 2021 19:25

wasade reviewed Jan 26, 2021

View reviewed changes

antgonza added 2 commits January 26, 2021 13:43

fix test

7304e0f

adding @wasade suggestion

a5b9be5

ElDeveloper merged commit f3fefc6 into qiita-spots:dev Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

minor improvements #3067

minor improvements #3067

Uh oh!

antgonza commented Jan 26, 2021

Uh oh!

codecov-io commented Jan 26, 2021 •

edited

Loading

Uh oh!

wasade Jan 26, 2021

Uh oh!

antgonza Jan 26, 2021

Uh oh!

wasade Jan 27, 2021

Uh oh!

antgonza Jan 27, 2021

Uh oh!

ElDeveloper commented Jan 27, 2021

Uh oh!

ElDeveloper commented Jan 29, 2021

Uh oh!

ElDeveloper commented Jan 29, 2021

Uh oh!

Uh oh!

minor improvements #3067

minor improvements #3067

Uh oh!

Conversation

antgonza commented Jan 26, 2021

Uh oh!

codecov-io commented Jan 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wasade Jan 26, 2021

Choose a reason for hiding this comment

Uh oh!

antgonza Jan 26, 2021

Choose a reason for hiding this comment

Uh oh!

wasade Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

antgonza Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

ElDeveloper commented Jan 27, 2021

Uh oh!

ElDeveloper commented Jan 29, 2021

Uh oh!

ElDeveloper commented Jan 29, 2021

Uh oh!

Uh oh!

codecov-io commented Jan 26, 2021 •

edited

Loading