Skip to content

Commit

Permalink
Fix test after merge
Browse files Browse the repository at this point in the history
  • Loading branch information
albertvillanova committed Sep 16, 2022
1 parent 80f4fbc commit dddb47c
Showing 1 changed file with 0 additions and 1 deletion.
1 change: 0 additions & 1 deletion tests/test_load.py
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,6 @@ def test_CachedMetricModuleFactory(self):
[
CachedDatasetModuleFactory,
CachedMetricModuleFactory,
GithubDatasetModuleFactory,
GithubMetricModuleFactory,
HubDatasetModuleFactoryWithoutScript,
HubDatasetModuleFactoryWithScript,
Expand Down

1 comment on commit dddb47c

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009640 / 0.011353 (-0.001713) 0.004562 / 0.011008 (-0.006447) 0.035715 / 0.038508 (-0.002793) 0.041812 / 0.023109 (0.018703) 0.352047 / 0.275898 (0.076149) 0.432517 / 0.323480 (0.109037) 0.007177 / 0.007986 (-0.000809) 0.003908 / 0.004328 (-0.000420) 0.008102 / 0.004250 (0.003852) 0.061871 / 0.037052 (0.024818) 0.368701 / 0.258489 (0.110212) 0.419163 / 0.293841 (0.125322) 0.037298 / 0.128546 (-0.091249) 0.010867 / 0.075646 (-0.064779) 0.309030 / 0.419271 (-0.110242) 0.061160 / 0.043533 (0.017627) 0.353078 / 0.255139 (0.097939) 0.376040 / 0.283200 (0.092840) 0.121055 / 0.141683 (-0.020628) 1.736045 / 1.452155 (0.283890) 1.785961 / 1.492716 (0.293244)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.249010 / 0.018006 (0.231003) 0.518023 / 0.000490 (0.517533) 0.008854 / 0.000200 (0.008654) 0.000594 / 0.000054 (0.000540)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.028182 / 0.037411 (-0.009230) 0.121759 / 0.014526 (0.107233) 0.135150 / 0.176557 (-0.041406) 0.189098 / 0.737135 (-0.548038) 0.140605 / 0.296338 (-0.155733)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.468494 / 0.215209 (0.253284) 4.522392 / 2.077655 (2.444738) 2.087989 / 1.504120 (0.583869) 1.881883 / 1.541195 (0.340688) 1.915531 / 1.468490 (0.447041) 0.477992 / 4.584777 (-4.106785) 4.690758 / 3.745712 (0.945046) 4.338306 / 5.269862 (-0.931556) 2.159140 / 4.565676 (-2.406537) 0.060752 / 0.424275 (-0.363523) 0.013428 / 0.007607 (0.005821) 0.581114 / 0.226044 (0.355070) 6.182210 / 2.268929 (3.913281) 2.599676 / 55.444624 (-52.844948) 2.212345 / 6.876477 (-4.664132) 2.373625 / 2.142072 (0.231553) 0.617091 / 4.805227 (-4.188137) 0.132614 / 6.500664 (-6.368050) 0.070380 / 0.075469 (-0.005090)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.765909 / 1.841788 (-0.075879) 16.988651 / 8.074308 (8.914343) 29.597825 / 10.191392 (19.406433) 1.067834 / 0.680424 (0.387410) 0.637602 / 0.534201 (0.103401) 0.441709 / 0.579283 (-0.137574) 0.522549 / 0.434364 (0.088185) 0.343650 / 0.540337 (-0.196687) 0.354669 / 1.386936 (-1.032267)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.006453 / 0.011353 (-0.004900) 0.004498 / 0.011008 (-0.006510) 0.031672 / 0.038508 (-0.006836) 0.037639 / 0.023109 (0.014530) 0.420403 / 0.275898 (0.144505) 0.495881 / 0.323480 (0.172402) 0.004436 / 0.007986 (-0.003550) 0.003958 / 0.004328 (-0.000371) 0.005780 / 0.004250 (0.001530) 0.050085 / 0.037052 (0.013033) 0.427943 / 0.258489 (0.169454) 0.458481 / 0.293841 (0.164640) 0.033926 / 0.128546 (-0.094621) 0.010850 / 0.075646 (-0.064797) 0.308908 / 0.419271 (-0.110364) 0.063894 / 0.043533 (0.020361) 0.420175 / 0.255139 (0.165036) 0.454774 / 0.283200 (0.171574) 0.120020 / 0.141683 (-0.021663) 1.723984 / 1.452155 (0.271830) 1.832878 / 1.492716 (0.340161)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.250354 / 0.018006 (0.232348) 0.486532 / 0.000490 (0.486042) 0.001200 / 0.000200 (0.001000) 0.000096 / 0.000054 (0.000041)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.027981 / 0.037411 (-0.009430) 0.126642 / 0.014526 (0.112116) 0.147033 / 0.176557 (-0.029523) 0.207239 / 0.737135 (-0.529897) 0.156767 / 0.296338 (-0.139571)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.495318 / 0.215209 (0.280109) 4.925092 / 2.077655 (2.847437) 2.354238 / 1.504120 (0.850118) 2.157259 / 1.541195 (0.616064) 2.240825 / 1.468490 (0.772335) 0.506391 / 4.584777 (-4.078386) 4.756637 / 3.745712 (1.010925) 4.717589 / 5.269862 (-0.552273) 1.886880 / 4.565676 (-2.678796) 0.065213 / 0.424275 (-0.359062) 0.013296 / 0.007607 (0.005689) 0.619463 / 0.226044 (0.393418) 6.167164 / 2.268929 (3.898235) 2.952303 / 55.444624 (-52.492321) 2.572860 / 6.876477 (-4.303616) 2.771813 / 2.142072 (0.629740) 0.624316 / 4.805227 (-4.180912) 0.140624 / 6.500664 (-6.360040) 0.072448 / 0.075469 (-0.003021)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.977567 / 1.841788 (0.135779) 16.800770 / 8.074308 (8.726462) 29.713986 / 10.191392 (19.522594) 1.106918 / 0.680424 (0.426494) 0.715892 / 0.534201 (0.181691) 0.462551 / 0.579283 (-0.116732) 0.545351 / 0.434364 (0.110988) 0.334683 / 0.540337 (-0.205655) 0.340873 / 1.386936 (-1.046063)

CML watermark

Please sign in to comment.