Use `ruff` for formatting #6434

mariosasko · 2023-11-17T16:53:22Z

Use ruff instead of black for formatting to be consistent with transformers (PR) and huggingface_hub (PR 1 and PR 2).

github-actions · 2023-11-17T16:54:09Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.004293 / 0.011353 (-0.007060)	0.002953 / 0.011008 (-0.008055)	0.063712 / 0.038508 (0.025204)	0.029963 / 0.023109 (0.006854)	0.248574 / 0.275898 (-0.027324)	0.272757 / 0.323480 (-0.050723)	0.003878 / 0.007986 (-0.004108)	0.002456 / 0.004328 (-0.001872)	0.047959 / 0.004250 (0.043709)	0.043277 / 0.037052 (0.006224)	0.255071 / 0.258489 (-0.003418)	0.283934 / 0.293841 (-0.009907)	0.022870 / 0.128546 (-0.105676)	0.007224 / 0.075646 (-0.068422)	0.221595 / 0.419271 (-0.197677)	0.053468 / 0.043533 (0.009935)	0.249906 / 0.255139 (-0.005233)	0.274894 / 0.283200 (-0.008305)	0.017246 / 0.141683 (-0.124437)	1.112440 / 1.452155 (-0.339714)	1.167293 / 1.492716 (-0.325424)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.092684 / 0.018006 (0.074677)	0.301721 / 0.000490 (0.301231)	0.000220 / 0.000200 (0.000020)	0.000050 / 0.000054 (-0.000005)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.018289 / 0.037411 (-0.019122)	0.061898 / 0.014526 (0.047372)	0.072904 / 0.176557 (-0.103653)	0.118515 / 0.737135 (-0.618621)	0.074000 / 0.296338 (-0.222338)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.287044 / 0.215209 (0.071835)	2.818091 / 2.077655 (0.740436)	1.502401 / 1.504120 (-0.001719)	1.374688 / 1.541195 (-0.166506)	1.410254 / 1.468490 (-0.058236)	0.407519 / 4.584777 (-4.177258)	2.379199 / 3.745712 (-1.366513)	2.585745 / 5.269862 (-2.684117)	1.562336 / 4.565676 (-3.003341)	0.045977 / 0.424275 (-0.378299)	0.004809 / 0.007607 (-0.002798)	0.347942 / 0.226044 (0.121897)	3.383318 / 2.268929 (1.114390)	1.844784 / 55.444624 (-53.599841)	1.561949 / 6.876477 (-5.314528)	1.571082 / 2.142072 (-0.570990)	0.482469 / 4.805227 (-4.322758)	0.099357 / 6.500664 (-6.401307)	0.041039 / 0.075469 (-0.034430)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.944236 / 1.841788 (-0.897551)	11.519623 / 8.074308 (3.445315)	10.353829 / 10.191392 (0.162437)	0.137530 / 0.680424 (-0.542894)	0.014454 / 0.534201 (-0.519747)	0.268657 / 0.579283 (-0.310626)	0.265165 / 0.434364 (-0.169199)	0.302626 / 0.540337 (-0.237712)	0.426923 / 1.386936 (-0.960013)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.004711 / 0.011353 (-0.006641)	0.002504 / 0.011008 (-0.008504)	0.047671 / 0.038508 (0.009163)	0.051147 / 0.023109 (0.028037)	0.272848 / 0.275898 (-0.003050)	0.291705 / 0.323480 (-0.031775)	0.004002 / 0.007986 (-0.003984)	0.002382 / 0.004328 (-0.001947)	0.047583 / 0.004250 (0.043332)	0.038203 / 0.037052 (0.001150)	0.278536 / 0.258489 (0.020047)	0.305872 / 0.293841 (0.012031)	0.023890 / 0.128546 (-0.104657)	0.006954 / 0.075646 (-0.068693)	0.053716 / 0.419271 (-0.365556)	0.032158 / 0.043533 (-0.011375)	0.273939 / 0.255139 (0.018800)	0.290722 / 0.283200 (0.007522)	0.016946 / 0.141683 (-0.124737)	1.102726 / 1.452155 (-0.349429)	1.169356 / 1.492716 (-0.323360)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.092520 / 0.018006 (0.074514)	0.301949 / 0.000490 (0.301459)	0.000248 / 0.000200 (0.000048)	0.000061 / 0.000054 (0.000007)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.021013 / 0.037411 (-0.016399)	0.069965 / 0.014526 (0.055439)	0.080105 / 0.176557 (-0.096451)	0.119802 / 0.737135 (-0.617334)	0.081615 / 0.296338 (-0.214724)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.301170 / 0.215209 (0.085960)	2.884817 / 2.077655 (0.807162)	1.596376 / 1.504120 (0.092256)	1.471205 / 1.541195 (-0.069990)	1.499061 / 1.468490 (0.030571)	0.407729 / 4.584777 (-4.177048)	2.432824 / 3.745712 (-1.312888)	2.561905 / 5.269862 (-2.707957)	1.535364 / 4.565676 (-3.030313)	0.046592 / 0.424275 (-0.377683)	0.004773 / 0.007607 (-0.002834)	0.350872 / 0.226044 (0.124828)	3.474874 / 2.268929 (1.205945)	1.963114 / 55.444624 (-53.481510)	1.688213 / 6.876477 (-5.188263)	1.686325 / 2.142072 (-0.455748)	0.487151 / 4.805227 (-4.318076)	0.104253 / 6.500664 (-6.396411)	0.043499 / 0.075469 (-0.031970)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.980395 / 1.841788 (-0.861393)	11.907393 / 8.074308 (3.833085)	10.983688 / 10.191392 (0.792296)	0.142875 / 0.680424 (-0.537549)	0.015375 / 0.534201 (-0.518826)	0.270043 / 0.579283 (-0.309240)	0.295092 / 0.434364 (-0.139272)	0.309466 / 0.540337 (-0.230871)	0.409812 / 1.386936 (-0.977124)

HuggingFaceDocBuilderDev · 2023-11-17T16:58:25Z

The documentation is not available anymore as the PR was closed or merged.

lhoestq

Thanks !

github-actions · 2023-11-21T14:19:20Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.004703 / 0.011353 (-0.006650)	0.002767 / 0.011008 (-0.008241)	0.063162 / 0.038508 (0.024654)	0.052241 / 0.023109 (0.029132)	0.237138 / 0.275898 (-0.038760)	0.262793 / 0.323480 (-0.060687)	0.003873 / 0.007986 (-0.004113)	0.002433 / 0.004328 (-0.001896)	0.048647 / 0.004250 (0.044397)	0.037887 / 0.037052 (0.000834)	0.244939 / 0.258489 (-0.013551)	0.304015 / 0.293841 (0.010174)	0.022859 / 0.128546 (-0.105688)	0.006763 / 0.075646 (-0.068883)	0.202728 / 0.419271 (-0.216544)	0.035369 / 0.043533 (-0.008164)	0.240785 / 0.255139 (-0.014354)	0.255109 / 0.283200 (-0.028091)	0.017951 / 0.141683 (-0.123732)	1.096103 / 1.452155 (-0.356052)	1.167662 / 1.492716 (-0.325054)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.092285 / 0.018006 (0.074279)	0.300201 / 0.000490 (0.299711)	0.000222 / 0.000200 (0.000022)	0.000049 / 0.000054 (-0.000005)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.018271 / 0.037411 (-0.019140)	0.062306 / 0.014526 (0.047780)	0.072615 / 0.176557 (-0.103942)	0.119357 / 0.737135 (-0.617779)	0.073365 / 0.296338 (-0.222974)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.278763 / 0.215209 (0.063554)	2.714943 / 2.077655 (0.637288)	1.426318 / 1.504120 (-0.077802)	1.313296 / 1.541195 (-0.227898)	1.330920 / 1.468490 (-0.137570)	0.391466 / 4.584777 (-4.193311)	2.380521 / 3.745712 (-1.365191)	2.545042 / 5.269862 (-2.724819)	1.549696 / 4.565676 (-3.015980)	0.044661 / 0.424275 (-0.379614)	0.005269 / 0.007607 (-0.002338)	0.331112 / 0.226044 (0.105068)	3.241120 / 2.268929 (0.972192)	1.783771 / 55.444624 (-53.660853)	1.506205 / 6.876477 (-5.370272)	1.521062 / 2.142072 (-0.621010)	0.462339 / 4.805227 (-4.342888)	0.097646 / 6.500664 (-6.403018)	0.041365 / 0.075469 (-0.034104)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.939653 / 1.841788 (-0.902135)	11.415472 / 8.074308 (3.341164)	10.338961 / 10.191392 (0.147569)	0.128543 / 0.680424 (-0.551881)	0.013997 / 0.534201 (-0.520204)	0.270034 / 0.579283 (-0.309249)	0.266766 / 0.434364 (-0.167598)	0.305290 / 0.540337 (-0.235047)	0.395969 / 1.386936 (-0.990967)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.004869 / 0.011353 (-0.006484)	0.002445 / 0.011008 (-0.008563)	0.051256 / 0.038508 (0.012748)	0.050871 / 0.023109 (0.027761)	0.271044 / 0.275898 (-0.004854)	0.294138 / 0.323480 (-0.029342)	0.003974 / 0.007986 (-0.004012)	0.002423 / 0.004328 (-0.001906)	0.048277 / 0.004250 (0.044027)	0.039685 / 0.037052 (0.002632)	0.277092 / 0.258489 (0.018603)	0.302097 / 0.293841 (0.008256)	0.024515 / 0.128546 (-0.104031)	0.006892 / 0.075646 (-0.068754)	0.053528 / 0.419271 (-0.365744)	0.032243 / 0.043533 (-0.011290)	0.272098 / 0.255139 (0.016959)	0.291678 / 0.283200 (0.008479)	0.018368 / 0.141683 (-0.123315)	1.160151 / 1.452155 (-0.292004)	1.193643 / 1.492716 (-0.299073)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.096669 / 0.018006 (0.078663)	0.299043 / 0.000490 (0.298553)	0.000227 / 0.000200 (0.000027)	0.000048 / 0.000054 (-0.000006)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.021557 / 0.037411 (-0.015855)	0.069875 / 0.014526 (0.055349)	0.080952 / 0.176557 (-0.095605)	0.119509 / 0.737135 (-0.617626)	0.082030 / 0.296338 (-0.214308)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.303062 / 0.215209 (0.087853)	2.943823 / 2.077655 (0.866169)	1.607816 / 1.504120 (0.103696)	1.479773 / 1.541195 (-0.061422)	1.482663 / 1.468490 (0.014173)	0.411923 / 4.584777 (-4.172854)	2.450138 / 3.745712 (-1.295574)	2.466111 / 5.269862 (-2.803751)	1.543852 / 4.565676 (-3.021825)	0.046256 / 0.424275 (-0.378019)	0.004787 / 0.007607 (-0.002820)	0.353673 / 0.226044 (0.127628)	3.528218 / 2.268929 (1.259289)	1.984663 / 55.444624 (-53.459962)	1.675785 / 6.876477 (-5.200691)	1.775646 / 2.142072 (-0.366426)	0.483277 / 4.805227 (-4.321950)	0.097781 / 6.500664 (-6.402883)	0.040291 / 0.075469 (-0.035178)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.975458 / 1.841788 (-0.866330)	11.961966 / 8.074308 (3.887658)	10.558559 / 10.191392 (0.367167)	0.131372 / 0.680424 (-0.549052)	0.016156 / 0.534201 (-0.518045)	0.269254 / 0.579283 (-0.310029)	0.274896 / 0.434364 (-0.159468)	0.304672 / 0.540337 (-0.235665)	0.517652 / 1.386936 (-0.869284)

mariosasko added 4 commits November 17, 2023 16:58

Use ruff for formatting

de25f2c

Updat quality dependencies and lint setup.py

3570dc2

Update pre-commit-config

3be6c4b

Small fix

17f97ca

mariosasko requested a review from lhoestq November 17, 2023 17:15

lhoestq approved these changes Nov 20, 2023

View reviewed changes

mariosasko merged commit 1a1e741 into main Nov 21, 2023
13 checks passed

mariosasko deleted the ruff-format branch November 21, 2023 14:13

mariosasko mentioned this pull request Nov 21, 2023

Update ruff version in pre-commit config #6049

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `ruff` for formatting #6434

Use `ruff` for formatting #6434

mariosasko commented Nov 17, 2023 •

edited

Loading

github-actions bot commented Nov 17, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

HuggingFaceDocBuilderDev commented Nov 17, 2023 •

edited

Loading

lhoestq left a comment

github-actions bot commented Nov 21, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Use ruff for formatting #6434

Use ruff for formatting #6434

Conversation

mariosasko commented Nov 17, 2023 • edited Loading

github-actions bot commented Nov 17, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

HuggingFaceDocBuilderDev commented Nov 17, 2023 • edited Loading

lhoestq left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 21, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Use `ruff` for formatting #6434

Use `ruff` for formatting #6434

mariosasko commented Nov 17, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 17, 2023 •

edited

Loading