Update MatrixTable.show #14250

Will-Tyler · 2024-02-05T18:46:45Z

Description

In this pull request, I change the implementation of the MatrixTable.show method. Previously, show would create a table and then show the table by reusing the table implementation of show.

The problem with the existing implementation is that it creates row fields in the table for all of the entries in the matrix table.

This new implementation directly shows the matrix table by creating a table and displaying the information in the table.

I make some refactoring changes in this pull request as well that helped me understand the code and will hopefully help others as well.

Testing

I add some unit tests.

danking

This PR scales much better in columns shown than current main. It now seems we need to get to
about 100,000 columns before we overcome the basic overhead of Spark & Hail. Fantastic.

A recent main commit:

In [24]: %%time
    ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 10, handler=lambda x: isinstance(str(x), str))
CPU times: user 29.9 ms, sys: 2.84 ms, total: 32.7 ms
Wall time: 289 ms
Out[24]: True

In [25]: %%time
    ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 100, handler=lambda x: isinstance(str(x), str))
CPU times: user 171 ms, sys: 9.37 ms, total: 180 ms
Wall time: 775 ms
Out[25]: True

In [26]: %%time
    ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 1000, handler=lambda x: isinstance(str(x), str))
CPU times: user 978 ms, sys: 64.7 ms, total: 1.04 s
Wall time: 10.7 s
Out[26]: True

This PR:

In [4]: %%time
   ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 10, handler=lambda x: isinstance(str(x), str))
CPU times: user 19.1 ms, sys: 2.76 ms, total: 21.8 ms
Wall time: 283 ms

In [5]: %%time
   ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 10, handler=lambda x: isinstance(str(x), str))
CPU times: user 67.3 ms, sys: 7.44 ms, total: 74.8 ms
Wall time: 293 ms

In [6]: %%time
   ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 100, handler=lambda x: isinstance(str(x), str))
CPU times: user 18.8 ms, sys: 2.46 ms, total: 21.3 ms
Wall time: 318 ms

In [7]: %%time
   ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 1000, handler=lambda x: isinstance(str(x), str))
CPU times: user 24.6 ms, sys: 1.98 ms, total: 26.6 ms
Wall time: 315 ms

In [8]: %%time
   ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 10000, handler=lambda x: isinstance(str(x), str))
CPU times: user 75.2 ms, sys: 3.65 ms, total: 78.9 ms
Wall time: 351 ms

In [9]: %%time
   ...: hl.utils.range_matrix_table(1, 1_000_000).annotate_entries(x=1).show(1, 100000, handler=lambda x: isinstance(str(x), str))
CPU times: user 740 ms, sys: 16 ms, total: 756 ms
Wall time: 1.17 s

With the repr change this all seems copacetic to me.

In [2]: hl.utils.range_matrix_table(10,1000).annotate_entries(a='a\n', b=3, c=4.0, d=hl.dict([(3, "a")])).show(5, 10)
+---------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+
| row_idx | 0.a   |   0.b |     0.c | 0.d              | 1.a   |   1.b |     1.c | 1.d              | 2.a   |   2.b |     2.c | 2.d              | 3.a   |   3.b |     3.c | 3.d              | 4.a   |   4.b |     4.c | 4.d              | 5.a   |   5.b |     5.c |
+---------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+
|   int32 | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 |
+---------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+
|       0 | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 |
|       1 | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 |
|       2 | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 |
|       3 | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 |
|       4 | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 |
+---------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+

+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+
| 5.d              | 6.a   |   6.b |     6.c | 6.d              | 7.a   |   7.b |     7.c | 7.d              | 8.a   |   8.b |     8.c | 8.d              | 9.a   |   9.b |     9.c | 9.d              |
+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+
| dict<int32, str> | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 | dict<int32, str> | str   | int32 | float64 | dict<int32, str> |
+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+
| {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         |
| {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         |
| {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         |
| {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         |
| {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         | 'a\n' |     3 |     4.0 | {3: 'a'}         |
+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+-------+-------+---------+------------------+
showing top 5 rows
showing the first 10 of 1000 columns

danking · 2024-02-05T23:20:21Z