Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule][UX] Make Database with-able #12520

Merged

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Aug 21, 2022

ApplyHistoryBest right now plays a role as the database adaptor to query inside the database. In fact, the logic could be simplified and users only have to deal with Database instead of this extra object.

  • Add EnterWithScope/ExitWithScope/Current to Database
  • Migrate te_filter_func => "tir_filter" in Relay's pass context
  • Migrate f_take_tuning_record => "Database.query_tuning_record"
  • Migrate TECompiler to use Database
  • Remove apply-history-best

Next PR:

  • Migrate f_direct_dispatch (potentially unify with apply_fixed_schedule?)

cc @Hzfengsy @junrushao1994

@junrushao junrushao force-pushed the feature/2022-08-20/query-in-database branch 9 times, most recently from d2b1e90 to e506714 Compare August 26, 2022 04:35
@junrushao junrushao marked this pull request as ready for review August 26, 2022 16:20
@junrushao
Copy link
Member Author

@junrushao
Copy link
Member Author

Also CC @sunggg @YuchenJin for updates on Relax side

@github-actions github-actions bot requested a review from Hzfengsy August 26, 2022 18:52
Copy link
Contributor

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tiny nits :-)

include/tvm/meta_schedule/database.h Outdated Show resolved Hide resolved
include/tvm/meta_schedule/database.h Outdated Show resolved Hide resolved
Copy link
Contributor

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem I’m thinking about is.. What will happen if we have code

with database_0, database_1:
    ...

If we query the best record in the “with” body, looks like the current impl will only looking for records in database_2, rather than looking for record from both databases?

@masahi
Copy link
Member

masahi commented Aug 26, 2022

Another problem I’m thinking about is.. What will happen if we have code

with database_0, database_1:
    ...

If we query the best record in the “with” body, looks like the current impl will only looking for records in database_2, rather than looking for record from both databases?

I think if we have a way to merge several databases, we can agree that we only support with database: (one database).

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this change! Modulo the issue raised by @MasterJH5574, LGTM.

Copy link
Member

@zxybazh zxybazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM and we may want to add function to distinguish between targets, which should be a minor problem for now.

include/tvm/meta_schedule/database.h Outdated Show resolved Hide resolved
src/meta_schedule/database/database.cc Show resolved Hide resolved
Copy link
Contributor

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work!
Overall, LGTM. A few questions and nits.

python/tvm/meta_schedule/relay_integration.py Show resolved Hide resolved
python/tvm/meta_schedule/database/database.py Show resolved Hide resolved
src/relay/backend/te_compiler_cache.cc Show resolved Hide resolved
src/relay/backend/te_compiler_cache.cc Show resolved Hide resolved
src/relay/backend/utils.cc Show resolved Hide resolved
@junrushao junrushao force-pushed the feature/2022-08-20/query-in-database branch from e506714 to 2221492 Compare August 26, 2022 22:54
@junrushao
Copy link
Member Author

Another problem I’m thinking about is.. What will happen if we have code

with database_0, database_1:
    ...

If we query the best record in the “with” body, looks like the current impl will only looking for records in database_2, rather than looking for record from both databases?

I think if we have a way to merge several databases, we can agree that we only support with database: (one database).

@MasterJH5574 @masahi Great discussion here! We will introduced JoinedDatabase in the near future to cater such needs

`ApplyHistoryBest` right now plays a role as the database adaptor to query inside the database.
In fact, the logic could be simplified and users only have to deal with `Database` instead of this
extra object.

- [x] Add `EnterWithScope`/`ExitWithScope`/`Current` to Database
- [x] Migrate `te_filter_func` => "tir_filter" in Relay's pass context
- [x] Migrate `f_take_tuning_record` => "Database.query_tuning_record"
- [x] Migrate `TECompiler` to use `Database`
- [x] Remove apply-history-best

Next PR:
- Migrate `f_direct_dispatch` (potentially unify with `apply_fixed_schedule`?)
@junrushao junrushao force-pushed the feature/2022-08-20/query-in-database branch from 2221492 to 8f74866 Compare August 26, 2022 23:08
@MasterJH5574
Copy link
Contributor

Another problem I’m thinking about is.. What will happen if we have code

with database_0, database_1:
    ...

If we query the best record in the “with” body, looks like the current impl will only looking for records in database_2, rather than looking for record from both databases?

I think if we have a way to merge several databases, we can agree that we only support with database: (one database).

@MasterJH5574 @masahi Great discussion here! We will introduced JoinedDatabase in the near future to cater such needs

@junrushao Sounds a great idea. Thanks in advance!

@MasterJH5574 MasterJH5574 merged commit 370abe6 into apache:main Aug 27, 2022
@tqchen
Copy link
Member

tqchen commented Aug 27, 2022

One minor nit on naming :) Join have a separate meaning and in this case perhaps concat(or merge) is more appropriate. Try lookup the terminology in pandas

junrushao added a commit to junrushao/tvm that referenced this pull request Aug 27, 2022
Following apache#12520, this PR introduces `ScheduleFnDatabase`, a mocked
database to allow injecting handcrafted schedules provided by a schedule
function.

The schedule function comes with the following signature:

```python
def schedule_fn(
  sch: tir.Schedule,
) -> bool:
  task_name = sch.mod.attrs["task_name"]
  # ^^^ provides an optional name of the task queried
  ...
```

This mocked database helps incorporate the existing testing utility
`apply_fixed_schedule` more formally into the MetaSchedule-Relay build
pipeline, and allows further extension to Relax with the same interface.

Next as another follow-up, we will introduce ConcatDatabase that allows
mixing multiple databases, including the mocked and ones from JSON
files.
@junrushao
Copy link
Member Author

@tqchen Sure join doesn't look like a proper name. Perhaps ms.database.MergedDatabase sounds better?

junrushao added a commit to junrushao/tvm that referenced this pull request Aug 28, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  databases=[database],
  preferred=ms.database.ScheduleFn(schedule_fn),
):
  lib = relay.build(...)
```
masahi pushed a commit that referenced this pull request Aug 29, 2022
Following #12520, this PR introduces `ScheduleFnDatabase`, a mocked
database to allow injecting handcrafted schedules provided by a schedule
function.

The schedule function comes with the following signature:

```python
def schedule_fn(
  sch: tir.Schedule,
) -> bool:
  task_name = sch.mod.attrs["task_name"]
  # ^^^ provides an optional name of the task queried
  ...
```

This mocked database helps incorporate the existing testing utility
`apply_fixed_schedule` more formally into the MetaSchedule-Relay build
pipeline, and allows further extension to Relax with the same interface.

Next as another follow-up, we will introduce ConcatDatabase that allows
mixing multiple databases, including the mocked and ones from JSON
files.
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 29, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  databases=[database],
  preferred=ms.database.ScheduleFn(schedule_fn),
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 29, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 29, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 30, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 30, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns best one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns best one between r3 and r4
        db3, # has r3
        db4,  # has r4
    )
    db5,  # has r5
)
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns best one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns best one between r3 and r4
        db3, # has r3
        db4,  # has r4
    )
    db5,  # has r5
)
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Sat Aug 27 17:36:54 2022 -0700
#
# On branch feature/2022-08-27/merged-database
# Your branch and 'origin/feature/2022-08-27/merged-database' have diverged,
# and have 1 and 1 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)
#
# Changes to be committed:
#	modified:   include/tvm/meta_schedule/database.h
#	modified:   python/tvm/meta_schedule/database/__init__.py
#	new file:   python/tvm/meta_schedule/database/ordered_union_database.py
#	new file:   python/tvm/meta_schedule/database/union_database.py
#	modified:   src/meta_schedule/database/json_database.cc
#	new file:   src/meta_schedule/database/ordered_union_database.cc
#	new file:   src/meta_schedule/database/union_database.cc
#	modified:   src/meta_schedule/utils.h
#	modified:   tests/python/unittest/test_link_params.py
#	modified:   tests/python/unittest/test_meta_schedule_database.py
#

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Sat Aug 27 17:36:54 2022 -0700
#
# On branch feature/2022-08-27/merged-database
# Your branch and 'origin/feature/2022-08-27/merged-database' have diverged,
# and have 1 and 1 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)
#
# Changes to be committed:
#	modified:   include/tvm/meta_schedule/database.h
#	modified:   python/tvm/meta_schedule/database/__init__.py
#	new file:   python/tvm/meta_schedule/database/ordered_union_database.py
#	new file:   python/tvm/meta_schedule/database/union_database.py
#	modified:   src/meta_schedule/database/json_database.cc
#	new file:   src/meta_schedule/database/ordered_union_database.cc
#	new file:   src/meta_schedule/database/union_database.cc
#	modified:   src/meta_schedule/utils.h
#	modified:   tests/python/unittest/test_link_params.py
#	modified:   tests/python/unittest/test_meta_schedule_database.py
#

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Sat Aug 27 17:36:54 2022 -0700
#
# On branch feature/2022-08-27/merged-database
# Your branch and 'origin/feature/2022-08-27/merged-database' have diverged,
# and have 1 and 1 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)
#
# Changes to be committed:
#	modified:   include/tvm/meta_schedule/database.h
#	modified:   python/tvm/meta_schedule/database/__init__.py
#	new file:   python/tvm/meta_schedule/database/ordered_union_database.py
#	new file:   python/tvm/meta_schedule/database/union_database.py
#	modified:   src/meta_schedule/database/json_database.cc
#	new file:   src/meta_schedule/database/ordered_union_database.cc
#	new file:   src/meta_schedule/database/union_database.cc
#	modified:   src/meta_schedule/utils.h
#	modified:   tests/python/unittest/test_link_params.py
#	modified:   tests/python/unittest/test_meta_schedule_database.py
#
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
junrushao added a commit that referenced this pull request Sep 1, 2022
Following up #12520 and #12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
`ApplyHistoryBest` right now plays a role as the database adaptor to query inside the database.
In fact, the logic could be simplified and users only have to deal with `Database` instead of this
extra object.

- [x] Add `EnterWithScope`/`ExitWithScope`/`Current` to Database
- [x] Migrate `te_filter_func` => "tir_filter" in Relay's pass context
- [x] Migrate `f_take_tuning_record` => "Database.query_tuning_record"
- [x] Migrate `TECompiler` to use `Database`
- [x] Remove apply-history-best

Next PR:
- Migrate `f_direct_dispatch` (potentially unify with `apply_fixed_schedule`?)
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
Following apache#12520, this PR introduces `ScheduleFnDatabase`, a mocked
database to allow injecting handcrafted schedules provided by a schedule
function.

The schedule function comes with the following signature:

```python
def schedule_fn(
  sch: tir.Schedule,
) -> bool:
  task_name = sch.mod.attrs["task_name"]
  # ^^^ provides an optional name of the task queried
  ...
```

This mocked database helps incorporate the existing testing utility
`apply_fixed_schedule` more formally into the MetaSchedule-Relay build
pipeline, and allows further extension to Relax with the same interface.

Next as another follow-up, we will introduce ConcatDatabase that allows
mixing multiple databases, including the mocked and ones from JSON
files.
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…he#12628)

Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants