Skip to content

Commit

Permalink
Squashed commit
Browse files Browse the repository at this point in the history
[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485)

[Meta Schedule][M3c] PostOrderApply (apache#486)

Fix Post Order Apply (apache#490)

[MetaSchedule] Relay Integration (apache#489)

[M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492)

Fix replay trace. (apache#493)

[M3c][Meta Schedule] Implement the Replay Func class. (apache#495)

[PR] Test script for meta-schedule task extraction. Interface to load… (apache#494)

[Meta Schedule Refactor] Get child blocks (apache#500)

Read-at && Write-at (apache#497)

[M3c][Meta Schedule] Measure Callbacks (apache#498)

[Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496)

[MetaSchedule] Sample-Perfect-Tile (apache#501)

[MetaSchedule] TE Workloads (apache#502)

[TensorIR] GetProducer, GetConsumer (apache#506)

[MetaScheduleRefactor] Annotate&Unannotate (apache#505)

[MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503)

[Tests] Add unittests for auto-inline and multi-level-tiling (apache#508)

[Meta Schedule] Minor Fixes (apache#507)

[MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509)

[MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499)

[Meta Schedule] Add Helper Function & Minor Modification (apache#512)

[MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll  (apache#513)

[Meta Schedule] Feature Extractor & Cost Model (apache#510)

Blockize & Tensorize (apache#514)

Layout Rewriting: Suggest-Index-Map (apache#520)

[MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516)

[Meta Schedule] Per-Store-Feature (apache#521)

Add traced schedule for blockize & tensorize (apache#526)

[Meta Schedule] Add XGBoost Model & Random Model (apache#519)

User-Interface: Tune-TIR (apache#525)

User-Interface: Tune-TE (apache#527)

[Minor] More logging on python (apache#528)

Get CUDA tuning working (apache#529)

[MetaSchedule] TensorRT BYOC (apache#518)

[BugFix] LocalBuilder API (apache#531)

[Meta Schedule] Add Cost Model Update Measure Callback (apache#530)

[Bugfix] BuilderInput with default params (apache#532)

[MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534)

[Meta Schedule] Evolutionary Search (apache#522)

[BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535)

[Meta Schedule] Fix some bugs (apache#537)

Initiate Experiments for CPU Performance Alignment with Ansor (apache#538)

[Meta Schedule] Tweak experiment scripts (apache#539)

[Meta Schedule] Initiate experiments on CUDA (apache#540)

[TIR][Schedule] Buffer transform (apache#523)

Auto Tensor Core (apache#524)

Working on Evo Search (apache#542)

[Meta Schedule] Add Replay Tuning Interface (apache#543)

Evolutionary Search on CPU (apache#544)

Misc improvement over the error message (apache#545)

[TIR][Schedule] Software pipelining (apache#533)

[Meta Schedule Refactor] fixing unit tests (apache#547)

[MetaSchedule] Mutator-Compute-Location (apache#548)

Misc Improvement of Evolutionary Search (apache#549)

Hotfix for software pipeline (apache#552)

Misc Improvement (apache#550)

[Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555)

Rule RFactor (apache#551)

[MemHammer] Rewrite Rules (apache#554)

[MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556)

[MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559)

[MetaSchedule] Perf Alignment - NRM on CUDA (apache#560)

[TIR] Reorder the block iters of the blocks generated by RFactor (apache#561)

Removing 2 unit tests for software pipelining (apache#562)

[MemHammer] Lower Pass + Unittests (apache#557)

Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564)

Fix Sketch Generation Unittests (apache#565)

speed up VerifyGpuCode (apache#568)

[Performance Align] fixing codegen problems (apache#569)

[Meta schedule] improve search space (apache#1)

Hot fix for bound predicate (apache#3)

[Meta Schedule] Update Tune Relay (apache#4)

[Performance Align] fixing codegen problems (apache#5)

[PerfAlign] NRM & SFM on Raspi Aligned (apache#6)

[BugFix] Apply bound predicate directly to loops when possible (apache#12)

[BugFix] Fix CrossThreadReduction on CUDA (apache#13)

[MetaSchedule] Enable BertTuning with MetaScheduler (apache#11)

[Minor][MemHammer] Minor tweaks in code review (apache#14)

[Meta Schedule] Add customizable search space to PostOrderApply. (apache#16)

Fix cooperative fetching (apache#17)

Fixes for codegen (apache#18)

[Hotfix] A unittest (apache#19)

Fix for GRP sketch gen (apache#21)

Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20)

[BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22)

[MemHammer][Refactor] Code Review (apache#15)

[Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24)

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com>
Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
  • Loading branch information
9 people committed Feb 20, 2022
1 parent 46a98f1 commit eb8def7
Show file tree
Hide file tree
Showing 38 changed files with 693 additions and 209 deletions.
1 change: 1 addition & 0 deletions include/tvm/meta_schedule/database.h
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,7 @@ class PyDatabaseNode : public DatabaseNode {
// PackedFuncs are all not visited, because the reflection system doesn't take care of them,
// so it cannot be accessible on the python side. If there is such need from the future,
// we can then add corresponding accessor methods to help access on python.
//
// `f_has_workload` is not visited
// `f_commit_workload` is not visited
// `f_commit_tuning_record` is not visited
Expand Down
2 changes: 1 addition & 1 deletion include/tvm/meta_schedule/tune_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ class TuneContextNode : public runtime::Object {
/*! \brief The probability of using certain mutator. */
Map<Mutator, FloatImm> mutator_probs;
/*! \brief The name of the tuning task. */
Optional<String> task_name;
String task_name;
/*! \brief The random state. */
support::LinearCongruentialEngine::TRandState rand_state;
/*! \brief The number of threads to be used. */
Expand Down
8 changes: 4 additions & 4 deletions include/tvm/tir/schedule/schedule.h
Original file line number Diff line number Diff line change
Expand Up @@ -500,28 +500,28 @@ class ScheduleNode : public runtime::Object {
/******** Schedule: Annotation ********/
/*!
* \brief Annotate a loop with a key value pair
* \param loop_rv The loop to be annotated
* \param loop The loop to be annotated
* \param ann_key The annotation key
* \param ann_val The annotation value, a string or a ExprRV
*/
virtual void Annotate(const LoopRV& loop_rv, const String& ann_key, const ObjectRef& ann_val) = 0;
/*!
* \brief Annotate a block with a key value pair
* \param block_rv The block to be annotated
* \param loop The block to be annotated
* \param ann_key The annotation key
* \param ann_val The annotation value, a string or a ExprRV
*/
virtual void Annotate(const BlockRV& block_rv, const String& ann_key,
const ObjectRef& ann_val) = 0;
/*!
* \brief Unannotate a loop's annotation with key ann_key
* \param loop_rv The loop to be unannotated
* \param loop The loop to be unannotated
* \param ann_key The annotation key
*/
virtual void Unannotate(const LoopRV& loop_rv, const String& ann_key) = 0;
/*!
* \brief Unannotate a block's annotation with key ann_key
* \param block_rv The block to be unannotated
* \param loop The block to be unannotated
* \param ann_key The annotation key
*/
virtual void Unannotate(const BlockRV& block_rv, const String& ann_key) = 0;
Expand Down
87 changes: 87 additions & 0 deletions include/tvm/tir/stmt.h
Original file line number Diff line number Diff line change
Expand Up @@ -1442,6 +1442,93 @@ constexpr const char* nested_software_pipeline_stage = "nested_software_pipeline
*/
constexpr const char* nested_software_pipeline_order = "nested_software_pipeline_order";

/*!
* \brief Mark that the block need to add predicate for block var bounds during lowering
*/
constexpr const char* require_block_var_bound_predicate = "require_bound_predicate";

/*!
* \brief Mark that the loop should be further skip and bound to environment threads to enable
* cooperative fetching.
*/
constexpr const char* meta_schedule_cooperative_fetch = "meta_schedule.cooperative_fetch";

/*!
* \brief Mark that the block should be further rewritten using tensorization.
*/
constexpr const char* meta_schedule_auto_tensorize = "meta_schedule.auto_tensorize";

/*! \brief Mark that tensor core is enabled in the PrimExpr */
constexpr const char* meta_schedule_tensor_core_enabled = "meta_schedule.tensor_core_enabled";

/*! \brief The allowed range of thread extent in thread bindings */
constexpr const char* meta_schedule_thread_extent_low_inclusive =
"meta_schedule.thread_extent_low_inclusive";

/*! \brief The allowed range of thread extent in thread bindings */
constexpr const char* meta_schedule_thread_extent_high_inclusive =
"meta_schedule.thread_extent_high_inclusive";

/*!
* \brief Mark a block as generated by cache_read or cache_write block.
* 0 means cache_read; 1 means cache_write.
* \sa meta_schedule_cache_type_read
* \sa meta_schedule_cache_type_write
*/
constexpr const char* meta_schedule_cache_type = "meta_schedule.cache_type";

/*! \sa meta_schedule_cache_type */
constexpr const int meta_schedule_cache_type_read = 0;

/*! \sa meta_schedule_cache_type */
constexpr const int meta_schedule_cache_type_write = 1;

/*! \brief Mark the tiling structure of blocks that are applied by rule Multi-Level-Tiling */
constexpr const char* meta_schedule_tiling_structure = "meta_schedule.tiling_structure";

/*! \brief Mark the block whose producer needs to be applied by rule Random-Compute-Location */
constexpr const char* meta_schedule_random_compute_producer =
"meta_schedule.random_compute_producer";

/*! \brief Mark auto-parallel setting on the block. */
constexpr const char* meta_schedule_parallel = "meta_schedule.parallel";

/*! \brief Mark auto-vectorize setting on the block. */
constexpr const char* meta_schedule_vectorize = "meta_schedule.vectorize";

/*! \brief Mark auto-unroll setting on the block. */
constexpr const char* meta_schedule_unroll_explicit = "meta_schedule.unroll_explicit";

/*! \brief Mark auto-unroll setting on the block. */
constexpr const char* meta_schedule_unroll_implicit = "meta_schedule.unroll_implicit";

/*! \brief Pragma: auto-unroll, max_step */
constexpr const char* pragma_auto_unroll_max_step = "pragma_auto_unroll_max_step";

/*! \brief Pragma: unroll explicit */
constexpr const char* pragma_unroll_explicit = "pragma_unroll_explicit";

/*! \brief Mark the scope of the software pipeline */
constexpr const char* software_pipeline_scope = "software_pipeline_scope";

/*! \brief Mark the stage of a statement in the software pipeline */
constexpr const char* software_pipeline_stage = "software_pipeline_stage";

/*! \brief Mark the order of a statement in the software pipeline */
constexpr const char* software_pipeline_order = "software_pipeline_order";

/*! \brief Mark the stage of the result of the software pipeline lowering. This is used to specify
* the behavior of nested software pipelines. Should be a 3-tuple consisting of the stage of the
* prologue, the body, and the epilogue of the software pipeline.
*/
constexpr const char* nested_software_pipeline_stage = "nested_software_pipeline_stage";

/*! \brief Mark the stage of the result of the software pipeline lowering. This is used to specify
* the behavior of nested software pipelines. Should be a 3-tuple consisting of the stage of the
* prologue, the body, and the epilogue of the software pipeline.
*/
constexpr const char* nested_software_pipeline_order = "nested_software_pipeline_order";

/*!
* \brief Check if attr_key is a pragma key extension
* \param attr_key The attr key to be compared
Expand Down
14 changes: 14 additions & 0 deletions include/tvm/tir/transform.h
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,20 @@ TVM_DLL Pass LowerInitBlock();
*/
TVM_DLL Pass PlanAndUpdateBufferAllocationLocation();

/*!
* \brief Narrow the extents of some loops by checking whether some constraints in the block iter
* bound predicates can be directly applied on the loops.
* \return The pass.
*/
TVM_DLL Pass ApplyBlockBoundPredicate();

/*!
* \brief Narrow the extents of some loops by checking whether some constraints in the block iter
* bound predicates can be directly applied on the loops.
* \return The pass.
*/
TVM_DLL Pass ApplyBlockBoundPredicate();

/*!
* \brief Substitute all the block vars with the PrimExprs they are bound to, indicated by the
* corresponding iter_values in BlockRealize, for opaque blocks by removing all
Expand Down
3 changes: 2 additions & 1 deletion python/tvm/auto_scheduler/search_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,7 +543,8 @@ def print_best(self, log_file, print_mode="schedule"):
code: str
The best schedule code in python API or CUDA source code
"""
inp, _ = load_best_record(log_file, self.workload_key)
inp, res = load_best_record(log_file, self.workload_key)
print("Best codes (ms):", [float(c) * 1000.0 for c in res.costs])
if inp is None:
raise RuntimeError(
"Cannot find any valid schedule for %s in file %s" % (self.workload_key, log_file)
Expand Down
5 changes: 4 additions & 1 deletion python/tvm/auto_scheduler/workload_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,10 @@ def workload_key_to_tensors(workload_key):
assert callable(value)

args = deserialize_args(workload[1:])
return value(*args)
result = value(*args)
if isinstance(result, tuple):
result = list(result)
return result


def serialize_workload_registry_entry(workload_key):
Expand Down
18 changes: 16 additions & 2 deletions python/tvm/meta_schedule/builder/local_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,28 @@

from tvm._ffi import register_func
from tvm.ir import IRModule
from tvm.runtime import Module, NDArray, load_param_dict, save_param_dict
from tvm.runtime import NDArray
from tvm.runtime import Module, load_param_dict, save_param_dict
from tvm.target import Target

from ...contrib.popen_pool import MapResult, PopenPoolExecutor, StatusKind
from ..utils import cpu_count, get_global_func_with_default_on_worker
from .builder import BuilderInput, BuilderResult, PyBuilder

logger = logging.getLogger(__name__)


def _serialize_params(params: Optional[Dict[str, NDArray]]) -> Optional[bytearray]:
if params is None:
return None
return save_param_dict(params)


def _deserialize_params(params: Optional[bytearray]) -> Optional[Dict[str, NDArray]]:
if params is None:
return None
return load_param_dict(params)


logger = logging.getLogger(__name__) # pylint: disable=invalid-name

Expand Down Expand Up @@ -127,7 +142,6 @@ def __init__(
The initializer to be used for the worker processes.
"""
super().__init__()

if max_workers is None:
max_workers = cpu_count(logical=True)
logger.info("LocalBuilder: max_workers = %d", max_workers)
Expand Down
8 changes: 5 additions & 3 deletions python/tvm/meta_schedule/cost_model/cost_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,19 @@
# specific language governing permissions and limitations
# under the License.
"""Meta Schedule CostModel."""
import ctypes

from typing import List
import ctypes

import numpy as np

import numpy as np # type: ignore
from tvm._ffi import register_object
from tvm.runtime import Object

from .. import _ffi_api
from ..runner import RunnerResult
from ..search_strategy import MeasureCandidate
from ..tune_context import TuneContext
from ..search_strategy import MeasureCandidate
from ..utils import _get_hex_address, check_override


Expand Down
9 changes: 5 additions & 4 deletions python/tvm/meta_schedule/cost_model/metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@
# specific language governing permissions and limitations
# under the License.
"""Cost model metrics for meta schedule"""
import numpy as np # type: ignore
from typing import List
import numpy as np


def max_curve(trial_scores: np.ndarray) -> np.ndarray:
def max_curve(trial_scores: np.ndarray) -> List[float]:
"""f(n) = max([s[i] fo i < n])
Parameters
Expand All @@ -28,8 +29,8 @@ def max_curve(trial_scores: np.ndarray) -> np.ndarray:
Returns
-------
curve : np.ndarray
A vector, the max-curve function values
curve : List[float]
function values
"""
ret = np.empty(len(trial_scores))
keep = -1e9
Expand Down
12 changes: 6 additions & 6 deletions python/tvm/meta_schedule/cost_model/random_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@
"""
Random cost model
"""
from typing import List, Optional, Tuple, Union
from typing import List, Union, Tuple, Optional

import numpy as np # type: ignore
import numpy as np

from ..cost_model import PyCostModel
from ..runner import RunnerResult
from ..search_strategy import MeasureCandidate
from ..tune_context import TuneContext
from ..search_strategy import MeasureCandidate
from ..cost_model import PyCostModel


class RandomModel(PyCostModel):
Expand Down Expand Up @@ -70,7 +70,7 @@ def load(self, path: str) -> None:
path : str
The file path.
"""
self.random_state = tuple(np.load(path, allow_pickle=True)) # type: ignore
self.random_state = tuple(np.load(path, allow_pickle=True))

def save(self, path: str) -> None:
"""Save the cost model to given file location.
Expand Down Expand Up @@ -116,7 +116,7 @@ def predict(self, context: TuneContext, candidates: List[MeasureCandidate]) -> n
The predicted running results.
"""
np.random.set_state(self.random_state)
# TODO(@zxybazh): Use numpy's RandState object:
# todo(@zxybazh): Use numpy's RandState object:
# https://numpy.org/doc/1.16/reference/generated/numpy.random.RandomState.html#numpy.random.RandomState
result = np.random.rand(len(candidates)) * self.max_range
self.random_state = np.random.get_state()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"""Random Feature Extractor."""
from typing import List, Union, Tuple

import numpy as np # type: ignore
import numpy as np
from tvm.runtime.ndarray import NDArray, array

from ..tune_context import TuneContext
Expand Down
2 changes: 1 addition & 1 deletion python/tvm/meta_schedule/runner/local_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
run_evaluator_common,
)

logger = logging.getLogger(__name__) # pylint: disable=invalid-name
logger = logging.getLogger(__name__)


class LocalRunnerFuture(RunnerFuture):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ class PostOrderApply(SpaceGenerator):
def __init__(self):
"""Constructor"""
self.__init_handle_by_constructor__(
_ffi_api.SpaceGeneratorPostOrderApply, # type: ignore # pylint: disable=no-member
_ffi_api.SpaceGeneratorPostOrderApply, # pylint: disable=no-member
)
Loading

0 comments on commit eb8def7

Please sign in to comment.