Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][DRAFT] Is this the right track? Generating Dense Quantized op #4

Draft
wants to merge 67 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
84e850d
example moving things around
Apr 6, 2021
d655405
dense operator example
Apr 8, 2021
8ae43b5
dense stub
Apr 9, 2021
d20343f
clean up old file
Apr 9, 2021
8efce58
move common to common.py
Apr 9, 2021
3198516
add numpy quantization calculator
Apr 9, 2021
8557397
make dense creators create fp32 -> fp32 real or affine domain dependi…
Apr 9, 2021
2df25eb
lily's comments
Apr 9, 2021
ddd45df
renamed common.py -> utils.py
Apr 9, 2021
5ece075
rename common -> utils in files
Apr 9, 2021
a417c37
add more utils to simplify quantization/dequant code
Apr 9, 2021
aa67d38
cleanup interface calls
Apr 9, 2021
0575996
add arithmetic operators
AndrewZhaoLuo Apr 10, 2021
ee1487d
rename arithmetic to add
AndrewZhaoLuo Apr 12, 2021
e9dd073
make add default dequantize
AndrewZhaoLuo Apr 12, 2021
0c99a46
option to give output qparams to add
AndrewZhaoLuo Apr 12, 2021
9eb6f12
fix examples with new interface
AndrewZhaoLuo Apr 12, 2021
2f88d14
fix proper scale values
AndrewZhaoLuo Apr 12, 2021
03ae810
new multiply operator
AndrewZhaoLuo Apr 12, 2021
28b643c
add subtraction operator
AndrewZhaoLuo Apr 12, 2021
d4a60ee
fix rounding bug
AndrewZhaoLuo Apr 13, 2021
4def50b
add short circuits to some point termsg
AndrewZhaoLuo Apr 13, 2021
63fbf7e
fix bug with calculating zero point when symmetric
AndrewZhaoLuo Apr 14, 2021
3d1cb02
very ugly working conv
AndrewZhaoLuo Apr 14, 2021
81abd8c
more cool convs
AndrewZhaoLuo Apr 14, 2021
07f8fee
works with groups'
AndrewZhaoLuo Apr 14, 2021
b5883fb
example of dynamic pad not working
AndrewZhaoLuo Apr 14, 2021
0742d77
interface refresh for add and multiply for now
AndrewZhaoLuo Apr 30, 2021
4add45a
rewrite dense operator interface
AndrewZhaoLuo May 4, 2021
6ffa4b8
convs are cool, fixed interface
AndrewZhaoLuo May 4, 2021
0b8ec6d
add shape inference for args
AndrewZhaoLuo May 5, 2021
524b1f0
add todo
AndrewZhaoLuo May 5, 2021
a2a392a
move from transform --> qnn folder
AndrewZhaoLuo May 5, 2021
2658df3
add and multiply tests
AndrewZhaoLuo May 5, 2021
920b0ff
clean up utils, add comments, name QParams -> AffineQParams
AndrewZhaoLuo May 5, 2021
44609b7
example moving things around
Apr 6, 2021
dc8f980
dense operator example
Apr 8, 2021
22981ec
dense stub
Apr 9, 2021
ba85ecc
clean up old file
Apr 9, 2021
1a2583e
move common to common.py
Apr 9, 2021
dd51d9b
add numpy quantization calculator
Apr 9, 2021
c45a31e
make dense creators create fp32 -> fp32 real or affine domain dependi…
Apr 9, 2021
ab665b8
lily's comments
Apr 9, 2021
a993ee9
renamed common.py -> utils.py
Apr 9, 2021
773860f
rename common -> utils in files
Apr 9, 2021
879b3b9
add more utils to simplify quantization/dequant code
Apr 9, 2021
26849d8
cleanup interface calls
Apr 9, 2021
62ea876
add arithmetic operators
AndrewZhaoLuo Apr 10, 2021
252c416
rename arithmetic to add
AndrewZhaoLuo Apr 12, 2021
52638ac
make add default dequantize
AndrewZhaoLuo Apr 12, 2021
6597498
option to give output qparams to add
AndrewZhaoLuo Apr 12, 2021
3325f5e
fix examples with new interface
AndrewZhaoLuo Apr 12, 2021
999c234
fix proper scale values
AndrewZhaoLuo Apr 12, 2021
3c20289
new multiply operator
AndrewZhaoLuo Apr 12, 2021
b372b48
add subtraction operator
AndrewZhaoLuo Apr 12, 2021
e53040f
fix rounding bug
AndrewZhaoLuo Apr 13, 2021
9767479
add short circuits to some point termsg
AndrewZhaoLuo Apr 13, 2021
3bbb008
fix bug with calculating zero point when symmetric
AndrewZhaoLuo Apr 14, 2021
b13efc2
very ugly working conv
AndrewZhaoLuo Apr 14, 2021
9cb6f59
more cool convs
AndrewZhaoLuo Apr 14, 2021
737187b
works with groups'
AndrewZhaoLuo Apr 14, 2021
467de8b
example of dynamic pad not working
AndrewZhaoLuo Apr 14, 2021
81cf33a
Merge pull request #2 from AndrewZhaoLuo/quantization-dev-main-add-re…
AndrewZhaoLuo May 5, 2021
7d29017
change interface to be more explicit when simulated
AndrewZhaoLuo May 6, 2021
3b11404
extended quantize add/sub tests
AndrewZhaoLuo May 6, 2021
73a985d
more multiplication tests
AndrewZhaoLuo May 6, 2021
3f1c4ba
quantized util tests
AndrewZhaoLuo May 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
125 changes: 125 additions & 0 deletions python/tvm/relay/qnn/python_operators/add.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
from typing import *

import numpy as np
import tvm
from tvm import relay
from tvm.relay.dataflow_pattern import (
DFPatternCallback,
_DFPatternCallback,
is_constant,
is_op,
wildcard,
)
from tvm.relay.op import nn, tensor
from tvm.relay.qnn.python_operators import utils


def generate_generic_quantized_add_or_subtract(
input1: tvm.relay.Expr,
input2: tvm.relay.Expr,
output_qparams: Optional[utils.AffineQParams],
simulated_dtype: Optional[utils.SimulatedDTypes] = None,
accumulation_dtype: str = "int32",
dequantize: bool = True,
mode: str = "add",
) -> Tuple[tvm.relay.Expr, utils.AffineQParams]:
simulated = simulated_dtype is not None
internal_accumulation_dtype = simulated_dtype.value if simulated else accumulation_dtype

input1, input2 = utils.quantize_inputs(
simulated, internal_accumulation_dtype, input1, output_qparams, input2, output_qparams
)

if mode == "add":
output_term = (
input1 + input2 - utils.cast_all(internal_accumulation_dtype, output_qparams.zero_point)
)
elif mode == "sub":
output_term = (
input1 - input2 + utils.cast_all(internal_accumulation_dtype, output_qparams.zero_point)
)
else:
raise ValueError("Only support addition and subtraction")

if dequantize:
output_term = utils.dequantize_expr(simulated, output_term, output_qparams)

# TODO: simulate the effects of overflow
return output_term, output_qparams


def generate_quantized_add(
input1: tvm.relay.Expr,
input2: tvm.relay.Expr,
output_qparams: Optional[utils.AffineQParams],
simulated_dtype: Optional[utils.SimulatedDTypes] = None,
accumulation_dtype: str = "int32",
dequantize: bool = True,
) -> Tuple[tvm.relay.Expr, utils.AffineQParams]:
return generate_generic_quantized_add_or_subtract(
input1=input1,
input2=input2,
output_qparams=output_qparams,
simulated_dtype=simulated_dtype,
accumulation_dtype=accumulation_dtype,
dequantize=dequantize,
mode="add",
)


def generate_quantized_sub(
input1: tvm.relay.Expr,
input2: tvm.relay.Expr,
output_qparams: Optional[utils.AffineQParams],
simulated_dtype: Optional[utils.SimulatedDTypes] = None,
accumulation_dtype: str = "int32",
dequantize: bool = True,
) -> Tuple[tvm.relay.Expr, utils.AffineQParams]:
return generate_generic_quantized_add_or_subtract(
input1=input1,
input2=input2,
output_qparams=output_qparams,
simulated_dtype=simulated_dtype,
accumulation_dtype=accumulation_dtype,
dequantize=dequantize,
mode="sub",
)


def example_add_simulated(seed=42):
np.random.seed(seed=seed)
a_arr = np.random.uniform(-10, 10, size=(5, 10)).astype("float32")
b_arr = np.random.uniform(-10, 10, size=(5, 10)).astype("float32")

var_creator = utils.AffineQuantizationVarCreator()
a = relay.var("a")
b = relay.var("b")
output_qparams = var_creator.get_qparams("output")

add_output, output_qparams = generate_quantized_add(
a, b, output_qparams, simulated_dtype=utils.SimulatedDTypes.FLOAT32
)
f = relay.Function(
[a, b, output_qparams.scale_factor, output_qparams.zero_point],
add_output,
)
print(f)

actual_output_qparams = utils.get_quantization_parameters(a_arr + b_arr, True, 8)

mod = tvm.ir.IRModule.from_expr(f)
intrp = relay.create_executor(kind="debug", mod=mod)
result = intrp.evaluate(f)(
a_arr, b_arr, actual_output_qparams.scale_factor, actual_output_qparams.zero_point
).asnumpy()

print("Quantized result:")
print(result)
print()
print("FP32 result:")
print(a_arr + b_arr)


if __name__ == "__main__":
# Test that the sim_q and static_q get the same results
example_add_simulated(seed=42)
252 changes: 252 additions & 0 deletions python/tvm/relay/qnn/python_operators/conv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
from typing import *

import numpy as np
import torch
import tvm
from torch import nn
from tvm import relay
from tvm.relay.qnn.python_operators import utils


def get_axis_from_layout(dimension_name: str, layout: str):
if sorted(layout) != sorted("NCHW") or sorted(layout) != sorted("OIHW"):
raise ValueError(f"Unknown layout {data_layout} need permutation of NCHW or OIHW")

try:
return layout.index(dimension_name)
except ValueError:
raise ValueError(f"Unknown dimension {dimension_name} for layout {layout}")


def generate_generic_quantized_conv2d(
data: tvm.relay.Expr,
weight: tvm.relay.Expr,
data_qparams: utils.AffineQParams,
weight_qparams: utils.AffineQParams,
in_channels: Optional[int] = None,
out_channels: Optional[int] = None,
strides: Tuple[int, int] = (1, 1),
padding: Tuple[int, int] = (0, 0),
dilation: Tuple[int, int] = (1, 1),
groups: int = 1,
kernel_size: Optional[Tuple[int, int]] = None,
data_layout: str = "NCHW",
kernel_layout: str = "OIHW",
out_layout: str = "",
simulated_dtype: Optional[utils.SimulatedDTypes] = None,
accumulation_dtype: str = "int32",
dequantize: bool = True,
bias: Optional[tvm.relay.Expr] = None,
) -> Tuple[tvm.relay.Expr, utils.AffineQParams]:
simulated = simulated_dtype is not None
if in_channels is None:
in_channels_axis = get_axis_from_layout("C", data_layout)
in_channels = weight.checked_type.shape[in_channels_axis]
if out_channels is None:
out_channels_axis = get_axis_from_layout("O", kernel_layout)
out_channels = weight.checked_type.shape[out_channels_axis]
if kernel_size is None:
kernel_height_axis = get_axis_from_layoutH("H", kernel_layout)
kernel_width_axis = get_axis_from_layout("W", kernel_layout)
kernel_size = (
weight.checked_type.shape[kernel_height_axis],
weight.checked_type.shape[kernel_width_axis],
)

internal_accumulation_dtype = (
simulated_dtype.value if simulated is not None else accumulation_dtype
)

data, weight = utils.quantize_inputs(
simulated, internal_accumulation_dtype, data, data_qparams, weight, weight_qparams
)

pad_n = (0, 0)
pad_c = (0, 0)
pad_h = (padding[0], padding[0])
pad_w = (padding[1], padding[1])

if sorted(data_layout) != sorted("NCHW"):
raise ValueError(f"Unknown layout {data_layout} need dimension: N, H, W, C")

padding = []
for dimension_name in data_layout:
if dimension_name == "N":
padding.append(pad_n)
elif dimension_name == "C":
padding.append(pad_c)
elif dimension_name == "H":
padding.append(pad_h)
else:
padding.append(pad_w)

padded_data = relay.nn.pad(
data,
tuple(padding),
pad_value=data_qparams.zero_point,
)

first_term = relay.nn.conv2d(
padded_data,
weight,
strides=strides,
padding=(0, 0),
dilation=dilation,
groups=groups,
channels=out_channels,
kernel_size=kernel_size,
data_layout=data_layout,
kernel_layout=kernel_layout,
out_layout=out_layout,
out_dtype=internal_accumulation_dtype,
)

# TODO: consider sum/avg pooling and then reduction along channels instead of this
second_term = (
relay.nn.conv2d(
padded_data,
relay.ones_like(weight),
strides=strides,
padding=(0, 0),
dilation=dilation,
groups=groups,
channels=out_channels,
kernel_size=kernel_size,
data_layout=data_layout,
kernel_layout=kernel_layout,
out_layout=out_layout,
out_dtype=internal_accumulation_dtype,
)
* utils.cast_all(internal_accumulation_dtype, weight_qparams.zero_point)
)

if kernel_layout == "OIHW":
third_term = relay.sum(weight, axis=0, keepdims=False, exclude=True)
else:
third_term = relay.sum(weight, axis=1, keepdims=False, exclude=True)
third_term *= utils.cast_all(internal_accumulation_dtype, data_qparams.zero_point)

if data_layout == "NCHW":
third_term = relay.reshape(third_term, (1, out_channels, 1, 1))
else:
third_term = relay.reshape(third_term, (1, 1, 1, out_channels))

data, weight, data_zero_point, weight_zero_point = utils.cast_all(
internal_accumulation_dtype,
data,
weight,
data_qparams.zero_point,
weight_qparams.zero_point,
)

fourth_term = (
data_zero_point
* weight_zero_point
* relay.const(kernel_size[0], dtype=internal_accumulation_dtype)
* relay.const(kernel_size[1], dtype=internal_accumulation_dtype)
* relay.const(in_channels, dtype=internal_accumulation_dtype)
/ relay.const(groups, dtype=internal_accumulation_dtype)
)

output_qparams = utils.AffineQParams(
data_qparams.scale_factor * weight_qparams.scale_factor,
relay.const(0, dtype=accumulation_dtype),
accumulation_dtype,
)

# Don't do operations sequentially to better use parallelism
output_term = (first_term - second_term) - (third_term - fourth_term)

if bias is not None:
bias = utils.quantize_inputs(simulated, internal_accumulation_dtype, bias, output_qparams)
output_term += bias

if dequantize:
output_term = utils.dequantize_expr(
simulated, output_term, output_qparams
)

return output_term, output_qparams


def example_conv_no_zp(in_channels, out_channels, img_height, img_width, groups=2, seed=42):
np.random.seed(seed=seed)
kernel_size = 1
padding = 1

# NCHW tensors, OIHW kernel
data_arr = np.random.uniform(-5, 10, size=(1, in_channels, img_height, img_width)).astype(
"float32"
)
weight_arr = np.random.uniform(
-1, 2, size=(out_channels, in_channels // groups, kernel_size, kernel_size)
).astype("float32")

# bias_arr = np.random.uniform(-100, 100, size=(n, out_units)).astype("float32")

var_creator = utils.AffineQuantizationVarCreator()
data = relay.var("data")
weight = relay.var("weight")
data_qparams = var_creator.get_qparams("conv_data")
weight_qparams = var_creator.get_qparams("conv_weight")
output_tensor, output_qparams = generate_generic_quantized_conv2d(
data,
weight,
data_qparams,
weight_qparams,
kernel_size=(kernel_size, kernel_size),
padding=(padding, padding),
out_channels=out_channels,
in_channels=in_channels,
groups=groups,
simulated_dtype=utils.SimulatedDTypes.FLOAT32,
dequantize=True,
)

f = relay.Function(
[
data,
weight,
data_qparams.scale_factor,
data_qparams.zero_point,
weight_qparams.scale_factor,
weight_qparams.zero_point,
],
output_tensor,
)
print(f)

actual_data_qparams = utils.get_quantization_parameters(data_arr, True, 8)
actual_weight_qparams = utils.get_quantization_parameters(weight_arr, True, 8)

print(actual_data_qparams)
print(actual_weight_qparams)

mod = tvm.ir.IRModule.from_expr(f)
intrp = relay.create_executor(kind="debug", mod=mod)
result = intrp.evaluate(f)(
data_arr,
weight_arr,
actual_data_qparams.scale_factor,
actual_data_qparams.zero_point,
actual_weight_qparams.scale_factor,
actual_weight_qparams.zero_point,
).asnumpy()

print("Quantized result:")
q_result = torch.Tensor(result)
print(q_result)
print()

print("FP32 result:")
fp32_result = nn.functional.conv2d(
torch.Tensor(data_arr), torch.Tensor(weight_arr), padding=(padding, padding), groups=groups
)
print(fp32_result)
print()
print("Difference:")
print(q_result - fp32_result)


if __name__ == "__main__":
example_conv_no_zp(4, 2, 5, 5, groups=2)
Loading