Suggest: Add Bayesian optimization support for ratio search #104

trotsky1997 · 2023-10-26T15:57:04Z

No description provided.

casper-hansen · 2023-10-27T10:28:28Z

Hi @trotsky1997, this looks very interesting! Have you conducted any experiments to measure perplexity after using Bayesian optimization?

trotsky1997 · 2023-10-27T10:53:47Z

Hi @trotsky1997, this looks very interesting! Have you conducted any experiments to measure perplexity after using Bayesian optimization?
You can check my result in
https://trotsky1997.notion.site/f49dcb79ab6245a7b689beed086e4c7b?pvs=4

casper-hansen · 2023-10-28T16:59:15Z

@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.

trotsky1997 · 2023-10-29T16:24:03Z

@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.

that's very easy to modify, just add a new parameter called ratio_b to get_loss function, and replace 1-ratio with ratio_b, than define a new parameter ratio_b with its boundary in parameter definition.

trotsky1997 · 2023-10-29T16:25:53Z

        @scheduler.serial
        def get_loss(ratio,ratio_b):
            nonlocal best_error,best_ratio,best_scales
            ratio = ratio * 1 / n_grid
            scales = (x_max.pow(ratio) / w_max.pow(ratio_b)
                      ).clamp(min=1e-4).view(-1)
            scales = scales / (scales.max() * scales.min()).sqrt()
            for fc in linears2scale:
                fc.weight.mul_(scales.view(1, -1).to(fc.weight.device))
                fc.weight.data = w_quantize_func(
                    fc.weight.data) / (scales.view(1, -1))
            out = block(x, **kwargs)
            if isinstance(out, tuple):
                out = out[0]

            loss = (org_out - out).float().pow(2).mean().item()  # float prevents overflow
            history.append(loss)
            is_best = loss < best_error
            if is_best:
                best_error = loss
                best_ratio = ratio
                best_scales = scales
            block.load_state_dict(org_sd)
            return loss

        param_space = dict(ratio=uniform(0, 1),ratio_b=uniform(0, 1))

trotsky1997 · 2023-10-29T16:30:24Z

@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.

I have talked with Dr.Tang, it perform a little better than gs in vicuna, but just the same as gs in llama2-7b.

trotsky1997@qq.com added 2 commits October 26, 2023 20:40

bayesian'

66bbc36

fix

838085e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest: Add Bayesian optimization support for ratio search #104

Suggest: Add Bayesian optimization support for ratio search #104

trotsky1997 commented Oct 26, 2023

casper-hansen commented Oct 27, 2023

trotsky1997 commented Oct 27, 2023

casper-hansen commented Oct 28, 2023

trotsky1997 commented Oct 29, 2023

trotsky1997 commented Oct 29, 2023

trotsky1997 commented Oct 29, 2023

Suggest: Add Bayesian optimization support for ratio search #104

Are you sure you want to change the base?

Suggest: Add Bayesian optimization support for ratio search #104

Conversation

trotsky1997 commented Oct 26, 2023

casper-hansen commented Oct 27, 2023

trotsky1997 commented Oct 27, 2023

casper-hansen commented Oct 28, 2023

trotsky1997 commented Oct 29, 2023

trotsky1997 commented Oct 29, 2023

trotsky1997 commented Oct 29, 2023