diff --git a/content/_index.md b/content/_index.md index 1b00fb2..d17ac90 100644 --- a/content/_index.md +++ b/content/_index.md @@ -4,16 +4,14 @@ date: 2023-10-24 type: landing sections: - - - block: hero + - block: markdown content: - title: SV-Gen. Analyzing the Generalization and Reliability of Steering Vectors + title: Analyzing the Generalization and Reliability of Steering Vectors text: We find that steering vectors can often fail to work in- and out-of-distribution. We propose "steerability", a new metric for steering vectors, and extensively evaluate it across 40 datasets. We find that steerability is highly variable across different inputs. Depending on the concept, spurious biases can substantially contribute to how effective steering is for each input. Overall, our findings show that while steering can work well in the right circumstances, there remain mnany technical difficulties of applying steering vectors to guide models' behaviour at scale, and higher standards of evidence are required when applying steering vectors to models on novel tasks. design: + css_class: dark background: - gradient_end: '#1976d2' - gradient_start: '#004ba0' - text_color_light: true + color: black - block: cta-button-list content: # Need a custom icon? @@ -22,14 +20,18 @@ sections: - text: Read our paper icon: academicons/arxiv url: https://drive.google.com/file/d/10DDi0wPFlw9yItmTaY03LPJptuFyTG8P/view?usp=sharing - - text: Use our steering-vectors library - icon: academicons/github - url: https://github.com/steering-vectors/steering-vectors/ - text: View our poster icon: brands/google url: https://drive.google.com/file/d/1xCMGCExBfyGivAhTV3-piU8CxVVPkC_5/view?usp=sharing + - text: Use our steering-vectors library + icon: brands/github + url: https://github.com/steering-vectors/steering-vectors/ - text: Reproduce our experiments - icon: academicons/github + icon: brands/github url: https://github.com/dtch1997/repepo + design: + css_class: dark + background: + color: black --- diff --git a/hugo_stats.json b/hugo_stats.json index cc0f445..0e669cb 100644 --- a/hugo_stats.json +++ b/hugo_stats.json @@ -32,6 +32,7 @@ "bg-primary-100", "blox-cta-button-list", "blox-hero", + "blox-markdown", "container", "dark", "dark:bg-hb-dark", @@ -49,6 +50,7 @@ "flex-wrap", "font-bold", "font-semibold", + "gap-3", "gap-6", "h-10", "h-[24px]", @@ -73,6 +75,7 @@ "max-w-prose", "mb-3", "mb-4", + "mb-6", "mt-24", "mt-4", "mt-6", @@ -100,6 +103,7 @@ "rounded-sm", "sm:py-48", "sm:text-6xl", + "text-3xl", "text-4xl", "text-center", "text-gray-600", @@ -122,6 +126,7 @@ "page-bg", "section-cta-button-list", "section-hero", + "section-markdown", "sun", "top" ] diff --git a/public/index.html b/public/index.html index 52273e4..e24c7a4 100644 --- a/public/index.html +++ b/public/index.html @@ -400,10 +400,9 @@ + - - @@ -445,8 +444,8 @@ -
-
+
+
@@ -460,25 +459,17 @@ -
- -
- - - -
-

SV-Gen. Analyzing the Generalization and Reliability of Steering Vectors

-

We find that steering vectors can often fail to work in- and out-of-distribution. We propose “steerability”, a new metric for steering vectors, and extensively evaluate it across 40 datasets. We find that steerability is highly variable across different inputs. Depending on the concept, spurious biases can substantially contribute to how effective steering is for each input. Overall, our findings show that while steering can work well in the right circumstances, there remain mnany technical difficulties of applying steering vectors to guide models’ behaviour at scale, and higher standards of evidence are required when applying steering vectors to models on novel tasks.

- - -
+
-
+
+ Analyzing the Generalization and Reliability of Steering Vectors
+ +
We find that steering vectors can often fail to work in- and out-of-distribution. We propose “steerability”, a new metric for steering vectors, and extensively evaluate it across 40 datasets. We find that steerability is highly variable across different inputs. Depending on the concept, spurious biases can substantially contribute to how effective steering is for each input. Overall, our findings show that while steering can work well in the right circumstances, there remain mnany technical difficulties of applying steering vectors to guide models’ behaviour at scale, and higher standards of evidence are required when applying steering vectors to models on novel tasks.
@@ -527,6 +518,8 @@

-