-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Spanish translation of attention.md #29681
Merged
Merged
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
745472e
add attention to es/ and edit es/_toctree.yml
aaronjimv c9e043d
translate attention.md
aaronjimv e1a03bf
Merge branch 'main' into translate_attention
aaronjimv 155928c
fix transformers
aaronjimv 69a33bf
fix transformers
aaronjimv 68d2a8b
Merge branch 'main' into translate_attention
aaronjimv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
<!--Copyright 2023 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
|
||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
rendered properly in your Markdown viewer. | ||
|
||
--> | ||
|
||
# Mecanismos de atención | ||
|
||
La mayoría de los modelos de transformadores utilizan atención completa, en el sentido de que la matriz de atención es cuadrada. Esto puede ser un gran cuello de botella computacional cuando tienes textos largos. `Longformer` y `reformer` son modelos que intentan ser más eficientes y utilizan una versión dispersa de la matriz de atención para acelerar el entrenamiento. | ||
|
||
## Atención LSH | ||
|
||
[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer) utiliza atención LSH. En el softmax(QK^t), solo los elementos más grandes (en la dimensión softmax) de la matriz QK^t van a dar contribuciones útiles. Entonces, para cada consulta q en Q, podemos considerar solo las claves k en K que estén cerca de q. Se utiliza una función hash para determinar si q y k están cerca. La máscara de atención se modifica para enmascarar el token actual (excepto en la primera posición), porque dará una consulta y una clave iguales (entonces muy similares entre sí). Dado que el hash puede ser un poco aleatorio, en la práctica se utilizan varias funciones hash (determinadas por un parámetro n_rounds) y luego se promedian juntas. | ||
|
||
## Atención local | ||
|
||
[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer) utiliza atención local: a menudo, el contexto local (por ejemplo, ¿cuáles son los dos tokens a la izquierda y a la derecha?) es suficiente para tomar acción para un token dado. Además, apilando capas de atención que tienen una ventana pequeña, la última capa tendrá un campo receptivo mayor que solamente los tokens en la ventana, lo que les permite construir una representación de toda la oración. | ||
|
||
Algunos tokens de entrada preseleccionados también reciben atención global: para esos pocos tokens, la matriz de atención puede acceder a todos los tokens y este proceso es simétrico: todos los demás tokens tienen acceso a esos tokens específicos (además de los que están en su ventana local). Esto se muestra en la Figura 2d del artículo, el cual se puede apreciar un ejemplo de una máscara de atención: | ||
|
||
<div class="flex justify-center"> | ||
<img scale="50 %" align="center" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/local_attention_mask.png"/> | ||
</div> | ||
|
||
El uso de dichas matrices de atención con menos parámetros permite que el modelo tenga entradas con una longitud de secuencia mayor. | ||
|
||
## Otros trucos | ||
|
||
### Codificación posicional axial | ||
|
||
[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer) utiliza codificación posicional axial: en los modelos tradicionales de transformadores, la codificación posicional E es una matriz de tamaño \\(l\\) por \\(d\\), donde \\(l\\) es la longitud de la secuencia y \\(d\\) es la dimensión del estado oculto. Si tienes textos muy extensos, esta matriz puede ser enorme y ocupar demasiado espacio en la GPU. Para aliviar eso, las codificaciones posicionales axiales consisten en factorizar esa gran matriz E en dos matrices más pequeñas E1 y E2, con dimensiones \\(l_{1} \times d_{1}\\) y \\(l_{2} \times d_{2}\\), tal que \\(l_{1} \times l_{2} = l\\) y \\(d_{1} + d_{2} = d\\) (con el producto de las longitudes, esto termina siendo mucho más pequeño). La incrustación (embedding) para el paso de tiempo \\(j\\) en E se obtiene concatenando las incrustaciones para el paso de tiempo \\(j \% l1\\) en E1 y \\(j // l1\\) en E2. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing about transformers (specifically in the phrase "modelos de transformadores"), I believe that using 'transformadores' may not be the most accurate translation. In my opinion, it is preferable to consistently refer to them as "transformers," regardless of the language being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronjimv @stevhliu I just added this comment. I'm not sure if it's correct, but I wanted to express my thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @tadeodonegana thanks, I thinks its a good point. I going to use to "
transformers
"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when referring to the library, it is good to refer to it as just Transformers (no translation), but when referring to the models in a more generic way then it is ok to translate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me! Happy to help reviewing future translations! Great work @aaronjimv!