Skip to content

Commit

Permalink
Lecture word embeddings (#68)
Browse files Browse the repository at this point in the history
  • Loading branch information
pkeilbach authored Jan 15, 2024
1 parent d27e2a0 commit eacd897
Show file tree
Hide file tree
Showing 10 changed files with 1,950 additions and 0 deletions.
612 changes: 612 additions & 0 deletions docs/img/word-embeddings-cbow-architecture.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
83 changes: 83 additions & 0 deletions docs/img/word-embeddings-cbow-schema.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
132 changes: 132 additions & 0 deletions docs/img/word-embeddings-cbow-transformation.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
136 changes: 136 additions & 0 deletions docs/img/word-embeddings-cbow.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
330 changes: 330 additions & 0 deletions docs/img/word-embeddings-process.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
312 changes: 312 additions & 0 deletions docs/img/word-embeddings.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/lectures/language_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ A text corpus is a large and structured **set of texts**, such as:
- Blog posts
- Tweets

A corpus can be **general**, such as Wikipedia or news articles, or it can be **domain specific**, such as medical texts or legal documents.

!!! note "Vocabulary size vs. corpus size"

Note that the **vocabulary size** $|V|$ is the number of unique words in the corpus, whereas the **corpus size** $|C|$ is the total number of words in the corpus.
Expand Down
342 changes: 342 additions & 0 deletions docs/lectures/word_embeddings.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ nav:
- lectures/vector_space_models.md
- lectures/minimum_edit_distance.md
- lectures/language_models.md
- lectures/word_embeddings.md
- Assignments: assignments.md
- Presentations:
- presentations/presentations.md
Expand Down

0 comments on commit eacd897

Please sign in to comment.