diff --git a/content/resources/git-version-control-via-command-line/index.mdx b/content/resources/git-version-control-via-command-line/index.mdx index 817de75fc..a20fbe2f4 100644 --- a/content/resources/git-version-control-via-command-line/index.mdx +++ b/content/resources/git-version-control-via-command-line/index.mdx @@ -1,34 +1,44 @@ --- +title: Git version control via command line +summary: >- + This article introduces the main concepts in Git and basic Git commands that + can be used from the command line. Understanding these commands will help you + with using git in a code editor, the git desktop and other options, like + GitHub online. +locale: en authors: - siam-omar -license: cc-by-4-0 -locale: en +editors: [] publicationDate: 2023-11-05 -summary: This article introduces the main concepts in Git and basic Git commands that can be used from the command line. Understanding these commands will help you with using git in a code editor, the git desktop and other options, like GitHub online. +version: 1.0.0 tags: - data-management - git -title: Git version control via command line +license: cc-by-4-0 toc: false -version: 1.0.0 --- ## Learning outcomes -- be familiar with git terminology -- be familiar with the git workflow -- install git -- understand essential git commands -- use git commands in the terminal / command line +* be familiar with git terminology + +* be familiar with the git workflow + +* install git + +* understand essential git commands + +* use git commands in the terminal / command line ## Version control and tracking changes -**Git**'s main purpose is to track changes in files and folders of a project, either for yourself or when working collaboratively in a team. It works best when used with **plaintext file formats** such as **source code, XML/TEI documents** or **Markdown content**. While Git will happily store _images_, _audio files_, or `.doc` and `.pdf` documents, it is not able to help you with changes in documents in so-called binary formats. +**Git**'s main purpose is to track changes in files and folders of a project, either for yourself or when working collaboratively in a team. It works best when used with **plaintext file formats** such as **source code, XML/TEI documents** or **Markdown content**. While Git will happily store *images*, *audio files*, or `.doc` and `.pdf` documents, it is not able to help you with changes in documents in so-called binary formats. **Git** -- solves the problem of keeping versions of text documents in sync among sometimes thousands of +* solves the problem of keeping versions of text documents in sync among sometimes thousands of collaborators working on a software product -- it helps integrate changes by multiple collaborators and also solves situations where two people + +* it helps integrate changes by multiple collaborators and also solves situations where two people edit the same part of a document Git can record which changes, to which documents, have been made when, and by whom. It allows to keep a detailed revision history of a project, because it can save snapshots of a project at specific points in time, allowing to review them any time in the future. @@ -36,7 +46,7 @@ Git can record which changes, to which documents, have been made when, and by wh Version control allows to save different versions of content, restores previous versions, and compares different versions. This is especially beneficial when working with multiple documents, and when working in teams (potentially working on the same document).
-Git's version control works with branches. + Git's version control works with branches.
Git allows to create separate branches for changes to any document. For example, in the image above, the document is represented by the **main branch**. When a change should be created, a copy, the branch "feature A" is **pulled** from the main branch. Once the change is made, the feature branch is ready to transfer back into the main branch. Similarly, we can **fetch** a branch "feature B" and create another change. When changes have been **committed**, it can be **pushed** back into the main branch, too. Git version control features allow to trace those changes and consider them before they are **merged** into the main branch. @@ -48,154 +58,164 @@ The paragraph above features **terminology** (in bold) that will become essentia This diagram is a great summary of how content moves in Git, between the “working area”, which is simply the files and folders in a project as they exist on your computer, the “staging area”, which we can call a “holding zone”, and the “repository”, which are the permanently recorded commit snapshots.
-Git workflow + Git workflow
-- **Working area:** This is where you manipulate your files. For example, you can make changes to the text in an MDX document. The working area usually resides on your local machine. -- **Staging area:** This "holding zone" allows to **stage** changes in your documents, and **commit** those changes once you are comfortable, and **push** these changes to your repository. -- **Repository:** A repository is a central location where the data, files and documents are stored and managed. You can be the sole user of a repository or you can be a collaborator in a team. +* **Working area:** This is where you manipulate your files. For example, you can make changes to the text in an MDX document. The working area usually resides on your local machine. + +* **Staging area:** This "holding zone" allows to **stage** changes in your documents, and **commit** those changes once you are comfortable, and **push** these changes to your repository. + +* **Repository:** A repository is a central location where the data, files and documents are stored and managed. You can be the sole user of a repository or you can be a collaborator in a team. Let's look at how change tracking with Git works in practice, in a local project on your computer (no network connection required). We'll walk through how to work with Git in the terminal, because that is where Git was originally meant to be used, and because it helps to understand what is actually going on. This involves learning a handful of Git commands, and while that might seem intimidating at first, you'll see that it becomes second-nature with a bit of practice very quickly. However, if you prefer to work with Git via a **graphical user interface (GUI)**, take a look at the bonus section at the end of this introduction, which lists some popular editor or operating system integrations. -The commands you learn for working in a local project also apply for working collaboratively. Working in the terminal helps to understand what is going on behind the scenes. + The commands you learn for working in a local project also apply for working collaboratively. Working in the terminal helps to understand what is going on behind the scenes. ## Installing git -Many users, particularly Windows users, depend on using Graphical User Interfaces (GUI) in their daily lives and work schedules. GUIs have tabs, menus and windows that you can click on. GUIs are perceived as more user-friendly because they graphically represent the actions you take when you interact with the machine in front of you. + Many users, particularly Windows users, depend on using Graphical User Interfaces (GUI) in their daily lives and work schedules. GUIs have tabs, menus and windows that you can click on. GUIs are perceived as more user-friendly because they graphically represent the actions you take when you interact with the machine in front of you. -The command line interface (CLI) relies on text input. The user needs to know syntax and commands to use it. So instead of clicking on symbols, you type what you would like your machine to do. To open the command line, search for _terminal_ or _command prompt_. When you open your CLI, it will look something like this: + The command line interface (CLI) relies on text input. The user needs to know syntax and commands to use it. So instead of clicking on symbols, you type what you would like your machine to do. To open the command line, search for *terminal* or *command prompt*. When you open your CLI, it will look something like this: -```bash -C:\Users\user_name>path_to_your_directory -``` + ```bash + C:\Users\user_name>path_to_your_directory + ``` -When using a Mac, you will see the **$** character instead of **\>**. + When using a Mac, you will see the **$** character instead of **>**. -Here's an example for a basic command to use in your CLI: to navigate to a directory, type **cd** and the directory name. Now you can navigate down the directory hierarchy. To go back up, type **cd..** + Here's an example for a basic command to use in your CLI: to navigate to a directory, type **cd** and the directory name. Now you can navigate down the directory hierarchy. To go back up, type **cd..** -```bash -C:\Users\user_name>cd path_to_your_directory -C:\Users\user_name\path_to_your_directory> cd a_small_folder -C:\Users\user_name\path_to_your_directory\a_small_folder> cd .. -C:\Users\user_name\path_to_your directory> cd .. -C:\Users\user_name> -``` + ```bash + C:\Users\user_name>cd path_to_your_directory + C:\Users\user_name\path_to_your_directory> cd a_small_folder + C:\Users\user_name\path_to_your_directory\a_small_folder> cd .. + C:\Users\user_name\path_to_your directory> cd .. + C:\Users\user_name> + ``` -There are multiple tutorials available that help you to familiarize yourself with the command line. + There are multiple tutorials available that help you to familiarize yourself with the command line. - - -If you haven't yet installed Git on your Windows computer, follow the instructions on https://git-scm.com/downloads, or https://gitforwindows.org/. + + + If you haven't yet installed Git on your Windows computer, follow the instructions on [https://git-scm.com/downloads](https://git-scm.com/downloads), or [https://gitforwindows.org/](https://gitforwindows.org/). -You should also provide some initial configuration. The setup program runs as administrator as well as normal user. In general hit next and don't change anything as most options are only relevant for advanced usage. + You should also provide some initial configuration. The setup program runs as administrator as well as normal user. In general hit next and don't change anything as most options are only relevant for advanced usage. -[One exception might be the editor](https://git-intro-wboe.acdh-dev.oeaw.ac.at/simple_windows_editors). But if you follow this howto, its use is not essential. + [One exception might be the editor](https://git-intro-wboe.acdh-dev.oeaw.ac.at/simple_windows_editors). But if you follow this howto, its use is not essential. -### Editors with git-support + ### Editors with git-support -In order to have an editor that supports us when working with git version control, we would suggest you to use [Visual Studio Code.](https://code.visualstudio.com/) Just download the suggested stable build and install it. + In order to have an editor that supports us when working with git version control, we would suggest you to use [Visual Studio Code.](https://code.visualstudio.com/) Just download the suggested stable build and install it. -We would suggest you to let the setup program add an icon on your desktop. You can of course just search the start menu for (Visual Studio)Code. + We would suggest you to let the setup program add an icon on your desktop. You can of course just search the start menu for (Visual Studio)Code. -Note that other specialized editors also support managing git repositories themselves. One notable example would be OxygenXML. - - -There are several ways to install git on MacOS. Depending on your needs and your familiarity with the command line we suggest one of the following two ways: + Note that other specialized editors also support managing git repositories themselves. One notable example would be OxygenXML. + -### 1. Using the Git bundled with XCode Command Line Tools + + There are several ways to install git on MacOS. Depending on your needs and your familiarity with the command line we suggest one of the following two ways: -Apple ships an older version of git with some add on tools for their MacOS. To get this git version you only have to + ### 1. Using the Git bundled with XCode Command Line Tools -- open a "terminal" (type "terminal" in Spotlight search, start the respective app) -- type `git` and press enter and try to run git + Apple ships an older version of git with some add on tools for their MacOS. To get this git version you only have to -Doing this will tell MacOS to get a few more programs from the internet. The bundle is called "Xcode Command Line Tools". So if this already happened for some reason you are ready. If not, you have to go through a few dialogs and after about 10 minutes everything is installed. + * open a "terminal" (type "terminal" in Spotlight search, start the respective app) -### 2. Using an installation package + * type `git` and press enter and try to run git -The easiest way to install git on a Mac is to use an installer-package provided by Tim Harper. -However + Doing this will tell MacOS to get a few more programs from the internet. The bundle is called "Xcode Command Line Tools". So if this already happened for some reason you are ready. If not, you have to go through a few dialogs and after about 10 minutes everything is installed. -- it will not give you the latest version of Git -- it will be harder to update -- it will complain that the software might be risky (which is highly unlikely) + ### 2. Using an installation package -If these three downsides do not matter to your workflow, here is a step by step guide: + The easiest way to install git on a Mac is to use an installer-package provided by Tim Harper. + However -1. Download the installer package at [https://sourceforge.net/projects/git-osx-installer/](https://sourceforge.net/projects/git-osx-installer/). -1. Open the downloaded _.dmg file and run the file with the ending_ .pkg. - ![](/assets/content/resources/git-version-control-via-command-line/macos_installerpackage.jpg) -1. MacOS is likely to give you a warning, because it does not initially trust sourceforge.com as a - source, as it is not a "verified developer". - ![](/assets/content/resources/git-version-control-via-command-line/macos-warning.png) - If this happens, please press ok and go to your "System Settings" - Security - General. - ![](/assets/content/resources/git-version-control-via-command-line/macos-securitysettings.jpg) - There you will see at the bottom of the window a button that should prompt "open anyways" to be able to open and run the *.pkg. Also prompt "open" at the following warning and the installer-package will start. -1. Please follow the instructions in the standard installation dialogue to install Git. + * it will not give you the latest version of Git -![](/assets/content/resources/git-version-control-via-command-line/macos_installationdialogue.png) + * it will be harder to update -### 3. Using Homebrew + * it will complain that the software might be risky (which is highly unlikely) -If you are not afraid of the command line, you can use the package manager "Homebrew" to install the latest version of Git. + If these three downsides do not matter to your workflow, here is a step by step guide: -#### Installing Homebrew + 1. Download the installer package at [https://sourceforge.net/projects/git-osx-installer/](https://sourceforge.net/projects/git-osx-installer/). -If you do not have Homebrew installed yet, please paste the following line into your MacOS Terminal: -`/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)".` + 2. Open the downloaded *.dmg file and run the file with the ending* .pkg. + ![](/assets/content/resources/git-version-control-via-command-line/macos_installerpackage.jpg) -By pressing enter and starting the script the installation procedure will run in the terminal. At each step the installer will pause and let you confirm, before it continues. + 3. MacOS is likely to give you a warning, because it does not initially trust sourceforge.com as a + source, as it is not a "verified developer". + ![](/assets/content/resources/git-version-control-via-command-line/macos-warning.png) + If this happens, please press ok and go to your "System Settings" - Security - General. + ![](/assets/content/resources/git-version-control-via-command-line/macos-securitysettings.jpg) + There you will see at the bottom of the window a button that should prompt "open anyways" to be able to open and run the \*.pkg. Also prompt "open" at the following warning and the installer-package will start. -#### Installing git using Homebrew + 4. Please follow the instructions in the standard installation dialogue to install Git. -After having installed Homebrew, please paste the following into your MacOS terminal: `$ brew install git`. This prompt will run the git-installation. + ![](/assets/content/resources/git-version-control-via-command-line/macos_installationdialogue.png) -### Editors with git-support + ### 3. Using Homebrew -To have an editor that supports us when working with git version control, we would suggest you to use -[Visual Studio Code](https://code.visualstudio.com/). Just download the suggested stable build and install it. + If you are not afraid of the command line, you can use the package manager "Homebrew" to install the latest version of Git. -Note that other specialized editors also support managing git repositories themselves. One notable -example would be OxygenXML. - - -If you haven't yet installed Git on your computer, follow the instructions on [https://git-scm.com/ Download for Linux and Unix](https://git-scm.com/download/linux). + #### Installing Homebrew -Git is part of every Linux Distribution nowadays and you install it using the usual package management tools (like `apt`, `dnf`, `pacman`, `yum`, `zypper` etc.). + If you do not have Homebrew installed yet, please paste the following line into your MacOS Terminal: + `/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)".` -### Editors with git-support + By pressing enter and starting the script the installation procedure will run in the terminal. At each step the installer will pause and let you confirm, before it continues. -To have an editor that supports us when working with git version control we would suggest you to use -[Visual Studio Code](https://code.visualstudio.com/). Just download the suggested stable build and install it. + #### Installing git using Homebrew -Note that other specialized editors also support managing git repositories themselves. One notable -example would be OxygenXML. + After having installed Homebrew, please paste the following into your MacOS terminal: `$ brew install git`. This prompt will run the git-installation. -## Save your username and password, so type it only once + ### Editors with git-support -You also should make sure the `git-credential-libsecret` package is installed. This package exists -on fedora based distributions. On Ubuntu and debian you need to execute these commands in a terminal: + To have an editor that supports us when working with git version control, we would suggest you to use + [Visual Studio Code](https://code.visualstudio.com/). Just download the suggested stable build and install it. -```bash -sudo apt-get install build-essential libsecret-1-0 libsecret-1-dev -cd /usr/share/doc/git/contrib/credential/libsecret -sudo make -sudo git config --system credential.helper /usr/share/doc/git/contrib/credential/libsecret/git-credential-libsecret -``` - - + Note that other specialized editors also support managing git repositories themselves. One notable + example would be OxygenXML. + + + + If you haven't yet installed Git on your computer, follow the instructions on [https://git-scm.com/ Download for Linux and Unix](https://git-scm.com/download/linux). + + Git is part of every Linux Distribution nowadays and you install it using the usual package management tools (like `apt`, `dnf`, `pacman`, `yum`, `zypper` etc.). + + ### Editors with git-support + + To have an editor that supports us when working with git version control we would suggest you to use + [Visual Studio Code](https://code.visualstudio.com/). Just download the suggested stable build and install it. + + Note that other specialized editors also support managing git repositories themselves. One notable + example would be OxygenXML. + + ## Save your username and password, so type it only once + + You also should make sure the `git-credential-libsecret` package is installed. This package exists + on fedora based distributions. On Ubuntu and debian you need to execute these commands in a terminal: + + ```bash + sudo apt-get install build-essential libsecret-1-0 libsecret-1-dev + cd /usr/share/doc/git/contrib/credential/libsecret + sudo make + sudo git config --system credential.helper /usr/share/doc/git/contrib/credential/libsecret/git-credential-libsecret + ``` + + ## Initialize a Git project in your terminal -First, we need to tell Git that it should start to manage a project directory and keep an eye on changes to documents there. On a **Windows** PC, run the program **git shell** **(Git Bash)**, on **Mac** or **Linux** use the terminal of your choice. Navigate to the folder which contains the data you want to version (usually done by typing `cd {folder-name}`, e.g. _cd Documents_), and afterwards type: +First, we need to tell Git that it should start to manage a project directory and keep an eye on changes to documents there. On a **Windows** PC, run the program **git shell** **(Git Bash)**, on **Mac** or **Linux** use the terminal of your choice. Navigate to the folder which contains the data you want to version (usually done by typing `cd {folder-name}`, e.g. *cd Documents*), and afterwards type: ```bash git init @@ -204,10 +224,10 @@ git init This will initially set up Git's internal bookkeeping metadata, which is stored in a hidden .git folder. Git will respond with:
-Initialize an empty Git repository on your local machine. + Initialize an empty Git repository on your local machine.
-Let's also create some new content. In a real research project this would mean editing or adding an XML/TEI document or similar. To demonstrate the mechanics we'll keep it basic here and use the terminal to create a simple text file. When you want to create a text file via the command line: The text in "", following _echo_, is the content of the text file which you specify with "filename.txt", txt being a text file extension. +Let's also create some new content. In a real research project this would mean editing or adding an XML/TEI document or similar. To demonstrate the mechanics we'll keep it basic here and use the terminal to create a simple text file. When you want to create a text file via the command line: The text in "", following *echo*, is the content of the text file which you specify with "filename.txt", txt being a text file extension. ```bash echo "This is my text document." > my-document.txt @@ -224,7 +244,7 @@ git status Note that Git informs us that changes have been made to a file called `my-document.txt`, but that file is currently "untracked", which means it is not currently managed by Git's versioning. Git also tells us that, in order to tell it to keep track of changes to that file, we should "use `git add` to track". Generally, if you find yourself in situations where you're unsure how to proceed, `git status` will most of the time show helpful hints. It's probably the Git command you'll be using most often.
-In the image you can see that a text file was created and that the status was checked. There are no commits yet, but some changes are already staged, others yet untracked. + In the image you can see that a text file was created and that the status was checked. There are no commits yet, but some changes are already staged, others yet untracked.
## Mark content changes to be included in the version history @@ -236,17 +256,17 @@ git add my-document.txt ``` -To add multiple files, it is possible to list them individually: + To add multiple files, it is possible to list them individually: -```bash -git add first-document.txt second-document.txt -``` + ```bash + git add first-document.txt second-document.txt + ``` -Or, to include _all_ changes to the project: + Or, to include *all* changes to the project: -```bash -git add -A -``` + ```bash + git add -A + ``` Note that `git add` does not automatically **commit** the changes, but places them in the **staging area**. @@ -266,9 +286,9 @@ git commit -m "Add test document" Every project snapshot we commit to history should include a semantically meaningful set of changes, and this may involve edits to different documents. `git add` allows to granularly choose which changes to which files should be part of the next snapshot, while `git commit` will label that set of changes, and actually save a new snapshot. -Commit messages are usually written in imperative language. For example it is customary to say “Change document title”, not “Changed document title”. Nevertheless, it's perfectly fine to agree on different project-specific conventions, just try to be consistent. + Commit messages are usually written in imperative language. For example it is customary to say “Change document title”, not “Changed document title”. Nevertheless, it's perfectly fine to agree on different project-specific conventions, just try to be consistent. -When in a hurry, it's tempting to write non-descriptive commit messages like “Added changes”. Try to come up with something that describes the change, you'll thank yourself later! + When in a hurry, it's tempting to write non-descriptive commit messages like “Added changes”. Try to come up with something that describes the change, you'll thank yourself later! ### View history of changes @@ -318,17 +338,17 @@ git diff 7f733ac ab9b27f ``` -To compare working directory to staging area: + To compare working directory to staging area: -```bash -git diff -``` + ```bash + git diff + ``` -To compare staging area to repository, i.e. last commit to next commit: + To compare staging area to repository, i.e. last commit to next commit: -```bash -git diff --staged -``` + ```bash + git diff --staged + ``` The format in which the changes are displayed can be a bit hard to read in the terminal, especially for larger changesets, so it's best to view them in a real text editor. @@ -347,7 +367,7 @@ Once you're done looking around, don't forget to return to the present! The easi git checkout main ``` -The `main` identifier is just a shortcut way to refer to the default timeline (it's actually the default _branch_ of the timeline, because there can be multiple parallel timelines 🤯. Branches were mentioned in the beginning and we'll have a look at these branches in the post about [keeping repositories in sync](/resources/git-collaboration#keeping-repositories-in-sync-fetch-pull-push). +The `main` identifier is just a shortcut way to refer to the default timeline (it's actually the default *branch* of the timeline, because there can be multiple parallel timelines 🤯. Branches were mentioned in the beginning and we'll have a look at these branches in the post about [keeping repositories in sync](/resources/git-collaboration#keeping-repositories-in-sync-fetch-pull-push). ### Undo changes @@ -359,7 +379,7 @@ There are three possible ways to do this. git revert ab9b27f ``` -Reverting a commit will keep that snapshot in the version history, and create a _new_ snapshot with the changes removed. This is useful when you want to keep a record of the initial changes, and the fact that they have been reverted. +Reverting a commit will keep that snapshot in the version history, and create a *new* snapshot with the changes removed. This is useful when you want to keep a record of the initial changes, and the fact that they have been reverted. ```bash git reset 7f733ac @@ -373,7 +393,7 @@ git reset --hard 7f733ac The most brute-force way to undo changes is with a "hard reset". Be aware that this will not only "rewind the clock" to a specific commit, but nuke any changes that have been made in the project since that point in time. Those changes will be lost. -Lastly, if you only want to quickly change the _message_ of the last commit, for example because you made a typo, you can: +Lastly, if you only want to quickly change the *message* of the last commit, for example because you made a typo, you can: ```bash git commit --amend @@ -383,46 +403,56 @@ git commit --amend - - Which commands move changes to the "waiting area"? - - - `git stage` - - - `git add` - - - `git commit` - - - Correct! - - - Try again. Officially, the "waiting area" is called "staging area". - - + + Which commands move changes to the "waiting area"? + + + + `git stage` + + + + `git add` + + + + `git commit` + + + + Correct! + + + + Try again. Officially, the "waiting area" is called "staging area". + + - - Select all correct statements. - - - Git commits are snapshots of the status of document or file at a specific point in time. - - - `git reset --hard "commit id"` will remove all changes that have been created in a project after the commit which you pass to the command. - - - You can add various changes and then bundle them into one meaningful commit. - - - Exactly, all of the above are correct. - - - Try again - - + + Select all correct statements. + + + + Git commits are snapshots of the status of document or file at a specific point in time. + + + + `git reset --hard "commit id"` will remove all changes that have been created in a project after the commit which you pass to the command. + + + + You can add various changes and then bundle them into one meaningful commit. + + + + Exactly, all of the above are correct. + + + + Try again + + ## Training task @@ -466,6 +496,6 @@ Run `git log` to see the identifier and the commit message. ### Cheatsheets -Git commands Github: https://education.github.com/git-cheat-sheet-education.pdf +Git commands Github: [https://education.github.com/git-cheat-sheet-education.pdf](https://education.github.com/git-cheat-sheet-education.pdf) -Git commands Gitlab: https://about.gitlab.com//assets/content/resources/git-version-control-via-command-line/press/git-cheat-sheet.pdf +Git commands Gitlab: [https://about.gitlab.com/images/press/git-cheat-sheet.pdf](https://about.gitlab.com/images/press/git-cheat-sheet.pdf) diff --git a/content/resources/voice-3-0-tutorial/index.mdx b/content/resources/voice-3-0-tutorial/index.mdx index 4d21c899c..a9d45e277 100644 --- a/content/resources/voice-3-0-tutorial/index.mdx +++ b/content/resources/voice-3-0-tutorial/index.mdx @@ -1,20 +1,27 @@ --- +title: Tutorial for VOICE 3.0 +summary: >- + This tutorial explains how to navigate in and use the new VOICE 3.0 Online + interface for the Vienna-Oxford International Corpus of English, developed by + the VOICE CLARIAH project team and released in September 2021. The tutorial + introduces the web interface, explains how to run search queries, apply + filters for the creation of sub-corpora and set bookmarks. In addition, it + provides short quizzes and links to short videos explaining the design and + functions of the VOICE 3.0 interface. +locale: en authors: - pitzl-marie-luise - riegler-stefanie - osimk-teasdale-ruth editors: - zhanial-susanne -license: cc-by-4-0 -locale: en publicationDate: 2022-07-18 -summary: This tutorial explains how to navigate in and use the new VOICE 3.0 Online interface for the Vienna-Oxford International Corpus of English, developed by the VOICE CLARIAH project team and released in September 2021. The tutorial introduces the web interface, explains how to run search queries, apply filters for the creation of sub-corpora and set bookmarks. In addition, it provides short quizzes and links to short videos explaining the design and functions of the VOICE 3.0 interface. +version: 2.0.0 tags: - linguistics - corpus -title: Tutorial for VOICE 3.0 +license: cc-by-4-0 toc: false -version: 2.0.0 --- ## What is VOICE 3.0 Online? @@ -27,7 +34,7 @@ More detailed information on the compilation and history of the corpus can be fo In addition, you can check out the recordings of the ACDH-CH Tool Gallery 8.1 on “[Spoken corpora and open access: Usability and Technology of VOICE 3.0 Online](https://www.youtube.com/playlist?list=PLN0wiGwlUlbem5euvpMLxpnkDljOZ6k_2)” from April 2022. These provide further information on the compilation of the VOICE corpus, the pros and cons of open access corpora and detailed information on the Open Access technologies, like the local NoSketch Engine set up to run queries, its technology stacks and software packages. ## Accessing VOICE 3.0 @@ -37,14 +44,16 @@ To access VOICE 3.0 Online, go to [https://voice3.acdh.oeaw.ac.at](https://voice Once you have selected your preferences for cookies, the content in the blue area of the **landing page** **changes**. It now gives you the option to explore VOICE, either by **typing in a search query** or by simply clicking "**Browse**" - this takes you to the actual VOICE 3.0 Online interface.
-VOICE 3.0 Web Interface + VOICE 3.0 Web Interface
The standard design of the VOICE 3.0 web interface is made up of three main areas: -- The area on the **left-hand side** contains the **corpus tree**. The first order of organization is domain. By activating the **SPET** (**SP**eech **E**vent **T**ype) shifter above the corpus tree, the second layer of organization according to speech event types is made available. By clicking on the small arrow next to a domain, a list with the speech events in this domain appears and displays the unique ID of each speech event. An audio symbol next to an ID indicates that a sound file is available for this event. When you click on a particular speech event, it will be opened on the right-hand side. -- The **middle area** contains the **search field** and will display the **search results** once a query has been run. -- On the **right-hand side**, users are initially greeted by a **welcome text**. As soon as you have started using the corpus, the **entire transcripts** or corpus information on the speech events you have selected, as well as **metadata** from TEI headers will be displayed here. Once a particular speech event has been opened, you **can switch between different styles** (VOICE, PLAIN, POS and XML). If a sound file is available, you can use the **audio player at the bottom**. Several speech events can be opened next to each other and can be navigated via separate tabs. +* The area on the **left-hand side** contains the **corpus tree**. The first order of organization is domain. By activating the **SPET** (**SP**eech **E**vent **T**ype) shifter above the corpus tree, the second layer of organization according to speech event types is made available. By clicking on the small arrow next to a domain, a list with the speech events in this domain appears and displays the unique ID of each speech event. An audio symbol next to an ID indicates that a sound file is available for this event. When you click on a particular speech event, it will be opened on the right-hand side. + +* The **middle area** contains the **search field** and will display the **search results** once a query has been run. + +* On the **right-hand side**, users are initially greeted by a **welcome text**. As soon as you have started using the corpus, the **entire transcripts** or corpus information on the speech events you have selected, as well as **metadata** from TEI headers will be displayed here. Once a particular speech event has been opened, you **can switch between different styles** (VOICE, PLAIN, POS and XML). If a sound file is available, you can use the **audio player at the bottom**. Several speech events can be opened next to each other and can be navigated via separate tabs. The **buttons above the right-hand area** are especially useful: the big button “**Corpus information**” gives you access to more extensive PDF manuals, like the search manual and the VOICE transcription conventions. The **two symbols** next to "Corpus information" allow you to **adjust the display settings**: with the left icon, you can **merge colons**, the right icon allows to adjust the display to a **narrow screen** (e.g. for a mobile phone). Clicking on the same icon again brings you back to the default view. @@ -53,7 +62,7 @@ The new VOICE 3.0 Online interface provides many integrated tool tips, pop-ups a The following clip offers an introductory tour of the VOICE 3.0 Online interface and explains its main areas and buttons. #### Quiz: It's your turn! @@ -62,49 +71,60 @@ Introduction to VOICE 3.0 Online Interface - - For a research paper, you are looking for interactions between ELF speakers of English in academic contexts. Which of the five domains available in VOICE 3.0 Online might contain such speech events? Tick the correct answer. - - - PB - - - PR and LE - - - PR and ED - - - ED and LE - - - Correct! In order to find appropriate events, look into the two domains PR (professional and research/science) and ED (educational). Tip: Hovering over the domain abbreviation with your mouse reveals a short definition of the domain. - - - Sorry! Have a look at the domain descriptions again, either by hovering over the domains with your mouse in the corpus tree on the left or by going to Corpus information and VOICE Header. - - + + For a research paper, you are looking for interactions between ELF speakers of English in academic contexts. Which of the five domains available in VOICE 3.0 Online might contain such speech events? Tick the correct answer. + + + + PB + + + + PR and LE + + + + PR and ED + + + + ED and LE + + + + Correct! In order to find appropriate events, look into the two domains PR (professional and research/science) and ED (educational). Tip: Hovering over the domain abbreviation with your mouse reveals a short definition of the domain. + + + + Sorry! Have a look at the domain descriptions again, either by hovering over the domains with your mouse in the corpus tree on the left or by going to Corpus information and VOICE Header. + + - - You are really satisfied with your search results and you want to cite the VOICE corpus in your research paper. Where can you find the biographical information in the VOICE 3.0 Online interface? Tick ALL options that lead to the desired result. - - - I go to "Corpus information" and click on "How to cite VOICE". - - - At the bottom of the website, in the middle, I can immediately find the short citation for VOICE 3.0 Online. The long version appears when hovering over the small "i" icon. - - - I go to the VOICE project homepage at voice.acdh.oeaw.ac.at and look for the necessary information there. - - - Well done! All answers are correct. In fact, there are various options to find the information for your bibliography. - - - Your chosen option was definitely correct, but what about the other options stated? - - + + You are really satisfied with your search results and you want to cite the VOICE corpus in your research paper. Where can you find the biographical information in the VOICE 3.0 Online interface? Tick ALL options that lead to the desired result. + + + + I go to "Corpus information" and click on "How to cite VOICE". + + + + At the bottom of the website, in the middle, I can immediately find the short citation for VOICE 3.0 Online. The long version appears when hovering over the small "i" icon. + + + + I go to the VOICE project homepage at voice.acdh.oeaw\.ac.at and look for the necessary information there. + + + + Well done! All answers are correct. In fact, there are various options to find the information for your bibliography. + + + + Your chosen option was definitely correct, but what about the other options stated? + + ## Searches in VOICE 3.0 @@ -115,19 +135,19 @@ Searches can be easily carried out with the help of the **search field** at the #### Token search (word form) -In order to search for a word or word form (i.e. token queries), enter the word using lower-case characters, e.g. _speak_. Please note that all queries are case-sensitive and tokens are searched for with lower case characters, e.g. _i speak french,_ as this is how they are represented in VOICE transcripts. You can, of course, search for phrases (i.e. token token) as well. +In order to search for a word or word form (i.e. token queries), enter the word using lower-case characters, e.g. *speak*. Please note that all queries are case-sensitive and tokens are searched for with lower case characters, e.g. *i speak french,* as this is how they are represented in VOICE transcripts. You can, of course, search for phrases (i.e. token token) as well. -If you want to search for contracted forms, like _wanna, gonna, don't,_ etc., you need to insert a space before the contracted part in your query in VOICE 3.0 Online, i.e.: _wan na, gon na, do n't, it 's_. +If you want to search for contracted forms, like *wanna, gonna, don't,* etc., you need to insert a space before the contracted part in your query in VOICE 3.0 Online, i.e.: *wan na, gon na, do n't, it 's*. #### Lemma search -A lemma is the basic form of a word, which represents all declensions and inflected forms of a word, e.g. _walk_ is the lemma of _walk, walks, walking_. To search for all tokens of a lemma, use the form “l:lemma”, e.g. _l:walk_. +A lemma is the basic form of a word, which represents all declensions and inflected forms of a word, e.g. *walk* is the lemma of *walk, walks, walking*. To search for all tokens of a lemma, use the form “l:lemma”, e.g. *l:walk*. #### POS search -POS, or **P**art-**o**f **S**peech annotations, allow searching for the morphosyntactic categories of tokens. Each token in VOICE has been annotated with an individual POS tag for morphological form, and, in parentheses, for syntactic function. Often, these are identical, as in _professional_JJ(JJ)_, but they may also diverge. +POS, or **P**art-**o**f **S**peech annotations, allow searching for the morphosyntactic categories of tokens. Each token in VOICE has been annotated with an individual POS tag for morphological form, and, in parentheses, for syntactic function. Often, these are identical, as in *professional\_JJ(JJ)*, but they may also diverge. -If a POS tag is searched for without further specification in VOICE 3.0 Online, both positions (i.e. form and function) are searched. If you want to search them separately, use _p:POS_ for form position or _f:POS_ for function position. +If a POS tag is searched for without further specification in VOICE 3.0 Online, both positions (i.e. form and function) are searched. If you want to search them separately, use *p:POS* for form position or *f:POS* for function position. For POS searches, enter the POS tag in capital letters. For further details, please go to the [POS tagging manual](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/04/POS-tagging-and-lemmatization-manual.pdf) and consult the [VOICE Tagset](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/04/Short-POS-tagset.pdf). @@ -135,17 +155,20 @@ For POS searches, enter the POS tag in capital letters. For further details, ple As an entirely novel feature of VOICE 3.0 Online, conversational mark-up can now be searched for and retrieved in different ways in the new interface. Users can: -- search for **tokenized mark-up**, such as **pauses** (e.g.: _1, \_2, etc.) or **laughter** (e.g.:_ @, @@). The numbers 1, 2, etc. or number of symbols indicate the length of pauses or laughter represented in the transcripts. -- or search for **POS tags** **indicating mark-up**, such as **PVC** (pronunciation variations & coinages), **ONO** (onomatopoeic noises), etc. -- In addition, VOICE 3.0 Online offers the **possibility to search for** words, POS and lemmas that occur within and between **stretches of conversational mark-up** in the corpus, such as stretches of s**peaking modes, non-English speech,** or **overlapping speech**, by using pointed brackets, e.g. `, , or
    .` -- Furthermore, to search within mark-up, the search-words “within” or “containing” can be used to search for tokens, POS or lemma. For example, the search phrase _really within `
      `_ gives you all results of the word really within overlapping speech. +* search for **tokenized mark-up**, such as **pauses** (e.g.: *1, \_2, etc.) or ****laughter**** (e.g.:* @, @@). The numbers 1, 2, etc. or number of symbols indicate the length of pauses or laughter represented in the transcripts. + +* or search for **POS tags** **indicating mark-up**, such as **PVC** (pronunciation variations & coinages), **ONO** (onomatopoeic noises), etc. + +* In addition, VOICE 3.0 Online offers the **possibility to search for** words, POS and lemmas that occur within and between **stretches of conversational mark-up** in the corpus, such as stretches of s**peaking modes, non-English speech,** or **overlapping speech**, by using pointed brackets, e.g. `, , or
        .` + +* Furthermore, to search within mark-up, the search-words “within” or “containing” can be used to search for tokens, POS or lemma. For example, the search phrase *really within **`
          `* gives you all results of the word really within overlapping speech. Detailed examples for mark-up searches are provided in section 6 of the VOICE 3.0 Online [search manual](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf). The VOICE tagging scheme has been said to be especially strong in displaying features typical of spoken language. This is possible because already during the early stages of the corpus compilation, it was decided to add additional tags to the [PENN Treebank tagset](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to represent spoken features in the corpus. The following illustration shows some examples of spoken features that have been tagged and thus allow powerful searches in VOICE:
          -VOICE POS tagging - spoken features + VOICE POS tagging - spoken features
          Detailed examples for mark-up searches are provided in section 6 of the VOICE 3.0 Online [search manual](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf). @@ -156,99 +179,123 @@ Detailed examples for mark-up searches are provided in section 6 of the VOICE 3. - - How many instances of the word _English_ does a simple search in VOICE 3.0 Online yield? - - - 14 - - - 1329 - - - 5031 - - - 406 - - - Well done! If you got 1329 results, then you have correctly entered the token English in lowercase letters. - - - Sorry! Try again and check how you need to search for a token in VOICE 3.0 Online. - - + + How many instances of the word *English* does a simple search in VOICE 3.0 Online yield? + + + + 14 + + + + 1329 + + + + 5031 + + + + 406 + + + + Well done! If you got 1329 results, then you have correctly entered the token English in lowercase letters. + + + + Sorry! Try again and check how you need to search for a token in VOICE 3.0 Online. + + - - Humour and laughter are important topics in ELF research. VOICE 3.0 Online allows you to search for laughter in VOICE data in various forms. Which of these queries gives you stretches of laughingly spoken sequences? - - - `_@+` - - - `<@/>` - - - `_@` - - - `Q` - - - Well done! By using pointed brackets, you can search for stretches of conversational mark-up, such as laughter. Laughter is represented by the symbol @ in VOICE. - - - The first answer option searches for strings of tokenized laughter of any lenght, while with the third answer option you search for a particular number of tokenized laughter. The fourth and final answer option is no valid search. - + + Humour and laughter are important topics in ELF research. VOICE 3.0 Online allows you to search for laughter in VOICE data in various forms. Which of these queries gives you stretches of laughingly spoken sequences? + + + + `_@+` + + + + `<@/>` + + + + `_@` + + + + `Q` + + + + Well done! By using pointed brackets, you can search for stretches of conversational mark-up, such as laughter. Laughter is represented by the symbol @ in VOICE. + + + + The first answer option searches for strings of tokenized laughter of any lenght, while with the third answer option you search for a particular number of tokenized laughter. The fourth and final answer option is no valid search. + - - You want to find out if word coinages are frequently uttered softly or whispered. Which of the options below will **not** yield meaning results for this search? Tip: consult the POS tagging and search manual! - - - PVC within `` - - - PVC `` - - - `` containing PVC - - - PVC within `` - - - Correct. This query searches for a string of PVC (pronunciation variation and coinage) followed by a softly spoken sequence. - - - Check your results and try out the other stated options in the VOICE 3.0 Online interface – which of them provides the correct answer to your search for softly or whispered word coinages? - + + You want to find out if word coinages are frequently uttered softly or whispered. Which of the options below will **not** yield meaning results for this search? Tip: consult the POS tagging and search manual! + + + + PVC within `` + + + + PVC `` + + + + `` containing PVC + + + + PVC within `` + + + + Correct. This query searches for a string of PVC (pronunciation variation and coinage) followed by a softly spoken sequence. + + + + Check your results and try out the other stated options in the VOICE 3.0 Online interface – which of them provides the correct answer to your search for softly or whispered word coinages? + - - You are interested in the use of code-switches into French, which is not the L1 of a speaker. How do you search for this? - - - `` - - - `` - - - `` - - - `` - - - Correct! - - - Have a look at the search manual in the Corpus Information and check your search query. - + + You are interested in the use of code-switches into French, which is not the L1 of a speaker. How do you search for this? + + + + `` + + + + `` + + + + `` + + + + `` + + + + Correct! + + + + Have a look at the search manual in the Corpus Information and check your search query. + @@ -258,33 +305,40 @@ Detailed examples for mark-up searches are provided in section 6 of the VOICE 3. In order to adjust the search results to the needs of your research question, the following **placeholders** might be useful: -- **. Full stop:** matches any single character. You can perceive this as a kind of universal joker. Example: Searching for _hi._ results in: _him, his, hit_, etc. -- **`[...]` Character class:** matches any character contained in the brackets, e.g. _`h[ai]t`_ – _hat, hit_ -- **`[^...]` Inverted character class:** matches any character not contained in the bracket, e.g. _`h[^ai]` – hot, hut_ -- **`?` Question mark:** the preceding element can appear 0 or 1 times, i.e. it is optional. Example: _houses?_ – _house, houses_ -- **`+` Plus:** the preceding element must appear 1 or more times, i.e. it is not optional and might be repeated. Example: _house.+_ results in _houses, household, housewives_, i.e. all words that start with house plus at least one more character. -- **`*` Asterisk:** particularly useful, since the preceding element can appear 0 or more times, i.e. it is optional and might be repeated. -- **`(...)` Brackets:** these can be used to group characters (and even regular expressions) to form new elements. In addition, we can combine them with the quantifiers `?`, `+`, and `*` and let them operate on specified groups. Example: _(wo)?man - man, woman_ +* **. Full stop:** matches any single character. You can perceive this as a kind of universal joker. Example: Searching for *hi.* results in: *him, his, hit*, etc. + +* **`[...]`**** Character class:** matches any character contained in the brackets, e.g. *`h[ai]t`* – *hat, hit* + +* **`[^...]`**** Inverted character class:** matches any character not contained in the bracket, e.g. *`h[^ai]`** – hot, hut* + +* **`?`**** Question mark:** the preceding element can appear 0 or 1 times, i.e. it is optional. Example: *houses?* – *house, houses* + +* **`+`**** Plus:** the preceding element must appear 1 or more times, i.e. it is not optional and might be repeated. Example: *house.+* results in *houses, household, housewives*, i.e. all words that start with house plus at least one more character. + +* **`*`**** Asterisk:** particularly useful, since the preceding element can appear 0 or more times, i.e. it is optional and might be repeated. + +* **`(...)`**** Brackets:** these can be used to group characters (and even regular expressions) to form new elements. In addition, we can combine them with the quantifiers `?`, `+`, and `*` and let them operate on specified groups. Example: *(wo)?man - man, woman* -You might be familiar with the usage of wildcards, i.e. plain ?, +, or * from other tools. In wildcard syntax, the asterisk, for example denotes “zero or more characters”. **VOICE 3.0 Online**, however, **uses regex** (regular expressions). Here, the symbols work as quantifiers which operate on the preceding element. In regex, the use of the asterisk in _hous*_ will only quantify the final “e” and thus will only match _hous, house, housee, houseeee_, etc. + You might be familiar with the usage of wildcards, i.e. plain ?, +, or \* from other tools. In wildcard syntax, the asterisk, for example denotes “zero or more characters”. **VOICE 3.0 Online**, however, **uses regex** (regular expressions). Here, the symbols work as quantifiers which operate on the preceding element. In regex, the use of the asterisk in *hous\** will only quantify the final “e” and thus will only match *hous, house, housee, houseeee*, etc. -If you want to use the **symbols as wildcards**, you have to place the **placeholder character** “.” (full stop) in front of them. Thus, searching for _house.+_ will result in _house, houses, household, housewives_, etc., or _.+ize_ in _organize, apologize, harmonize,_ etc. + If you want to use the **symbols as wildcards**, you have to place the **placeholder character** “.” (full stop) in front of them. Thus, searching for *house.+* will result in *house, houses, household, housewives*, etc., or *.+ize* in *organize, apologize, harmonize,* etc. -To gain even more precise control over the number of allowed and necessary character repetitions, you can use curly brackets with _min,max_. Leaving the max empty means there is no upper limit (see section 3.2 of the [search manual](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf)). +To gain even more precise control over the number of allowed and necessary character repetitions, you can use curly brackets with *min,max*. Leaving the max empty means there is no upper limit (see section 3.2 of the [search manual](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf)). #### Boolean Operators in VOICE 3.0 -- **AND** is represented by a **comma**. Note that there is no space between the conditions. Thus, entering condition1,condition2 will yield results which matches both conditions. Any sequence of items before and after the comma is possible, e.g.: _walk,NN_ - finds tokens of _walk_ tagged as noun, as in "a five minute walk". -- **OR** is represented by a **vertical line |**. It finds any options to the left or the right of the vertical line. It can be used for any sequence of tokens, lemmas or POS tags before and after the line, and more than two options can be specified. For example: _mean | say_ that - finds: _mean that, say that_. +* **AND** is represented by a **comma**. Note that there is no space between the conditions. Thus, entering condition1,condition2 will yield results which matches both conditions. Any sequence of items before and after the comma is possible, e.g.: *walk,NN* - finds tokens of *walk* tagged as noun, as in "a five minute walk". + +* **OR** is represented by a **vertical line |**. It finds any options to the left or the right of the vertical line. It can be used for any sequence of tokens, lemmas or POS tags before and after the line, and more than two options can be specified. For example: *mean | say* that - finds: *mean that, say that*. More details on searches with wildcards and placeholders can be found in the [search manual](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf). In the following video clip, VOICE project member Ruth Osimk-Teasdale demonstrates the combinations of tokens, lemma, and POS tags in a number of searches, and shows how to display your search results in the different style options (VOICE, plain, POS, XML, and KWIC) available in VOICE 3.0. #### **Quiz - It's your turn!** @@ -293,51 +347,63 @@ VOICE 3.0 Online-Searches - - You are interested in vague expressions and looking for the classifier _some sort of_ followed by a singular noun with an **asterisk wildcard** (to include also plural noun tags). Which domain yields **no** instances of this construction? Tip: Consult the POS tagset in the Corpus information if necessary. - - - LE - - - ED - - - PB - - - PO - - - Correct! - - - For this query, you should type in `some sort of NN*`. With the tag NN plus the asterisk, you search for single nouns and plural nouns. - + + You are interested in vague expressions and looking for the classifier *some sort of* followed by a singular noun with an **asterisk wildcard** (to include also plural noun tags). Which domain yields **no** instances of this construction? Tip: Consult the POS tagset in the Corpus information if necessary. + + + + LE + + + + ED + + + + PB + + + + PO + + + + Correct! + + + + For this query, you should type in `some sort of NN*`. With the tag NN plus the asterisk, you search for single nouns and plural nouns. + - - - You want to find out how state verbs are used in VOICE. Search for all verb forms of _taste_. What does your query look like? - - - `tast.*,VVP` - - - `taste,VV` - - - `tast._,V._` - - - `taste` - - - Correct - this yields 24 hits in 23 utterances - - - Consult the POS tagset in the Corpus information and check! (Explanation: VVP searches for present tense verb taste, the second answer option searches for taste as base form and the fourth option searches for the token "taste", including taste as noun, but not the verb forms tasting, tasted, etc.). - + + + You want to find out how state verbs are used in VOICE. Search for all verb forms of *taste*. What does your query look like? + + + + `tast.*,VVP` + + + + `taste,VV` + + + + `tast.*,V.*` + + + + `taste` + + + + Correct - this yields 24 hits in 23 utterances + + + + Consult the POS tagset in the Corpus information and check! (Explanation: VVP searches for present tense verb taste, the second answer option searches for taste as base form and the fourth option searches for the token "taste", including taste as noun, but not the verb forms tasting, tasted, etc.). + @@ -348,7 +414,7 @@ In VOICE 3.0 Online, users have the possibility to create their own sub-corpora In the following short clip, you will learn how to apply filters and bookmarks. In order to create your own corpus, first of all navigate to the tab "**Filter**" in the left-hand area and turn on the filter options. You can then narrow down the corpus by applying criteria such as number of speakers or interactants, power relations, duration of speech events, or L1 language. After you have set your desired filters, navigate back to the corpus tree. It will now highlight in **bold** those speech events to which your filters apply. If you like, you can hide all other speech events by using the respective toggle above the tree. @@ -356,17 +422,17 @@ In order to create your own corpus, first of all navigate to the tab "**Filter** Once you have set your filters, you can use the search field in the middle area and search your subcorpus. All speech events from your corpus which yield results for your search will then be highlighted in **bold** in the corpus tree. In addition, events which would also yield results for your query but are not part of your subcorpus will be marked in **grey and bold**, as can be seen in the following illustration:
          -Search results for corpus and subcorpus, with opened transcripts on the right-hand side + Search results for corpus and subcorpus, with opened transcripts on the right-hand side
          If you would like to add particular speech events to your subcorpus, tick the box next to the speech event. **Please note** that in order to do so, the function "**Manual selection**", which can be found in the "**Filter**" tab, has to be turned on. -The function "**Manual selection**" in the filter area gives you the option to manually choose speech events for your subcorpus. In order to do so, you must turn on the function and go to the corpus tree. Speech events can be added by ticking the respective box. This function can be used before you run a search or after you have run a search. + The function "**Manual selection**" in the filter area gives you the option to manually choose speech events for your subcorpus. In order to do so, you must turn on the function and go to the corpus tree. Speech events can be added by ticking the respective box. This function can be used before you run a search or after you have run a search. -
          -Activating manual selection -
          +
          + Activating manual selection +
          #### Setting bookmarks @@ -374,7 +440,7 @@ Activating manual selection Bookmarks can be easily set with the help of the third tab in the left-hand area. First of all, activate icons and local storage. Once you have done so, small icons appear next to the search results in the middle area. You now have the possibility to select a search result (i.e. a particular utterance) and create a bookmark for it. You can add a short description, and then save the bookmark (as URL, .txt or .xlsx). Saved bookmarks will appear on the left if you click on the tab "**Bookmarks**".
          -Working with bookmarks + Working with bookmarks
          #### Download Function @@ -389,76 +455,94 @@ After you have carried out a search, the download button (i.e. arrow) can be fou - - Let's start with an easy task: You are looking for transcripts from dyadic interactions between unacquainted speakers. How many of these can you find in VOICE 3.0 Online? - - - 14 - - - 12 - - - 10 - - - 8 - - - Correct! - - - If your answer was not 12 but 14, then you used the filter for interactants instead of speakers. The difference might be small, but important: Interactants also include researchers or persons not involved in the conversation, like for example a waiter who brings coffee to a table of speakers. The small "i" icon next to the filters provides more information on the filter categories when hovering over it. 10 and 8 are wrong. - - - - - - Due to the limited scope of research projects in university courses, you would like to identify the shortest speech event for which an audio file is available in the corpus. Which one is the correct speech event? - - - `PRqas409` - - - `EDsve422` - - - `PBqas411` - - - `LEcon420` - - - Correct, the file's duration is 10 minutes, 53 seconds. - - - Turn on the filter for audio files so that only speech events with an audio file are shown; furthermore, there are two filter categories which will help you narrow down the events, namely "duration of speech event" and "number of words". They can either be used on their own or in combination. - - - - - - You want to bookmark and save the correct answer from Question 2, PRqas409, for later. Bookmark the event and download it as .xlsx file. Which information is NOT provided in the file? - - - Category for bookmark - - - Date and time of export - - - Utterance ID - - - the bookmarked utterance - - - Correct! - - - Use the explanation above to set a bookmark and then download it as .xlsx file. Then download it and check! - - + + Let's start with an easy task: You are looking for transcripts from dyadic interactions between unacquainted speakers. How many of these can you find in VOICE 3.0 Online? + + + + 14 + + + + 12 + + + + 10 + + + + 8 + + + + Correct! + + + + If your answer was not 12 but 14, then you used the filter for interactants instead of speakers. The difference might be small, but important: Interactants also include researchers or persons not involved in the conversation, like for example a waiter who brings coffee to a table of speakers. The small "i" icon next to the filters provides more information on the filter categories when hovering over it. 10 and 8 are wrong. + + + + + + Due to the limited scope of research projects in university courses, you would like to identify the shortest speech event for which an audio file is available in the corpus. Which one is the correct speech event? + + + + `PRqas409` + + + + `EDsve422` + + + + `PBqas411` + + + + `LEcon420` + + + + Correct, the file's duration is 10 minutes, 53 seconds. + + + + Turn on the filter for audio files so that only speech events with an audio file are shown; furthermore, there are two filter categories which will help you narrow down the events, namely "duration of speech event" and "number of words". They can either be used on their own or in combination. + + + + + + You bookmark an utterance from PRqas409 indicating the category and including a comment for later. Then export your bookmark(s). Which information is NOT provided in the file? + + + + Number of bookmarks + + + + Date and time of export + + + + Utterance ID + + + + Speaker information + + + + Correct! + + + + Use the explanation above to set a bookmark and then download it as .xlsx file. Then download it and check! + + ### Conclusion - Advanced Searches @@ -473,81 +557,102 @@ Tip: Before you start the quiz, it might be helpful to download the search manua - - To resources from which language do the speakers in the two business meetings including L1 speakers of Serbian draw on that is neither English nor any of the speakers' L1? - - - Spanish - - - Italian - - - French - - - Croatian - - - Correct! - - - Have a close look at the question and your filters: first of all, select the correct domain PB, then narrow down your search to mtg = meeting. Select _Serbian_ as L1. Now, you need to type in the correct search query: ``. Finally, compare `` tags with **Speaker Information** in text header. - - - - - - How many of the speech events in your subcorpus of transcripts from working group discussions in an educational setting feature non-English tokens? - - - 1 - - - 2 - - - 4 - - - 5 - - - Correct - select ED in domain, select wdg, type FW in search field, navigate to the tree and check which events are highlighted in bold. - - - Check your filters. You should have selected ED in domain and wdg for SPET. Then type FW in the search field and navigate to the tree. Check the events highlighted in bold. Alternatively, you can search for ``. - - - - - - How many response particles are produced by speakers while someone else is speaking in leisure conversations in VOICE? - - - 20652 - - - 2080 - - - 2179 - - - 1808 - - - Correct! - - - If your results was 20562, you searched without setting filters; if you had 2179, then you have only set the filter for leisure, but not for conversation. If you got 1808 results, then you searched for `
            ` containing, which counts the number of overlaps but not the number of response particles. - - + + To resources from which language do the speakers in the two business meetings including L1 speakers of Serbian refer that is neither English nor any of the speakers' L1? + + + + Spanish + + + + Italian + + + + French + + + + Croatian + + + + Correct! + + + + Have a close look at the question and your filters: first of all, select the correct domain PB, then narrow down your search to mtg = meeting. Select *Serbian* as L1. Now, you need to type in the correct search query: ``. Finally, compare `` tags with **Speaker Information** in text header. + + + + + + How many of the speech events in your subcorpus of transcripts from working group discussions in an educational setting feature non-English tokens? + + + + 1 + + + + 2 + + + + 4 + + + + 5 + + + + Correct - select ED in domain, select wdg, type FW in search field, navigate to the tree and check which events are highlighted in bold. + + + + Check your filters. You should have selected ED in domain and wdg for SPET. Then type FW in the search field and navigate to the tree. Check the events highlighted in bold. Alternatively, you can search for ``. + + + + + + How many response particles are produced by speakers while someone else is speaking in leisure conversations in VOICE? + + + + 20652 + + + + 2080 + + + + 2179 + + + + 1808 + + + + Correct! + + + + If your results was 20562, you searched without setting filters; if you had 2179, then you have only set the filter for leisure, but not for conversation. If you got 1808 results, then you searched for `
              ` containing, which counts the number of overlaps but not the number of response particles. + + ### Links: -- Osimk-Teasdale, Ruth; Pirker, Hannes; Pitzl, Marie-Luise. 2021. [Search manual for VOICE 3.0 Online. https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf>). (14 March 2022). -- Pitzl, Marie-Luise. [VOICE: Vienna-Oxford-International Corpus of English](https://voice.acdh.oeaw.ac.at/). Homepage. [https:/voice.acdh.oeaw.ac.at](https://voice.acdh.oeaw.ac.at/). (14 March 2022). -- [VOICE](https://voice3.acdh.oeaw.ac.at/). 2021. The Vienna-Oxford International Corpus of English (version VOICE 3.0 Online). Founding director: Barbara Seidlhofer; Principal investigators VOICE 3.0: Marie-Luise Pitzl, Daniel Schopper; Researchers: Angelika Breiteneder, Hans-Christian Breuer, Nora Dorn, Theresa Klimpfinger, Stefan Majewski, Ruth Osimk-Teasdale, Hannes Pirker, Marie-Luise Pitzl, Michael Radeka, Stefanie Riegler, Barbara Seidlhofer, Omar Siam, Daniel Stoxreiter. [https://voice3.acdh.oeaw.ac.at](https://voice3.acdh.oeaw.ac.at) (14 March 2022). -- Pitzl, Marie-Luise, Ruth Osimk-Teasdale, Stefanie Riegler, Hannes Pirker, Omar Siam. _ACDH-CH Tool Gallery 8.1.: Spoken Corpus Linguistics and Open Access: Usability and Technology of VOICE 3.0 Online_. Youtube. April 2022. [https://www.youtube.com/playlist?list=PLN0wiGwlUlbem5euvpMLxpnkDljOZ6k_2](https://www.youtube.com/playlist?list=PLN0wiGwlUlbem5euvpMLxpnkDljOZ6k_2). +* Osimk-Teasdale, Ruth; Pirker, Hannes; Pitzl, Marie-Luise. 2021. [Search manual for VOICE 3.0 Online. https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf](https://voice.acdh.oeaw.ac.at/wp-content/uploads/2021/09/Search-manual-VOICE-3.0-Online.pdf>). (14 March 2022). + +* Pitzl, Marie-Luise. [VOICE: Vienna-Oxford-International Corpus of English](https://voice.acdh.oeaw.ac.at/). Homepage. [https:/voice.acdh.oeaw.ac.at](https://voice.acdh.oeaw.ac.at/). (14 March 2022). + +* [VOICE](https://voice3.acdh.oeaw.ac.at/). 2021. The Vienna-Oxford International Corpus of English (version VOICE 3.0 Online). Founding director: Barbara Seidlhofer; Principal investigators VOICE 3.0: Marie-Luise Pitzl, Daniel Schopper; Researchers: Angelika Breiteneder, Hans-Christian Breuer, Nora Dorn, Theresa Klimpfinger, Stefan Majewski, Ruth Osimk-Teasdale, Hannes Pirker, Marie-Luise Pitzl, Michael Radeka, Stefanie Riegler, Barbara Seidlhofer, Omar Siam, Daniel Stoxreiter. [https://voice3.acdh.oeaw.ac.at](https://voice3.acdh.oeaw.ac.at) (14 March 2022). + +* Pitzl, Marie-Luise, Ruth Osimk-Teasdale, Stefanie Riegler, Hannes Pirker, Omar Siam. *ACDH-CH Tool Gallery 8.1.: Spoken Corpus Linguistics and Open Access: Usability and Technology of VOICE 3.0 Online*. Youtube. April 2022. [https://www.youtube.com/playlist?list=PLN0wiGwlUlbem5euvpMLxpnkDljOZ6k\_2](https://www.youtube.com/playlist?list=PLN0wiGwlUlbem5euvpMLxpnkDljOZ6k_2).