diff --git a/docs/docs/.vuepress/theme/styles/fonts.styl b/docs/docs/.vuepress/theme/styles/fonts.styl index 4f5fcb80e..cce9bb4ef 100644 --- a/docs/docs/.vuepress/theme/styles/fonts.styl +++ b/docs/docs/.vuepress/theme/styles/fonts.styl @@ -2,18 +2,18 @@ font-family: 'Basis Grotesque Pro' font-style: normal font-weight: normal - src: local('Basis Grotesque Pro'), url('/biome-text/master/assets/fonts/BasisGrotesquePro-Regular.woff') format('woff') + src: local('Basis Grotesque Pro'), url('/biome-text/v3.3.0/assets/fonts/BasisGrotesquePro-Regular.woff') format('woff') @font-face font-family: 'Basis Grotesque Pro Bold' font-style: normal font-weight: normal - src: local('Basis Grotesque Pro Bold'), url('/biome-text/master/assets/fonts/BasisGrotesquePro-Bold.woff') format('woff') + src: local('Basis Grotesque Pro Bold'), url('/biome-text/v3.3.0/assets/fonts/BasisGrotesquePro-Bold.woff') format('woff') @font-face font-family: 'Basis Grotesque Pro Light' font-style: normal font-weight: normal - src: local('Basis Grotesque Pro Light'), url('/biome-text/master/assets/fonts/BasisGrotesquePro-Light.woff') format('woff') + src: local('Basis Grotesque Pro Light'), url('/biome-text/v3.3.0/assets/fonts/BasisGrotesquePro-Light.woff') format('woff') diff --git a/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb b/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb index ac42c0f18..600e4aa2f 100644 --- a/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb +++ b/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb @@ -11,14 +11,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "[View on recogn.ai](https://https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html)\n", + "\n", + "[View on recogn.ai](https://https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html)\n", "\n", - "\n", - "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb)\n", + "\n", + "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb)\n", "\n", - "\n", - "[View source on GitHub](https://github.com/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb)" + "\n", + "[View source on GitHub](https://github.com/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/1-Training_a_text_classifier.ipynb)" ] }, { @@ -35,7 +35,7 @@ "outputs": [], "source": [ "!pip install -U pip\n", - "!pip install -U git+https://github.com/recognai/biome-text.git\n", + "!pip install -U biome-text\n", "exit(0) # Force restart of the runtime" ] }, @@ -91,7 +91,7 @@ "source": [ "## Explore the training data\n", "\n", - "Let's take a look at the data we will use for training. For this we will use the [`Dataset`](https://recognai.github.io/biome-text/master/api/biome/text/dataset.html#dataset) class that is a very thin wrapper around HuggingFace's awesome [datasets.Dataset](https://huggingface.co/docs/datasets/master/package_reference/main_classes.html#datasets.Dataset).\n", + "Let's take a look at the data we will use for training. For this we will use the [`Dataset`](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/dataset.html#dataset) class that is a very thin wrapper around HuggingFace's awesome [datasets.Dataset](https://huggingface.co/docs/datasets/master/package_reference/main_classes.html#datasets.Dataset).\n", "We will download the data first to create `Dataset` instances.\n", "\n", "Apart from the training data we will also download an optional validation data set to estimate the generalization error." @@ -157,7 +157,7 @@ "source": [ "::: tip Tip\n", "\n", - "The [TaskHead](https://recognai.github.io/biome-text/master/api/biome/text/modules/heads/task_head.html#taskhead) of our model below will expect a *text* and a *label* column to be present in the `Dataset`. In our data set this is already the case, otherwise we would need to change or map the corresponding column names via `Dataset.rename_column_()` or `Dataset.map()`.\n", + "The [TaskHead](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/modules/heads/task_head.html#taskhead) of our model below will expect a *text* and a *label* column to be present in the `Dataset`. In our data set this is already the case, otherwise we would need to change or map the corresponding column names via `Dataset.rename_column_()` or `Dataset.map()`.\n", "\n", ":::" ] @@ -196,12 +196,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "A typical [Pipeline](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#pipeline) consists of tokenizing the input, extracting features, applying a language encoding (optionally) and executing a task-specific head in the end.\n", + "A typical [Pipeline](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#pipeline) consists of tokenizing the input, extracting features, applying a language encoding (optionally) and executing a task-specific head in the end.\n", "\n", "After training a pipeline, you can use it to make predictions.\n", "\n", "As a first step we must define a configuration for our pipeline. \n", - "In this tutorial we will create a configuration dictionary and use the `Pipeline.from_config()` method to create our pipeline, but there are [other ways](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#pipeline).\n", + "In this tutorial we will create a configuration dictionary and use the `Pipeline.from_config()` method to create our pipeline, but there are [other ways](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#pipeline).\n", "\n", "A *biome.text* pipeline has the following main components:\n", "\n", @@ -218,7 +218,7 @@ "\n", "```\n", "\n", - "See the [Configuration section](https://recognai.github.io/biome-text/master/documentation/user-guides/2-configuration.html) for a detailed description of how these main components can be configured.\n", + "See the [Configuration section](https://recognai.github.io/biome-text/v3.3.0/documentation/user-guides/2-configuration.html) for a detailed description of how these main components can be configured.\n", "\n", "Our complete configuration for this tutorial will be following:" ] @@ -297,9 +297,9 @@ "The default behavior of *biome.text* is to add all tokens from the training data set to the pipeline's vocabulary. \n", "This is done automatically when training the pipeline for the first time.\n", "\n", - "If you want to have more control over this step, you can define a `VocabularyConfiguration` and pass it to the [`Trainer`](https://recognai.github.io/biome-text/master/api/biome/text/trainer.html) later on.\n", + "If you want to have more control over this step, you can define a `VocabularyConfiguration` and pass it to the [`Trainer`](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/trainer.html) later on.\n", "In our business name classifier we only want to include words with a general meaning to our word feature vocabulary (like \"Computer\" or \"Autohaus\", for example), and want to exclude specific names that will not help to generally classify the kind of business.\n", - "This can be achieved by including only the most frequent words in our training set via the `min_count` argument. For a complete list of available arguments see the [VocabularyConfiguration API](https://recognai.github.io/biome-text/master/api/biome/text/configuration.html#vocabularyconfiguration)." + "This can be achieved by including only the most frequent words in our training set via the `min_count` argument. For a complete list of available arguments see the [VocabularyConfiguration API](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/configuration.html#vocabularyconfiguration)." ] }, { @@ -317,12 +317,12 @@ "source": [ "## Configure the trainer\n", "\n", - "As a next step we have to configure the [`Trainer`](https://recognai.github.io/biome-text/master/api/biome/text/trainer.html), which in essentially is a light wrapper around the amazing [Pytorch Lightning Trainer](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html).\n", + "As a next step we have to configure the [`Trainer`](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/trainer.html), which in essentially is a light wrapper around the amazing [Pytorch Lightning Trainer](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html).\n", "\n", "The default trainer has sensible defaults and should work alright for most of your cases.\n", "In this tutorial, however, we want to tune a bit the learning rate and limit the training time to three epochs only.\n", "We also want to modify the monitored validation metric (by default it is the `validation_loss`) that is used to rank the checkpoints, as well as for the early stopping mechanism and to load the best model weights at the end of the training.\n", - "For a complete list of available arguments see the [TrainerConfiguration API](https://recognai.github.io/biome-text/master/api/biome/text/configuration.html#trainerconfiguration).\n", + "For a complete list of available arguments see the [TrainerConfiguration API](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/configuration.html#trainerconfiguration).\n", "\n", "::: tip Tip\n", "\n", diff --git a/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb b/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb index c28fd6658..16afd7e48 100644 --- a/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb +++ b/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb @@ -11,14 +11,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "[View on recogn.ai](https://recognai.github.io/biome-text/master/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.html)\n", + "\n", + "[View on recogn.ai](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.html)\n", " \n", - "\n", - "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb)\n", + "\n", + "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb)\n", " \n", - "\n", - "[View source on GitHub](https://github.com/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb)" + "\n", + "[View source on GitHub](https://github.com/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/2-Training_a_sequence_tagger_for_Slot_Filling.ipynb)" ] }, { @@ -35,7 +35,7 @@ "outputs": [], "source": [ "!pip install -U pip\n", - "!pip install -U git+https://github.com/recognai/biome-text.git\n", + "!pip install -U biome-text\n", "exit(0) # Force restart of the runtime" ] }, @@ -135,7 +135,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The [Dataset](https://recognai.github.io/biome-text/master/api/biome/text/dataset.html#dataset) class is a very thin wrapper around HuggingFace's awesome [datasets.Dataset](https://huggingface.co/docs/datasets/master/package_reference/main_classes.html#datasets.Dataset).\n", + "The [Dataset](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/dataset.html#dataset) class is a very thin wrapper around HuggingFace's awesome [datasets.Dataset](https://huggingface.co/docs/datasets/master/package_reference/main_classes.html#datasets.Dataset).\n", "Most of HuggingFace's `Dataset` API is exposed and you can checkout their nice [documentation](https://huggingface.co/docs/datasets/master/processing.html) on how to work with data in a `Dataset`. For example, let's quickly check the size of our training data and print the first 10 examples as a pandas DataFrame:" ] }, @@ -178,7 +178,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Since the the [TaskHead](https://recognai.github.io/biome-text/master/api/biome/text/modules/heads/task_head.html#taskhead) of our model (the [TokenClassification](https://recognai.github.io/biome-text/master/api/biome/text/modules/heads/token_classification.html#tokenclassification) head) expects a *text* and a *tags* column to be present in the Dataset, we need to rename the *labels* column:" + "Since the the [TaskHead](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/modules/heads/task_head.html#taskhead) of our model (the [TokenClassification](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/modules/heads/token_classification.html#tokenclassification) head) expects a *text* and a *tags* column to be present in the Dataset, we need to rename the *labels* column:" ] }, { @@ -197,7 +197,7 @@ "source": [ "::: tip Tip\n", "\n", - "The [TokenClassification](https://recognai.github.io/biome-text/master/api/biome/text/modules/heads/token_classification.html#tokenclassification) head also supports a *entities* column instead of a *tags* column, in which case the entities have to be a list of python dictionaries with a `start`, `end` and `label` key that correspond to the char indexes and the label of the entity, respectively.\n", + "The [TokenClassification](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/modules/heads/token_classification.html#tokenclassification) head also supports a *entities* column instead of a *tags* column, in which case the entities have to be a list of python dictionaries with a `start`, `end` and `label` key that correspond to the char indexes and the label of the entity, respectively.\n", "\n", ":::" ] @@ -213,7 +213,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "A typical [Pipeline](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#pipeline) consists of tokenizing the input, extracting features, applying a language encoding (optionally) and executing a task-specific head in the end.\n", + "A typical [Pipeline](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#pipeline) consists of tokenizing the input, extracting features, applying a language encoding (optionally) and executing a task-specific head in the end.\n", "After training a pipeline, you can use it to make predictions\n", "\n", "A *biome.text* pipeline has the following main components:\n", @@ -231,12 +231,12 @@ "\n", "```\n", "\n", - "See the [Configuration section](https://recognai.github.io/biome-text/master/documentation/user-guides/2-configuration.html) for a detailed description of how these main components can be configured.\n", + "See the [Configuration section](https://recognai.github.io/biome-text/v3.3.0/documentation/user-guides/2-configuration.html) for a detailed description of how these main components can be configured.\n", "\n", - "In this tutorial we will create a [PipelineConfiguration](https://recognai.github.io/biome-text/master/api/biome/text/configuration.html#pipelineconfiguration) programmatically, and use it to initialize the [Pipeline](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#pipeline).\n", - "You can also create your pipelines by providing a [python dictionary](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#from-config) (see the text classification [tutorial](https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html)), a YAML [configuration file](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#from-yaml) or a [pretrained model](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#from-pretrained).\n", + "In this tutorial we will create a [PipelineConfiguration](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/configuration.html#pipelineconfiguration) programmatically, and use it to initialize the [Pipeline](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#pipeline).\n", + "You can also create your pipelines by providing a [python dictionary](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#from-config) (see the text classification [tutorial](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html)), a YAML [configuration file](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#from-yaml) or a [pretrained model](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#from-pretrained).\n", "\n", - "A pipeline configuration is composed of several other [configuration classes](https://recognai.github.io/biome-text/master/api/biome/text/configuration.html#biome-text-configuration), each one corresponding to one of the main components." + "A pipeline configuration is composed of several other [configuration classes](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/configuration.html#biome-text-configuration), each one corresponding to one of the main components." ] }, { @@ -329,16 +329,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The final configuration belongs to our [TaskHead](https://recognai.github.io/biome-text/master/api/biome/text/modules/heads/task_head.html#taskhead).\n", + "The final configuration belongs to our [TaskHead](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/modules/heads/task_head.html#taskhead).\n", "It reflects the task our problem belongs to and can be easily exchanged with other types of heads keeping the same features and encoder.\n", "\n", "::: tip Tip\n", "\n", - "Exchanging the heads you can easily pretrain a model on a certain task, such as [language modelling](https://recognai.github.io/biome-text/master/api/biome/text/modules/heads/language_modelling.html#languagemodelling), and use its pretrained features and encoder for training the model on another task.\n", + "Exchanging the heads you can easily pretrain a model on a certain task, such as [language modelling](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/modules/heads/language_modelling.html#languagemodelling), and use its pretrained features and encoder for training the model on another task.\n", "\n", ":::\n", "\n", - "For our task we will use a [TokenClassification](https://recognai.github.io/biome-text/master/api/biome/text/modules/heads/token_classification.html#tokenclassification) head that allows us to tag each token individually:" + "For our task we will use a [TokenClassification](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/modules/heads/token_classification.html#tokenclassification) head that allows us to tag each token individually:" ] }, { @@ -371,7 +371,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now we can create a [PipelineConfiguration](https://recognai.github.io/biome-text/master/api/biome/text/configuration.html#pipelineconfiguration) and finally initialize our [Pipeline](https://recognai.github.io/biome-text/master/api/biome/text/pipeline.html#pipeline)." + "Now we can create a [PipelineConfiguration](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/configuration.html#pipelineconfiguration) and finally initialize our [Pipeline](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/pipeline.html#pipeline)." ] }, { @@ -413,7 +413,7 @@ "\n", "For this tutorial we get rid of the rarest words by adding the `min_count` argument and set it to 2 for the word feature vocabulary.\n", "Since we use pretrained word embeddings we will not only consider the training data, but also the validation data when creating the vocabulary by setting `include_valid_data=True`. \n", - "For a complete list of available arguments see the [VocabularyConfiguration API](https://recognai.github.io/biome-text/master/api/biome/text/configuration.html#vocabularyconfiguration)." + "For a complete list of available arguments see the [VocabularyConfiguration API](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/configuration.html#vocabularyconfiguration)." ] }, { @@ -440,13 +440,13 @@ "- training data set\n", "- pipeline\n", "\n", - "We will use the default configuration for our [`Trainer`](https://recognai.github.io/biome-text/master/api/biome/text/trainer.html) that has sensible values and works alright for our experiment.\n", - "[This tutorial](https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html) provides more information about the `Trainer` and gives you an example how to configure it.\n", + "We will use the default configuration for our [`Trainer`](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/trainer.html) that has sensible values and works alright for our experiment.\n", + "[This tutorial](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html) provides more information about the `Trainer` and gives you an example how to configure it.\n", "\n", "::: tip Tip\n", "\n", "If you want to configure the trainer you can pass on a `TrainerConfiguration` instance to the `Trainer`s init. \n", - "See the [TrainerConfiguration API](https://recognai.github.io/biome-text/master/api/biome/text/configuration.html#trainerconfiguration) for a complete list of available configurations.\n", + "See the [TrainerConfiguration API](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/configuration.html#trainerconfiguration) for a complete list of available configurations.\n", "\n", ":::" ] diff --git a/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb b/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb index e4be32760..f77a96228 100644 --- a/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb +++ b/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb @@ -11,14 +11,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "[View on recogn.ai](https://recognai.github.io/biome-text/master/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.html)\n", + "\n", + "[View on recogn.ai](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.html)\n", "\n", - "\n", - "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb)\n", + "\n", + "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb)\n", "\n", - "\n", - "[View source on GitHub](https://github.com/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb)" + "\n", + "[View source on GitHub](https://github.com/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.ipynb)" ] }, { @@ -35,7 +35,7 @@ "outputs": [], "source": [ "!pip install -U pip\n", - "!pip install -U git+https://github.com/recognai/biome-text.git\n", + "!pip install -U biome-text\n", "exit(0) # Force restart of the runtime" ] }, @@ -79,7 +79,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Here we will optimize the hyperparameters of the short-text classifier from [this tutorial](https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html), hence we recommend to have a look at it first before going through this tutorial.\n", + "Here we will optimize the hyperparameters of the short-text classifier from [this tutorial](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html), hence we recommend to have a look at it first before going through this tutorial.\n", "For the Hyper-Parameter Optimization (HPO) we rely on the awesome [Ray Tune library](https://docs.ray.io/en/latest/tune.html#tune-index).\n", "\n", "For a short introduction to HPO with Ray Tune you can have a look at this nice [talk](https://www.youtube.com/watch?v=VX7HvEoMrsA) by Richard Liaw. \n", @@ -153,7 +153,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As mentioned in the introduction we will use the same pipeline configuration as used in the [base tutorial](https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html)).\n", + "As mentioned in the introduction we will use the same pipeline configuration as used in the [base tutorial](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html)).\n", "\n", "To perform a random hyperparameter search (as well as a grid search) we simply have to replace the parameters we want to optimize with methods from the [Random Distributions API](https://docs.ray.io/en/latest/tune/api_docs/search_space.html#random-distributions-api) or the [Grid Search API](https://docs.ray.io/en/latest/tune/api_docs/search_space.html#grid-search-api) in our configuration dictionaries.\n", "For a complete description of both APIs and how they interplay with each other, see the corresponding section in the [Ray Tune docs](https://docs.ray.io/en/latest/tune/api_docs/search_space.html). \n", @@ -167,7 +167,7 @@ "- and the learning rate\n", "\n", "For most of the parameters we will provide discrete values from which Tune will sample randomly, while for the dropout and learning rate we will provide a continuous linear and logarithmic range, respectively.\n", - "Since we want to directly compare the outcome of the optimization with the base configuration of the [underlying tutorial](https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html), we will fix the number of epochs to 3.\n", + "Since we want to directly compare the outcome of the optimization with the base configuration of the [underlying tutorial](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html), we will fix the number of epochs to 3.\n", "\n", "Not all of the parameters above are worth tuning, but we want to stress the flexibility that *Ray Tune* and *biome.text* offers you.\n", "\n", @@ -438,7 +438,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Event though with 50 trials we visit just a small space of our possible configurations, we should have achieved an accuracy of ~0.94, an increase of roughly 3 points compared to the original configuration of the [base tutorial](https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html).\n", + "Event though with 50 trials we visit just a small space of our possible configurations, we should have achieved an accuracy of ~0.94, an increase of roughly 3 points compared to the original configuration of the [base tutorial](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html).\n", "\n", "In a real-life example, though, you probably should increase the number of epochs, since the validation loss in general seems to be decreasing further.\n", "\n", diff --git a/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb b/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb index 2aa26a1c0..a12ad45ea 100644 --- a/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb +++ b/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb @@ -15,14 +15,14 @@ "id": "qgfpoXsuVoso" }, "source": [ - "\n", - "[View on recogn.ai](https://recognai.github.io/biome-text/master/documentation/tutorials/4-Using_Transformers_in_biome_text.html)\n", + "\n", + "[View on recogn.ai](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/4-Using_Transformers_in_biome_text.html)\n", " \n", - "\n", - "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb)\n", + "\n", + "[Run in Google Colab](https://colab.research.google.com/github/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb)\n", " \n", - "\n", - "[View source on GitHub](https://github.com/recognai/biome-text/blob/master/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb)" + "\n", + "[View source on GitHub](https://github.com/recognai/biome-text/blob/v3.3.0/docs/docs/documentation/tutorials/4-Using_Transformers_in_biome_text.ipynb)" ] }, { @@ -43,7 +43,7 @@ "outputs": [], "source": [ "!pip install -U pip\n", - "!pip install -U git+https://github.com/recognai/biome-text.git\n", + "!pip install -U biome-text\n", "exit(0) # Force restart of the runtime" ] }, @@ -143,7 +143,7 @@ "\n", "We preprocessed the data in a separate [notebook](https://drive.google.com/file/d/1zUSz81x15RH5mL5GoN7i7xqiNGEqclU0/view?usp=sharing) producing three csv files (train, validate and test datasets) that contain the title, the abstract and the category of the corresponding paper. \n", "\n", - "Our NLP task will be to classify the papers into the given categories based on the title and abstract. Below we download the preprocessed data and create our [Datasets](https://recognai.github.io/biome-text/master/api/biome/text/dataset.html#dataset) with it." + "Our NLP task will be to classify the papers into the given categories based on the title and abstract. Below we download the preprocessed data and create our [Datasets](https://recognai.github.io/biome-text/v3.3.0/api/biome/text/dataset.html#dataset) with it." ] }, { @@ -205,7 +205,7 @@ "id": "-UlEwBj4C6Zj" }, "source": [ - "Our pipeline defined in the next section, or to be more precise the `TaskClassification` task [head](https://recognai.github.io/biome-text/master/documentation/basics.html#head), will expect a *text* and *label* column to be present in our data.\n", + "Our pipeline defined in the next section, or to be more precise the `TaskClassification` task [head](https://recognai.github.io/biome-text/v3.3.0/documentation/basics.html#head), will expect a *text* and *label* column to be present in our data.\n", "Therefore, we need to map our input to these two columns:" ] }, @@ -236,7 +236,7 @@ "source": [ "## Configuring and training the pipeline\n", "\n", - "As we have seen in [previous tutorials](https://recognai.github.io/biome-text/master/documentation/tutorials/1-Training_a_text_classifier.html#explore-the-training-data), a *biome.text* [`Pipeline`](https://recognai.github.io/biome-text/master/documentation/basics.html#pipeline) consists of tokenizing the input, extracting text features, applying a language encoding (optionally) and executing a task-specific head in the end. In *biome.text* the pre-trained transformers by Hugging Face are treated as a text feature, just like the *word* and *char* feature.\n", + "As we have seen in [previous tutorials](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/1-Training_a_text_classifier.html#explore-the-training-data), a *biome.text* [`Pipeline`](https://recognai.github.io/biome-text/v3.3.0/documentation/basics.html#pipeline) consists of tokenizing the input, extracting text features, applying a language encoding (optionally) and executing a task-specific head in the end. In *biome.text* the pre-trained transformers by Hugging Face are treated as a text feature, just like the *word* and *char* feature.\n", "\n", "In this section we will configure and train 3 different pipelines to showcase the usage of transformers in *biome.text*." ] @@ -660,7 +660,7 @@ "source": [ "## Optimizing the trainer configuration\n", "\n", - "As described in the [HPO tutorial](https://recognai.github.io/biome-text/master/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.html#imports), *biome.text* relies on the [Ray Tune library](https://docs.ray.io/en/latest/tune.html#tune-index) to perform hyperparameter optimization.\n", + "As described in the [HPO tutorial](https://recognai.github.io/biome-text/v3.3.0/documentation/tutorials/3-Hyperparameter_optimization_with_Ray_Tune.html#imports), *biome.text* relies on the [Ray Tune library](https://docs.ray.io/en/latest/tune.html#tune-index) to perform hyperparameter optimization.\n", "We recommend to go through that tutorial first, as we will be skipping most of the implementation details here." ] },