Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish the tutorial materials #161

Merged
merged 1 commit into from
Aug 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,19 @@
"\n",
"<div> <br/><img src=\"imgs/pecos_xmr_framework.png\" width=\"80%\"/> </div>\n",
"\n",
"As shown in the above figure, to address the XMR problem, PECOS conceptually consists of three stages, including semantic label indexing, machine-learned matching, and ranking. For more details about XMR problem and model formulation, please refer to presentations in the PECOS Day. In this part of the tutorial, we will use XR-Linear as an example to demonstrate how to use PECOS to tackle real-world problems and understrand the model architecture in PECOS."
"As shown in the above figure, to address the XMR problem, PECOS conceptually consists of three stages, including semantic label indexing, machine-learned matching, and ranking. In this part of the tutorial, we will use XR-Linear as an example to demonstrate how to use PECOS to tackle real-world problems and understrand the model architecture in PECOS.\n",
"\n",
"### Install PECOS through Python PIP"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6d9fa78b",
"metadata": {},
"outputs": [],
"source": [
"! pip install libpecos"
]
},
{
Expand Down Expand Up @@ -232,7 +244,7 @@
"\n",
"In PECOS, numerical features of instances can be in either a [dense NumPy matrix](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) or a [Compressed Sparse Row (CSR) matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) of shape `(nr_inst, nr_feat)`, where `nr_inst` and `nr_feat` are numbers of instances and features. Similary, labels of instances can be also presented as a dense or a sparse matrix of shape `(nr_inst, nr_labels)`, where `nr_labels` is the number of labels in the XMR problem. Note that for the sparse format, training labels should be a [Compressed Sparse Column (CSC) matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html) while testing labels should be a CSR matrix for the purpose of computational efficiency. For convenience, PECOS also provides APIs for loading features and labels from binary files in arbitary formats.\n",
"\n",
"In addition to numerical features, PECOS also supports handling text data with transformer. Please refer to [Part 2](Part%202%20-%20Text%20Processing.ipynb) in this tutorial for more details about text processing in PECOS."
"In addition to numerical features, PECOS also supports handling text data with transformer."
]
},
{
Expand Down Expand Up @@ -330,7 +342,7 @@
"source": [
"### Training XR-Linear Negative Sampling and Sparsification\n",
"\n",
"Negative sampling plays an important role in solving the XMR problem. PECOS currently provides two negative sampling schemes, including Teacher Forcing Negatives (TFN) and Matcher Aware Negatives (MAN). Please refer to [our report](https://arxiv.org/pdf/2010.05878.pdf)) and presentations in the [PECOS Day](https://w.amazon.com/bin/view/Search/MIDAS/Projects/PECOS/PecosDay/) for more details about negative sampling schemes.\n",
"Negative sampling plays an important role in solving the XMR problem. PECOS currently provides two negative sampling schemes, including Teacher Forcing Negatives (TFN) and Matcher Aware Negatives (MAN). Please refer to [our report](https://arxiv.org/pdf/2010.05878.pdf) for more details about negative sampling schemes.\n",
"\n",
"To reduce model sizes and improve efficiency, PECOS conduct model sparsification with a hyper-parameter `threshold`. The model weights with absolute values smaller than the threshold will be discarded."
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,24 @@
"* building the indexer (training)\n",
"* inference (testing).\n",
"\n",
"### Install PECOS through Python PIP"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f6df49a3",
"metadata": {},
"outputs": [],
"source": [
"! pip install libpecos"
]
},
{
"cell_type": "markdown",
"id": "abb5ff7e",
"metadata": {},
"source": [
"### Data Loading"
]
},
Expand Down
14 changes: 13 additions & 1 deletion tutorials/kdd22/Session 4 Utilities in PECOS.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,19 @@
"source": [
"# Utilities in PECOS\n",
"\n",
"PECOS provides various useful interfaces and utility functions for XMR problems and related tasks. In this session, we will introduce how to tackle arbitrary data formats for XMR, and then present some utilities in PECOS for efficient matrix operations and hierarchical clustering."
"PECOS provides various useful interfaces and utility functions for XMR problems and related tasks. In this session, we will introduce how to tackle arbitrary data formats for XMR, and then present some utilities in PECOS for efficient matrix operations and hierarchical clustering.\n",
"\n",
"### Install PECOS through Python PIP"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4eba0f0b",
"metadata": {},
"outputs": [],
"source": [
"! pip install libpecos"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,18 @@
"In many XMC applications, XR-Transformer is able to yield better performance than XR-Linear due to better extraction of semantic information. However, unlike the linear models, the training hyper-parameters need to be carefully set to achieve the best performance. Naively using the default setting will often lead to sub-optimal results.\n",
"\n",
"In this section, we will discuss about crucial components in training a good XR-Transformer model.\n",
"\n"
"\n",
"### Install PECOS through Python PIP"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f7f4acc8",
"metadata": {},
"outputs": [],
"source": [
"! pip install libpecos"
]
},
{
Expand All @@ -25,7 +36,7 @@
"* **Step2**: Fine-tune the transformer encoder on the chosen levels of the preliminary HLT.\n",
"* **Step3**: Concatenate final instance embeddings and sparse features and train the linear rankers on the refined HLT.\n",
"\n",
"<div> <br/><img src=\"https://assets.amazon.science/dims4/default/d1bf545/2147483647/strip/true/crop/2670x1502+0+0/resize/1200x675!/format/webp/quality/90/?url=http%3A%2F%2Famazon-topics-brightspot.s3.amazonaws.com%2Fscience%2F20%2F20%2Ffe5f61184e0ea535f2ae054f9d42%2Fxrtransformer.png\" width=\"80%\"/> </div>\n",
"<div> <br/><img src=\"imgs/pecos_xrtransformer.png\" width=\"80%\"/> </div>\n",
"\n"
]
},
Expand Down
Binary file added tutorials/kdd22/imgs/pecos_xrtransformer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.