Skip to content

Commit

Permalink
Generate en docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Milvus-doc-bot authored and Milvus-doc-bot committed Sep 19, 2024
1 parent 1ccede7 commit 4680df6
Show file tree
Hide file tree
Showing 4 changed files with 84 additions and 32 deletions.
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"codeList":["pip install --upgrade pymilvus\npip install \"pymilvus[model]\"\n","from pymilvus.model.dense import JinaEmbeddingFunction\n\njina_ef = JinaEmbeddingFunction(\n model_name=\"jina-embeddings-v2-base-en\", # Defaults to `jina-embeddings-v2-base-en`\n api_key=JINAAI_API_KEY # Provide your Jina AI API key\n)\n","docs = [\n \"Artificial intelligence was founded as an academic discipline in 1956.\",\n \"Alan Turing was the first person to conduct substantial research in AI.\",\n \"Born in Maida Vale, London, Turing was raised in southern England.\",\n]\n\ndocs_embeddings = jina_ef.encode_documents(docs)\n\n# Print embeddings\nprint(\"Embeddings:\", docs_embeddings)\n# Print dimension and shape of embeddings\nprint(\"Dim:\", jina_ef.dim, docs_embeddings[0].shape)\n","Embeddings: [array([-4.88487840e-01, -4.28095880e-01, 4.90086500e-01, -1.63274320e-01,\n 3.43437800e-01, 3.21476880e-01, 2.83173790e-02, -3.10403670e-01,\n 4.76985040e-01, -1.77410420e-01, -3.84803180e-01, -2.19224200e-01,\n -2.52898000e-01, 6.62411900e-02, -8.58173100e-01, 1.05221800e+00,\n...\n -2.04462400e-01, 7.14229800e-01, -1.66823000e-01, 8.72551440e-01,\n 5.53560140e-01, 8.92506300e-01, -2.39408610e-01, -4.22413560e-01,\n -3.19551350e-01, 5.59153850e-01, 2.44338100e-01, -8.60452100e-01])]\nDim: 768 (768,)\n","queries = [\"When was artificial intelligence founded\", \n \"Where was Alan Turing born?\"]\n\nquery_embeddings = jina_ef.encode_queries(queries)\n\nprint(\"Embeddings:\", query_embeddings)\nprint(\"Dim\", jina_ef.dim, query_embeddings[0].shape)\n","Embeddings: [array([-5.99164660e-01, -3.49827350e-01, 8.22405160e-01, -1.18632730e-01,\n 5.78107540e-01, 1.09789170e-01, 2.91604200e-01, -3.29306450e-01,\n 2.93779640e-01, -2.17880800e-01, -6.84535440e-01, -3.79752000e-01,\n -3.47541800e-01, 9.20846100e-02, -6.13804400e-01, 6.31312800e-01,\n...\n -1.84993740e-02, 9.38629150e-01, 2.74858470e-02, 1.09396360e+00,\n 3.96270750e-01, 7.44445800e-01, -1.95404050e-01, -6.08383200e-01,\n -3.75076300e-01, 3.87512200e-01, 8.11889650e-01, -3.76407620e-01])]\nDim 768 (768,)\n"],"headingContent":"Jina AI","anchorList":[{"label":"Jina AI","href":"Jina-AI","type":1,"isActive":false}]}
{"codeList":["pip install --upgrade pymilvus\npip install \"pymilvus[model]\"\n","from pymilvus.model.dense import JinaEmbeddingFunction\n\njina_ef = JinaEmbeddingFunction(\n model_name=\"jina-embeddings-v3\", # Defaults to `jina-embeddings-v3`\n api_key=JINAAI_API_KEY, # Provide your Jina AI API key\n task=\"retrieval.passage\", # Specify the task\n dimensions=1024, # Defaults to 1024\n)\n","\n```python\ndocs = [\n \"Artificial intelligence was founded as an academic discipline in 1956.\",\n \"Alan Turing was the first person to conduct substantial research in AI.\",\n \"Born in Maida Vale, London, Turing was raised in southern England.\",\n]\n\ndocs_embeddings = jina_ef.encode_documents(docs)\n\n# Print embeddings\nprint(\"Embeddings:\", docs_embeddings)\n# Print dimension and shape of embeddings\nprint(\"Dim:\", jina_ef.dim, docs_embeddings[0].shape)\n","Embeddings: [array([9.80641991e-02, -8.51697400e-02, 7.36531913e-02, 1.42558888e-02,\n -2.23589484e-02, 1.68494112e-03, -3.50753777e-02, -3.11530549e-02,\n -3.26012149e-02, 5.04568312e-03, 3.69836427e-02, 3.48948985e-02,\n 8.19722563e-03, 5.88679723e-02, -6.71099266e-03, -1.82369724e-02,\n...\n 2.48654783e-02, 3.43279652e-02, -1.66154150e-02, -9.90478322e-03,\n -2.96043139e-03, -8.57473817e-03, -7.39028037e-04, 6.25024503e-03,\n -1.08831357e-02, -4.00776342e-02, 3.25369164e-02, -1.42691191e-03])]\nDim: 1024 (1024,)\n","queries = [\"When was artificial intelligence founded\", \n \"Where was Alan Turing born?\"]\n\nquery_embeddings = jina_ef.encode_queries(queries)\n\nprint(\"Embeddings:\", query_embeddings)\nprint(\"Dim\", jina_ef.dim, query_embeddings[0].shape)\n","Embeddings: [array([8.79201014e-03, 1.47551354e-02, 4.02722731e-02, -2.52991207e-02,\n 1.12719582e-02, 3.75947170e-02, 3.97946090e-02, -7.36681819e-02,\n -2.17952449e-02, -1.16298944e-02, -6.83426252e-03, -5.12507409e-02,\n 5.26071340e-02, 6.75181448e-02, 3.92445624e-02, -1.40817231e-02,\n...\n 8.81703943e-03, 4.24629413e-02, -2.32944116e-02, -2.05193572e-02,\n -3.22035812e-02, 2.81896023e-03, 3.85326855e-02, 3.64372656e-02,\n -1.65050142e-02, -4.26847413e-02, 2.02664156e-02, -1.72684863e-02])]\nDim 1024 (1024,)\n","from pymilvus.model.dense import JinaEmbeddingFunction\n\njina_ef = JinaEmbeddingFunction(\n model_name=\"jina-embeddings-v3\", # Defaults to `jina-embeddings-v3`\n api_key=JINA_API_KEY, # Provide your Jina AI API key\n task=\"text-matching\",\n dimensions=1024, # Defaults to 1024\n)\n\ntexts = [\n \"Follow the white rabbit.\", # English\n \"Sigue al conejo blanco.\", # Spanish\n \"Suis le lapin blanc.\", # French\n \"跟着白兔走。\", # Chinese\n \"اتبع الأرنب الأبيض.\", # Arabic\n \"Folge dem weißen Kaninchen.\", # German\n]\n\nembeddings = jina_ef(texts)\n\n# Compute similarities\nprint(embeddings[0] @ embeddings[1].T)\n"],"headingContent":"Jina AI","anchorList":[{"label":"Jina AI","href":"Jina-AI","type":1,"isActive":false}]}
85 changes: 63 additions & 22 deletions localization/v2.4.x/site/en/embeddings/embed-with-jina.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,36 @@ pip install <span class="hljs-string">&quot;pymilvus[model]&quot;</span>
<pre><code translate="no" class="language-python"><span class="hljs-keyword">from</span> pymilvus.model.dense <span class="hljs-keyword">import</span> JinaEmbeddingFunction

jina_ef = JinaEmbeddingFunction(
model_name=<span class="hljs-string">&quot;jina-embeddings-v2-base-en&quot;</span>, <span class="hljs-comment"># Defaults to `jina-embeddings-v2-base-en`</span>
api_key=JINAAI_API_KEY <span class="hljs-comment"># Provide your Jina AI API key</span>
model_name=<span class="hljs-string">&quot;jina-embeddings-v3&quot;</span>, <span class="hljs-comment"># Defaults to `jina-embeddings-v3`</span>
api_key=JINAAI_API_KEY, <span class="hljs-comment"># Provide your Jina AI API key</span>
task=<span class="hljs-string">&quot;retrieval.passage&quot;</span>, <span class="hljs-comment"># Specify the task</span>
dimensions=<span class="hljs-number">1024</span>, <span class="hljs-comment"># Defaults to 1024</span>
)
<button class="copy-code-btn"></button></code></pre>
<p><strong>Parameters</strong>:</p>
<ul>
<li><p><code translate="no">model_name</code> (<em>string</em>)</p>
<p>The name of the Jina AI embedding model to use for encoding. You can specify any of the available Jina AI embedding model names, for example, <code translate="no">jina-embeddings-v2-base-en</code>, <code translate="no">jina-embeddings-v2-small-en</code>, etc. If you leave this parameter unspecified, <code translate="no">jina-embeddings-v2-base-en</code> will be used. For a list of available models, refer to <a href="https://jina.ai/embeddings">Jina Embeddings</a>.</p></li>
<p>The name of the Jina AI embedding model to use for encoding. You can specify any of the available Jina AI embedding model names, for example, <code translate="no">jina-embeddings-v3</code>, <code translate="no">jina-embeddings-v2-base-en</code>, etc. If you leave this parameter unspecified, <code translate="no">jina-embeddings-v3</code> will be used. For a list of available models, refer to <a href="https://jina.ai/embeddings">Jina Embeddings</a>.</p></li>
<li><p><code translate="no">api_key</code> (<em>string</em>)</p>
<p>The API key for accessing the Jina AI API.</p></li>
<li><p><code translate="no">task</code> (<em>string</em>)</p>
<p>The type of input passed to the model. Required for embedding models v3 and higher.</p>
<ul>
<li><code translate="no">&quot;retrieval.passage&quot;</code>: Used to encode large documents in retrieval tasks at indexing time.</li>
<li><code translate="no">&quot;retrieval.query&quot;</code>: Used to encode user queries or questions in retrieval tasks.</li>
<li><code translate="no">&quot;classification&quot;</code>: Used to encode text for text classification tasks.</li>
<li><code translate="no">&quot;text-matching&quot;</code>: Used to encode text for similarity matching, such as measuring similarity between two sentences.</li>
<li><code translate="no">&quot;clustering&quot;</code>: Used for clustering or reranking tasks.</li>
</ul></li>
<li><p><code translate="no">dimensions</code> (<em>int</em>)</p>
<p>The number of dimensions the resulting output embeddings should have. Defaults to 1024. Only supported for embedding models v3 and higher.</p></li>
<li><p><code translate="no">late_chunking</code> (<em>bool</em>)</p>
<p>This parameter controls whether to use the new chunking method <a href="https://arxiv.org/abs/2409.04701">Jina AI introduced last month</a> for encoding a batch of sentences. Defaults to <code translate="no">False</code>. When set to <code translate="no">True</code>, Jina AI API will concatenate all sentences in the input field and feed them as a single string to the model. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list.</p></li>
</ul>
<p>To create embeddings for documents, use the <code translate="no">encode_documents()</code> method:</p>
<pre><code translate="no" class="language-python">docs = [
<p>To create embeddings for documents, use the <code translate="no">encode_documents()</code> method. This method is designed for documents embeddings in asymmetric retrieval tasks, such as indexing documents for search or recommendation tasks. This method uses <code translate="no">retrieval.passage</code> as the task.</p>
<pre><code translate="no" class="language-python:">
```python
docs = [
<span class="hljs-string">&quot;Artificial intelligence was founded as an academic discipline in 1956.&quot;</span>,
<span class="hljs-string">&quot;Alan Turing was the first person to conduct substantial research in AI.&quot;</span>,
<span class="hljs-string">&quot;Born in Maida Vale, London, Turing was raised in southern England.&quot;</span>,
Expand All @@ -57,17 +74,17 @@ docs_embeddings = jina_ef.encode_documents(docs)
<span class="hljs-built_in">print</span>(<span class="hljs-string">&quot;Dim:&quot;</span>, jina_ef.dim, docs_embeddings[<span class="hljs-number">0</span>].shape)
<button class="copy-code-btn"></button></code></pre>
<p>The expected output is similar to the following:</p>
<pre><code translate="no" class="language-python">Embeddings: [array([-4.88487840e-01, -4.28095880e-01, 4.90086500e-01, -1.63274320e-01,
3.43437800e-01, 3.21476880e-01, 2.83173790e-02, -3.10403670e-01,
4.76985040e-01, -1.77410420e-01, -3.84803180e-01, -2.19224200e-01,
-2.52898000e-01, 6.62411900e-02, -8.58173100e-01, 1.05221800e+00,
<pre><code translate="no" class="language-python">Embeddings: [array([9.80641991e-02, -8.51697400e-02, 7.36531913e-02, 1.42558888e-02,
-2.23589484e-02, 1.68494112e-03, -3.50753777e-02, -3.11530549e-02,
-3.26012149e-02, 5.04568312e-03, 3.69836427e-02, 3.48948985e-02,
8.19722563e-03, 5.88679723e-02, -6.71099266e-03, -1.82369724e-02,
...
-2.04462400e-01, 7.14229800e-01, -1.66823000e-01, 8.72551440e-01,
5.53560140e-01, 8.92506300e-01, -2.39408610e-01, -4.22413560e-01,
-3.19551350e-01, 5.59153850e-01, 2.44338100e-01, -8.60452100e-01])]
Dim: 768 (768,)
2.48654783e-02, 3.43279652e-02, -1.66154150e-02, -9.90478322e-03,
-2.96043139e-03, -8.57473817e-03, -7.39028037e-04, 6.25024503e-03,
-1.08831357e-02, -4.00776342e-02, 3.25369164e-02, -1.42691191e-03])]
Dim: 1024 (1024,)
<button class="copy-code-btn"></button></code></pre>
<p>To create embeddings for queries, use the <code translate="no">encode_queries()</code> method:</p>
<p>To create embeddings for queries, use the <code translate="no">encode_queries()</code> method. This method is designed for query embeddings in asymmetric retrieval tasks, such as search queries or questions. This method uses <code translate="no">retrieval.query</code> as the task.</p>
<pre><code translate="no" class="language-python">queries = [<span class="hljs-string">&quot;When was artificial intelligence founded&quot;</span>,
<span class="hljs-string">&quot;Where was Alan Turing born?&quot;</span>]

Expand All @@ -77,13 +94,37 @@ query_embeddings = jina_ef.encode_queries(queries)
<span class="hljs-built_in">print</span>(<span class="hljs-string">&quot;Dim&quot;</span>, jina_ef.dim, query_embeddings[<span class="hljs-number">0</span>].shape)
<button class="copy-code-btn"></button></code></pre>
<p>The expected output is similar to the following:</p>
<pre><code translate="no" class="language-python">Embeddings: [array([-5.99164660e-01, -3.49827350e-01, 8.22405160e-01, -1.18632730e-01,
5.78107540e-01, 1.09789170e-01, 2.91604200e-01, -3.29306450e-01,
2.93779640e-01, -2.17880800e-01, -6.84535440e-01, -3.79752000e-01,
-3.47541800e-01, 9.20846100e-02, -6.13804400e-01, 6.31312800e-01,
<pre><code translate="no" class="language-python">Embeddings: [array([8.79201014e-03, 1.47551354e-02, 4.02722731e-02, -2.52991207e-02,
1.12719582e-02, 3.75947170e-02, 3.97946090e-02, -7.36681819e-02,
-2.17952449e-02, -1.16298944e-02, -6.83426252e-03, -5.12507409e-02,
5.26071340e-02, 6.75181448e-02, 3.92445624e-02, -1.40817231e-02,
...
-1.84993740e-02, 9.38629150e-01, 2.74858470e-02, 1.09396360e+00,
3.96270750e-01, 7.44445800e-01, -1.95404050e-01, -6.08383200e-01,
-3.75076300e-01, 3.87512200e-01, 8.11889650e-01, -3.76407620e-01])]
Dim 768 (768,)
8.81703943e-03, 4.24629413e-02, -2.32944116e-02, -2.05193572e-02,
-3.22035812e-02, 2.81896023e-03, 3.85326855e-02, 3.64372656e-02,
-1.65050142e-02, -4.26847413e-02, 2.02664156e-02, -1.72684863e-02])]
Dim 1024 (1024,)
<button class="copy-code-btn"></button></code></pre>
<p>To create embeddings of inputs for similarity matching (such as STS or symmetric retrieval tasks), text classification, clustering, or reranking tasks, use the appropriate <code translate="no">task</code> parameter value when instantiating the <code translate="no">JinaEmbeddingFunction</code> class.</p>
<pre><code translate="no" class="language-python"><span class="hljs-keyword">from</span> pymilvus.model.dense <span class="hljs-keyword">import</span> JinaEmbeddingFunction

jina_ef = JinaEmbeddingFunction(
model_name=<span class="hljs-string">&quot;jina-embeddings-v3&quot;</span>, <span class="hljs-comment"># Defaults to `jina-embeddings-v3`</span>
api_key=JINA_API_KEY, <span class="hljs-comment"># Provide your Jina AI API key</span>
task=<span class="hljs-string">&quot;text-matching&quot;</span>,
dimensions=<span class="hljs-number">1024</span>, <span class="hljs-comment"># Defaults to 1024</span>
)

texts = [
<span class="hljs-string">&quot;Follow the white rabbit.&quot;</span>, <span class="hljs-comment"># English</span>
<span class="hljs-string">&quot;Sigue al conejo blanco.&quot;</span>, <span class="hljs-comment"># Spanish</span>
<span class="hljs-string">&quot;Suis le lapin blanc.&quot;</span>, <span class="hljs-comment"># French</span>
<span class="hljs-string">&quot;跟着白兔走。&quot;</span>, <span class="hljs-comment"># Chinese</span>
<span class="hljs-string">&quot;اتبع الأرنب الأبيض.&quot;</span>, <span class="hljs-comment"># Arabic</span>
<span class="hljs-string">&quot;Folge dem weißen Kaninchen.&quot;</span>, <span class="hljs-comment"># German</span>
]

embeddings = jina_ef(texts)

<span class="hljs-comment"># Compute similarities</span>
<span class="hljs-built_in">print</span>(embeddings[<span class="hljs-number">0</span>] @ embeddings[<span class="hljs-number">1</span>].T)
<button class="copy-code-btn"></button></code></pre>
Loading

0 comments on commit 4680df6

Please sign in to comment.