{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":642608852,"defaultBranch":"main","name":"gorilla","ownerLogin":"ShishirPatil","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-05-19T00:46:45.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/30296397?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1724739830.0","currentOid":""},"activityList":{"items":[{"before":"46c3e855c51f4d0f222f08d41818e72b619cadea","after":"3f5ace7de63c37dc76d191f4471dfdb318c25f8e","ref":"refs/heads/main","pushedAt":"2024-09-15T07:32:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Add New Model `o1-preview-2024-09-12` and `o1-mini-2024-09-12` (#635)\n\nThis PR adds new models `o1-preview-2024-09-12` and `o1-mini-2024-09-12`\r\nto the leaderboard.","shortMessageHtmlLink":"[BFCL] Add New Model o1-preview-2024-09-12 and o1-mini-2024-09-12 ("}},{"before":"bc421d209092651b54b59a127d9c2d8a8e569497","after":"2bc9c4a47b2687b83c35c9e8e9171609eb5b6235","ref":"refs/heads/gh-pages","pushedAt":"2024-09-15T06:49:06.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Leaderboard Update, in sync with #600, #608, #616, #623, #626, #627, #635, and #638. (#639)\n\nThis PR updates the leaderboard to reflect the change in score due to\r\nthe following PR merge:\r\n\r\n1. #608\r\n2. #600\r\n3. #616 \r\n4. #623\r\n5. #626\r\n6. #627\r\n7. #635 \r\n8. #638\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji ","shortMessageHtmlLink":"[BFCL] Leaderboard Update, in sync with #600, #608, #616, #623, #626, #…"}},{"before":"12f641c44e9835bbb4458a77f14d8fc0171a202b","after":"46c3e855c51f4d0f222f08d41818e72b619cadea","ref":"refs/heads/main","pushedAt":"2024-09-15T06:48:24.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"fix: bug for glm prompt format (#638)\n\nfixed a bug in the GLM prompt formatting. #637\r\n\r\n**The previous formatted prompt:**\r\n`[gMASK]<|user|>\\nI'm playing a dice game and want to calculate my\r\nchances. I roll the die 20 times, and I'm trying to figure out the\r\nprobability of landing on a 6 exactly five times, considering each roll\r\nhas a one in six chance of being a 6. Could you help me with\r\nthat?<|assistant|>`\r\nthe function calling result:\r\n\r\n![image](https://github.com/user-attachments/assets/c95d4d19-8033-4ad1-84b3-9ee35ce61d61)\r\n\r\n**The fixed formatted prompt:**\r\nSingle role: user\r\n```\r\n[gMASK]<|system|>\r\n你是一个名为 ChatGLM 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的,你的任务是针对用户的问题和要求提供适当的答复和支持\r\n\r\n# 可用工具\r\n\r\n## calculate_ROI\r\n\r\n{\r\n \"name\": \"calculate_ROI\",\r\n \"description\": \"Calculate the Return on Investment (ROI) for a given investment amount and net profit. Note that the provided function is in Python 3 syntax.\",\r\n \"parameters\": {\r\n \"type\": \"object\",\r\n \"properties\": {\r\n \"investment_amount\": {\r\n \"type\": \"number\",\r\n \"description\": \"The initial amount of money invested. This is a float type value.\",\r\n \"format\": \"float\"\r\n },\r\n \"net_profit\": {\r\n \"type\": \"number\",\r\n \"description\": \"The profit made from the investment. This is a float type value.\",\r\n \"format\": \"float\"\r\n },\r\n \"duration_years\": {\r\n \"type\": \"integer\",\r\n \"description\": \"The duration of the investment in years.\",\r\n \"default\": 1\r\n }\r\n },\r\n \"required\": [\r\n \"investment_amount\",\r\n \"net_profit\"\r\n ]\r\n }\r\n}\r\n在调用上述函数时,请使用 Json 格式表示调用的参数。<|user|>\r\nCalculate the profit margin of a company with revenue of $200,000 and expenses of $150,000.<|assistant|>\r\n```\r\nMultiple roles, system and user\r\n```\r\n[gMASK]<|system|>\r\n你是一个名为 ChatGLM 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的,你的任务是针对用户的问题和要求提供适当的答复和支持\r\n\r\n# 可用工具\r\n\r\n## find_flights\r\n\r\n{\r\n \"name\": \"find_flights\",\r\n \"description\": \"Searches for available flights between an origin and a destination on a specified date for a given number of passengers. Note that the provided function is in Python 3 syntax.\",\r\n \"parameters\": {\r\n \"type\": \"object\",\r\n \"required\": [\r\n \"origin\",\r\n \"destination\",\r\n \"date\",\r\n \"passengers\"\r\n ],\r\n \"properties\": {\r\n \"origin\": {\r\n \"type\": \"string\",\r\n \"description\": \"The three-letter IATA code of the origin airport, such as 'SFO' for San Francisco International Airport.\"\r\n },\r\n \"destination\": {\r\n \"type\": \"string\",\r\n \"description\": \"The three-letter IATA code of the destination airport, similar to the origin format.\"\r\n },\r\n \"date\": {\r\n \"type\": \"string\",\r\n \"description\": \"The departure date of the flight in the format YYYY-MM-DD, such as '2023-04-15'.\"\r\n },\r\n \"passengers\": {\r\n \"type\": \"integer\",\r\n \"description\": \"The total number of passengers traveling.\"\r\n }\r\n }\r\n }\r\n}\r\n在调用上述函数时,请使用 Json 格式表示调用的参数。\r\n\r\n## book_flight\r\n\r\n{\r\n \"name\": \"book_flight\",\r\n \"description\": \"Registers a specified flight for the given list of passengers on a particular date. Note that the provided function is in Python 3 syntax.\",\r\n \"parameters\": {\r\n \"type\": \"object\",\r\n \"required\": [\r\n \"flight\",\r\n \"passengers\",\r\n \"date\"\r\n ],\r\n \"properties\": {\r\n \"flight\": {\r\n \"type\": \"string\",\r\n \"description\": \"The unique identifier of the flight to be booked.\"\r\n },\r\n \"passengers\": {\r\n \"type\": \"array\",\r\n \"items\": {\r\n \"type\": \"string\"\r\n },\r\n \"description\": \"A list of names of the passengers traveling.\"\r\n },\r\n \"date\": {\r\n \"type\": \"string\",\r\n \"description\": \"The departure date of the flight in the format YYYY-MM-DD.\"\r\n }\r\n }\r\n }\r\n}\r\n在调用上述函数时,请使用 Json 格式表示调用的参数。<|system|>\r\nYou are very powerful travel agent, you are able to search for flights and book them for your customers.<|user|>\r\nI need to travel from Berlin to New York on 2021-10-10 with 2 passengers. Can you help me find available flights and also assist with booking once we choose the right one?<|assistant|>\r\n```\r\nthe function calling result:\r\n\r\n![image](https://github.com/user-attachments/assets/eadff246-f20f-4d1e-8096-c0a9ed33b790)\r\n\r\n---------\r\n\r\nCo-authored-by: ai_user \r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"fix: bug for glm prompt format (#638)"}},{"before":"5726008476e30408c06d97166851d9386f44e6fb","after":"12f641c44e9835bbb4458a77f14d8fc0171a202b","ref":"refs/heads/main","pushedAt":"2024-09-14T08:09:16.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Hot Fix to Remove Extra Parameters for NoAPIKeyError (#636)\n\nThis PR won't impact the leaderboard score.","shortMessageHtmlLink":"[BFCL] Hot Fix to Remove Extra Parameters for NoAPIKeyError (#636)"}},{"before":"d804675e66b52ef13426b97130b4e7803874fcd1","after":"5726008476e30408c06d97166851d9386f44e6fb","ref":"refs/heads/main","pushedAt":"2024-09-11T22:57:46.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Refactor Model Handler into OSS and Proprietary Components (#612)\n\nThis PR reorganizes the model handler by splitting it into two distinct\r\ncomponents: an Open Source (OSS) model handler and a Proprietary model\r\nhandler. This change is part of a series of updates that address the\r\ntasks outlined in issue #510.\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"[BFCL] Refactor Model Handler into OSS and Proprietary Components (#612)"}},{"before":"cddb4af7ced19b505bbf649b76f459a35fffc7c0","after":"d804675e66b52ef13426b97130b4e7803874fcd1","ref":"refs/heads/main","pushedAt":"2024-09-09T20:11:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HuanzhiMao","name":"Huanzhi (Hans) Mao","path":"/HuanzhiMao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/93459410?s=80&v=4"},"commit":{"message":"[BFCL] add MadeAgents/Hammer-7b handler (#627)\n\nThis PR add MadeAgents/Hammer-7b. Here's the CSV table converted to\r\nmarkdown format:\r\n\r\n| | Overall Acc | Model | AST Summary | Exec Summary | Simple AST |\r\nMultiple AST | Parallel AST | Parallel Multiple AST | Simple Exec |\r\nMultiple Exec | Parallel Exec | Parallel Multiple Exec | Irrelevance\r\nDetection | Relevance Detection |\r\n\r\n|:-:|:-----------:|:------------------------------:|:-----------:|:------------:|:----------:|:------------:|:------------:|:---------------------:|:-----------:|:-------------:|:-------------:|:----------------------:|:---------------------:|:-------------------:|\r\n| 1 | 83.92% | Hammer-7b (fc) | 78.70% | 89.71% | 69.31% | 82.52% |\r\n78.88% | 84.08% | 91.86% | 94.00% | 88.00% | 85.00% | 72.87% | 92.68% |\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"[BFCL] add MadeAgents/Hammer-7b handler (#627)"}},{"before":"8ba9479920a4b9b08daee7f0bfd5734660332f5b","after":"cddb4af7ced19b505bbf649b76f459a35fffc7c0","ref":"refs/heads/main","pushedAt":"2024-09-09T07:44:54.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Fix Llama Handler (#626)\n\nAccording to the [llama chat template on\r\nhuggingface](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct/blob/1480bb72e06591eb87b0ebe2c8853127f9697bae/tokenizer_config.json#L2053),\r\nthe `_format_prompt` method in `LlamaHandler` is missing two `\\n` after\r\neach `<|end_header_id|>` tag. This PR fixes it.\r\n\r\nThis will affect the leaderboard score for modell\r\n`meta-llama/Meta-Llama-3-8B-Instruct` and\r\n`meta-llama/Meta-Llama-3-70B-Instruct`.","shortMessageHtmlLink":"[BFCL] Fix Llama Handler (#626)"}},{"before":"7056b284711ddd078d19782cccb39b766087c07f","after":"bc421d209092651b54b59a127d9c2d8a8e569497","ref":"refs/heads/gh-pages","pushedAt":"2024-09-09T07:42:46.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Fix data augmentation examples (#624)\n\nFix data augmentation examples in Blog 7: Gorilla OpenFunctions v2.","shortMessageHtmlLink":"Fix data augmentation examples (#624)"}},{"before":"cec2bfce1ca7a1e4bedb4a67dec6d562c2055b7a","after":"8ba9479920a4b9b08daee7f0bfd5734660332f5b","ref":"refs/heads/main","pushedAt":"2024-09-09T07:42:13.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Fix Decoding Issue in Nvidia Handler (#623)\n\nIn this PR:\r\n\r\n1. Fix decoding issue in the `NvidiaHandler`. An unnecessary whitespace\r\nin the decode method caused many model responses fail to get decoded\r\n(should be `[` instead of `[ `, and `]` instead of ` ]`).\r\n3. Remove duplicate `ArcticHandler` class. It is the exact same as\r\n`NvidiaHandler`, so we just let `snowflake/arctic` use `NvidiaHandler`\r\ninstead.\r\n\r\nThis will affect the leaderboard score for\r\n`nvidia/nemotron-4-340b-instruct` and `snowflake/arctic`. We will update\r\nit shortly.\r\n\r\nCredit to @mattf for pointing these out. \r\n\r\n---------\r\n\r\nCo-authored-by: Matthew Farrellee \r\n\r\n---------\r\n\r\nCo-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>","shortMessageHtmlLink":"[BFCL] Fix Decoding Issue in Nvidia Handler (#623)"}},{"before":"9dec19208c22a80ffcc8452ed1d6a67301b15359","after":"cec2bfce1ca7a1e4bedb4a67dec6d562c2055b7a","ref":"refs/heads/main","pushedAt":"2024-09-09T07:39:44.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Update gemini_handler.py to better handle NL+FC model output (#617)\n\nGemini model is capable of outputting NL/text (for reasoning or\r\nverbosity) for the corresponding FC as different parts. Simply\r\nconcatenation of text with FC will result in decode failures. So we\r\nshould only extract FC parts for decoding and eval when it exists in the\r\ncontent.\r\n\r\nSee below example:\r\n\r\nOUTPUT: [\"Okay, I can help you with that. Let's calculate the future\r\nvalue for each investment option:\\n\\n\\n**Bond:**\\n\\n\",\r\n{\"calculate_future_value\": \"{\\\"present_value\\\": 5000, \\\"periods\\\": 10,\r\n\\\"interest_rate\\\": 0.05}\"}, \"\\n\\n**Mutual Fund:**\\n\\n\",\r\n{\"calculate_future_value\": \"{\\\"periods\\\": 15, \\\"interest_rate\\\": 0.07,\r\n\\\"present_value\\\": 2000}\"}, \"\\n\\n\\n**Stocks:**\\n\\n\",\r\n{\"calculate_future_value\": \"{\\\"periods\\\": 20, \\\"interest_rate\\\": 0.1,\r\n\\\"present_value\\\": 1000}\"}\"]","shortMessageHtmlLink":"Update gemini_handler.py to better handle NL+FC model output (#617)"}},{"before":"def80a440b62fd3fc23f4d09077a5ff9c2098b63","after":"9dec19208c22a80ffcc8452ed1d6a67301b15359","ref":"refs/heads/main","pushedAt":"2024-09-09T07:39:19.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Add Salesforce xLAM model series (#616)\n\nThis PR add `Salesforce/xLAM-7b-r`, `Salesforce/xLAM-8x7b-r`,\r\n`Salesforce/xLAM-8x22b-r` models.\r\n\r\nNote that the `Salesforce/xLAM-8x7b-r` model requires 8x40GB GPUs like\r\nA100 for inference, while the `Salesforce/xLAM-8x22b-r` model requires\r\n8x80GB GPUs to run.\r\n\r\nTested these models with 8 H100 GPUs locally. The reference combined\r\nperformance is as follows:\r\n\r\nHere's the CSV table converted to markdown format:\r\n\r\nHere's the CSV table converted to markdown format:\r\n\r\n| Rank | Overall Acc | Model | AST Summary | Exec Summary | Irrelevance\r\nDetection | Relevance Detection |\r\n\r\n|------|-------------|-------|-------------|--------------|------------------------|---------------------|\r\n| 1 | 87.31% | xLAM-8x22b-r (FC) | 82.76% | 92.39% | 74.96% | 97.56% |\r\n| 2 | 83.38% | xLAM-8x7b-r (FC) | 78.18% | 89.02% | 72.35% | 92.68% |\r\n| 3 | 80.33% | xLAM-7b-r (FC) | 74.00% | 85.43% | 72.88% | 92.68% |\r\n| 4 | 80.18% | xLAM-7b-fc-r (FC) | 72.78% | 87.68% | 79.54% | 80.49% |\r\n| 5 | 75.43% | xLAM-1b-fc-r (FC) | 65.72% | 83.30% | 60.65% | 97.56% |\r\n\r\n\r\n@HuanzhiMao @CharlieJCJ Would you please take a look? Thanks for the\r\ngreat benchmark!\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"Add Salesforce xLAM model series (#616)"}},{"before":"d6657ac51a5a4a02eed702a4f3bbc90792778ca1","after":"7056b284711ddd078d19782cccb39b766087c07f","ref":"refs/heads/gh-pages","pushedAt":"2024-09-09T07:38:31.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Optimization for Wagon Wheel Visualization with Search Dropdown (#611)\n\nEnhanced the wagon wheel chart visualization by implementing search,\r\nselection, and deletion features for comparing different models.\r\nImproved usability and saved space by replacing the full models display\r\nwith an interactive search dropdown.\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"Optimization for Wagon Wheel Visualization with Search Dropdown (#611)"}},{"before":"eca516a1df30699eeec090dc27b23aa5923b1d26","after":"def80a440b62fd3fc23f4d09077a5ff9c2098b63","ref":"refs/heads/main","pushedAt":"2024-09-09T07:37:53.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Dataset and Possible Answer Fix (#600)\n\nFix #581.\r\n\r\n- Total number of entries affected: 24\r\n- Simple: 12 entry\r\n- `simple_182, simple_196, simple_203, simple_263, simple_266,\r\nsimple_277, simple_286, simple_287, simple_319, simple_322, simple_389,\r\nsimple_398`\r\n- Multiple: 3 entry\r\n - `multiple_64, multiple_160, multiple_194`\r\n- Parallel: 3 entry\r\n - `parallel_45, parallel_49, parallel_185`\r\n- Parallel Multiple: 6 entry\r\n- `parallel_multiple_10, parallel_multiple_141, parallel_multiple_171,\r\nparallel_multiple_174, parallel_multiple_177, parallel_multiple_183`\r\n\r\nThis will affect the leaderboard score. We will update it in a separate\r\nPR.\r\n\r\n\r\n---------\r\n\r\nCo-authored-by: Jason Huang ","shortMessageHtmlLink":"[BFCL] Dataset and Possible Answer Fix (#600)"}},{"before":"9035ac710edc004badc771a7a5f32e1523c1fd57","after":"d6657ac51a5a4a02eed702a4f3bbc90792778ca1","ref":"refs/heads/gh-pages","pushedAt":"2024-08-29T16:30:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Leaderboard Update, in sync with #593. (#603)\n\nThis PR updates the leaderboard to reflect the change in score due to\r\n#593 .\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji ","shortMessageHtmlLink":"[BFCL] Leaderboard Update, in sync with #593. (#603)"}},{"before":"b97910214b1205c056338f38dac72b1d1696ece2","after":"eca516a1df30699eeec090dc27b23aa5923b1d26","ref":"refs/heads/main","pushedAt":"2024-08-29T16:28:35.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Fix some bugs in test case prompts/ground truths (#608)\n\nSummary of the changes:\r\n\r\n| ID | Reason for Change |\r\n|----|-------------------|\r\n| simple_5 | Allowing 'all' roots as a valid answer, as it includes real\r\nroots |\r\n| simple_96 | Clarification needed in the prompt |\r\n| simple_238 | Expanding acceptable answers to include all years of the\r\nCivil War (1861-1865) |\r\n| simple_122 | Mismatch between specification (array of integers) and\r\nground truth (array of array of integers) |\r\n| simple_309 | Adding Tampa Bay Buccaneers to ground truth due to Tom\r\nBrady's 2020 NFL season |\r\n| simple_235 | Including Lisbon, Portugal as an acceptable answer |\r\n| simple_267 | Accepting \"New York\" as per the location parameter in the\r\nprompt |\r\n| simple_308 | Addressing grammar issues in the prompt |\r\n| simple_316 | Adding \"female\" as an acceptable answer for Serena\r\nWilliams |\r\n| simple_375 | Removing LA from ground truth as it's not mentioned in\r\nthe prompt |\r\n| simple_379 | Including \"Australia/Sydney\" as a valid timezone\r\nidentifier |\r\n| multiple_88 | Accepting PC as a platform for the WOW game |\r\n| multiple_119 | Same issue as simple_96, needs clarification in the\r\nprompt |\r\n| multiple_153 | Same issue as simple_235, including Lisbon, Portugal as\r\nan acceptable answer |\r\n\r\n\r\nThis PR does change the leaderboard values and will be updated in a\r\nseparate PR.\r\n\r\n---------\r\n\r\nCo-authored-by: Vinay Bagade \r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"Fix some bugs in test case prompts/ground truths (#608)"}},{"before":"fc6bd6084f5e3769de840d99cd1b02afef194720","after":"b97910214b1205c056338f38dac72b1d1696ece2","ref":"refs/heads/main","pushedAt":"2024-08-29T04:05:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HuanzhiMao","name":"Huanzhi (Hans) Mao","path":"/HuanzhiMao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/93459410?s=80&v=4"},"commit":{"message":"Fix issue #614: [BFCL] ModuleNotFoundError after commit 70d6722 (#615)\n\nBug fix of #614 [](url)\r\n\r\nThis pull request resolves the `ModuleNotFoundError` reported in issue\r\n#614 by updating the import paths affected by the directory\r\nrestructuring in commit 70d6722. The adjustments ensure that the LLM\r\ninference functionality works as expected.\r\n\r\n**Changes:**\r\n- `oss_handler.py` and `gorilla_handler.py` import paths updated to\r\ninclude the new `bfcl` directory prefix.\r\n\r\n**Testing:**\r\nRun the following command to verify that the issue has been resolved:\r\n`python openfunctions_evaluation.py --model gorilla-openfunctions-v2\r\n--test-category multiple --num-threads 1\r\n`","shortMessageHtmlLink":"Fix issue #614: [BFCL] ModuleNotFoundError after commit 70d6722 (#615)"}},{"before":"05f83fda565b4c9af962e2f6a1a3619a308a0ebd","after":null,"ref":"refs/heads/fix/merge-commit-597","pushedAt":"2024-08-27T06:23:50.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"}},{"before":"fa3bf8c1506208b562feff5e7dc899835ece964b","after":"fc6bd6084f5e3769de840d99cd1b02afef194720","ref":"refs/heads/main","pushedAt":"2024-08-27T06:23:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Fix/merge commit #605 and #604 (#609)\n\nPR #605 and #604 had conflicting requirements.txt. This should fix it.","shortMessageHtmlLink":"Fix/merge commit #605 and #604 (#609)"}},{"before":"11bdc518d81530142dc34242af9e95c37d4d474c","after":"05f83fda565b4c9af962e2f6a1a3619a308a0ebd","ref":"refs/heads/fix/merge-commit-597","pushedAt":"2024-08-27T06:23:16.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Merge branch 'main' into fix/merge-commit-597","shortMessageHtmlLink":"Merge branch 'main' into fix/merge-commit-597"}},{"before":"41fdee66562dc1c69de1c687fb60a65f27577aba","after":"fa3bf8c1506208b562feff5e7dc899835ece964b","ref":"refs/heads/main","pushedAt":"2024-08-27T06:21:55.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval (#604)\n\nThis pull request introduces a comprehensive set of updates and\r\nimprovements to the RAFT project, enhancing robustness, logging,\r\nprogress monitoring, checkpointing, multi-threading, Llama support,\r\nAzure authentication, and evaluation processes.\r\n\r\n**Note**: Those updates where developed for the most part to prepare the\r\nMS Build 2024 talk [Practicalities of Fine-Tuning Llama 2 with AI\r\nStudio](https://aka.ms/build24-ft-practical) with @ShishirPatil and Bala\r\nVenkataraman.\r\n\r\nKey updates include:\r\n\r\n### RAFT Script Improvements:\r\n\r\nThis PR introduces significant updates to the `raft.py` script,\r\nexpanding its functionality, improving its configurability, and removing\r\ndeprecated options. Below is a summary of the key changes:\r\n\r\n- **Logging Enhancements:** Improved logging configuration, including\r\nmore granular logging for various operations.\r\n- **Checkpointing Overhaul:** Significant refactoring of checkpointing\r\nlogic in `raft.py`, including the introduction of multi-threading,\r\nbetter directory handling, and optimization of chunk processing. The\r\n`--fast` mode, which deactivated checkpointing, was removed in favor of\r\na more efficient implementation that allows checkpointing to remain\r\nactivated at all times.\r\n- **Multi-Worker Support:** Added a `--workers` parameter to enable\r\nparallel processing, improving efficiency and reliability during various\r\noperations.\r\n- **Llama Instruction Support:** Added support for Llama instructions in\r\naddition to GPT instructions, enhancing the versatility of the script\r\nfor different model types.\r\n- **Dataset Processing:** Added more robust handling and filtering of\r\ndatasets, including support for customized field names, empty row\r\nfiltering, and threshold-based early stopping.\r\n- **Authentication Updates:** Added support for Azure OpenAI Keyless and\r\nManaged Identity authentication, along with related environment variable\r\nhandling.\r\n- **Content Safety Handling:** Updated the content generation process to\r\nskip chunks that fail content safety compliance checks, allowing the\r\nprocess to continue without interruption.\r\n- **Progress Logging Enhancements:** Improved progress logging with\r\n`tqdm`, including enhanced stats support in `client_utils.py`, providing\r\nbetter insights into the process flow.\r\n- **Bug Fixes and Cleanup:** Fixed various bugs across the project,\r\ncleaned up help messages, and removed outdated or redundant components.\r\n\r\n#### New Features and Options\r\n\r\n1. **Output Format Expansion:**\r\n- Added a new output format option: `eval`. This format is intended for\r\nevaluation purposes, providing an additional way to format datasets.\r\n\r\n2. **Enhanced Output Configuration:**\r\n- Introduced `--output-completion-prompt-column` and\r\n`--output-completion-completion-column` options to allow users to\r\nspecify custom column names for prompts and completions when using the\r\n`completion` format.\r\n\r\n3. **System Prompt Customization:**\r\n- Added the `--system-prompt-key` option to allow users to select\r\nbetween different system prompt keys (`gpt` or `llama`) based on the\r\nmodel they intend to use for dataset generation.\r\n\r\n4. **Worker Thread Management:**\r\n- Introduced the `--workers` option to allow parallel processing by\r\nspecifying the number of worker threads, improving the script’s\r\nefficiency in handling large datasets.\r\n\r\n5. **Checkpoint Management:**\r\n- Added the `--auto-clean-checkpoints` option, giving users the ability\r\nto automatically clean up checkpoints after dataset generation, reducing\r\nthe need for manual intervention.\r\n\r\n6. **Question/Answer Sample Threshold:**\r\n- Introduced the `--qa-threshold` option, which allows users to specify\r\na threshold for the number of Question/Answer samples to generate before\r\nstopping. This provides more control over the dataset generation\r\nprocess, particularly in large-scale operations.\r\n\r\n#### Removed Options\r\n\r\n1. **`--fast`:**\r\n- The `--fast` option has been removed. This option was previously used\r\nto run the script in a fast mode with no recovery implemented. The\r\nscript has been optimized to improve performance without the need for a\r\nseparate fast mode, rendering this option obsolete.\r\n\r\n#### Default Value Updates\r\n\r\n- Several options now have default values set, including\r\n`--output-type`, `--output-format`, `--doctype`, `--embedding_model`,\r\n`--completion_model`, `--workers`, and more. These defaults aim to make\r\nthe script more user-friendly by reducing the need for extensive\r\nconfiguration.\r\n\r\n---\r\n\r\n### Evaluation Script Improvements:\r\n\r\n- **Stop Keyword:** Added a stop keyword functionality to allow\r\ncontrolled early termination of evaluation processes when specific\r\nconditions are met.\r\n- **Retry Mechanism:** Introduced a retry mechanism for failed tasks,\r\nimproving reliability during evaluations.\r\n- **Improved Robustness:** Enhanced the script’s robustness,\r\nparticularly in handling errors and edge cases, ensuring a smoother\r\nevaluation process.\r\n- **Logging Retry Statistics:** Implemented logging for retry attempts,\r\nproviding detailed insights and transparency into the evaluation\r\nprocess.\r\n- **Main Thread Exception Handling:** Fixed an issue where exceptions in\r\nthe main thread could cause silent failures, ensuring that all errors\r\nare properly reported and handled.\r\n- **Support for Chat and Completion Models:** Extended the script to\r\nsupport both chat and completion models, increasing its versatility\r\nacross different use cases.\r\n- **Environment Prefix Handling:** Enabled the script to accept an\r\nenvironment prefix as a parameter, enhancing its adaptability to\r\ndifferent deployment environments.\r\n- **Progress Monitoring:** Integrated progress monitoring with `tqdm`,\r\nallowing for real-time tracking of the evaluation process.\r\n- **Configurable Workers:** Made the number of workers configurable\r\nusing the `--workers` option, allowing for fine-tuned parallel\r\nprocessing during evaluations.\r\n\r\nHere's the PR message formatted in Markdown:\r\n\r\n#### Enhanced CLI Options for `eval.py`\r\n\r\nThis PR introduces several new command-line options to the `eval.py`\r\nscript, providing enhanced functionality and flexibility for model\r\nevaluation. The following changes have been made:\r\n\r\n- **`--model MODEL`**: Added support for specifying the model to be\r\nevaluated.\r\n- **`--mode MODE`**: Introduced a new option to select the API mode,\r\neither 'chat' or 'completion'. The default mode is set to 'chat'.\r\n- **`--input-prompt-key INPUT_PROMPT_KEY`**: Added the ability to define\r\nwhich column in the dataset should be used as the input prompt.\r\n- **`--output-answer-key OUTPUT_ANSWER_KEY`**: Added the ability to\r\ndefine which column in the dataset should be used as the output answer.\r\n- **`--workers WORKERS`**: Introduced multi-threading support, allowing\r\nusers to specify the number of worker threads for evaluating the\r\ndataset, improving processing efficiency.\r\n- **`--env-prefix ENV_PREFIX`**: Added an option to customize the prefix\r\nfor environment variables used for API keys and base URLs. The default\r\nprefix is set to `EVAL`.\r\n\r\nThese enhancements provide greater control over the evaluation process,\r\nallowing for more customized and efficient use of the `eval.py` script.\r\n\r\n## Testing\r\n\r\n```\r\npytest\r\n```","shortMessageHtmlLink":"RAFT Enhancements: Improved robustness, logging, checkpointing, threa…"}},{"before":"b2b41f8aaaa7bb3e246438b85643b747c1626b22","after":"11bdc518d81530142dc34242af9e95c37d4d474c","ref":"refs/heads/fix/merge-commit-597","pushedAt":"2024-08-27T06:20:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Add requirements from PR #597","shortMessageHtmlLink":"Add requirements from PR #597"}},{"before":null,"after":"b2b41f8aaaa7bb3e246438b85643b747c1626b22","ref":"refs/heads/fix/merge-commit-597","pushedAt":"2024-08-27T06:20:54.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Merge branch 'main' into upstream-merge-prep","shortMessageHtmlLink":"Merge branch 'main' into upstream-merge-prep"}},{"before":"70d6722e2e78621802b2810a2ec2213d52e8d4bc","after":"41fdee66562dc1c69de1c687fb60a65f27577aba","ref":"refs/heads/main","pushedAt":"2024-08-27T06:17:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Added python script named as raft_local.py to raft directory to run script completely locally using HF models (#605)\n\nThere are many users who want to use this `raft.py` script locally, but\r\ndue to dependency of **OpenAI** it ask for some extra bucks to get it\r\ndone, so for those like me who have good GPU available can use it\r\nlocally using `hugging-face` models, I also modified the `README.md` for\r\nshowing how to use the new script named as `raft_local.py` in same\r\ndirectory.\r\nModified the `requirements.txt` in raft directory for `torch` related\r\ndependencies.\r\n\r\n---------\r\n\r\nCo-authored-by: Shukla ","shortMessageHtmlLink":"Added python script named as raft_local.py to raft directory to run s…"}},{"before":"83ab06d33f80b37c26a7e2ea462fc5a772bd618e","after":"9035ac710edc004badc771a7a5f32e1523c1fd57","ref":"refs/heads/gh-pages","pushedAt":"2024-08-27T06:15:03.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Sticky effects with top row and first three columns on the left while scrolling. (#590)\n\nHi. I restructured the code to make sure the top row and left three\r\ncolumns be sticky while scrolling in x, y direction.","shortMessageHtmlLink":"Sticky effects with top row and first three columns on the left while…"}},{"before":"3850c2b6227a318101f2d11f0c4442398e39ea29","after":"70d6722e2e78621802b2810a2ec2213d52e8d4bc","ref":"refs/heads/main","pushedAt":"2024-08-27T06:13:40.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Package the Codebase (#565)\n\nThis PR aims to improve the organization and distribution of the\r\ncodebase by packaging the BFCL codebase. This PR is part of a series of\r\nchanges that break down the tasks outlined in #510.\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"[BFCL] Package the Codebase (#565)"}},{"before":"984883dae74ff214618f10d7307c88633bc715e7","after":"3850c2b6227a318101f2d11f0c4442398e39ea29","ref":"refs/heads/main","pushedAt":"2024-08-25T04:08:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Relocate Formatting Instructions and Function Documentation to System Prompt (#593)\n\nPreviously, formatting instructions and function documentation were\r\nincluded in the user prompt when interacting with models in prompting\r\nmode. However, these details are better suited for the system prompt,\r\nwhere they can more effectively guide the model's behaviour. This PR\r\nupdates the model prompting process by moving the formatting\r\ninstructions and function documentation to the system prompt, ensuring\r\nthey are appropriately positioned for optimal model performance.\r\n\r\nThis **will affect** the leaderboard score.\r\n\r\n----\r\n\r\nAlso in this PR:\r\n\r\n1. Update the model handlers to record the processed prompt/message and\r\ntools (if FC mode) in the result file when inference. This helps to\r\nidentify if there are any issues with the pre-processing phase.\r\n2. Fix 6 dataset issues: `irrelevance_49, live_irrelevance_157-18-1,\r\nlive_simple_79-40-0, live_parallel_0-0-0, live_parallel_4-1-0,\r\nlive_parallel_5-2-0`","shortMessageHtmlLink":"[BFCL] Relocate Formatting Instructions and Function Documentation to…"}},{"before":"545f312d9a600cce1e0d95b9f3423ad8ca145a1c","after":null,"ref":"refs/heads/bfcl-issue-template","pushedAt":"2024-08-23T21:41:52.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"HuanzhiMao","name":"Huanzhi (Hans) Mao","path":"/HuanzhiMao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/93459410?s=80&v=4"}},{"before":"30124c4399a82356d00b2114111dea889d99151a","after":"984883dae74ff214618f10d7307c88633bc715e7","ref":"refs/heads/main","pushedAt":"2024-08-23T21:41:51.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HuanzhiMao","name":"Huanzhi (Hans) Mao","path":"/HuanzhiMao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/93459410?s=80&v=4"},"commit":{"message":"Create an issue template for BFCL (#599)\n\nCreates a new issue template. \r\n\r\nDOES not change BFCL leaderboard values.","shortMessageHtmlLink":"Create an issue template for BFCL (#599)"}},{"before":null,"after":"545f312d9a600cce1e0d95b9f3423ad8ca145a1c","ref":"refs/heads/bfcl-issue-template","pushedAt":"2024-08-23T21:34:26.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Create an issue template for BFCL","shortMessageHtmlLink":"Create an issue template for BFCL"}},{"before":"0a40be4aac39217da3b4f3c1bdb98e24d633f70d","after":"83ab06d33f80b37c26a7e2ea462fc5a772bd618e","ref":"refs/heads/gh-pages","pushedAt":"2024-08-19T19:02:37.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HuanzhiMao","name":"Huanzhi (Hans) Mao","path":"/HuanzhiMao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/93459410?s=80&v=4"},"commit":{"message":"[BFCL Blog] Hot Fix Plotly Key Name for Simple Function (#589)\n\nFixed simple function plotly key name from `Python Simple Function AST`\r\nto `Simple Function AST`","shortMessageHtmlLink":"[BFCL Blog] Hot Fix Plotly Key Name for Simple Function (#589)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0xNVQwNzozMjoyOS4wMDAwMDBazwAAAAS2k23a","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0xNVQwNzozMjoyOS4wMDAwMDBazwAAAAS2k23a","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0xOVQxOTowMjozNy4wMDAwMDBazwAAAASeW85J"}},"title":"Activity · ShishirPatil/gorilla"}