Skip to content

Commit

Permalink
BFCL April 9th Release (ShishirPatil#338)
Browse files Browse the repository at this point in the history
This PR is for the BFCL April 9th release:

1. Bug fix in the evaluation dataset. This involves modifying both
prompts and function docs.
2. Bug fix for possible answers.

The detailed breakdown is attached below. If you spot any issue with our
evaluation dataset and/or possible answers, please feel free to raise an
issue!

| Test Category | Prompt/Func Doc Correction Count | Possible Answer
Correction Count |

|---------------------|-----------------------------|-----------------------------|
| Simple              | 3                           | 16 |
| Parallel             | 1                           | 16|
| Multiple              | 1                         | 11 |
| Parallel Multiple   | 10                          | 43 |

This PR **DOES** change the leaderboard score. We will update the
leaderboard website shortly, in PR ShishirPatil#341

---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
Co-authored-by: Fanjia Yan <fanjiayan@berkeley.edu>

---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
  • Loading branch information
HuanzhiMao and CharlieJCJ authored Apr 11, 2024
1 parent 19baa9a commit 98eefa2
Show file tree
Hide file tree
Showing 6 changed files with 105 additions and 103 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,8 @@ For inferencing `Databrick-DBRX-instruct`, you need to create a Databrick Azure


## Changelog

* [April 9, 2024] [#338](https://github.com/ShishirPatil/gorilla/pull/338): Bug fix in the evaluation datasets (including both prompts and function docs). Bug fix for possible answers as well.
* [April 8, 2024] [#330](https://github.com/ShishirPatil/gorilla/pull/330): Fixed an oversight that was introduced in [#299](https://github.com/ShishirPatil/gorilla/pull/299). For function-calling (FC) models that cannot take `float` type in input, when the parameter type is a `float`, the evaluation procedure will convert that type to `number` in the model input and mention in the parameter description that `This is a float type value.`. An additional field `format: float` will also be included in the model input to make it clear about the type. Updated the model handler for Claude, Mistral, and OSS to better parse the model output.
* [April 3, 2024] [#309](https://github.com/ShishirPatil/gorilla/pull/309): Bug fix for evaluation dataset possible answers. Implement **string standardization** for the AST evaluation pipeline, i.e. removing white spaces and a subset of punctuations (`,./-_*^`) to make the AST evaluation more robust and accurate. Fixed AST evaluation issue for type `tuple`. Add 2 new models `meetkai/functionary-small-v2.4 (FC)`, `meetkai/functionary-medium-v2.4 (FC)` to the leaderboard.
* [April 1, 2024] [#299](https://github.com/ShishirPatil/gorilla/pull/299): Leaderboard update with new models (`Claude-3-Haiku`, `Databrick-DBRX-Instruct`), more advanced AST evaluation procedure, and updated evaluation datasets. Cost and latency statistics during evaluation are also measured. We also released the manual that our evaluation procedure is based on, available [here](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html#metrics).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
{"corporate_finance.revenue_forecast":{"company":["XYZ"],"product":["A", "Product A"],"sales_units_increase_percentage":[10]}}
{"finance.property_depreciation":{"initial_cost":[200000],"depreciation_rate":[3],"years":[5],"monthly":[false,true,""]}}
{"solarFarm.potential":{"coordinates":[[43.653225,-79.383186]],"panelArea":[80000],"month":["December","Dec"]}}
{"population_genetics.calculate_ne":{"species":["tiger"],"generations":[100],"probability":[0.95]}}
{"currency_conversion.get_rate":{"from_currency":["EUR"],"to_currency":["Dollar","USD"],"date":["2022-01-01","01/01/2022","1/1/2022","Jan.1,2022","January 1, 2022"]}}
{"population_genetics.calculate_ne":{"species":["wild tiger", "tiger"],"generations":[100],"probability":[0.95]}}
{"currency_conversion.get_rate":{"from_currency":["EUR", "Euro"],"to_currency":["Dollar","USD"],"date":["2022-01-01","01/01/2022","1/1/2022","Jan.1,2022","January 1, 2022","2022-1-1"]}}
{"european_history.battle_details":{"battle":["Battle of Stalingrad", "Stalingrad"]}}
{"religion_history.get_schisms":{"religion":["Christianity"],"count":[3]}}
{"sculpture_price.calculate":{"material":["marble"],"size":[3],"complexity":["medium",""]}}
Expand Down Expand Up @@ -66,18 +66,18 @@
{"geodistance.find":{"origin":["New York City","NYC"],"destination":["Los Angeles","LA"],"unit":["miles",""]}}
{"traffic_estimate":{"start_location":["Las Vegas"],"end_location":["Los Angeles"],"time_period":["weekend"]}}
{"translate":{"text":["Hello, how are you?"],"source_language":["English"],"target_language":["French"]}}
{"library.search_books":{"location":["New York","New York, NY","New York City","NYC"],"genre":["Historical Fiction","historical fiction"],"title":[""]}}
{"library.search_books":{"location":["New York","New York, NY","New York City","New York City, NY","NYC", "New York public library"],"genre":["Historical Fiction","historical fiction"],"title":[""]}}
{"five_factor_model.analyse":{"talkative":[true],"nervous":[true],"artistic_interests":[false],"lazy":[true],"forgiving":[true]}}
{"european_history.get_monarchs":{"country":["France"],"century":[18]}}
{"get_population":{"year":[1954],"category":["veterans"]}}
{"us_history.population_by_state_year":{"state":["California","CA"],"year":[1970]}}
{"religion.get_origin":{"religion":["Buddhism"]}}
{"art_auction.fetch_artwork_price":{"artwork_name":["Starry Night"],"artist":["Van Gogh"],"platform":["auction_platform"]}}
{"art_auction.fetch_artwork_price":{"artwork_name":["Starry Night"],"artist":["Van Gogh"],"platform":["all", ""]}}
{"paint_color.trends":{"room":["living room","Living room"],"period":["","Daily"]}}
{"sculpture.create_custom":{"item":["horse","Horse"],"material":["Bronze","bronze"],"size":["",12]}}
{"artwork_search.find":{"type":["sculpture"],"location":["New York","New York, NY","New York City","NYC"],"era":["contemporary",""]}}
{"museum_info":{"museum":["Natural History Museum"],"city":["London"],"features":[["timings","exhibitions","accessibility"],["exhibitions","timings","accessibility"],["exhibitions","accessibility","timings"],["accessibility","timings","exhibitions"],["accessibility","exhibitions","timings"],["timings","accessibility","exhibitions"]]}}
{"exhibition_info":{"museum_name":["Museum of Modern Art","MOMA"],"month":["",1]}}
{"exhibition_info":{"museum_name":["Museum of Modern Art","MOMA", "Museum of Modern Art, New York"],"month":["",1]}}
{"music_shop.find_nearby":{"location":["Nashville, TN","Nashville"],"services":[["Violin Lessons"]],"instruments":[["Guitars"]]}}
{"concert.book_ticket":{"artist":["Eminem"],"location":["New York City","NYC"],"add_ons":[["Backstage Pass"]]}}
{"music.generate":{"key":["C Major"],"tempo":[120],"time_signature":["","4/4"]}}
Expand All @@ -89,7 +89,7 @@
{"video_games.get_player_count":{"game_title":["World of Warcraft"],"year":[2020],"platform":[""]}}
{"recipe_search":{"ingredients":[["chicken","mushrooms"],["mushrooms","chicken"]],"calories":[500],"meal":["lunch",""]}}
{"restaurant.find_group":{"location":["Seattle","Seattle, WA"],"cuisine":[["Seafood"]],"group_size":[5]}}
{"recipe.find":{"mainIngredient":["apple pie"],"ingredientLimit":[4]}}
{"recipe.find":{"mainIngredient":["apple pie", "apple"],"ingredientLimit":[4]}}
{"walmart.vegan_products":{"location":["Denver, CO","Denver"],"categories":[["vegan","gluten-free"],["gluten-free","vegan"]]}}
{"hotel.book":{"location":["New York","New York, NY","NYC"],"roomType":["deluxe","Deluxe"],"nights":[2],"additional_services":[["breakfast"]]}}
{"hotel_room_pricing.get":{"hotelName":["Hilton New York"],"roomType":["suite with queen size bed"],"nights":[3]}}
Expand All @@ -112,7 +112,7 @@
{"calculate_genotype_frequency":{"allele_frequency":[0.3],"genotype":["AA"]}}
{"forest_growth_forecast":{"location":["Yellowstone National Park"],"years":[5],"include_human_impact":[true]}}
{"calculate_fitness":{"trait_values":[[0.8,0.7]],"trait_contributions":[[0.4,0.6]]}}
{"prediction.evolution":{"species":["Homo Sapiens","Homo sapiens"],"years":[50],"model":["Darwin"]}}
{"prediction.evolution":{"species":["Homo Sapiens","Homo sapiens"],"years":[50],"model":["Darwin", ""]}}
{"find_restaurants":{"location":["Manhattan"],"food_type":["Thai"],"number":[5],"dietary_requirements":[["vegan"]]}}
{"calculate_bmi":{"weight":[85],"height":[180],"unit":["","metric"]}}
{"calculate_BMI":{"weight_kg":[70],"height_m":[1.75]}}
Expand Down Expand Up @@ -143,11 +143,11 @@
{"weather.humidity_forecast":{"location":["Miami","Miami, Florida","FL"],"days":[7],"min_humidity":["",0]}}
{"calculate_slope_gradient":{"point1":[[40.7128,-74.006]],"point2":[[34.0522,-118.2437]],"unit":["degree",""]}}
{"air_quality":{"location":["London"],"date":["2022-08-16","16/08/2022","Aug.16,2022","2022/08/16","16\\08\\2022"]}}
{"calculate_emissions":{"distance":[12000],"fuel_type":["gas","gasoline"],"fuel_efficiency":[25],"efficiency_reduction":["",0.0]}}
{"calculate_emissions":{"distance":[12000],"fuel_type":["gas","gasoline"],"fuel_efficiency":[20],"efficiency_reduction":["",0.0]}}
{"restaurant.find_nearby":{"location":["Seattle","Seattle, WA"],"cuisine":["Chinese"],"max_distance":[10]}}
{"map_service.get_directions":{"start":["New York","New York, NY","NYC"],"end":["Los Angeles","LA"],"avoid":[["highways","tolls"],["tolls","highways"]]}}
{"get_stock_info":{"company_name":["Apple Inc.","Apple"],"detail_level":["detailed"],"market":["NASDAQ",""]}}
{"sentiment_analysis":{"text":["I love the food here! It is always fresh and delicious."],"language":["english","English"]}}
{"sentiment_analysis":{"text":["I love the food here! It's always fresh and delicious."],"language":["english","English"]}}
{"calculate_neuronal_activity":{"input_synaptic_rate":[200],"weight":[0.5],"decay_rate":[0.1]}}
{"social_media_analytics.most_followed":{"topic":["psychology", "Psychology"],"sub_topics":[["behaviour","group dynamics"],["group dynamics","behaviour"]],"region":["","global"]}}
{"history.get_key_events":{"country":["Germany"],"start_year":[1871],"end_year":[1945],"event_type":[["War"]]}}
Expand All @@ -156,10 +156,10 @@
{"get_discoverer":{"discovery":["neutron"],"detail":[true]}}
{"historical_contrib.get_contrib":{"scientist":["Albert Einstein"],"date":["1915-03-17","03/17/1915","Mar.17,1915"],"category":["","all"]}}
{"get_earliest_reference":{"name":["Jesus Christ"],"source":["historical records"]}}
{"religious_history.get_papal_biography":{"papal_name":["Innocent III"],"include_contributions":[true]}}
{"religious_history.get_papal_biography":{"papal_name":["Innocent III","Pope Innocent III"],"include_contributions":[true]}}
{"calculate_paint_needed":{"coverage_rate":[400],"length":[30],"height":[12]}}
{"get_sculpture_info":{"artist_name":["James Plensa"],"detail":[true],"year":[2000,""]}}
{"find_exhibition":{"location":["New York","New York, NY","New York City","NYC","NY"],"art_form":["sculpture", "modern sculpture"],"month":["upcoming","next month","upcoming month","next"],"user_ratings":["high",""]}}
{"find_exhibition":{"location":["New York","New York, NY","New York City","NYC","NY"],"art_form":["sculpture", "modern sculpture"],"month":["upcoming","next month","upcoming month","next",""],"user_ratings":["high",""]}}
{"analyze_structure":{"building_id":["B1004"],"floors":[[2,3,4]],"mode":["dynamic"]}}
{"metropolitan_museum.get_top_artworks":{"number":[5],"sort_by":["popularity"]}}
{"instrument_price.get":{"brand":["Fender"],"model":["American Professional II Stratocaster"],"finish":["Rosewood"]}}
Expand Down
Loading

0 comments on commit 98eefa2

Please sign in to comment.