there were a lot of events recorded by gharchive.org of which 2,505,949 were push events containing 4,025,367 commit messages that amount to 314,298,771 characters filtered with words.py@e23d022007... to these 62 messages:
i hate my life
nie jestem informatykiem jestem nieudacznikiem zyciowym ktoremu sie nic nie ulozylo i nic nie wyszlo w zyciu
Adds a Colony Synthetic variant, with bug fixes (#4760)
- should fix fax machine problem(thx forest)
- gives trucker synth the frontier jumpsuit(Thwomplert)
- adds Freelancer Synthetic. This Synth is one that was bought off a civi market and reprogrammed, or stolen and reprogrammed, or hacked, You get the point - its going with a band of freelancers. The idea behind it is that this synth's team is dead and they are just programmed as a merc for pay - hoping to someday find their boss boss and give the money as set up. I always thought about this one for a long time and decided to put him in the civilian category, where its hard to roll and also gives you freedom to choose your allegiance. In this case I hope that a freelancer synthetic will open up unique avenue of RP and allegiance. I've only explored it once ingame, but it was very good for RP! Hopefully people can recreate this success.
was hard to make this guy look cool and I also wasn't sure on what his loadout would be. I ended up giving him random generic stuff while looking like a beat up freelancer(missing the armor especially hurt his look, since thats the largest piece of a freelancer - the curiass, but I don't want to give armor for balance reasons) and no beret because its for a SL only.
as usual, if a synth wants to change RP avenues and don different clothes for different RP, no one would know the difference
- bug bad
- a beat up UA laborer that so happens to be synthetic. you wouldn't expect it because there's so many similar looking people! exactly the job of a synth - to blend in.
- Freelancer colony synth hopefully will open up a unique avenue of RP. If they don't want to they can always ditch it - but its on a relatively rare and uncommon roll anyways.
[Screenshots & Videos](https://cdn.discordapp.com/attachments/490668342357786645/1166307813719556187/image.png?ex=654a03cb&is=65378ecb&hm=7108218bbaab61c78c0bedcecbfdcc07bdf9db87a3fefe9fb94b28d3430cc815&)
Put screenshots and videos here with an empty line between the
screenshots and the <details>
tags.
🆑 add: adds another Colony Synthetic variant, changes up some existing ones(trucker,CMB) fix: fixes a small problem with WY-Colony Synthetic access(thx forest), adds WY subtype of Synthetics for admin building/faxes fix: fixes problems with organic spawning ferret industries Trucker Synthetic /🆑
Reworks The Visuals Of Independent And Nanotrasen Captains (#2453)
Does what it says in the title. This is a demented PR that touches a lot of things, but its main benefit is that now regular independent captains, cowboy independent captains, and nanotrasen captains have a unique identity.
Of those changed, it includes:
- The Nanotrasen Captain (parade)
- The Nanotrasen Captain (regular)
- The Independent Captain (regular/parade)
- The Independent Captain (western)
The PR also axes a bunch of unused, or frankly quite basic lieutenant outfits that were nothing more than set dressing with not much substance behind them. The roles were not removed for now, and they have appropriate outfits as a placeholder pending a full removal.
This also means that the Head of Personnel was slightly touched up, mostly by having a coat and hat similar to the western captain's when appropriate. The role itself is pending a full visual rework for later that is beyond the scope of this PR.
Speaking of removals, this also means that captain outfits/roles that were there as a legacy of removed ships, were finally axed for good. Goodbye deserter captain for Riggs variant number 4, you will not be missed.
This PR also touches several (a lot) of maps, mostly adding/removing outfits that were either missing, or didn't fit with the dress code of the vessel.
Also the PR fixes an oversight by @MarkSuckerberg by making the BYOND version warning an actual warning, instead of an error when compiling. Etto bleh.
Visual cohesion is important, and dear fucking god if I see one more independent western captain not wearing the duster because it wasn't in the ship, I will weep, and my weeping will cause a biblical deluge.
🆑 PositiveEntropy imageadd: Outfits for independent and Nanotrasen captains have been violently reworked. /🆑
new space ruin, the biological research outpost (#79149)
adds this ruin to space ruin pool this is a shady (as NT always is) bioresearch outpost that got fucked up by an experiment this has like some puzzle aspect to it since you gotta find keycards and shit and press buttons to unlock shield gates this ends with you fighting a heart which if you defeat, destroys the blockade that prevents you from entering the outpost vault
also you can no longer literally just cut indestructible grilles or unanchor indestructible windows
variant of pressure plate that you cannot remove and it sends a puzzle signal cooler red puzzle doors that look very foreboding or something idk theyre for this ruin also puzzle blockades, which are indestructible dense objects that are destroyed if they receive a puzzle signal and also buttons and keycard pads for puzzles
2023-10-21.18-17-07.mp4
2023-10-21.18-19-20.mp4
stuff that throws electric shocks in a pattern, ignores insuls and only knocks down, and no you cannot just run past
2023-10-21.18-21-05.mp4
living floor, it can only attack stuff on top of it and it attacks until the victim is dead it is invincible to all but a crowbar, and it cannot move, and it remains hidden until a victim is in range
2023-10-21.18-23-15.mp4
living flesh, it can replace your limbs with itself the conditions for that are; the limb must have 20 or more brute, victim must be alive and dismemberable, the limb may not be torso or head, or the limb may not be living flesh alternatively it can replace a missing limb these are all checked with every attack they have 20 hp the limbs in question will sometimes act up, while passively draining nutrition, arms will randomly start pulling nearby stuff, legs may step randomly limbs when detached, turn into mobs and reactivate AI 2 seconds later. if the host is shocked, all living flesh limbs will detach, or if the host dies they will also do that
2023-10-21.18-29-10.mp4
ruin variety is cool i think also the other things i added should be useful for other mappers for bitrunning or whatever
also bug bad for that one fix
🆑 add: living floor, living flesh, and other stuff for the bioresearch outpost ruin add: bioresearch outpost ruin fix: you may not defeat indestructible grilles and windows with mere tools /🆑
Co-authored-by: Jacquerel hnevard@gmail.com
Basic Pirate NPCs (#79284)
Converts hostile pirate NPCs to basic mobs - specifically, a subtype of trooper. As their behavior is not meaningfully distinct from other troopers, this conversion mostly just sticks them on the existing AI behavior while keeping the rest the same.
Pirates do have one new thing going for them, though, to differentiate them from other troopers. They use the new plundering attacks component, which means that every time they land a melee attack, they steal money from the bank account of whoever they hit. This requires the target to be wearing an ID with a linked bank account, so it's not the hardest thing in the world to hide your money from them - but it's still something to be wary of! If killed, any mob with this component will drop everything they've stolen in a convenient holochip.
Takes down 5 more simplemobs, and (I think) converts the last remaining trooper-type enemy to be a basic trooper. (It's possible there's more I've forgotten that could use the same AI, though.)
The money-stealing behavior is mostly good because I think it's funny, but it also makes the pirates something a little distinct from "yet another mob that runs at you and punches you until you die". They still do that, but now there's a little twist! This can be placed on other mobs too, if we want to make any other sorts of thieves or brigands.
🆑 refactor: Pirate NPCs now use the basic mob framework. They'll be a little smarter in combat, and if you're wearing your ID they'll siphon your bank account with every melee attack! Beware! Please report any bugs. /🆑
refactor(proxy-wasm) improve pwexec resurrection and instance lifecycle
The main goal of this overhaul is to simplify on_context_create
, make
it fully re-entrant and properly handle instance recycling at the same
time.
The way to do so, in my opinion, was to move pwexec
creation where
rexec
already was. In other words, always lookup the context id in the
instance rbtree, and if not found, create it. This means that
surrounding code also needed big overhauls. It also removes the
reference counting poor man's GC of the older implementation. The code
became really ugly by then so I took the time to also review this
module's code structure instead of making a very ugly commit.
This new ngx_proxy_wasm.c file should be much easier to read and follow now.
One change I do not fully like is moving the next_id
to a global
counter, but we do not have a "global proxy-wasm conf" object yet. I
also started thinking about pre-allocating a number of pwexecs
(like
worker_connections
) and use free/busy queue that all filter chains can
dip into to get a context id + context memory zone. Perhaps for a later
time.
Update to Sentry v4 (#1780)
-
Bump minimum PHP version to 8.1
-
Missed these too
-
Shit's broke and stupid as fuck
-
Update SentryHelper.php
-
Update package-lock.json
-
I did the fixing
Co-authored-by: Belle Aerni belleaerni@gmail.com
Revert "Merge remote-tracking branch 'upstream/master' into fuck-you"
This reverts commit 02e475c4ef5ea4fba3d50a5990a4f069233507a3, reversing changes made to 7e98858138b6ef5a32c21ed93456203720740a13.
Revert "Revert "Merge remote-tracking branch 'upstream/master' into fuck-you""
This reverts commit 237900bddf859c8be3f457816caaac1a17b627eb.
Revert "Revert "Revert "Merge remote-tracking branch 'upstream/master' into fuck-you"""
This reverts commit d6c100992e65f6a9b87f6e60db83cc0cb54fab53.
Cait Sith Avatar:
- Cait sith has proper name prefix and named properly to be "Cait Sith" instead of "The CaitSith"
- BPs Implemented
- Regal Slash (BP:Rage): 3-hit physical
- Level ? Holy (BP:Rage): aoe magical
- Rolls a die and does dmg proportional to roll
- Only does damage if the target's level is divisible by the roll
- Mewing Lullaby (BP:Ward): AoE lullaby that resets TP
- Eerie Eye (BP:Ward): conal silence/amnesia with appropriate elemental resist check for amnesia, but retail does light check for silence
- Reraise II (BP:Ward): single-target 60-minute reraise II buff for any party member
- Raise II (BP:Ward): single-target raise II for any party member
- Altana's Favor (BP:Ward): 2-hour ability gives arise to all party members in range (Arise and reraise III with infinite duration)
Implemented Pretty Basic Map Weather Simulation, 11/7/2023
Implemented Pretty Basic Map Weather Simulation. Got more tracking stuff for climate values based on the current location on the map. As well as a pretty basic "weather simulation" for when on the world-map. It needs alot of work honestly, but it does work mostly atleast. One of the main issues, that I don't know if I'll be able to fix atm atleast, is that when fast-traveling the weather changes, but often not the value I'm expected. Likely due to the vanilla DFU weather changing logic, which is likely just time based when it decides to change the current weather, but seems like my attempt at changing this does not work, so yeah. Other than that, I think I'm happy enough with the basic weather simulation stuff. So likely the next thing I'm going to try to work on is refining the view range/radius thing, Line of sight, whatever you want to call it, basically alter this value depending on stuff like the travel mode and current weather and such, will take some work but hopefully will work out, after that I'm not sure, but will see, etc, 11/7/2023.
People listen up don't stand so close, I got somethin that you all should know. Holy matrimony is not for me, I'd rather die alone in misery.
Adds Red Shoes (#901)
Mr. Heavenly's Abnormality Jam Entry #1
Records
uncommented weapon
Finishing touches
Design rework
adds ego gift and inhands
New sprites!
uncommented sfx
insanity fix
quieter sound loop
Fixes some shit
fix linters
requested changes
Adds Red Shoes
Mr. Heavenly's Abnormality Jam Entry #1
Records
uncommented weapon
Finishing touches
Design rework
adds ego gift and inhands
New sprites!
uncommented sfx
insanity fix
quieter sound loop
Fixes some shit
fix linters
requested changes
Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm
fixes suit check in assimilate() proc
Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com
Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm
fixes dismembering
Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com
Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm
Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com
breach is more dangerous
compiles
bug fix
fixes simple mob
bug fixes
Panic fixed!!!!
stuff
wayward records
Update code/modules/paperwork/records/info/he.dm
Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com
Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm
Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com
attribute bonus
requested changes
Co-authored-by: Mr.Heavenly davidx3adamhunt@gmail.com
Correct copy/move for union
By writing separate construction and assignment, plus the new feature of suppressing assignment to a member by writing member = _ ;
(now allowed only in assignment operators).
I do realize that's an "opt-out" which I normally prefer to avoid, but:
-
I considered and decided against (for now) the alternative of not having assignment be memberwise by default. I want to keep the (new to Cpp2) default of memberwise semantics for assignment as with construction. I think that's a useful feature, and normally if you do assign to a member it doesn't arise, and so I think it makes sense to explicitly call out when we're choosing not to do any assignment at all to a member before doing other assignment processing. We'll get experience with how it goes.
-
_
is arguably natural here, since it's pronounced "don't care." There too, we'll see if that is natural generalized, or feels strained. For now it feels natural to me.
Hey what if I made Sleeping Carp better at nonlethal takedowns and also deflect with combat mode instead of throw mode (but cost more) (#79517)
It's been a hot minute hasn't it?
When I initially reworked Sleeping Carp, we didn't have combat mode. Now that we do, and that Sleeping Carp has substantially less defensive power to justify having to make a choice between deflection and attacking, it's probably about time we updated this aspect back to what it was before my rework. Sorta.
Now, we can have all the deniability of the previous method, while also letting you reliably protect yourself from ranged attacks at all times while it matters. Because of this, I increased the price up to 17 TC because of this change just to be on the safe side. The higher uptime of projectile immunity while also being able to attack during that time makes this a lot stronger overall.
Secondly, Sleeping Carp presently just isn't as good as a good ol' baton. It takes a lot more hits to accomplish the same task that a baton can. Many people feel like they can't even reasonably fight anyone for fear of the baton, or they would rather use a baton and kill someone at their leisure. So we've updated some of the moves in order to facilitate Sleeping Carp as a substantial contender for 1v1 fighting, and lessen the need for a baton by adding a lot more Stamina damage overall to the various attacks;
Keelhaul: Now a Shove Shove combo. Does literally zero lethal damage, but now temporarily blinds and dizzies the target as well as its previous effects. The amount of lethal damage it did was...extremely small, so this isn't a particularly big loss.
Grabs and Shoves: Deal some amount of stamina damage (20). You need to be in combat mode in order to perform these special attacks (more deniability). Grabbing someone while they have 80 Stamina damage or more will cause them to fall unconscious. Yes, I really did just want to add a Vulcan Nerve Pinch, what do you want from me?
That's it actually. Oh, I guess they are heavy sleepers now too. Because its funny.
I often get told (read: thrown various insults and slurs at me while mentioning this as the justification) that Sleeping Carp is not very strong anymore since it lost all that invisible armor I added way back + I removed the stuns in my initial rework. This made some people upset (I think at least one person wished for my death).
So, having given it at least 2 years, I wanted to recapture parts of what made the older Sleeping Carp (before my rework) strong, some of the benefits of the new version, and introduce a brand new aspect; nonlethal takedowns. This makes it beneficial for pacifists, as well as for kidnapping.
This should not meaningfully make Sleeping Carp any stronger against the things that typically ruin its day. I suspect in a straight joust with a baton, Sleeping Carp will still struggle. But against what should be its strong points (lone targets and ranged weapons), it will be strong once again rather than clumsily unable to do very much at all.
🆑 balance: Harnessing Shoreline Quay (bluespace energy, probably), a mystical energy (total bullshit) that permeates the Astral Waterways (bluespace quantum dimensions, probably), Sleeping Carp users can now once against deflect projectiles with their bare hands when focused in on battle (in combat mode). balance: The Keelhaul technique is now nonlethal (a philosophical acknowledgement of the familial bond of sleep and death), but causes the target to become temporarily blind and dizzy along with its previous effects. balance: Sleeping carp users, while in combat mode, deal Stamina damage with their grabs and shoves. If the target of their grab has enough Stamina damage (80), they are knocked unconscious from a well placed nerve pinch. balance: Sleeping carp users find it very hard to wake up once they fall asleep.... /🆑
Do NOT make a new accent. Worst mistake of my life, holy fuck. (#20685)
-
empty template so i can work from another PC
-
holy fuck
-
essential
-
idk how accents are handled. its poorly documented
-
sure
-
augh
-
amtbe
-
Update fugitive_outfits.dm
fuk that
[MIRROR] swaps one of the fridges in snowcabin to be in line with the rest [MDB IGNORE] (#24754)
- swaps one of the fridges in snowcabin to be in line with the rest (#79414)
In truth, this is an IDED PR (this is not at all sarcasm, and as we all know nobody would lie on the internet) that came about from a round i just got done playing wherein i was in snowcabin trying to cook up some food for fun, well wouldn't you know it i couldn't open one of the fridges, what gives? well i got to thinkin it has to do with the fridge type used, for some reason the fridge that holds the universal enzyme uses the freezer/fridge/kitchen type instead of the fridge/open type that the other two do, so i went ahead and just changed it off to the other fridge types so now anyone can open it.
its a bit stupid to have a single fridge thats different from the rest for no discernable reason, i can't think of any reason universal enzyme would need to be guarded ever, you could just say "well why not go back onto the station and grab some if the fridge is locked", well if for some reason i'm barred from the station i want to be able to use as many tools within my reach as possible preferably without many hoops, and this ones unnecessary.
fix: changes the type of fridge used to hold the universal enzyme in the snowcabin gateway's kitchen, letting everyone access it like the rest of the fridges.
/:cl:
- swaps one of the fridges in snowcabin to be in line with the rest
Co-authored-by: Donglesplonge 120208006+Donglesplonge@users.noreply.github.com
M707 "Vulture" Anti-Materiel Rifle (#4253)
Adds the M707 "Vulture" anti-materiel rifle to the game. Design doc here.
The M707 is meant to take the place of a heavy support weapon, not unlike the mortar. It is a 20mm bolt-action rifle, capable of loading from 4-round magazines. Each round does 400 damage with full AP (50), but it is not a simple task to fire the weapon. The gun, being as high-caliber as it is, will immediately break your arm & hand if you do not fire it without use of the built-in bipod. In addition, its accuracy is massively reduced below its ideal range (10 tiles), which means the scope is necessary to be used.
The scope does not function like a regular scope. (see screenshot section for details) Instead, it shows a 5x5 area (the rest is blacked out) 12 tiles ahead, with an aiming reticle in the center. The aiming reticle will drift over time within the 5x5, requiring you to re-adjust or use the Hold Breath ability to temporarily stop the sway. If you open up the scope's UI, you will be able to modify the scope and the reticle's location, one tile at a time, very slowly.
To assist with this, the Vulture comes with a spotting scope & tripod. A secondary user is able to assemble and use the spotting scope. The scope is a complement to the Vulture's, allowing a communicative team to become far more effective. The spotter's view, on use, will be locked to the location of the Vulture scope. However, the spotter's view is not locked to a 5x5 area, instead getting a view of the full area, on top of an extra 2 tiles (in each direction) of view range. Finally, both the spotter and sniper's scopes have night vision equivalent to an SG's goggles.
The bullet itself is a powerful beast. Powerful enough to pierce walls, people, and barricades, but with 2 caveats. The first is that every wall/cade penetration removes 75 damage from the round, and any cades/tables that the round passes over will be immediately destroyed by the round. In addition, anyone in a large range will hear the report of the rifle sound and the direction it came from.
Update as of 8/31: Vulture and its spotter scope now require a pamphlet to use (a pamphlet gives the trait needed to use both), guncase spawns with 2.
It's a unique weapon that encourages communication inside a team, while simultaneously not contributing to the IFF ungaball. The weapon promotes thoughtful gameplay and repositioning to be able to hit a target without friendlies getting in the way or getting overrun.
🆑 Zonepace, Thwomper add: Added the M707 "Vulture" anti-materiel rifle. Not currently player-obtainable. Credit to Tophat and Kaga for the lore description. /🆑
Co-authored-by: harryob me@harryob.live
[Security Solution][Detection Engine] improves new terms rule for multiple fields (#157413)
As described in our README for new terms rule type:
Runtime field supports only 100 emitted values. So for large arrays or combination of values greater than 100, results may not be exhaustive. This applies only to new terms with multiple fields. Following edge cases possible:
- false negatives (alert is not generated) if too many fields were emitted and actual new values are not getting evaluated if it happened in document in rule run window.
- false positives (wrong alert generated) if too many fields were emitted in historical document and some old terms are not getting evaluated against values in new documents.
To avoid this and deliver the better experience for our customers, this PR is moving from current implementation(emitting aggregated values for multiple new terms fields) towards using composite aggregation for each page from phase 1, split in chunks by 500. This allowed to be done due order of composite aggregation results
NOTE: implementation for a single new terms filed is the same, due to performance reasons
Implementation | Shards | Docs per shard | Simultaneous Rule Executions | Fields cardinality | Rule Execution Time Runtime field(current implementation) | On week work -- | -- | -- | -- | -- | -- | -- array of unique values length 10 | | | | | | Terms 1 field | 10 | 900,000 | 1 | 100,000 | | Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 30s | 41s Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 40s | 56s
Implementation | Shards | Docs per shard | Simultaneous Rule Executions | Fields cardinality | Rule Execution Time Runtime field(current implementation) | On week work 1,000 per batch | On week work 500 per batch -- | -- | -- | -- | -- | -- | -- | -- Terms 2 fields | 10 | 9,000,000 | 1 | 100,000 | 19s | 41s | 35s Terms 3 fields | 10 | 9,000,000 | 1 | 100,000 | 21s | 52s| 47s CPU % | | | | | 400-450% |500-600% | 400-450%
I selected size of the chunk as 500, since it's a bit faster and less load on CPU
When running composite search requests in parallel, noticed significant CPU increase in Elasticsearch ~ 1,000% for 2 requests in parallel against ~ 500% for single. Where win in performance was not that big: ~ 35s for 2 in parallel, 43s for a single request. I think, having only one request is the better option to go, that will prevent unnecessary CPU usage
I've added several functional test cases, that ensures, no missing/false positives alerts are occurring. Applied to the old implementation, they would fail
Because we create query, that can have few thousands clauses, it is possible it may fail due to the maximum number of allowed clauses I implemented retry that: If request fails with batch size of 500 (default value), we will try to reduce it in twice per each retried request, up until 125. Per ES documentation, max_clause_count min value is 1,000 - so with 125 we should be able execute query below max_clause_count value
Delete any items that are not applicable to this PR.
- Unit or functional tests were updated or added to match the most common scenarios
Co-authored-by: kibanamachine 42973632+kibanamachine@users.noreply.github.com
The fix everything update
Because... WHOO boy did I have a lot to explain for.
Like the fact that throwing in all those button checks destroyed performance on most (single core) boards! We not only brought the special input handling to its own function (more on that), but fixed it so it wasn't spamming return signals every frame, which artificially noticeably lowered camera tracking rate. Now it should be as smooth as the original fork, with less (if any) difference between single and dual-core devices.
The main thing though is the buttons handling. Notes in the code, but to the point, it's not LightgunButtons' fault; it's the keyboard (and ONLY the keyboard btw) inputs getting "lost in traffic" alongside camera updates. We now handle those in a slightly hacky, but infinitely more consistent manner without breaking performance, at the cost of potential jitter while pressing keyboard buttons in motion. The values chosen have been picked to mitigate this pause as much as possible.
And with that, the dual core update is rendered kinda moot? Lol. But it handles button inputs anyways and has a chance of reducing latency, so it's maintained and shouldn't cause issues either way.
FUN FACT: didjuknow that the reason the original dual core input handling caused the camera to die is because the second core is so fast that it completely jammed the USB interface? Yeah, it's that stupid quick. Even with the delay-based timing, there's no noticeable pause, so still a benefit (if a theoretical one) to having it.
[SQUASHED] core: Blacklist pixel system feature from Google Photos
We want to include the P21 experience flag to enable new features,
however it seems like Google Photos uses it to decide whether to use the
TPU tflite delegate. There doesn't seem to be any fallback so we need to
make sure the feature is not exposed to the app so that a normal
NNAPI/GPU delegate can be used instead.
Test: Google Photos editor with PIXEL_2021_EXPERIENCE feature in product
Signed-off-by: Kuba Wojciechowski <nullbytepl@gmail.com>
Change-Id: I51a02f8347324c7a85f3136b802dce4cc4556ac5
commit 67eb31b3bb43d06fcc7f6fdb2f92eb486451cae6 Author: kondors1995 normandija1945@gmail.com Date: Thu Jun 9 17:39:25 2022 +0530
Core: Extend Pixel experience Blacklist For Google Photos
Turns out having these brakes Original quality backups.
Since these indicate that the device is pixel 4 with in the turn brakes device spoofing as OG pixel
Change-Id: I336facff7b55552f094997ade337656461a0ea1d
commit 508a99cde60b73dc3f1e843d569bca31def35988 Author: ReallySnow reallysnow233@gmail.com Date: Fri Dec 31 16:40:23 2021 +0800
base: core: Blacklist Pixel 2017 and 2018 exclusive for Google Photos
* In this way can use PixelPropsUtils to simulate the Pixel XL prop
method to use the unlimited storage space of Google Photos
* Thanks nullbytepl for the idea
Change-Id: I92d472d319373d648365c8c63e301f1a915f8de9
commit aaf07f6ccc89c2747b97bc6dc2ee4cb7bd2c6727 Author: Akash Srivastava akashniki@gmail.com Date: Sat Aug 20 19:04:32 2022 +0700
core: Pixel experience Blacklist For Google Photos for Android 13
* See, in Android 13 pixel_experience_2022_midyear was added, which needs to be blacklisted aswell
Change-Id: Id36d12afeda3cf6b39d01a0dbe7e3e9058659b8e
commit 9d6e5749a988c9051b1d47c11bb02daa7b1b36fd Author: spezi77 spezi7713@gmx.net Date: Mon Jan 31 19:17:34 2022 +0100
core: Rework the ph0t0s features blacklist
* Moving the flags to an array feels more like a blacklist :P
* Converted the flags into fully qualified package names, while at it
Signed-off-by: spezi77 <spezi7713@gmx.net>
Change-Id: I4b9e925fc0b8c01204564e18b9e9ee4c7d31c123
commit d7201c0cff326a6374e29aa79c6ce18828f96dc6 Author: Joey Huab joey@evolution-x.org Date: Tue Feb 15 17:32:11 2022 +0900
core: Refactor Pixel features
* Magic Eraser is wonky and hard to
enable and all this mess isn't really worth
the trouble so just stick to the older setup.
* Default Pixel 5 spoof for Photos and only switch
to Pixel XL when spoof is toggled.
* We will try to bypass 2021 features and Raven
props for non-Pixel 2021 devices as apps usage
requires TPU.
* Remove P21 experience system feature check
Change-Id: Iffae2ac87ce5428daaf6711414b86212814db7f2 Signed-off-by: Hưng Phan phandinhhungvp2001@gmail.com
sched/core: Fix ttwu() race
Paul reported rcutorture occasionally hitting a NULL deref:
sched_ttwu_pending() ttwu_do_wakeup() check_preempt_curr() := check_preempt_wakeup() find_matching_se() is_same_group() if (se->cfs_rq == pse->cfs_rq) <-- BOOM
Debugging showed that this only appears to happen when we take the new code-path from commit:
2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
and only when @cpu == smp_processor_id(). Something which should not be possible, because p->on_cpu can only be true for remote tasks. Similarly, without the new code-path from commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
this would've unconditionally hit:
smp_cond_load_acquire(&p->on_cpu, !VAL);
and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this would result in an instant live-lock (with IRQs disabled), something that hasn't been reported.
The NULL deref can be explained however if the task_cpu(p) load at the beginning of try_to_wake_up() returns an old value, and this old value happens to be smp_processor_id(). Further assume that the p->on_cpu load accurately returns 1, it really is still running, just not here.
Then, when we enqueue the task locally, we can crash in exactly the observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from the wrong CPU, therefore we'll iterate into the non-existant parents and NULL deref.
The closest semi-plausible scenario I've managed to contrive is somewhat elaborate (then again, actual reproduction takes many CPU hours of rcutorture, so it can't be anything obvious):
X->cpu = 1
rq(1)->curr = X
CPU0 CPU1 CPU2
// switch away from X
LOCK rq(1)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 9
switch_to(Z)
X->on_cpu = 0
UNLOCK rq(1)->lock
// migrate X to cpu 0
LOCK rq(1)->lock
dequeue_task(X)
set_task_cpu(X, 0)
X->cpu = 0
UNLOCK rq(1)->lock
LOCK rq(0)->lock
enqueue_task(X)
X->on_rq = 1
UNLOCK rq(0)->lock
// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
X->on_cpu = 1
UNLOCK rq(0)->lock
// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb(); // wake X
ttwu()
LOCK X->pi_lock
smp_mb__after_spinlock
if (p->state)
cpu = X->cpu; // =? 1
smp_rmb()
// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 0
if (p->on_rq)
smp_rmb();
if (p->on_cpu && ttwu_queue_wakelist(..)) [*]
smp_cond_load_acquire(&p->on_cpu, !VAL)
cpu = select_task_rq(X, X->wake_cpu, ...)
if (X->cpu != cpu)
switch_to(Y)
X->on_cpu = 0
UNLOCK rq(0)->lock
However I'm having trouble convincing myself that's actually possible on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu observes ->state != RUNNING, it must also observe ->cpu != 1.
(Most of the previous ttwu() races were found on very large PowerPC)
Nevertheless, this fully explains the observed failure case.
Fix it by ordering the task_cpu(p) load after the p->on_cpu load, which is easy since nothing actually uses @cpu before this.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Paul E. McKenney paulmck@kernel.org Tested-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net Change-Id: Idd54334615da4c78698ca8b3b12b514ae9d8360f Signed-off-by: Alexander Winkowski dereference23@outlook.com
Update base for Update on "AOTAutograd: handle set_(), detect metadata mutations that cancel out"
This should be enough to get voznesenskym 's FSDP branch to plumb set_()
through AOTAutograd properly and have everything properly no-op out. Main changes are:
(1) graph break on aten::set_.source_Tensor_storage_offset
(we could support it but it isn't needed, seems safer to graph break)
(2) Functionalization: add a "proper" functionalization kernel for aten::set_.source_Tensor
. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the FunctionalTensorWrapper
when a given tensor has been mutated by a set_()
call.
(3) AOTAutograd: I added a new field, InputAliasInfo.mutates_storage_metadata
, so we can distinguish between "regular" metadata mutations, and metadata mutations due to set_()
calls. This is mainly because at runtime, one requires calling as_strided_()
to fix up metadata, while the other requires calling set_()
.
(4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same).
I also killed was_updated()
and was_metadata_updated()
, and replaced them with (existing) has_data_mutation()
and (new) has_data_mutation()
, which can more accurately distinguish between data-mutation vs. set_()
calls vs. metadata-mutation
This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example:
def f(x):
x_view = x.view(-1)
x.set_(torch.ones(2))
x_view.mul_(2)
return
If you have an input that experiences both a data-mutation and a x_old.set_(x_new)
call, there are two cases:
(a) the data mutation happened on the storage of x_new
. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual copy_()
on that other graph input
(b) the data mutation happened on the storage of x_old
. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like:
def functionalized_f(x):
x_view = x.view(-1)
# set_() desugars into a no-op; later usages of x will use x_output
x_output = torch.ones(2)
# functionalize the mutation on x_view
x_view_updated = x.mul(2)
x_updated = x_view_updated.view(x.shape)
# x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation
# We need to return both updated tensors in our graph
return x_updated, x_output
def runtime_wrapper(x):
x_data_mutation_result, x_set_mutation_result = compiled_graph(x)
# First, perform the data mutation on x's old storage
x.copy_(x_data_mutation_result)
# Then, swap out the storage of x with the new storage
x.set_(x_set_mutation_result)
There are two things that make this difficult to do though:
(1) Functionalization: the functionalization rule for set_()
will fully throw away the old FunctionalStorageImpl
on the graph input. So if there are any mutations to that FunctionalStorageImpl
later on in the graph, the current graph input won't know about it. Maybe we can have a given FunctionalTensorWrapper
remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated.
(2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying.
It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need FunctionalTensorWrapper
to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng
[ghstack-poisoned]
Makes the Regal Condor realistically simulate being shot dead with a high caliber hand cannon by making it HITSCAN [MDB IGNORE] (#24149)
- Makes the Regal Condor realistically simulate being shot dead with a high caliber hand cannon by making it HITSCAN (#78674)
The Regal Condor come with a magazine and ammo already inside.
The recipe for the magazine now no longer needs TC, but does need donk pockets (sponsored murder gear, you see) and a hell of a lot more materials per magazine (you're looking at like 40 sheets of various materials all up). It also needs you to make the Condor first. But it comes preloaded with ammo.
The Condor is 1 whole TC more expensive. Also needs some metal. The old recipe is there in spirit.
The Regal Condor and the magazines come with 10mm Reaper bullets. They're high damage. They're high AP. They are also hitscan.
Apparently people don't like the Condor. Too much effort for not enough reward. After all, revolvers exist. 'It must be a joke' they say! 'It's joke content! I went to all that effort to make it for nothing! That slut Anne tricked us!'
Wrong, bitch.
If you want the Condor to make you shit yourself the moment someone with it appears on the screen, then fine!
🆑 balance: Despite earlier reports suggesting that the famous lethality of the Regal Condor was largely a myth, there has been rumors that the gun has once again started to display its true killing potential on any station that it 'manifests'. /🆑
- Makes the Regal Condor realistically simulate being shot dead with a high caliber hand cannon by making it HITSCAN
Co-authored-by: necromanceranne 40847847+necromanceranne@users.noreply.github.com
AboutPage routing
plz work fuck you gitignore idea balls
[DNM][HACK] telephony: Force Class 0 SMS to Class 1
This kills Flash SMS messages. Fuck you airtel
Change-Id: Ifb0c9e8bae5c12868d178fbdaeceb2cc72a0ffb6 Signed-off-by: Sageofd6path mail2anirban95@gmail.com
Adds distorted form
adds some basic features
new 1% sprite dropped
text update
Finished work mechanics
adds basic breaching
should fix linters a bit
It works!!!! Kinda...
adds crumbling armor and hammer of light (beta)
adds cool and important stuff
does a thing
adds apostle and tutorial abnorms
adds the stuff
might fix linters
adds a console proc
adds crumbling armor's proper attack and red queen
does some things
should fix linters
adds a blubbering toad transformation
adds more attacks
brings the tier up
adds big boy attacks
updates some sfx, fixes bugs
adds jump attacks
why does linters care about indentation on comments?
adds suggested changes
should fix some stuff
adds info
adjusts damage numbers
updates an effects and fixes transformations
updates blacklist
lowers stack damage
lowers max qlip to 3
adds bloodbath
adds a new AOE attack
adds halberd apostle
blacklists DF from pink midnight
fixes weirdness
requested changes and sound design improvement
removes armortype
removes armortype for real
damage coeff update
makes suggested changes
updates comments
adds procs
adds stuff
The Hive Awakens (#5940)
There is actually less paths for the hivebots. They are actually some of the most primitive mobs on the codebase. So it was high time they were given a facelift. As I said with my previous mob update robots are a good alternative as mobs compared to humanoids, and with the hivebots we can present a threat of hostile machine intelligence to round out the existing threats of pirates, mercs, aliens beasts and the supernatural. Once more these robots are also far mroe generalist then the existing robot varieties and as most types of them are not very dangerous they can be released on civilian crew without fear of them causing extreme damage,
🆑 add: A couple new varieties of both melee and ranged hivebots removed: redundant hivebot varieties tweak: siegebots now have sniper range fitting their name, their attack has been nerfed (holy fuck the one shoot explode on contact grenades with a base attack of 10... that's 1 frag grenade a second!!!) fix: hivebots now use their various cataloguer entiries sprites: hivebot types are now more visually distinct /🆑
API: Allow comparisons with and between any python integers
This implements comparisons between NumPy integer arrays and arbitrary valued Python integers when weak promotion is enabled.
To achieve this:
- I allow abstract DTypes (with small bug fixes) to register as loops (
ArrayMethods
). This is fine, you just need to take more care. It does muddy the waters between promotion and not a bit if the result DType would also be abstract. (For the specific case it doesn't, but in general it does.) - A new
resolve_descriptors_raw
function, which does the same job asresolve_descriptors
but I pass it this scalar argument (can be expanded, but starting small).- This only happens when available, so there are some niche paths were this cannot
be used (
ufunc.at
and the explicit resolution function right now), we can deal with those by keeping the previous rules (things will just raise trying to convert). - The function also gets the actual
arrays.dtype
instances while I normally ensure that we pass in dtypes already cast to the correct DType class. (The reason is that we don't define how to cast the abstract DTypes as of now, and even if we did, it would not be what we need unless the dtype instance actually had the value information.)
- This only happens when available, so there are some niche paths were this cannot
be used (
- There are new loops added (for combinations!), which:
- Use the new
resolve_descriptors_raw
(a single function dealing with everything) - Return the current legacy loop when that makes sense.
- Return an always true/false loop when that makes sense.
- To achieve this, they employ a hack/trick:
get_loop()
needs to know the value, but onlyresolve_descriptors_raw()
does right now, so this is encoded on whether we use thenp.dtype("object")
singleton or a fresh instance! (Yes, probably ugly, but avoids channeling things to more places.)
- Use the new
Additionally, there is a promoter to say that Python integer comparisons can just use
object
dtype (in theory weird if the input then wasn't a Python int,
but that is breaking promises).
[fix] add missing line of code in values_equal()
In the earlier commit I said over 50 tests were failing. Turns out
that was because I missed a return statement after an if statement,
and if that statement evaluated to false the program would reach
a call to unreachable()
which was causing SIGILLs making the
tests fail. I kinda wish Odin would say something like "You told me
this code was unreachable you idiot" instead of an ambigious error,
but then again it's mainly my stupidity anyways.
Problem fixed now though, and all the tests are passing!
Added main menu page
New main menu page and styling has been added -- this template (the stuff surrounding the actual menu items) is going to be what we probably use for the whole shopping/menu experience up until the cart stage. TODO -- I fucked up and I think we should move the logo into the header along with the "Menu" title. I set up a grid to do that but haven't had a chance to do it yet, but that's what it's there for. I'll have to refactor a lot of the code to fix the layout, but it'll look way better once that's done.
Adds eight vox hairstyles because why not and stuff (#22573)
-
god i hate myself
-
donedone
-
fixxxxx
Removes Beach Bums, Adds Althland Excavation Pit (#22315)
-
replace
-
Update lavaland_biodome_beach.dmm
-
fixes
-
we are so BACK bros
-
oh yeah, now were cookin
-
turf
-
oops!
-
Update lavaland.dm
-
work you fuck
-
donedonedoneeeeeee
Adds the WT-551, Unskyrats the WT-550 ammo (#655)
This adds the WT-551. A remade version of the WT-550 that is worse in every way. Fortunately, that means that it is balanced enough to be put in NanoTrasen armories.
Compared tot he WT-551, it is bulkier and slightly slower (0.3 second fire delay compared to 0.2). Additionally, it is commonly used with rubber-tipped rounds or FlatHead rounds, which are a special surplus of ammo that deals less damage and has no wounding, embedding, or penetrative power. Regular ammo can be purchased from cargo or researched later, with special ammo also being available later.
Note that this does not replace the WT-550.
Flathead ammo deals 18 brute damage (compared to the original 20), and 5 stamina damage per hit. It is extremely weak against armor, has no embed chance, has virtually no wounding chance. It's perfect for cheap corporate companies dealing with cheaper personnel. This is the type of lethal ammo that security will use for the gun, unless someone speedruns weapon research.
Flathead rounds and Rubber rounds for the WT-550/WT-551 can be researched for 2500 points after unlocking the "Weapon Development Technology" node.
Regular rounds and AP rounds for the WT-550/WT-551 can be researched can be researched for 5000 points after unlocking the "Advanced Weapon Development Technology" and "Basic WT-550/WT-551 Ammunition" nodes.
Incendiary rounds for the WT-550/WT-551 can be researched can be researched for 7500 points after unlocking the "Illegal Technology", "Exotic Ammo" , and "Advanced WT-550/WT-551 Ammunition" nodes.
Removes the WT-550 ammo from syndicate research since it is now redundant.
WT-551 rifles can be ordered in pairs (2) for the cost of a parrot, a grilling starter pack, or a crab rocket (1600 credits). This value was chosen because it is slightly higher than the thermal pistols, and the traitor-ordered WT-550 rifle pack (which contains lethal ammo + spare lethal ammo).
Additional FlatHead, Rubber, and Regular ammo can be ordered from cargo as well.
Cargo techs no longer get WT-550s in the mail, but instead WT-551s (why was this a thing holy shit).
2 WT-551s can be found in the armory. For balance purposes one (1) laser rifle was removed.
Unfucks the WT-550 ammo types by removing their dumb names and changed caliber types.
Unfucks the WT-550 ammo in the ammo printer so that rubber rounds can be printed at T0 and everything else (except incendiary rounds) can be printed with the adv munitions disk.
The bullets for the WT-550 have been forcibly changed to /tg/ balance, which means that any and all future Skyrat PRs cannot touch the damage values for it (unless some fuckery occurs, idk).
Security is in dire need of actual ballistics. /tg/ removed ballistics from security because of reasons I legitimately don't think are valid. It's also a huge balance concern for security not to have at least 1 ballistic weapon (other than the shotgun) because it doesn't stop antags from hoarding laser immunity or meds.
Also guns are cool.
🆑 BurgerBB add: Adds the WT-551 rifle, a redesign of the WT-550 rifle that is balanced (citation needed) for security use. add: Makes WT-550 ammo researchable and orderable from cargo. Removes WT-550 ammo from syndicate research, and gives them their own categories. /🆑
Co-authored-by: StrangeWeirdKitten 95130227+StrangeWeirdKitten@users.noreply.github.com Co-authored-by: ReturnToZender donwest947@gmail.com
my fucking god please work fully for the love of god, i wanna go back and become a looser and play genshin ffs
Fixes rock sprites ingame [WHOOPS] (#2332)
Rocks were invisible in game due to a recently merged PR of mine. this is why we testmerge PRs! anyways this should fix them.
Adds flora and rock missing texture sprites to most flora files to prevent something like this from ever happening again.
invisible things that block movement bad yeah. i want to fix my mistakes.
🆑 fix: Most rocks are now visible again add: Most flora files now have missing texture sprites to make it easier to spot when something has gone wrong. /🆑
feat(app): Update robots from USB flash drive (#13923)
- feat(app-shell-odd): watch for USB drives
The Flex operating system automatically mounts the filesystems of well-formatted USB drives (FAT and ext4 and maybe ntfs but that's a bit iffy) to /media when those USB drives are inserted on the robot. In theory it will in fact do this for any kind of media that presents a filesystem interface.
To that end, add a node task that will use a node filesystem watch to keep an eye on /media, and
- when something that looks like a USB drive (/media/sd\w\d+) appears,
notify via redux actions
- then enumerate all the files on it and notify those via redux actions
- when something we were keeping an eye on disappears, notify via redux actions
The redux actions don't alter state and so don't need new reducers or selectors; they exist because it's a handy mechanism to talk between our components.
This code is very tightly coupled to the way the node fs interfaces work and so I don't see a lot of point in unit tests for it; it's almost entirely fs calls originating everything and providing all of the data, and all the complexity is from working around weirdnesses in those calls and in the underlying system. For instance,
- There's a little bit of time in between when the fs watch on /media fires and when you can actually find the contents of the newly-present directory; if you readdir before that you'll get an empty list, so we wait a second
- The node fs.watch interface looks very fully features but is absolutely chock-full of warnings about various features not being reliable. A lot of that unreliability is probably across systems and everything works as we expect on linux, but just in case we have a lot of fallbacks for if our callback doesn't get filepaths, etc
- fix(app-shell-odd): handle errors in readstreams in http.post
We have our custom http interface that wraps around node-fetch that provides things like "doing your own read stream when posting a file", and "mapping everything into the promise interface", which is nice, but has an issue specifically for that read stream: we don't monitor errors on it. Read streams surface errors by emitting an 'error' event; we hook up a listener to that error event while we're creating the stream, but then we disconnect it. So if you have an error in the stream - for instance, you're reading from a file on a USB flash drive and the user unplugs the flash drive - then the error will never get surfaced.
Unfortunately the fix to this is a bit fiddly. We can hook up an error listener fine, but it needs to do something; specifically, it needs to turn the error from a callback into a promise rejection. That means it needs to have a promise to reject that has the same lifetime as the stream itself. http.post didn't provide that because it returns a whole big promise chain, and each time you move a link in that chain the old promise is gone and a new one happens, so we'd need to move the listener around.
Since promises are monadic, a better fix is to have post return a single promise and do all the promise chaining inside that promise; then, the read stream error handler can reject the outer promise directly, while relying on promises bubbling up rejections to preserve error handling capability for the promises in the internal chain.
- fix(app): Poll for updates on the ODD
Though we have everything set up to automatically fetch, prompt for, and execute robot updates from the ODD, we weren't actually checking for those updates except once on boot (which then wouldn't work if the robot wasn't internet-connected during boot). This means in particular that the software updates during onboarding were guaranteed to fail.
We can use the same hook in the ODD app root that we do in the desktop app route, but if we're going to do that then we better remove a log message that suddenly becomes extremely spammy.
- feat(app-shell-odd): Supply "system updates" from flash drives
Adds the capability to provide system updates from flash drives to the ODD app-shell.
These are "system updates" in that the app-shell determines their availability and provides it to the app, rather than the user indicating the presence of a file alongside their intent to update. The app-shell will advertise the flash drive updates in the same way it advertises internet-discovered updates, with a RobotUpdateInfo redux message; since those now provide the path to the file they mean, it will be easy for the app to specify the system update to load.
We can duplicate the logic that we use for system updates by adding a second let cache for the "current update"; the system-updates code will then prefer an update in the mass storage update cache to an update in the old system updates cache, and send new robot update info messages in all the state changes between neither cache being full; either cache being full; and both caches being full.
The determination that a flash drive system update is present is triggered by a mass storage enumerated message; when that flash drive gets removed, we'll get a removal message.
To figure out whether updates are actually present, we can the list of files that just got enumerated for things that end with .zip, and then try to open them as zip files and read the VERSION.json information out of them. This is a somewhat fraught process; the file could not be a zip file, it could be a zip file but corrupted, it could be a zip file but not an update, it could be an update but it's for an OT-2, and we need to handle all that, so there's a pretty excessive amount of error handling in here. Once we're sure that there are one or more zip files containing robot system updates, we can provide something to redux; we provide the highest-version update present.
There is one way in which updates from flash drives differ from system updates found on the internet, however: plugging in a flash drive requires user intent, while checking for updates on the internet doesn't. Therefore, if the user plugs in a flash drive with an update file, we always want to make that update file available no matter the relative versions of the robot and the update file. So we can add a bool to the system update message (and then to the update state) that shows that this is a "forced notification" update, and the app can know to display it without caring about the upgrade/downgrade/reinstall state.
Since there's a lot of duplication, we can also factor out some common logic to make it feel a little better.
That process of duplication also fixes a bug that would have prevented the ODD from ever prompting for updates. The function that gets information about updates used the same promise to read the release notes and provide the update information; but we overrode the downloaded release files to null out the release notes, meaning that promise would always fail, and we'd never get the notification. We no longer override the release notes to be null, and we also treat reading the release notes separately from reading the rest of the update.
- feat(app): allow robot updates from USB files
Now that the odd app-shell provides us with notifications of updates from USB flash drives, we can allow the user to install them. While the redux mechanisms allow this pretty easily - a system update is a system update, after all, and with the force mechanism the app wouldn't even know if the update was a downgrade or anything - we ran into a problem where the general robot update machinery in the ODD was very tightly bound with the onboarding experience for the ODD, since that's the context in which it was developed.
This commit extracts the robot update mechanisms from onboarding by
- Hoisting onboarding-related logic out of lower level components and instead injecting that logic into the organisms code from the top level page
- Moving the current update page to a new one that is focused on onboarding at a new route, and copying just the update-related code to a generic RobotUpdate page
This means that the two pages - RobotUpdate and RobotUpdateDuringOnboarding - share most of the same code but are bound to different routes and can have different top level behavior by injecting different contexts to the finish and error handling behaviors of the update. RobotUpdateDuringOnboarding sets the unfinished onboarding page breadcrumbs appropriately, and uses display language appropriate to the update being just a component of the larger workflow, and moves on to estop handling when cancelled; RobotUpdate doesn't touch any of that, and goes back to the settings page when cancelled, and uses wording more appropriate to being its own topline flow.
Closes RAUT-829
Fixes Shaving Beards + Mirror Code Improvement (#79529)
Fixes #79519
Basically we did a lot of assumptions that we really shouldn't do in the whole magical mirror framework (like having a boolean value for magical mirrors, what?). Anyways, I just made the UX experience a lot better when it came to bearded persons with feminine physiques to easily shave off their beard with an additional confirmatory prompt + details as well as keeping the nature of the magical mirror (giving you a swagadocious beard due to magic:tm:) intact.
There was a lot of convoluted code that skipped through the quality filter checks (it was me i think) so let's both make the code far easier to grasp as well as ensure that people who legitimately acquire beards and wish to keep them, keep them.
We were also doing some FUCK shit on attack_hand and the like (overriding a FALSE return signal to return TRUE is not what we should be doing there)- so that's also cleaned up.
🆑 fix: Both magic mirrors and regular mirrors are far better at respecting the choice of the beard you wish to wear (within reason, of course). /🆑
I HATE FUCKING CSS. ROT IN HELL CSS!!!!!. Deleted styles for head links, because of stupid React NavLink isActive (which actualy object with properties and that's not specified in DOCUMENTATION) and conflicts with other styles I cannot add highlighting, btw my way to make highlighting is a shit.
commit: give a hint when a commit message has been abandoned
If we launch an editor for the user to create a commit message, they may put significant work into doing so. Typically we try to check common mistakes that could cause the commit to fail early, so that we die before the user goes to the trouble.
We may still experience some errors afterwards, though; in this case, the user is given no hint that their commit message has been saved. Let's tell them where it is.
Signed-off-by: Jeff King peff@peff.net
New turret sprites, Patcher gun! and stuff
(4 - 16 / 1 / 2022) *) Now the tracing of the Plasma Rifle advanced, will now stop chasing, after 5 seconds of flight. *) Now the Hell-trigger powerup is added on the shop. *) Re-factorized the weapon names and descriptions, creating the Language.weapons file. !+) Finally a new we-TOOL, yes tool! The Patcher Gun! // Fix turrets/dispensers/drones paying 10 credits per valid shot! // Shows the buildings health at the aim of this tool! // Also is cappable of stunning enemies! // 5% chance for Experience Point for each turret fix! +) New sprites for the bullet, plasma, rocket and shotgun turrets! // Im fixing the dissaperance of the turret bases and other bugs with these sprites, Im tired. +) New sprites for the Temperance rune!
[Eval] Add Chinese Homophonic (#1169)
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.
Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.
Understand Chinese Homophonic
We have found some popular homophonic sentences on the Internet, including the Chinese pronunciation of English words and homophones, and provide several options for the model to determine which option matches the homophonic sentence the best.
Chinese homophonic puns are a widely popular internet cultural phenomenon that generates humor by utilizing the homophonic relationships between Chinese characters. These puns are typically spread in text form on social media, forums, and messaging applications, and they are extremely common in China's online culture.
Homophonic puns have a wide range of applications, encompassing ordinary daily life scenarios as well as hot news events, entertainment gossip, and political current affairs. These puns frequently appear in internet memes, jokes, advertising slogans, and short videos, garnering significant popularity among young people and internet users.
For those unfamiliar with them, homophonic puns may seem like encrypted text, making it difficult to grasp the true intention behind them. However, understanding them allows for the establishment of strong connections between individuals and facilitates smooth communication.
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- Includes good signal around what is the right behavior. This means
either a correct answer for
Basic
evals or theFact
Model-graded eval, or an exhaustive rubric for evaluating answers for theCriteria
Model-graded eval. - Include at least 15 high-quality examples.
If there is anything else that makes your eval worth including, please document it below.
Insert what makes your eval high quality that was not mentioned above. (Not required)
Your eval should
- Check that your data is in
evals/registry/data/{name}
- Check that your YAML is registered at
evals/registry/evals/{name}.yaml
- Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.
- I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.
- I have filled out all required fields of this form
- I have used Git LFS for the Eval JSON data
- (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit install
and have verified thatblack
,isort
, andautoflake
are running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"一天小鸭对小鸡表白:小鸡,我爱你。小鸡:你duck不必。这句话中的\"duck\"是什么意思?\nA. 鸭子\nB. 大可"}],
"ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"丑的人才有对象,美的卖空调。这句话中的\"美的\"是什么意思?\nA. 漂亮的\nB. 空调公司"}], "ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"我是一只小绵羊,我今天剪毛了,我失绵了。这句话中的\"失绵\"表达意思?\nA. 失眠\nB. 没有了羊毛"}], "ideal":
["A"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"以后我的吉祥物决定就是你了,螃蟹!——因为,你有钱(钳)。这句话中的\"钳\"是什么意思?\nA. 有钱\nB. 螃蟹的钳子"}],
"ideal": ["A"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"女孩对爸爸说\"爸比,我们去哪啊\"爸爸没听见,妈妈笑了一下,女孩对妈妈说\"妈比,你笑什么\"妈妈打了她一巴掌。妈妈为什么打她?\nA.
她提出了不合理的要求\nB. 她骂人了"}], "ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"天气这么热,我们总会熟的。这句话中的\"熟的\"是什么意思?\nA. 热熟了\nB. 熟悉了"}], "ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"我好像胖了,没事我陪你减肥,我们戒荤叭。这句话中的\"戒荤\"是什么意思?\nA. 吃素食\nB. 结婚"}], "ideal":
["B"]}
Co-authored-by: oscar oscar@hellotalk.com
Add Korean honorific sentence classification eval (#1181)
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.
Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.
korean-honorific
Evaluates LLMs on the task of classifying Korean honorific/non-honorific sentences.
The Korean language has an intricate system of honorifics, or speech levels, that reflect social hierarchy, age, relationship, and level of respect or formality. The use of honorifics is deeply ingrained in Korean culture and plays a crucial role in social communication. Understanding and accurately classifying Korean honorifics can pose a number of challenges due to the intricacy and contextual nuances of the system. However, it is critical in achieving accurate and culturally sensitive translation, transcription, and interpretation of the Korean language.
Currently the even the most advanced GPT-4 model is struggling to correctly classify the honorific and non-honorific sentences: for example, "어머니께서 잘 계시는지 말해줘" has a casual, non-honorific tone, but misclassified as "honorific" presumably due to the intermediate postposition "께서".
Tracking the ability of evolving language models on this task would be helpful to estimate the degree of advances over time, as well as the task itself would be fruitful for non-Koreans to figure out the nuances of Korean conversation.
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- Includes good signal around what is the right behavior. This means
either a correct answer for
Basic
evals or theFact
Model-graded eval, or an exhaustive rubric for evaluating answers for theCriteria
Model-graded eval. - Include at least 15 high-quality examples.
If there is anything else that makes your eval worth including, please document it below.
Insert what makes your eval high quality that was not mentioned above. (Not required)
Your eval should
- Check that your data is in
evals/registry/data/{name}
- Check that your YAML is registered at
evals/registry/evals/{name}.yaml
- Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.
- I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.
- I have filled out all required fields of this form
- I have used Git LFS for the Eval JSON data
- (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit install
and have verified thatblack
,isort
, andautoflake
are running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "그분이 잘 계시는지 물어봐
줘."}], "ideal": "non-honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "이 공원에서 자주
걷습니다."}], "ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "자주 드시나요?"}],
"ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "아니요, 접점은 없지만
개인적으로 관심이 있습니다."}], "ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "당신의 취미가
무엇인가요?"}], "ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "꼭 모으길 바랄게."}],
"ideal": "non-honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "그러면 나도
준비해야겠다."}], "ideal": "non-honorific"}
[Eval] Chinese lantern riddles (#1176)
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.
Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.
chinese-lantern-riddles
This evaluation tests the model's performance in solving Chinese lantern riddles, which are based on the shape, pronunciation, and meaning of Chinese characters.
Lantern riddles are a traditional Chinese festive activity that involves multiple participants guessing riddles together. Apart from being a part of festival celebrations, lantern riddles can also serve as an educational tool to help Chinese language learners enhance their vocabulary and language reasoning. Through the process of unraveling the riddles, students can also develop their logical thinking and reasoning skills, as well as nurture their imagination and creativity. Lantern riddles can also spark students' interest in language learning and make the learning experience more enjoyable.
Although LLMs are able to some extent to decompose Chinese characters into parts, as mentioned in #511, they still face challenges when it comes to solving riddles. In most cases, GPT 3.5 cannot reason correctly about the structure of Chinese characters. For instance, the riddle "上下一体(打一字)" can be interpreted as a combination ("一体") of "上" and "下", resulting in the answer "卡". However, GPT 3.5 gives the wrong answer, "升", with a reason that makes no sense. A similar situation occurs when GPT 3.5 reasons about the pronunciation of Chinese characters, with one of its explanations stating that the pronunciation of "盼(pàn)" is similar to the pronunciation of "俄(é)", which is entirely incorrect. However, on the positive side, GPT 3.5 shows good performance when a riddle only encodes meaning and does not require reasoning about the structure and pronunciation.
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- Includes good signal around what is the right behavior. This means
either a correct answer for
Basic
evals or theFact
Model-graded eval, or an exhaustive rubric for evaluating answers for theCriteria
Model-graded eval. - Include at least 15 high-quality examples.
If there is anything else that makes your eval worth including, please document it below.
Insert what makes your eval high quality that was not mentioned above. (Not required)
Your eval should
- Check that your data is in
evals/registry/data/{name}
- Check that your YAML is registered at
evals/registry/evals/{name}.yaml
- Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.
- I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.
- I have filled out all required fields of this form
- I have used Git LFS for the Eval JSON data
- (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit install
and have verified thatblack
,isort
, andautoflake
are running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n一撇(打一字)。"}], "ideal": ["厂"]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n内里有人(打一字)。"}], "ideal":
["肉"]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n二三四五六七八九(打一成语)。"}], "ideal":
["缺衣少食"]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n谜底在山东(打一国家名)。"}], "ideal":
["秘鲁"]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n身穿红衣,常年哨放,遇紧急事,往火里闯(打一日常用品)。"}],
"ideal": ["灭火器"]}
[eval] Chinese Idioms evulation (#1163)
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.
Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.
chinese_idioms
Check the model's ability to recognize Chinese idioms, which are words that have different meanings from its original meaning.
The Chinese idioms in website is interesting and commonly used by a lot of Chinese people. However, the GPT4 and GPT3.5 fail to explain the meaning of the idioms with a correct explanation.
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- [x ] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- [ x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- [ x] Includes good signal around what is the right behavior. This
means either a correct answer for
Basic
evals or theFact
Model-graded eval, or an exhaustive rubric for evaluating answers for theCriteria
Model-graded eval. - [ x] Include at least 15 high-quality examples.
If there is anything else that makes your eval worth including, please document it below.
Insert what makes your eval high quality that was not mentioned above. (Not required)
Your eval should
- [ x] Check that your data is in
evals/registry/data/{name}
- [ x] Check that your YAML is registered at
evals/registry/evals/{name}.yaml
- [ x] Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- [x ] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.
- [ x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- [ x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.
- [ x] I have filled out all required fields of this form
- [x ] I have used Git LFS for the Eval JSON data
- (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit install
and have verified thatblack
,isort
, andautoflake
are running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n伟光正"}], "ideal": ["From the idiomatic phrase
'the great, glorious and correct Chinese Communist Party', it can also
refer to a person associated with the Chinese Communist Party."]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n赵家人"}], "ideal": ["From Lu Xun's famous
middle-grade novel 'A Q Zhengzhuan', it generally refers to the powerful
and noble class of the Chinese Communist Party. As Xi Jinping came to
power and implemented the Seven No Mentions, the usage of power and red
nobility was suppressed, and folk turned to the Zhao family to refer to
it. Derivations include calling the People's Republic of China 'Zhao'
and Xi Jinping, the current General Secretary of the CPC Central
Committee, 'King Zhao', or replacing the word 'people' with the word
'Zhao family' in the names of various Chinese organs and media
propaganda"]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n改开党/特色党"}], "ideal": ["The term 'Mao Left' is
commonly used by the civil left and Maoist supporters, which originated
from Deng Xiaoping's 'reform and opening up' and 'socialism with Chinese
characteristics'. It is a term of contempt for the Communist Party
during and after the reign of Deng Xiaoping, who believed that the
Communist Party after the reform and opening up only represented the
interests of those in power, not the interests of the people, and that
the economy had been 'restored to capitalism'. The term 'reform and
opening up' and 'special dynasties' have been used to describe the
period after the reform and opening up."]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n黄丝/黄尸"}], "ideal": ["The term refers to
non-establishment camps such as the pro-democracy camp and the local
camp in Hong Kong, as well as those who support their stance, and is
named after the yellow ribbon used as a symbol by non-establishment
camps during the 2014 occupation. Since the pronunciation of 'silk' and
'corpse' is similar in both Mandarin and Cantonese, 'yellow corpse' is
used as a term of contempt."]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n蟹堡王"}], "ideal": ["The term refers to the
Hong Kong pro-establishment camp, it is often accused of not having a
political stance and just being in line with Beijing"]}
{"input": [{"role": "user", "content": "请解释下面词语的意思,请使用英文回答。\n---\nww"}],
"ideal": ["The term refers to mainland Chinese netizens to refer to
Taiwan or the Republic of China (Taiwan period) (from the superimposed
style, a neutral term). In January 2022, Taiwan Affairs Office
spokesperson Zhu Fenglian said that the word Wanwan is a nickname for
the Taiwanese people 'Mengmeng' by the Chinese mainlanders"]}
Ordering Randomised VersionList (#1164)
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.
Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.
Ordering Randomised VersionList
This evaluation aims to test prompt engineered failure cases to order a randomised version history list, but causes chronological ordering failures such as 7.5.2 -> 7.4.2 -> 7.5.1 -> 7.4.1 (incorrectly inserted 7.4.2 in between 7.5.2 and 7.5.1 and incorrectly skipping over the major release version 7.5.0 in the Explainable AI chain of thoughts) and 7.5.2 -> 7.5.1 -> 7.5.0 -> 7.4.1 (incorrectly skipped over 7.4.2 in the Explainable AI chain of thoughts).
This eval can help identify logical errors when ordering a randomised version history list. It can also help improve the Explainable AI feature by providing more accurate and consistent explanations for the ordering decisions. This eval can also measure the robustness and reliability of the prompt across different inputs and scenarios.
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- Includes good signal around what is the right behavior. This means
either a correct answer for
Basic
evals or theFact
Model-graded eval, or an exhaustive rubric for evaluating answers for theCriteria
Model-graded eval. - Include at least 15 high-quality examples.
If there is anything else that makes your eval worth including, please document it below.
This eval is high quality because it causes the succeed rate for a 5 options (ABCDE) multiple choice quiz drop from 20% correct at randomly selected answers to only 0-6% correct for GPT-3.5-Turbo. These are prompt engineered failures, causing bigger failure rates than prior work, as performing so much worse than random is unnatural for such a super easy task.
Your eval should
- Check that your data is in
evals/registry/data/{name}
- Check that your YAML is registered at
evals/registry/evals/{name}.yaml
- Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.
- I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.
- I have filled out all required fields of this form
- I have used Git LFS for the Eval JSON data
- (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit install
and have verified thatblack
,isort
, andautoflake
are running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.0 Release Date: December 02, 2019 Version 7.4.1 Release
Date: October 23, 2019 Version 7.5.1 Release Date: December 18, 2019
Version 7.5.2 Release Date: January 21, 2020 Version 7.4.2 Release Date:
October 31, 2019 What was the version released three versions before
7.5.2? A. 7.4.2 B. 7.5.2 C. 7.5.1 D. 7.4.1 E. 7.5.0"}],"ideal":"A.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.2 Release Date: January 21, 2020 Version 7.4.1 Release Date:
October 23, 2019 Version 7.5.0 Release Date: December 02, 2019 Version
7.4.2 Release Date: October 31, 2019 Version 7.5.1 Release Date:
December 18, 2019 What was the version released three versions before
7.5.2? A. 7.5.2 B. 7.5.1 C. 7.4.1 D. 7.4.2 E. 7.5.0"}],"ideal":"D.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.1 Release Date: December 18, 2019 Version 7.5.0 Release
Date: December 02, 2019 Version 7.4.1 Release Date: October 23, 2019
Version 7.5.2 Release Date: January 21, 2020 Version 7.4.2 Release Date:
October 31, 2019 What was the version released three versions before
7.5.2? A. 7.5.0 B. 7.4.2 C. 7.5.1 D. 7.4.1 E. 7.5.2"}],"ideal":"B.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.0 Release Date: December 02, 2019 Version 7.5.1 Release
Date: December 18, 2019 Version 7.4.2 Release Date: October 31, 2019
Version 7.4.1 Release Date: October 23, 2019 Version 7.5.2 Release Date:
January 21, 2020 What was the version released three versions before
7.5.2? A. 7.5.1 B. 7.4.1 C. 7.5.2 D. 7.5.0 E. 7.4.2"}],"ideal":"E.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.4.2 Release Date: October 31, 2019 Version 7.5.1 Release Date:
December 18, 2019 Version 7.5.0 Release Date: December 02, 2019 Version
7.5.2 Release Date: January 21, 2020 Version 7.4.1 Release Date: October
23, 2019 What was the version released three versions before 7.5.2? A.
7.4.1 B. 7.5.2 C. 7.4.2 D. 7.5.0 E. 7.5.1"}],"ideal":"C. 7.4.2"}
- The task of ordering a randomised version history list is relatively simple and straightforward for humans, but the AI system fails to follow the basic rules of chronological ordering.
- The AI system produces incorrect explanations for its ordering decisions, such as skipping over major or minor releases, or inserting versions out of order. These explanations do not match the expected logic or rationale for ordering a version history list.
- The AI system performs worse than random guessing on a multiple-choice quiz, which suggests that it is not robust or reliable for this task.
Co-authored-by: jjyuhub tdq459rcfm@privaterelay.appleid.com
[Eval] Determine a gear rotation given a layout (#1136)
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.
Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.
gears_rotation
Checks the model's ability to determine the rotation of a gear given a disposition of multiple gears and the rotation of one of them.
Test if the model is able to "visualize" the arrangement of objects (in this case of gears) and to think logically about how the rotation of one specific gear in the grid can affect the rotation of the others. Gpt3.5 had an accuracy of 0.16 (4/25 right). Gpt4 (chatgpt plus subscription) seems to fail in the same places as 3.5. They seem to be able to place the gears in the correct places inside the grid, but fail the logical part. Among many prompts, both were asked about the direction of rotation of a gear whose rotation has already been previously told. However, they still got it wrong.
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- Includes good signal around what is the right behavior. This means
either a correct answer for
Basic
evals or theFact
Model-graded eval, or an exhaustive rubric for evaluating answers for theCriteria
Model-graded eval. - Include at least 15 high-quality examples.
If there is anything else that makes your eval worth including, please document it below.
Insert what makes your eval high quality that was not mentioned above. (Not required)
Your eval should
- Check that your data is in
evals/registry/data/{name}
- Check that your YAML is registered at
evals/registry/evals/{name}.yaml
- Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.
- I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.
- I have filled out all required fields of this form
- I have used Git LFS for the Eval JSON data
- (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit install
and have verified thatblack
,isort
, andautoflake
are running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, each
placed to the left of the other: A-B-C. If gear C is running clockwise,
how is gear C running? Answer clockwise or counterclockwise. No
explanations."}], "ideal": "Clockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, each
placed one beneath the other: A-B-C. If gear C is running clockwise, how
is gear B running? Answer clockwise or counterclockwise. No
explanations."}], "ideal": "Counterclockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, placed
diagonally in a 3x3 grid: A-B-C. Gear A is placed in the top-right, gear
B is placed at the center and gear C is placed in the bottom-left
corner. They don't touch each other. If gear C is running clockwise, how
is gear A running? Answer clockwise, counterclockwise or \"not moving\".
No explanations."}], "ideal": "not moving"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, placed
diagonally in a 3x3 grid: A-B-C. Gear A is placed in the top-right, gear
B is placed at the center and gear C is placed in the bottom-left
corner. If gear C is running clockwise, how is gear A running? Answer
clockwise, counterclockwise or \"not moving\". No explanations."}],
"ideal": "Clockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotational direction of gears that are of the same
dimensions. Your working area is a 4x4 grid. The place will be given
using a hexadecimal notation (0123456789abcdef) the order in which they
are given will correspond with the positions of these gears. The first 4
char are placed on the first row from left to right. The 5th to 8th on
the second row always from left to right and so on. You will be also
given the rotation of a specific gear. You must be able to determine how
another specific gear is rotating. Answer clockwise or counterclockwise.
No explanations. Don't even use punctuation."}, {"role": "user",
"content": "a1c5e8f7932b46d0, gear 4 is running clockwise. How about
gear d?"}], "ideal": "Clockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotational direction of gears that are of the same
dimensions. Your working area is a 3x3 grid. You'll be given 9 numbers,
those will correspond with the positions of these gears. The first 3
given numbers are placed on the first row from left to right. The 4th to
6th are on the second row, always from left to right. Same with the
7th-9th on the third row. You will be also given the rotation of a
specific gear. You must be able to determine how another specific gear
is rotating. Answer clockwise or counterclockwise. No explanations.
Don't even use punctuation."}, {"role": "user", "content": "572913864,
gear 2 is rotating counterclockwise. How is number 7 rotating?"}],
"ideal": "Clockwise"}
Simple block puzzles (#1167)
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.
Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.
Simple 2-Block Arrangement Puzzles
Two Tetris shapes are given and a desired arrangement of those shapes is given. The model must arrange the blocks to match the desired shape outline.
Here's an example of what a prompt/answer would look like:
This kind of spatial reasoning is trivial for a human to do. It should also be a piece of cake for a generally-intelligent AI model.
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- Includes good signal around what is the right behavior. This means
either a correct answer for
Basic
evals or theFact
Model-graded eval, or an exhaustive rubric for evaluating answers for theCriteria
Model-graded eval. - Include at least 15 high-quality examples.
If there is anything else that makes your eval worth including, please document it below.
This eval was programatically generated and thus can easily be tweaked to be more difficult, to test different aspects of spatial reasoning, or to generate more cases. I wrote a script to generate this eval that anyone can come in and adjust.
Your eval should
- Check that your data is in
evals/registry/data/{name}
- Check that your YAML is registered at
evals/registry/evals/{name}.yaml
- Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.
- I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.
- I have filled out all required fields of this form
- I have used Git LFS for the Eval JSON data
- (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit install
and have verified thatblack
,isort
, andautoflake
are running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nF \nFF\n F\n\n U\nUUU\n\n\nPlease create:\n\n
XX \nXXXXX \n X\n\nReplacing the 'X's with the corresponding letter of
the shape that should occupy each position. Only respond with the final
shape, no commentary."}], "ideal": " UF \nUUUFF \n F"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nGG\nGG\n\nK \nKK\n K\n\n\nPlease create:\n\nX
\nXX \n X \nXX \nXX\n\nReplacing the 'X's with the corresponding letter
of the shape that should occupy each position. Only respond with the
final shape, no commentary."}], "ideal": "K \nKK \n K \nGG \nGG"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nLLL\n L \n\n F\nFF\n F\n\n\nPlease create:\n\n
XXXX \nXX X \n X\n\nReplacing the 'X's with the corresponding letter of
the shape that should occupy each position. Only respond with the final
shape, no commentary."}], "ideal": " FLLL \nFF L \n F"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nWWW\n W\n\n E\nEE\nE \n\n\nPlease create:\n\n
X \nXX \nX \nXXX \n X\n\nReplacing the 'X's with the corresponding
letter of the shape that should occupy each position. Only respond with
the final shape, no commentary."}], "ideal": " E \nEE \nE \nWWW \n W"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nSS\nSS\n\n N\nNN\n N\n\n\nPlease create:\n\n
XXX \nXXXX \n X\n\nReplacing the 'X's with the corresponding letter of
the shape that should occupy each position. Only respond with the final
shape, no commentary."}], "ideal": " NSS \nNNSS \n N"}
THE GRAND RENAMING HAS BEGUN (#481)
-
THE GRAND RENAMING HAS BEGUN but holy crap it still doesn't work because of some nbsphinx thing that I don't know how to even begin troubleshooting
-
Update .github/PULL_REQUEST_TEMPLATE.md
I am the goo0dest typer
Co-authored-by: Benjamin Pedigo benjamindpedigo@gmail.com
- Update README.md
Co-authored-by: Benjamin Pedigo benjamindpedigo@gmail.com
-
Make the build status badge less obnoxious
-
Made a sentence actually make sense
-
Ah the last merge from dev must have overwritten some of the changes I made. This should be fixed now.
-
Found another instance of graspy in the issue template
-
Some last second changes, including a fix to the utils init file because the all value was being populated by identifier names not string representations of those identifier names
-
I approve of black hating the single quotes for a string because I also hate it but it's still pythonic even if I wish it weren't so
Co-authored-by: Benjamin Pedigo benjamindpedigo@gmail.com
Suitably dynamic versioning (#467)
- Suitably dynamic versioning
The following versioning code bypasses a few problems with python module versions. The following scenarios are plausible:
- A user clones
graspologic
and runspip install -r requirements.txt
then executespython
in the project directory, accessing the graspologic library by python's local folder structure. - A users clones
graspologic
and runspython setup.py install
in the environment of their choice, accessing the graspologic library either by the local folder structure or the .egg in their site-packages, depending on their current working directory. - A user clones no repository and wants to install the library solely via
pip
via thepip install ...
command, which has 2 wings to consider:- The user wishes to try out the latest prerelease, which is going to be published with a X.Y.ZdevYYYYMMDDBUILDNUMBER style version and can be installed via
pip install graspologic --pre
- The user wishes to try out the latest release, which will be published as
X.Y.Z
.
- The user wishes to try out the latest prerelease, which is going to be published with a X.Y.ZdevYYYYMMDDBUILDNUMBER style version and can be installed via
This PR supports those 4 cases (notably, it does not support pip install .
from the root project directory, which does some super weird stuff and I gave up on trying to solve it a long time ago)
The concept is this: the actual version upon a build action, which can be undertaken by:
- CI building a snapshot build
- CI building a release build
- Local user building a local build
These states all require the same thing: a materialized version in a file. This version should be created at the time of this build action.
In the case of CI, we can populate the file in our CI build process and move on. It's the case of not being in CI where we need to consider what to do next, which leaves Local user building a local build (and local user using the filesystem as the library).
In these cases, the solution is the following: if we have a populated version.txt file, we use it. If we do not, we materialize a new version based on the __semver
in version.py and the current time in YYYYMMDDHHmmSS format. This means that if you are running on the filesystem, and you say import graspy; print(graspy.__version__);
, it will actually tell you the version is 0.1.0dev20200926120000
as an example. However, when you close the interpreter and do it again, it will tell you that the version is 0.1.0dev20200926120500
- because it will create a version for you at the time of import.
However, if you were to run python setup.py install
, the setup.py file actually takes it on itself to either get a version number or use the materialized version described above, then to write it to version.txt. Which means that installing the graspologic library from setuptools will actually lock in the version number in perpetuity.
Gotchas
- version.txt must always be empty in the source tree
pip install .
does some weird thing where it registers an entry in site-packages that is like a symlink to the local filesystem anyway so it doesn't actually make an egg which means you get a new version each time and I gave up caring at this point since we got the three primary use cases: developers, users of pre-releases, and users of releases all covered. Users who install by cloning and running pip install are just going to get a weird behavior that probably isn't that important to track down, and regardless they'll get a clearX.Y.Zdev<timestamp>
in theirgraspologic.__version__
which is enough for us to go on if there are any issues raised.
-
My testing resulted in filling this file and committing it, like I said not to do
-
Updated conf.py for sphinx to be able to find a version it likes. Or kinda likes. Maybe likes?
-
Forgot I had to add the recursive-include for the version file.
-
Making black happy
Changed name of floor to what its supposed to be (sorry if that fucked shit up) Also added wall texture
identity: default to RS256 for new workload ids (#18882)
OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256.
Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again.
Test Updates
Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed.
I added a new testutil.WaitForKeyring
helper to replace testutil.WaitForLeader
in cases where the keyring must be initialized before the test may continue. However this is mostly used in the nomad/
package.
In the api
and command/agent
packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners.
TODO
- Docs and changelog entries.
- Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days.
- Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this.
Requiem for ed25519
When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset.
EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties:
- Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key.
- Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious.
- No parameters. No choices of encodings. It's all well-defined by RFC 8032.
- Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus!
- Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too.
Life was good with ed25519, but sadly it could not last.
IDPs, such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256:
RS256 only:
Catching File Exceptions in openpower-vpd-parser
In this commit, I have added code to handle file exceptions more effectively. By implementing proper exception handling, we can improve the robustness and reliability of the file operations within our codebase.
Here are the key changes made in this commit:
- Introduced a try-catch block around the file operation sections.
- Within the try block, added code to perform the necessary file operations.
- Implemented catch blocks to handle specific file exceptions.
- In each catch block, included appropriate error handling logic, such as logging the error message or displaying a user-friendly error message.
- Ensured that the catch blocks gracefully handle the exceptions and prevent the program from crashing or behaving unexpectedly.
By adding this exception handling code, we can anticipate and handle potential file-related errors gracefully, providing a smoother experience for users and preventing any unexpected crashes or data loss. This would also aid in debugging issues.
Change-Id: I621a7f0ba68d2c298e4fea0a9d3e21d1939cd090 Signed-off-by: jinuthomas jinu.joy.thomas@in.ibm.com
Call Me Jane Bethesda Because This Is Some Environmental Storytelling
Good god man, the tooling for testing ruins is super unfriendly. Map template place is really not the kind of tool I should be relying on. Also, did you know that /ruin/powered doesn't mean it SUPPORTS power, but means that it actually has MAGIC power? what the fuck?? Who thought this was okay and who HURT them..
Add files via upload
Here is an intuitive explanation of the jump algorithm, broken down into steps. This explanation aims to be memorable so that you can recall the logic at a glance:
Algorithm: Jump Game (Finding Minimum Jumps to Reach the End)
Imagine you're on a track with numbered tiles lined up in a row. Each number tells you the maximum number of tiles you can leap forward from that tile. Your goal is to reach the end of the track in as few jumps as possible.
Here's the strategy in memorable steps:
-
Starting Block:
- You're standing on the first tile, ready to start jumping.
-
Look Ahead:
- Before you jump, you check how far you could potentially leap from your current position and all the positions up to where you'd land. This is your scanning phase.
-
Plan Your Jump:
- Now, you decide where to land by picking the tile that gives you the furthest reach on your next jump. That doesn't mean you jump there directly. It just means you know it's the best tile to aim for.
-
Leap of Faith:
- You make the jump, but only to the next tile. You're not actually leaping all the way to the furthest tile you spotted. It's a controlled, one-tile hop.
-
Counting Hops:
- Every time you reach the furthest tile you previously noted, you count a hop. This is because you've committed to a sequence that you predicted would carry you forward optimally.
-
Repeat:
- You keep doing this—scanning, planning, hopping, and counting—until you're about to land on or pass the final tile.
-
Finish Line:
- When you've made the jump that takes you to or beyond the last tile, you've finished the game, and the number of hops you've counted is the minimum needed to get there.
Key Points to Remember:
- Scan and Plan: Always look ahead and plan your jumps based on potential future leaps, not just the immediate next step.
- Incremental Hops: You move forward one tile at a time, but you're always thinking several tiles ahead.
- Count Wisely: You only count a hop when you've landed on the furthest tile you planned to reach from your last count.
- Optimize Each Step: Every step you take is calculated to extend your reach, ensuring efficiency.
By keeping these memorable steps and key points in mind, you should be able to recall the essence of the jump algorithm anytime you need to.
Add Server Context deprecation warning (#27424)
As agreed, we're removing Server Context. This was never official documented.
We've found that it's not that useful in practice. Often the better options are:
- Read things off the url or global scope like params or cookies.
- Use the module system for global dependency injection.
- Use
React.cache()
to dedupe multiple things instead of computing once and passing down.
There are still legit use cases for Server Context but you have to be very careful not to pass any large data, so in generally we recommend against it anyway.
Yes, prop drilling is annoying but it's not impossible for the cases this is needed. I would personally always pick it over Server Context anyway.
Semantically, Server Context also blocks object deduping due to how it plays out with Server Components that can't be deduped. This is much more important feature.
Since it's already in canary along with the rest of RSC, we're adding a warning for a few versions before removing completely to help migration.
Co-authored-by: Josh Story josh.c.story@gmail.com
Fixes Space Dragon Attacking (#78964)
Fixes #78953
Basically the gist is that Space Dragon's special attack code was on
AttackingTarget()
rather than whatever the hell simple animals
controlled by clients use (I didn't bother enough to look into the chain
to remember this). This was the complete wrong proc to use, and it NEVER
got executed. Anyways, we just hook into the signal for whatever the
simple animal proc is as well as clean up all the code, make everything
pretty, and most importantly:
MAKE THE DAMN CODE WORK
Either someone did not test their code at all, or some weird esoteric change in the attack chain changed this somehow? I'm not sure when or why this happened but it is guaranteed to be fixed now.
The code cleanup and tinkering I did means that it's gonna be about 10% easier to port this over to a basic mob eventually (not doing a full refactor when this shit is this broken, the code added here is modular enough to the point where it's plug-n-play).
🆑 fix: Space Dragons can now, once again, tear down walls and eat corpses. They also have regained their special damage modifier when attacking mechs. /🆑
Removed sourcehut aka sr.ht for banning me (fuck you), improved and fixed errors in helper scripts, etc
Modifiers november 2023 (#3579)
- Chaos modifier: Modifiers that your hero didn't have before will now be prioritized when you random a modifier on respawn.
- Hyper Active modifier now provides only 5% cooldown reduction for all items.
- Hyper Active modifier now provides only 5% cooldown reduction for Dazzle Bad Juju, Earth Spirit Rolling Boulder and Faceless Void Time Walk.
- Hyper Lifesteal lifesteal and spell lifesteal against creeps reduced from 25% to 15%.
- Hyper Lifesteal: Fixed lifesteal and spell lifesteal getting amplified by healing amplification instead of lifesteal amplification and spell lifesteal amplification respectively.
- Octarine Soul cooldown reduction per point of Intelligence increased from 0.08% to 0.1%
- Octarine Soul modifier no longer stacks with Hyper Active and Pro-Active modifiers.
- Octarine Soul modifier no longer works for Dazzle Bad Juju.
- Octarine Soul modifier no longer works for items.
- Pro-Active modifier now provides only 10% cooldown reduction for Dazzle Bad Juju, Earth Spirit Rolling Boulder, Faceless Void Time Walk, Slark Shadow Dance, Terrorblade Sunder and Ursa Enrage.
The Brawlening: Unarmed fighting interactions for shoving, grabbing and nonlethal takedowns (not martial arts) (#79362)
I've tweaked some elements of unarmed fighting to give it additional interactions between the various components, bridging them into a more coherent system and focusing more strongly as tool for disabling opponents nonlethally.
Shoving guarantees that unarmed attacks will land while knocked off-balance (AKA when slowed by a shove).
Being off-balance means that you can be knocked down from a punch if you have taken enough brute and stamina damage combined (at least above 40).
Being off-balance makes you vulnerable to grabs while you have a moderate amount of stamina damage (30 damage), forcing you to have to resist even passive grabs. This pairs exceptionally well with tackling.
Grappling someone makes your unarmed attacks penetrate armor based on a
new limb value called unarmed_effectiveness
. This is something
shared by kicking.
unarmed_effectiveness
has also taken over the functionality of
unarmed_stun_threshold
, as well as accuracy calculations. Human
equivalent limbs (pretty much all of them except mushrooms and golems)
have a value of 10.
Now, unarmed_effectiveness
determines how accurately a given limb
makes unarmed attacks. Unarmed attacks have a base inaccuracy of 20%,
with effectiveness acting as a reduction to this value. (so for humans,
that's 20% - 10% before any value changes from brute and stamina
damage). It is also capped at 75% miss chance, just to avoid those weird
instances of two brawling fighters being incapable of finishing each
other off at a certain amount of damage and it being real awkward, like
it does currently.
It also determines the base probability of landing a knockdown punch. For humans, this is 10%.
For the most part, these two particular changes are roughly equivalent to the current values, just handled in a way that is more straightforward to understand from a code perspective.
In addition to the above, human equivalent limbs have higher damage floors for unarmed attacks. Arms deal 5-10 damage, while legs deal 7-15 damage. In addition, kicks also deal stamina damage, like punches do.
Golems and Mushroom People (who don't even use their limbs for their unarmed strikes because mushroom people start with a martial art) have very accurate punches, and their punches penetrate quite a bit of armor when they are entitled to that. They also have a high knockdown probability. This is partially because they previously already had these features due to the wonky math at play, but also because this is their big thing they are good at.
Carp mutation also got a big win out of this as well. If and when you actually manage to get that to work and matter.
My favorite thing in this game is the robustness of unarmed fighting. It's the part of the game that actually acknowledges the sandbox and environmental interaction in a big way. The only problem with the unarmed combat is that it is a bit disjointed, and often much weaker than using even the most pathetic weapon you can get your hands on unless you're using the stun loops available. Those loops get a bit boring, even if they're mostly all environmental (except for the lucky neckgrab finish). Giving more options generally means that even when not in an ideal position, you still have some options.
It also has some internal inconsistencies in design even in the same proc, like accuracy calculations and knockdowns, as well as weird splits in damage. So I decided to resolve that.
Now, every part of unarmed fighting has some relevance in the other parts. Predominantly, it is heavily favoured towards dealing stamina damage, making unarmed combat very favourable as a nonlethal method of taking someone down, which is something we currently lack considerably. While people may still opt to simply beat someone into actual crit rather than stop at stamina crit, the possibility is actually entirely available and supported now. No just banking on a lucky neckgrab after a shove.
Paying attention to damage dealt and thinking intelligently about how you apply combinations of effects allows even someone on the significant back foot an opportunity for a comeback if they know what they're doing against even armed opponents.
Separating accuracy and knockdown effectiveness from damage allows for more consistent design and readability, but also preventing weirdness ike tighter damage spreads increasing knockdown probabilities as well as increasing accuracy without the coder knowing why. This also lets us make unarmed attacks just that little bit stronger. Since unarmed attacks require more complicated combinations to work, I think this won't make them stronger than weapons necessarily, but it will make for more interesting swung fights.
🆑 add: With the flood of Chi within the Spinward Sector receding, various masters of The Tunnel Arts, colloquially known as 'Maint-fu Masters', have started to refine the basics of their martial techniques. New forms have started to develop within Spacestation 13's hidden maintenance dojos. add: Someone shoved off-balance makes them vulnerable to more guaranteed unarmed strikes, knockdowns from a successful punch, and more difficult to escape grabs. add: Grabbing someone (as well as kicking them while they're on the floor) makes them more vulnerable to taking unarmed attack damage, even if they have armor. balance: Unarmed strikes made with human-equivalent limbs have higher damage floors, meaning you overall do more damage on average while not increasing the overall damage potential. It's more consistent! refactor: Significantly changed how punching accuracy and knockdowns are calculated. balance: Golem and mushroom limbs are a lot more effective at punching as a result of these various changes. As they should be. /🆑
Merge #113809
113809: kvstreamer: add limit to how many eager batches are issued r=yuzefovich a=yuzefovich
kvstreamer: add limit to how many eager batches are issued
This commit fixes extremely suboptimal behavior of the streamer in the InOrder mode in some cases. In particular, previously it was possible for the buffered responses to consume most of the working budget, so the streamer would degrade to processing all requests effectively one BatchRequest with one Get / Scan at a time, significantly increasing the latency. For example, the query added as a regression test that performs 30k Gets across 10 ranges would usually take on the order of 1.5s (which is not great already since in the non-streamer path it takes 400ms), but in the degenerate cases it could be on the order of 20-30s.
Similar behavior could occur in the OutOfOrder mode too where we would issue more BatchRequests in which only one request could be satisfied (although in OutOfOrder mode the problem is not as severe - we don't buffer any results since we can always return them right away).
This problem is now fixed by imposing the limit on the budget's usage at
which point the streamer stops issuing "eager" requests. Namely, now,
when there is at least one request in flight, the streamer won't issue
anymore requests once limit * eagerFraction
is exceeded. This
effectively reserves a portion of the budget for the "head-of-the-line"
batch.
The "eager fraction" is controlled by a session variable, separate for each mode. The defaults of 0.5 for InOrder and 0.8 for OutOfOrder modes were chosen after running TPCH queries and the query that inspired this commit. These values bring the number of gRPC calls for the reproduction query from 1.5k-2k range to below 200 and the query latency to be reliably around 400ms.
I don't really see any significant downsides to this change - in the worst case, we'd be utilizing less of the available memory budget which is not that big of a deal, so I intend to backport this change. Also, setting the eager fractions to large values (greater than 1.0 is allowed) would disable this limiting behavior and revert to the previous behavior if we need it.
Fixes: #113729.
Release note (bug fix): Previously, when executing queries with index / lookup joins when the ordering needs to be maintained, CockroachDB in some cases could get into a pathological behavior which would lead to increased query latency, possibly by 1 or 2 orders of magnitude. This bug was introduced in 22.2 and is now fixed.
kvstreamer: increase default avg response multiple
This commit increases the default value for
sql.distsql.streamer.avg_response_size_multiple
cluster setting from
1.5 to 3.0. This setting controls the factor by which the current "avg
response size" estimate is multiplied and allows for TargetBytes
parameter to grow over time. In the reproduction query from the
previous commit it was determined that the growth might not be as quick
as desirable.
The effect of this change is as follows:
- if we have responses of varying sizes, then we're now likely to be more effective since we'll end up issuing less BatchRequests
- if we have responses of similar sizes, then we might pre-reserve too much budget upfront, so we'll end up with lower concurrency across ranges.
Thus, we don't want to increase the multiple by too much; however, keeping it at 1.5 can be quite suboptimal in some cases - 3.0 seems like a decent middle ground. This number was chosen based on running TPCH queries (both via InOrder and OutOfOrder modes of the streamer) and the reproduction query. (For the latter this change reduces the number of gRPC calls by a factor of 3 or so.)
Release note: None
Co-authored-by: Yahor Yuzefovich yahor@cockroachlabs.com