Skip to content

Latest commit

 

History

History
3909 lines (2892 loc) · 174 KB

2023-11-08.md

File metadata and controls

3909 lines (2892 loc) · 174 KB

< 2023-11-08 >

there were a lot of events recorded by gharchive.org of which 2,505,949 were push events containing 4,025,367 commit messages that amount to 314,298,771 characters filtered with words.py@e23d022007... to these 62 messages:

Wednesday 2023-11-08 00:05:25 by Mateusz Kies

i hate my life

nie jestem informatykiem jestem nieudacznikiem zyciowym ktoremu sie nic nie ulozylo i nic nie wyszlo w zyciu


Wednesday 2023-11-08 00:25:17 by QuickLode

Adds a Colony Synthetic variant, with bug fixes (#4760)

About the pull request

  1. should fix fax machine problem(thx forest)
  2. gives trucker synth the frontier jumpsuit(Thwomplert)
  3. adds Freelancer Synthetic. This Synth is one that was bought off a civi market and reprogrammed, or stolen and reprogrammed, or hacked, You get the point - its going with a band of freelancers. The idea behind it is that this synth's team is dead and they are just programmed as a merc for pay - hoping to someday find their boss boss and give the money as set up. I always thought about this one for a long time and decided to put him in the civilian category, where its hard to roll and also gives you freedom to choose your allegiance. In this case I hope that a freelancer synthetic will open up unique avenue of RP and allegiance. I've only explored it once ingame, but it was very good for RP! Hopefully people can recreate this success.

was hard to make this guy look cool and I also wasn't sure on what his loadout would be. I ended up giving him random generic stuff while looking like a beat up freelancer(missing the armor especially hurt his look, since thats the largest piece of a freelancer - the curiass, but I don't want to give armor for balance reasons) and no beret because its for a SL only.

as usual, if a synth wants to change RP avenues and don different clothes for different RP, no one would know the difference

Explain why it's good for the game

  1. bug bad
  2. a beat up UA laborer that so happens to be synthetic. you wouldn't expect it because there's so many similar looking people! exactly the job of a synth - to blend in.
  3. Freelancer colony synth hopefully will open up a unique avenue of RP. If they don't want to they can always ditch it - but its on a relatively rare and uncommon roll anyways.

Testing Photographs and Procedure

[Screenshots & Videos](https://cdn.discordapp.com/attachments/490668342357786645/1166307813719556187/image.png?ex=654a03cb&is=65378ecb&hm=7108218bbaab61c78c0bedcecbfdcc07bdf9db87a3fefe9fb94b28d3430cc815&)

Put screenshots and videos here with an empty line between the screenshots and the <details> tags.

Changelog

🆑 add: adds another Colony Synthetic variant, changes up some existing ones(trucker,CMB) fix: fixes a small problem with WY-Colony Synthetic access(thx forest), adds WY subtype of Synthetics for admin building/faxes fix: fixes problems with organic spawning ferret industries Trucker Synthetic /🆑


Wednesday 2023-11-08 00:59:54 by Imaginos16

Reworks The Visuals Of Independent And Nanotrasen Captains (#2453)

About The Pull Request

Does what it says in the title. This is a demented PR that touches a lot of things, but its main benefit is that now regular independent captains, cowboy independent captains, and nanotrasen captains have a unique identity.

Of those changed, it includes:

  • The Nanotrasen Captain (parade)

image

  • The Nanotrasen Captain (regular)

image

  • The Independent Captain (regular/parade)

image

  • The Independent Captain (western)

image

The PR also axes a bunch of unused, or frankly quite basic lieutenant outfits that were nothing more than set dressing with not much substance behind them. The roles were not removed for now, and they have appropriate outfits as a placeholder pending a full removal.

This also means that the Head of Personnel was slightly touched up, mostly by having a coat and hat similar to the western captain's when appropriate. The role itself is pending a full visual rework for later that is beyond the scope of this PR.

Speaking of removals, this also means that captain outfits/roles that were there as a legacy of removed ships, were finally axed for good. Goodbye deserter captain for Riggs variant number 4, you will not be missed.

This PR also touches several (a lot) of maps, mostly adding/removing outfits that were either missing, or didn't fit with the dress code of the vessel.

Also the PR fixes an oversight by @MarkSuckerberg by making the BYOND version warning an actual warning, instead of an error when compiling. Etto bleh.

Why It's Good For The Game

Visual cohesion is important, and dear fucking god if I see one more independent western captain not wearing the duster because it wasn't in the ship, I will weep, and my weeping will cause a biblical deluge.

Changelog

🆑 PositiveEntropy imageadd: Outfits for independent and Nanotrasen captains have been violently reworked. /🆑


Wednesday 2023-11-08 01:29:16 by jimmyl

new space ruin, the biological research outpost (#79149)

About The Pull Request

2023-10-21 18 02 39

adds this ruin to space ruin pool this is a shady (as NT always is) bioresearch outpost that got fucked up by an experiment this has like some puzzle aspect to it since you gotta find keycards and shit and press buttons to unlock shield gates this ends with you fighting a heart which if you defeat, destroys the blockade that prevents you from entering the outpost vault

also you can no longer literally just cut indestructible grilles or unanchor indestructible windows

new puzzle elements or something idk

variant of pressure plate that you cannot remove and it sends a puzzle signal cooler red puzzle doors that look very foreboding or something idk theyre for this ruin also puzzle blockades, which are indestructible dense objects that are destroyed if they receive a puzzle signal and also buttons and keycard pads for puzzles

2023-10-21.18-17-07.mp4
2023-10-21.18-19-20.mp4

stuff that throws electric shocks in a pattern, ignores insuls and only knocks down, and no you cannot just run past

2023-10-21.18-21-05.mp4

enemies

living floor, it can only attack stuff on top of it and it attacks until the victim is dead it is invincible to all but a crowbar, and it cannot move, and it remains hidden until a victim is in range

2023-10-21.18-23-15.mp4

living flesh, it can replace your limbs with itself the conditions for that are; the limb must have 20 or more brute, victim must be alive and dismemberable, the limb may not be torso or head, or the limb may not be living flesh alternatively it can replace a missing limb these are all checked with every attack they have 20 hp the limbs in question will sometimes act up, while passively draining nutrition, arms will randomly start pulling nearby stuff, legs may step randomly limbs when detached, turn into mobs and reactivate AI 2 seconds later. if the host is shocked, all living flesh limbs will detach, or if the host dies they will also do that

2023-10-21.18-29-10.mp4

Why It's Good For The Game

ruin variety is cool i think also the other things i added should be useful for other mappers for bitrunning or whatever

also bug bad for that one fix

Changelog

🆑 add: living floor, living flesh, and other stuff for the bioresearch outpost ruin add: bioresearch outpost ruin fix: you may not defeat indestructible grilles and windows with mere tools /🆑


Co-authored-by: Jacquerel hnevard@gmail.com


Wednesday 2023-11-08 01:29:16 by lizardqueenlexi

Basic Pirate NPCs (#79284)

About The Pull Request

Converts hostile pirate NPCs to basic mobs - specifically, a subtype of trooper. As their behavior is not meaningfully distinct from other troopers, this conversion mostly just sticks them on the existing AI behavior while keeping the rest the same.

Pirates do have one new thing going for them, though, to differentiate them from other troopers. They use the new plundering attacks component, which means that every time they land a melee attack, they steal money from the bank account of whoever they hit. This requires the target to be wearing an ID with a linked bank account, so it's not the hardest thing in the world to hide your money from them - but it's still something to be wary of! If killed, any mob with this component will drop everything they've stolen in a convenient holochip.

Why It's Good For The Game

Takes down 5 more simplemobs, and (I think) converts the last remaining trooper-type enemy to be a basic trooper. (It's possible there's more I've forgotten that could use the same AI, though.)

The money-stealing behavior is mostly good because I think it's funny, but it also makes the pirates something a little distinct from "yet another mob that runs at you and punches you until you die". They still do that, but now there's a little twist! This can be placed on other mobs too, if we want to make any other sorts of thieves or brigands.

Changelog

🆑 refactor: Pirate NPCs now use the basic mob framework. They'll be a little smarter in combat, and if you're wearing your ID they'll siphon your bank account with every melee attack! Beware! Please report any bugs. /🆑


Wednesday 2023-11-08 01:40:55 by Thibault Charbonnier

refactor(proxy-wasm) improve pwexec resurrection and instance lifecycle

The main goal of this overhaul is to simplify on_context_create, make it fully re-entrant and properly handle instance recycling at the same time.

The way to do so, in my opinion, was to move pwexec creation where rexec already was. In other words, always lookup the context id in the instance rbtree, and if not found, create it. This means that surrounding code also needed big overhauls. It also removes the reference counting poor man's GC of the older implementation. The code became really ugly by then so I took the time to also review this module's code structure instead of making a very ugly commit.

This new ngx_proxy_wasm.c file should be much easier to read and follow now.

One change I do not fully like is moving the next_id to a global counter, but we do not have a "global proxy-wasm conf" object yet. I also started thinking about pre-allocating a number of pwexecs (like worker_connections) and use free/busy queue that all filter chains can dip into to get a context id + context memory zone. Perhaps for a later time.


Wednesday 2023-11-08 01:49:18 by Adam Daley

Update to Sentry v4 (#1780)

  • Bump minimum PHP version to 8.1

  • Missed these too

  • Shit's broke and stupid as fuck

  • Update SentryHelper.php

  • Update package-lock.json

  • I did the fixing


Co-authored-by: Belle Aerni belleaerni@gmail.com


Wednesday 2023-11-08 02:01:45 by Erika Fox

Revert "Merge remote-tracking branch 'upstream/master' into fuck-you"

This reverts commit 02e475c4ef5ea4fba3d50a5990a4f069233507a3, reversing changes made to 7e98858138b6ef5a32c21ed93456203720740a13.


Wednesday 2023-11-08 02:01:45 by Erika Fox

Revert "Revert "Merge remote-tracking branch 'upstream/master' into fuck-you""

This reverts commit 237900bddf859c8be3f457816caaac1a17b627eb.


Wednesday 2023-11-08 02:01:45 by Erika Fox

Revert "Revert "Revert "Merge remote-tracking branch 'upstream/master' into fuck-you"""

This reverts commit d6c100992e65f6a9b87f6e60db83cc0cb54fab53.


Wednesday 2023-11-08 02:03:30 by MowFord

Cait Sith Avatar:

  • Cait sith has proper name prefix and named properly to be "Cait Sith" instead of "The CaitSith"
  • BPs Implemented
    • Regal Slash (BP:Rage): 3-hit physical
    • Level ? Holy (BP:Rage): aoe magical
      • Rolls a die and does dmg proportional to roll
      • Only does damage if the target's level is divisible by the roll
    • Mewing Lullaby (BP:Ward): AoE lullaby that resets TP
    • Eerie Eye (BP:Ward): conal silence/amnesia with appropriate elemental resist check for amnesia, but retail does light check for silence
    • Reraise II (BP:Ward): single-target 60-minute reraise II buff for any party member
    • Raise II (BP:Ward): single-target raise II for any party member
    • Altana's Favor (BP:Ward): 2-hour ability gives arise to all party members in range (Arise and reraise III with infinite duration)

Wednesday 2023-11-08 02:09:25 by magicono43

Implemented Pretty Basic Map Weather Simulation, 11/7/2023

Implemented Pretty Basic Map Weather Simulation. Got more tracking stuff for climate values based on the current location on the map. As well as a pretty basic "weather simulation" for when on the world-map. It needs alot of work honestly, but it does work mostly atleast. One of the main issues, that I don't know if I'll be able to fix atm atleast, is that when fast-traveling the weather changes, but often not the value I'm expected. Likely due to the vanilla DFU weather changing logic, which is likely just time based when it decides to change the current weather, but seems like my attempt at changing this does not work, so yeah. Other than that, I think I'm happy enough with the basic weather simulation stuff. So likely the next thing I'm going to try to work on is refining the view range/radius thing, Line of sight, whatever you want to call it, basically alter this value depending on stuff like the travel mode and current weather and such, will take some work but hopefully will work out, after that I'm not sure, but will see, etc, 11/7/2023.


Wednesday 2023-11-08 02:22:04 by treckstar

People listen up don't stand so close, I got somethin that you all should know. Holy matrimony is not for me, I'd rather die alone in misery.


Wednesday 2023-11-08 02:29:14 by vampirebat74

Adds Red Shoes (#901)

Mr. Heavenly's Abnormality Jam Entry #1

Records

uncommented weapon

Finishing touches

Design rework

adds ego gift and inhands

New sprites!

uncommented sfx

insanity fix

quieter sound loop

Fixes some shit

fix linters

requested changes

Adds Red Shoes

Mr. Heavenly's Abnormality Jam Entry #1

Records

uncommented weapon

Finishing touches

Design rework

adds ego gift and inhands

New sprites!

uncommented sfx

insanity fix

quieter sound loop

Fixes some shit

fix linters

requested changes

Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm

fixes suit check in assimilate() proc

Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com

Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm

fixes dismembering

Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com

Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm

Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com

breach is more dangerous

compiles

bug fix

fixes simple mob

bug fixes

Panic fixed!!!!

stuff

wayward records

Update code/modules/paperwork/records/info/he.dm

Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com

Update code/modules/mob/living/simple_animal/abnormality/he/red_shoes.dm

Co-authored-by: [̸R̵e̵d̴a̴c̶t̸e̸d̴]̵ 61567407+LanceSmites328@users.noreply.github.com

attribute bonus

requested changes

Co-authored-by: Mr.Heavenly davidx3adamhunt@gmail.com


Wednesday 2023-11-08 02:52:09 by Herb Sutter

Correct copy/move for union

By writing separate construction and assignment, plus the new feature of suppressing assignment to a member by writing member = _ ; (now allowed only in assignment operators).

I do realize that's an "opt-out" which I normally prefer to avoid, but:

  • I considered and decided against (for now) the alternative of not having assignment be memberwise by default. I want to keep the (new to Cpp2) default of memberwise semantics for assignment as with construction. I think that's a useful feature, and normally if you do assign to a member it doesn't arise, and so I think it makes sense to explicitly call out when we're choosing not to do any assignment at all to a member before doing other assignment processing. We'll get experience with how it goes.

  • _ is arguably natural here, since it's pronounced "don't care." There too, we'll see if that is natural generalized, or feels strained. For now it feels natural to me.


Wednesday 2023-11-08 03:15:47 by necromanceranne

Hey what if I made Sleeping Carp better at nonlethal takedowns and also deflect with combat mode instead of throw mode (but cost more) (#79517)

About The Pull Request

It's been a hot minute hasn't it?

When I initially reworked Sleeping Carp, we didn't have combat mode. Now that we do, and that Sleeping Carp has substantially less defensive power to justify having to make a choice between deflection and attacking, it's probably about time we updated this aspect back to what it was before my rework. Sorta.

Now, we can have all the deniability of the previous method, while also letting you reliably protect yourself from ranged attacks at all times while it matters. Because of this, I increased the price up to 17 TC because of this change just to be on the safe side. The higher uptime of projectile immunity while also being able to attack during that time makes this a lot stronger overall.

Secondly, Sleeping Carp presently just isn't as good as a good ol' baton. It takes a lot more hits to accomplish the same task that a baton can. Many people feel like they can't even reasonably fight anyone for fear of the baton, or they would rather use a baton and kill someone at their leisure. So we've updated some of the moves in order to facilitate Sleeping Carp as a substantial contender for 1v1 fighting, and lessen the need for a baton by adding a lot more Stamina damage overall to the various attacks;

Keelhaul: Now a Shove Shove combo. Does literally zero lethal damage, but now temporarily blinds and dizzies the target as well as its previous effects. The amount of lethal damage it did was...extremely small, so this isn't a particularly big loss.

Grabs and Shoves: Deal some amount of stamina damage (20). You need to be in combat mode in order to perform these special attacks (more deniability). Grabbing someone while they have 80 Stamina damage or more will cause them to fall unconscious. Yes, I really did just want to add a Vulcan Nerve Pinch, what do you want from me?

That's it actually. Oh, I guess they are heavy sleepers now too. Because its funny.

Why It's Good For The Game

I often get told (read: thrown various insults and slurs at me while mentioning this as the justification) that Sleeping Carp is not very strong anymore since it lost all that invisible armor I added way back + I removed the stuns in my initial rework. This made some people upset (I think at least one person wished for my death).

So, having given it at least 2 years, I wanted to recapture parts of what made the older Sleeping Carp (before my rework) strong, some of the benefits of the new version, and introduce a brand new aspect; nonlethal takedowns. This makes it beneficial for pacifists, as well as for kidnapping.

This should not meaningfully make Sleeping Carp any stronger against the things that typically ruin its day. I suspect in a straight joust with a baton, Sleeping Carp will still struggle. But against what should be its strong points (lone targets and ranged weapons), it will be strong once again rather than clumsily unable to do very much at all.

Changelog

🆑 balance: Harnessing Shoreline Quay (bluespace energy, probably), a mystical energy (total bullshit) that permeates the Astral Waterways (bluespace quantum dimensions, probably), Sleeping Carp users can now once against deflect projectiles with their bare hands when focused in on battle (in combat mode). balance: The Keelhaul technique is now nonlethal (a philosophical acknowledgement of the familial bond of sleep and death), but causes the target to become temporarily blind and dizzy along with its previous effects. balance: Sleeping carp users, while in combat mode, deal Stamina damage with their grabs and shoves. If the target of their grab has enough Stamina damage (80), they are knocked unconscious from a well placed nerve pinch. balance: Sleeping carp users find it very hard to wake up once they fall asleep.... /🆑


Wednesday 2023-11-08 03:36:15 by Manatee

Do NOT make a new accent. Worst mistake of my life, holy fuck. (#20685)

  • empty template so i can work from another PC

  • holy fuck

  • essential

  • idk how accents are handled. its poorly documented

  • sure

  • augh

  • amtbe

  • Update fugitive_outfits.dm

fuk that


Wednesday 2023-11-08 04:15:09 by SkyratBot

[MIRROR] swaps one of the fridges in snowcabin to be in line with the rest [MDB IGNORE] (#24754)

  • swaps one of the fridges in snowcabin to be in line with the rest (#79414)

About The Pull Request

In truth, this is an IDED PR (this is not at all sarcasm, and as we all know nobody would lie on the internet) that came about from a round i just got done playing wherein i was in snowcabin trying to cook up some food for fun, well wouldn't you know it i couldn't open one of the fridges, what gives? well i got to thinkin it has to do with the fridge type used, for some reason the fridge that holds the universal enzyme uses the freezer/fridge/kitchen type instead of the fridge/open type that the other two do, so i went ahead and just changed it off to the other fridge types so now anyone can open it.

Why It's Good For The Game

its a bit stupid to have a single fridge thats different from the rest for no discernable reason, i can't think of any reason universal enzyme would need to be guarded ever, you could just say "well why not go back onto the station and grab some if the fridge is locked", well if for some reason i'm barred from the station i want to be able to use as many tools within my reach as possible preferably without many hoops, and this ones unnecessary.

Changelog

fix: changes the type of fridge used to hold the universal enzyme in the snowcabin gateway's kitchen, letting everyone access it like the rest of the fridges.

/:cl:

  • swaps one of the fridges in snowcabin to be in line with the rest

Co-authored-by: Donglesplonge 120208006+Donglesplonge@users.noreply.github.com


Wednesday 2023-11-08 05:06:48 by Zonespace

M707 "Vulture" Anti-Materiel Rifle (#4253)

About the pull request

The M707 is not made player-accessible in this PR.

Adds the M707 "Vulture" anti-materiel rifle to the game. Design doc here.

The M707 is meant to take the place of a heavy support weapon, not unlike the mortar. It is a 20mm bolt-action rifle, capable of loading from 4-round magazines. Each round does 400 damage with full AP (50), but it is not a simple task to fire the weapon. The gun, being as high-caliber as it is, will immediately break your arm & hand if you do not fire it without use of the built-in bipod. In addition, its accuracy is massively reduced below its ideal range (10 tiles), which means the scope is necessary to be used.

The scope does not function like a regular scope. (see screenshot section for details) Instead, it shows a 5x5 area (the rest is blacked out) 12 tiles ahead, with an aiming reticle in the center. The aiming reticle will drift over time within the 5x5, requiring you to re-adjust or use the Hold Breath ability to temporarily stop the sway. If you open up the scope's UI, you will be able to modify the scope and the reticle's location, one tile at a time, very slowly.

To assist with this, the Vulture comes with a spotting scope & tripod. A secondary user is able to assemble and use the spotting scope. The scope is a complement to the Vulture's, allowing a communicative team to become far more effective. The spotter's view, on use, will be locked to the location of the Vulture scope. However, the spotter's view is not locked to a 5x5 area, instead getting a view of the full area, on top of an extra 2 tiles (in each direction) of view range. Finally, both the spotter and sniper's scopes have night vision equivalent to an SG's goggles.

The bullet itself is a powerful beast. Powerful enough to pierce walls, people, and barricades, but with 2 caveats. The first is that every wall/cade penetration removes 75 damage from the round, and any cades/tables that the round passes over will be immediately destroyed by the round. In addition, anyone in a large range will hear the report of the rifle sound and the direction it came from.

Update as of 8/31: Vulture and its spotter scope now require a pamphlet to use (a pamphlet gives the trait needed to use both), guncase spawns with 2.

Explain why it's good for the game

It's a unique weapon that encourages communication inside a team, while simultaneously not contributing to the IFF ungaball. The weapon promotes thoughtful gameplay and repositioning to be able to hit a target without friendlies getting in the way or getting overrun.

Screenshots

Screenshots & Videos

Scope UI

image

The vulture's scope.

image

Sniper's nest

image

Closeup

image

Spotter's vision

image

Changelog

🆑 Zonepace, Thwomper add: Added the M707 "Vulture" anti-materiel rifle. Not currently player-obtainable. Credit to Tophat and Kaga for the lore description. /🆑


Co-authored-by: harryob me@harryob.live


Wednesday 2023-11-08 06:14:52 by Vitalii Dmyterko

[Security Solution][Detection Engine] improves new terms rule for multiple fields (#157413)

Summary

As described in our README for new terms rule type:

Runtime field supports only 100 emitted values. So for large arrays or combination of values greater than 100, results may not be exhaustive. This applies only to new terms with multiple fields. Following edge cases possible:

  • false negatives (alert is not generated) if too many fields were emitted and actual new values are not getting evaluated if it happened in document in rule run window.
  • false positives (wrong alert generated) if too many fields were emitted in historical document and some old terms are not getting evaluated against values in new documents.

To avoid this and deliver the better experience for our customers, this PR is moving from current implementation(emitting aggregated values for multiple new terms fields) towards using composite aggregation for each page from phase 1, split in chunks by 500. This allowed to be done due order of composite aggregation results

NOTE: implementation for a single new terms filed is the same, due to performance reasons

Performance measurements

Implementation | Shards | Docs per shard | Simultaneous Rule Executions | Fields cardinality | Rule Execution Time Runtime field(current implementation) | On week work -- | -- | -- | -- | -- | -- | -- array of unique values length 10 |   |   |   |   |   |   Terms 1 field | 10 | 900,000 | 1 | 100,000 | |   Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 30s | 41s Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 40s | 56s

Implementation | Shards | Docs per shard | Simultaneous Rule Executions | Fields cardinality | Rule Execution Time Runtime field(current implementation) | On week work 1,000 per batch | On week work 500 per batch -- | -- | -- | -- | -- | -- | -- | -- Terms 2 fields | 10 | 9,000,000 | 1 | 100,000 | 19s | 41s | 35s Terms 3 fields | 10 | 9,000,000 | 1 | 100,000 | 21s | 52s| 47s CPU % | | | | | 400-450% |500-600% | 400-450%

I selected size of the chunk as 500, since it's a bit faster and less load on CPU

Considerations on parallel composite search requests in phase 2

When running composite search requests in parallel, noticed significant CPU increase in Elasticsearch ~ 1,000% for 2 requests in parallel against ~ 500% for single. Where win in performance was not that big: ~ 35s for 2 in parallel, 43s for a single request. I think, having only one request is the better option to go, that will prevent unnecessary CPU usage

Test cases

I've added several functional test cases, that ensures, no missing/false positives alerts are occurring. Applied to the old implementation, they would fail

Retry on max_clause_count error

Because we create query, that can have few thousands clauses, it is possible it may fail due to the maximum number of allowed clauses I implemented retry that: If request fails with batch size of 500 (default value), we will try to reduce it in twice per each retried request, up until 125. Per ES documentation, max_clause_count min value is 1,000 - so with 125 we should be able execute query below max_clause_count value

Checklist

Delete any items that are not applicable to this PR.


Co-authored-by: kibanamachine 42973632+kibanamachine@users.noreply.github.com


Wednesday 2023-11-08 06:27:12 by That One Seong

The fix everything update

Because... WHOO boy did I have a lot to explain for.

Like the fact that throwing in all those button checks destroyed performance on most (single core) boards! We not only brought the special input handling to its own function (more on that), but fixed it so it wasn't spamming return signals every frame, which artificially noticeably lowered camera tracking rate. Now it should be as smooth as the original fork, with less (if any) difference between single and dual-core devices.

The main thing though is the buttons handling. Notes in the code, but to the point, it's not LightgunButtons' fault; it's the keyboard (and ONLY the keyboard btw) inputs getting "lost in traffic" alongside camera updates. We now handle those in a slightly hacky, but infinitely more consistent manner without breaking performance, at the cost of potential jitter while pressing keyboard buttons in motion. The values chosen have been picked to mitigate this pause as much as possible.

And with that, the dual core update is rendered kinda moot? Lol. But it handles button inputs anyways and has a chance of reducing latency, so it's maintained and shouldn't cause issues either way.

FUN FACT: didjuknow that the reason the original dual core input handling caused the camera to die is because the second core is so fast that it completely jammed the USB interface? Yeah, it's that stupid quick. Even with the delay-based timing, there's no noticeable pause, so still a benefit (if a theoretical one) to having it.


Wednesday 2023-11-08 06:39:34 by Kuba Wojciechowski

[SQUASHED] core: Blacklist pixel system feature from Google Photos

We want to include the P21 experience flag to enable new features,
however it seems like Google Photos uses it to decide whether to use the
TPU tflite delegate. There doesn't seem to be any fallback so we need to
make sure the feature is not exposed to the app so that a normal
NNAPI/GPU delegate can be used instead.

Test: Google Photos editor with PIXEL_2021_EXPERIENCE feature in product
Signed-off-by: Kuba Wojciechowski <nullbytepl@gmail.com>
Change-Id: I51a02f8347324c7a85f3136b802dce4cc4556ac5

commit 67eb31b3bb43d06fcc7f6fdb2f92eb486451cae6 Author: kondors1995 normandija1945@gmail.com Date: Thu Jun 9 17:39:25 2022 +0530

Core: Extend Pixel experience Blacklist For Google Photos

Turns out having these brakes Original quality backups.
Since these indicate that the device is pixel 4 with in the turn brakes device spoofing as OG pixel

Change-Id: I336facff7b55552f094997ade337656461a0ea1d

commit 508a99cde60b73dc3f1e843d569bca31def35988 Author: ReallySnow reallysnow233@gmail.com Date: Fri Dec 31 16:40:23 2021 +0800

base: core: Blacklist Pixel 2017 and 2018 exclusive for Google Photos

* In this way can use PixelPropsUtils to simulate the Pixel XL prop
  method to use the unlimited storage space of Google Photos
* Thanks nullbytepl for the idea

Change-Id: I92d472d319373d648365c8c63e301f1a915f8de9

commit aaf07f6ccc89c2747b97bc6dc2ee4cb7bd2c6727 Author: Akash Srivastava akashniki@gmail.com Date: Sat Aug 20 19:04:32 2022 +0700

core: Pixel experience Blacklist For Google Photos for Android 13

* See, in Android 13 pixel_experience_2022_midyear was added, which needs to be blacklisted aswell

Change-Id: Id36d12afeda3cf6b39d01a0dbe7e3e9058659b8e

commit 9d6e5749a988c9051b1d47c11bb02daa7b1b36fd Author: spezi77 spezi7713@gmx.net Date: Mon Jan 31 19:17:34 2022 +0100

core: Rework the ph0t0s features blacklist

* Moving the flags to an array feels more like a blacklist :P
* Converted the flags into fully qualified package names, while at it

Signed-off-by: spezi77 <spezi7713@gmx.net>
Change-Id: I4b9e925fc0b8c01204564e18b9e9ee4c7d31c123

commit d7201c0cff326a6374e29aa79c6ce18828f96dc6 Author: Joey Huab joey@evolution-x.org Date: Tue Feb 15 17:32:11 2022 +0900

core: Refactor Pixel features

* Magic Eraser is wonky and hard to
  enable and all this mess isn't really worth
  the trouble so just stick to the older setup.

* Default Pixel 5 spoof for Photos and only switch
  to Pixel XL when spoof is toggled.

* We will try to bypass 2021 features and Raven
  props for non-Pixel 2021 devices as apps usage
  requires TPU.

* Remove P21 experience system feature check

Change-Id: Iffae2ac87ce5428daaf6711414b86212814db7f2 Signed-off-by: Hưng Phan phandinhhungvp2001@gmail.com


Wednesday 2023-11-08 07:14:20 by Peter Zijlstra

sched/core: Fix ttwu() race

Paul reported rcutorture occasionally hitting a NULL deref:

sched_ttwu_pending() ttwu_do_wakeup() check_preempt_curr() := check_preempt_wakeup() find_matching_se() is_same_group() if (se->cfs_rq == pse->cfs_rq) <-- BOOM

Debugging showed that this only appears to happen when we take the new code-path from commit:

2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")

and only when @cpu == smp_processor_id(). Something which should not be possible, because p->on_cpu can only be true for remote tasks. Similarly, without the new code-path from commit:

c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")

this would've unconditionally hit:

smp_cond_load_acquire(&p->on_cpu, !VAL);

and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this would result in an instant live-lock (with IRQs disabled), something that hasn't been reported.

The NULL deref can be explained however if the task_cpu(p) load at the beginning of try_to_wake_up() returns an old value, and this old value happens to be smp_processor_id(). Further assume that the p->on_cpu load accurately returns 1, it really is still running, just not here.

Then, when we enqueue the task locally, we can crash in exactly the observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from the wrong CPU, therefore we'll iterate into the non-existant parents and NULL deref.

The closest semi-plausible scenario I've managed to contrive is somewhat elaborate (then again, actual reproduction takes many CPU hours of rcutorture, so it can't be anything obvious):

				X->cpu = 1
				rq(1)->curr = X

CPU0				CPU1				CPU2

				// switch away from X
				LOCK rq(1)->lock
				smp_mb__after_spinlock
				dequeue_task(X)
				  X->on_rq = 9
				switch_to(Z)
				  X->on_cpu = 0
				UNLOCK rq(1)->lock

								// migrate X to cpu 0
								LOCK rq(1)->lock
								dequeue_task(X)
								set_task_cpu(X, 0)
								  X->cpu = 0
								UNLOCK rq(1)->lock

								LOCK rq(0)->lock
								enqueue_task(X)
								  X->on_rq = 1
								UNLOCK rq(0)->lock

// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
  X->on_cpu = 1
UNLOCK rq(0)->lock

// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb();			// wake X
				ttwu()
				  LOCK X->pi_lock
				  smp_mb__after_spinlock

				  if (p->state)

				  cpu = X->cpu; // =? 1

				  smp_rmb()

// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
  X->on_rq = 0

				  if (p->on_rq)

				  smp_rmb();

				  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]

				  smp_cond_load_acquire(&p->on_cpu, !VAL)

				  cpu = select_task_rq(X, X->wake_cpu, ...)
				  if (X->cpu != cpu)
switch_to(Y)
  X->on_cpu = 0
UNLOCK rq(0)->lock

However I'm having trouble convincing myself that's actually possible on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu observes ->state != RUNNING, it must also observe ->cpu != 1.

(Most of the previous ttwu() races were found on very large PowerPC)

Nevertheless, this fully explains the observed failure case.

Fix it by ordering the task_cpu(p) load after the p->on_cpu load, which is easy since nothing actually uses @cpu before this.

Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Paul E. McKenney paulmck@kernel.org Tested-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net Change-Id: Idd54334615da4c78698ca8b3b12b514ae9d8360f Signed-off-by: Alexander Winkowski dereference23@outlook.com


Wednesday 2023-11-08 08:12:20 by voznesenskym

Update base for Update on "AOTAutograd: handle set_(), detect metadata mutations that cancel out"

This should be enough to get voznesenskym 's FSDP branch to plumb set_() through AOTAutograd properly and have everything properly no-op out. Main changes are:

(1) graph break on aten::set_.source_Tensor_storage_offset (we could support it but it isn't needed, seems safer to graph break)

(2) Functionalization: add a "proper" functionalization kernel for aten::set_.source_Tensor. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the FunctionalTensorWrapper when a given tensor has been mutated by a set_() call.

(3) AOTAutograd: I added a new field, InputAliasInfo.mutates_storage_metadata, so we can distinguish between "regular" metadata mutations, and metadata mutations due to set_() calls. This is mainly because at runtime, one requires calling as_strided_() to fix up metadata, while the other requires calling set_().

(4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same).

I also killed was_updated() and was_metadata_updated(), and replaced them with (existing) has_data_mutation() and (new) has_data_mutation(), which can more accurately distinguish between data-mutation vs. set_() calls vs. metadata-mutation

This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example:

def f(x):
    x_view = x.view(-1)
    x.set_(torch.ones(2))
    x_view.mul_(2)
    return

If you have an input that experiences both a data-mutation and a x_old.set_(x_new) call, there are two cases:

(a) the data mutation happened on the storage of x_new. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual copy_() on that other graph input

(b) the data mutation happened on the storage of x_old. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like:


def functionalized_f(x):
    x_view = x.view(-1)
    # set_() desugars into a no-op; later usages of x will use x_output
    x_output = torch.ones(2)
    # functionalize the mutation on x_view
    x_view_updated = x.mul(2)
    x_updated = x_view_updated.view(x.shape)
    # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation
    # We need to return both updated tensors in our graph
    return x_updated, x_output
def runtime_wrapper(x):
    x_data_mutation_result, x_set_mutation_result = compiled_graph(x)
    # First, perform the data mutation on x's old storage
    x.copy_(x_data_mutation_result)
    # Then, swap out the storage of x with the new storage
    x.set_(x_set_mutation_result)

There are two things that make this difficult to do though:

(1) Functionalization: the functionalization rule for set_() will fully throw away the old FunctionalStorageImpl on the graph input. So if there are any mutations to that FunctionalStorageImpl later on in the graph, the current graph input won't know about it. Maybe we can have a given FunctionalTensorWrapper remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated.

(2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying.

It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need FunctionalTensorWrapper to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

[ghstack-poisoned]


Wednesday 2023-11-08 09:11:51 by SkyratBot

Makes the Regal Condor realistically simulate being shot dead with a high caliber hand cannon by making it HITSCAN [MDB IGNORE] (#24149)

  • Makes the Regal Condor realistically simulate being shot dead with a high caliber hand cannon by making it HITSCAN (#78674)

About The Pull Request

The Regal Condor come with a magazine and ammo already inside.

The recipe for the magazine now no longer needs TC, but does need donk pockets (sponsored murder gear, you see) and a hell of a lot more materials per magazine (you're looking at like 40 sheets of various materials all up). It also needs you to make the Condor first. But it comes preloaded with ammo.

The Condor is 1 whole TC more expensive. Also needs some metal. The old recipe is there in spirit.

The Regal Condor and the magazines come with 10mm Reaper bullets. They're high damage. They're high AP. They are also hitscan.

Why It's Good For The Game

Apparently people don't like the Condor. Too much effort for not enough reward. After all, revolvers exist. 'It must be a joke' they say! 'It's joke content! I went to all that effort to make it for nothing! That slut Anne tricked us!'

Wrong, bitch.

If you want the Condor to make you shit yourself the moment someone with it appears on the screen, then fine!

You get what you fucking deserve.

Changelog

🆑 balance: Despite earlier reports suggesting that the famous lethality of the Regal Condor was largely a myth, there has been rumors that the gun has once again started to display its true killing potential on any station that it 'manifests'. /🆑

  • Makes the Regal Condor realistically simulate being shot dead with a high caliber hand cannon by making it HITSCAN

Co-authored-by: necromanceranne 40847847+necromanceranne@users.noreply.github.com


Wednesday 2023-11-08 11:01:12 by Bjornestad

AboutPage routing

plz work fuck you gitignore idea balls


Wednesday 2023-11-08 11:02:20 by Adithya R

[DNM][HACK] telephony: Force Class 0 SMS to Class 1

This kills Flash SMS messages. Fuck you airtel

Change-Id: Ifb0c9e8bae5c12868d178fbdaeceb2cc72a0ffb6 Signed-off-by: Sageofd6path mail2anirban95@gmail.com


Wednesday 2023-11-08 11:10:52 by Coxswain

Adds distorted form

adds some basic features

new 1% sprite dropped

text update

Finished work mechanics

adds basic breaching

should fix linters a bit

It works!!!! Kinda...

adds crumbling armor and hammer of light (beta)

adds cool and important stuff

does a thing

adds apostle and tutorial abnorms

adds the stuff

might fix linters

adds a console proc

adds crumbling armor's proper attack and red queen

does some things

should fix linters

adds a blubbering toad transformation

adds more attacks

brings the tier up

adds big boy attacks

updates some sfx, fixes bugs

adds jump attacks

why does linters care about indentation on comments?

adds suggested changes

should fix some stuff

adds info

adjusts damage numbers

updates an effects and fixes transformations

updates blacklist

lowers stack damage

lowers max qlip to 3

adds bloodbath

adds a new AOE attack

adds halberd apostle

blacklists DF from pink midnight

fixes weirdness

requested changes and sound design improvement

removes armortype

removes armortype for real

damage coeff update

makes suggested changes

updates comments

adds procs

adds stuff


Wednesday 2023-11-08 11:14:47 by CharlesWedge

The Hive Awakens (#5940)

Oh No More Robots

There is actually less paths for the hivebots. They are actually some of the most primitive mobs on the codebase. So it was high time they were given a facelift. As I said with my previous mob update robots are a good alternative as mobs compared to humanoids, and with the hivebots we can present a threat of hostile machine intelligence to round out the existing threats of pirates, mercs, aliens beasts and the supernatural. Once more these robots are also far mroe generalist then the existing robot varieties and as most types of them are not very dangerous they can be released on civilian crew without fear of them causing extreme damage,

Changelog

🆑 add: A couple new varieties of both melee and ranged hivebots removed: redundant hivebot varieties tweak: siegebots now have sniper range fitting their name, their attack has been nerfed (holy fuck the one shoot explode on contact grenades with a base attack of 10... that's 1 frag grenade a second!!!) fix: hivebots now use their various cataloguer entiries sprites: hivebot types are now more visually distinct /🆑


Wednesday 2023-11-08 11:15:17 by Sebastian Berg

API: Allow comparisons with and between any python integers

This implements comparisons between NumPy integer arrays and arbitrary valued Python integers when weak promotion is enabled.

To achieve this:

  • I allow abstract DTypes (with small bug fixes) to register as loops (ArrayMethods). This is fine, you just need to take more care. It does muddy the waters between promotion and not a bit if the result DType would also be abstract. (For the specific case it doesn't, but in general it does.)
  • A new resolve_descriptors_raw function, which does the same job as resolve_descriptors but I pass it this scalar argument (can be expanded, but starting small).
    • This only happens when available, so there are some niche paths were this cannot be used (ufunc.at and the explicit resolution function right now), we can deal with those by keeping the previous rules (things will just raise trying to convert).
    • The function also gets the actual arrays.dtype instances while I normally ensure that we pass in dtypes already cast to the correct DType class. (The reason is that we don't define how to cast the abstract DTypes as of now, and even if we did, it would not be what we need unless the dtype instance actually had the value information.)
  • There are new loops added (for combinations!), which:
    • Use the new resolve_descriptors_raw (a single function dealing with everything)
    • Return the current legacy loop when that makes sense.
    • Return an always true/false loop when that makes sense.
    • To achieve this, they employ a hack/trick: get_loop() needs to know the value, but only resolve_descriptors_raw() does right now, so this is encoded on whether we use the np.dtype("object") singleton or a fresh instance! (Yes, probably ugly, but avoids channeling things to more places.)

Additionally, there is a promoter to say that Python integer comparisons can just use object dtype (in theory weird if the input then wasn't a Python int, but that is breaking promises).


Wednesday 2023-11-08 11:24:18 by pes18fan

[fix] add missing line of code in values_equal()

In the earlier commit I said over 50 tests were failing. Turns out that was because I missed a return statement after an if statement, and if that statement evaluated to false the program would reach a call to unreachable() which was causing SIGILLs making the tests fail. I kinda wish Odin would say something like "You told me this code was unreachable you idiot" instead of an ambigious error, but then again it's mainly my stupidity anyways.

Problem fixed now though, and all the tests are passing!


Wednesday 2023-11-08 12:22:54 by Nathan-Dent

Added main menu page

New main menu page and styling has been added -- this template (the stuff surrounding the actual menu items) is going to be what we probably use for the whole shopping/menu experience up until the cart stage. TODO -- I fucked up and I think we should move the logo into the header along with the "Menu" title. I set up a grid to do that but haven't had a chance to do it yet, but that's what it's there for. I'll have to refactor a lot of the code to fix the layout, but it'll look way better once that's done.


Wednesday 2023-11-08 12:56:37 by Octus

Adds eight vox hairstyles because why not and stuff (#22573)

  • god i hate myself

  • donedone

  • fixxxxx


Wednesday 2023-11-08 12:56:37 by Octus

Removes Beach Bums, Adds Althland Excavation Pit (#22315)

  • replace

  • Update lavaland_biodome_beach.dmm

  • fixes

  • we are so BACK bros

  • oh yeah, now were cookin

  • turf

  • oops!

  • Update lavaland.dm

  • work you fuck

  • donedonedoneeeeeee


Wednesday 2023-11-08 14:26:07 by BurgerLUA

Adds the WT-551, Unskyrats the WT-550 ammo (#655)

About The Pull Request

The WT-551

image

This adds the WT-551. A remade version of the WT-550 that is worse in every way. Fortunately, that means that it is balanced enough to be put in NanoTrasen armories.

Compared tot he WT-551, it is bulkier and slightly slower (0.3 second fire delay compared to 0.2). Additionally, it is commonly used with rubber-tipped rounds or FlatHead rounds, which are a special surplus of ammo that deals less damage and has no wounding, embedding, or penetrative power. Regular ammo can be purchased from cargo or researched later, with special ammo also being available later.

Note that this does not replace the WT-550.

FlatHead ammo

image

Flathead ammo deals 18 brute damage (compared to the original 20), and 5 stamina damage per hit. It is extremely weak against armor, has no embed chance, has virtually no wounding chance. It's perfect for cheap corporate companies dealing with cheaper personnel. This is the type of lethal ammo that security will use for the gun, unless someone speedruns weapon research.

Research Progression

image

Basic WT-550/WT-551 Ammunition.

Flathead rounds and Rubber rounds for the WT-550/WT-551 can be researched for 2500 points after unlocking the "Weapon Development Technology" node.

Advanced WT-550/WT-551 Ammunition.

Regular rounds and AP rounds for the WT-550/WT-551 can be researched can be researched for 5000 points after unlocking the "Advanced Weapon Development Technology" and "Basic WT-550/WT-551 Ammunition" nodes.

Illegal WT-550/WT-551 Ammunition.

Incendiary rounds for the WT-550/WT-551 can be researched can be researched for 7500 points after unlocking the "Illegal Technology", "Exotic Ammo" , and "Advanced WT-550/WT-551 Ammunition" nodes.

Syndicate Research

Removes the WT-550 ammo from syndicate research since it is now redundant.

Cargo

image

WT-551 rifles can be ordered in pairs (2) for the cost of a parrot, a grilling starter pack, or a crab rocket (1600 credits). This value was chosen because it is slightly higher than the thermal pistols, and the traitor-ordered WT-550 rifle pack (which contains lethal ammo + spare lethal ammo).

Additional FlatHead, Rubber, and Regular ammo can be ordered from cargo as well.

Cargo techs no longer get WT-550s in the mail, but instead WT-551s (why was this a thing holy shit).

Armory

image

2 WT-551s can be found in the armory. For balance purposes one (1) laser rifle was removed.

I hate Skyrat so much holy shit

image

Unfucks the WT-550 ammo types by removing their dumb names and changed caliber types.

Unfucks the WT-550 ammo in the ammo printer so that rubber rounds can be printed at T0 and everything else (except incendiary rounds) can be printed with the adv munitions disk.

The bullets for the WT-550 have been forcibly changed to /tg/ balance, which means that any and all future Skyrat PRs cannot touch the damage values for it (unless some fuckery occurs, idk).

Why It's Good For The Game

image

Security is in dire need of actual ballistics. /tg/ removed ballistics from security because of reasons I legitimately don't think are valid. It's also a huge balance concern for security not to have at least 1 ballistic weapon (other than the shotgun) because it doesn't stop antags from hoarding laser immunity or meds.

Also guns are cool.

Changelog

🆑 BurgerBB add: Adds the WT-551 rifle, a redesign of the WT-550 rifle that is balanced (citation needed) for security use. add: Makes WT-550 ammo researchable and orderable from cargo. Removes WT-550 ammo from syndicate research, and gives them their own categories. /🆑


Co-authored-by: StrangeWeirdKitten 95130227+StrangeWeirdKitten@users.noreply.github.com Co-authored-by: ReturnToZender donwest947@gmail.com


Wednesday 2023-11-08 15:54:16 by Sirius B

my fucking god please work fully for the love of god, i wanna go back and become a looser and play genshin ffs


Wednesday 2023-11-08 16:11:09 by zevo

Fixes rock sprites ingame [WHOOPS] (#2332)

About The Pull Request

Rocks were invisible in game due to a recently merged PR of mine. this is why we testmerge PRs! anyways this should fix them.

Adds flora and rock missing texture sprites to most flora files to prevent something like this from ever happening again.

Why It's Good For The Game

invisible things that block movement bad yeah. i want to fix my mistakes.

Changelog

🆑 fix: Most rocks are now visible again add: Most flora files now have missing texture sprites to make it easier to spot when something has gone wrong. /🆑


Wednesday 2023-11-08 16:30:10 by Seth Foster

feat(app): Update robots from USB flash drive (#13923)

  • feat(app-shell-odd): watch for USB drives

The Flex operating system automatically mounts the filesystems of well-formatted USB drives (FAT and ext4 and maybe ntfs but that's a bit iffy) to /media when those USB drives are inserted on the robot. In theory it will in fact do this for any kind of media that presents a filesystem interface.

To that end, add a node task that will use a node filesystem watch to keep an eye on /media, and

  • when something that looks like a USB drive (/media/sd\w\d+) appears, notify via redux actions
    • then enumerate all the files on it and notify those via redux actions
  • when something we were keeping an eye on disappears, notify via redux actions

The redux actions don't alter state and so don't need new reducers or selectors; they exist because it's a handy mechanism to talk between our components.

This code is very tightly coupled to the way the node fs interfaces work and so I don't see a lot of point in unit tests for it; it's almost entirely fs calls originating everything and providing all of the data, and all the complexity is from working around weirdnesses in those calls and in the underlying system. For instance,

  • There's a little bit of time in between when the fs watch on /media fires and when you can actually find the contents of the newly-present directory; if you readdir before that you'll get an empty list, so we wait a second
  • The node fs.watch interface looks very fully features but is absolutely chock-full of warnings about various features not being reliable. A lot of that unreliability is probably across systems and everything works as we expect on linux, but just in case we have a lot of fallbacks for if our callback doesn't get filepaths, etc
  • fix(app-shell-odd): handle errors in readstreams in http.post

We have our custom http interface that wraps around node-fetch that provides things like "doing your own read stream when posting a file", and "mapping everything into the promise interface", which is nice, but has an issue specifically for that read stream: we don't monitor errors on it. Read streams surface errors by emitting an 'error' event; we hook up a listener to that error event while we're creating the stream, but then we disconnect it. So if you have an error in the stream - for instance, you're reading from a file on a USB flash drive and the user unplugs the flash drive - then the error will never get surfaced.

Unfortunately the fix to this is a bit fiddly. We can hook up an error listener fine, but it needs to do something; specifically, it needs to turn the error from a callback into a promise rejection. That means it needs to have a promise to reject that has the same lifetime as the stream itself. http.post didn't provide that because it returns a whole big promise chain, and each time you move a link in that chain the old promise is gone and a new one happens, so we'd need to move the listener around.

Since promises are monadic, a better fix is to have post return a single promise and do all the promise chaining inside that promise; then, the read stream error handler can reject the outer promise directly, while relying on promises bubbling up rejections to preserve error handling capability for the promises in the internal chain.

  • fix(app): Poll for updates on the ODD

Though we have everything set up to automatically fetch, prompt for, and execute robot updates from the ODD, we weren't actually checking for those updates except once on boot (which then wouldn't work if the robot wasn't internet-connected during boot). This means in particular that the software updates during onboarding were guaranteed to fail.

We can use the same hook in the ODD app root that we do in the desktop app route, but if we're going to do that then we better remove a log message that suddenly becomes extremely spammy.

  • feat(app-shell-odd): Supply "system updates" from flash drives

Adds the capability to provide system updates from flash drives to the ODD app-shell.

These are "system updates" in that the app-shell determines their availability and provides it to the app, rather than the user indicating the presence of a file alongside their intent to update. The app-shell will advertise the flash drive updates in the same way it advertises internet-discovered updates, with a RobotUpdateInfo redux message; since those now provide the path to the file they mean, it will be easy for the app to specify the system update to load.

We can duplicate the logic that we use for system updates by adding a second let cache for the "current update"; the system-updates code will then prefer an update in the mass storage update cache to an update in the old system updates cache, and send new robot update info messages in all the state changes between neither cache being full; either cache being full; and both caches being full.

The determination that a flash drive system update is present is triggered by a mass storage enumerated message; when that flash drive gets removed, we'll get a removal message.

To figure out whether updates are actually present, we can the list of files that just got enumerated for things that end with .zip, and then try to open them as zip files and read the VERSION.json information out of them. This is a somewhat fraught process; the file could not be a zip file, it could be a zip file but corrupted, it could be a zip file but not an update, it could be an update but it's for an OT-2, and we need to handle all that, so there's a pretty excessive amount of error handling in here. Once we're sure that there are one or more zip files containing robot system updates, we can provide something to redux; we provide the highest-version update present.

There is one way in which updates from flash drives differ from system updates found on the internet, however: plugging in a flash drive requires user intent, while checking for updates on the internet doesn't. Therefore, if the user plugs in a flash drive with an update file, we always want to make that update file available no matter the relative versions of the robot and the update file. So we can add a bool to the system update message (and then to the update state) that shows that this is a "forced notification" update, and the app can know to display it without caring about the upgrade/downgrade/reinstall state.

Since there's a lot of duplication, we can also factor out some common logic to make it feel a little better.

That process of duplication also fixes a bug that would have prevented the ODD from ever prompting for updates. The function that gets information about updates used the same promise to read the release notes and provide the update information; but we overrode the downloaded release files to null out the release notes, meaning that promise would always fail, and we'd never get the notification. We no longer override the release notes to be null, and we also treat reading the release notes separately from reading the rest of the update.

  • feat(app): allow robot updates from USB files

Now that the odd app-shell provides us with notifications of updates from USB flash drives, we can allow the user to install them. While the redux mechanisms allow this pretty easily - a system update is a system update, after all, and with the force mechanism the app wouldn't even know if the update was a downgrade or anything - we ran into a problem where the general robot update machinery in the ODD was very tightly bound with the onboarding experience for the ODD, since that's the context in which it was developed.

This commit extracts the robot update mechanisms from onboarding by

  • Hoisting onboarding-related logic out of lower level components and instead injecting that logic into the organisms code from the top level page
  • Moving the current update page to a new one that is focused on onboarding at a new route, and copying just the update-related code to a generic RobotUpdate page

This means that the two pages - RobotUpdate and RobotUpdateDuringOnboarding - share most of the same code but are bound to different routes and can have different top level behavior by injecting different contexts to the finish and error handling behaviors of the update. RobotUpdateDuringOnboarding sets the unfinished onboarding page breadcrumbs appropriately, and uses display language appropriate to the update being just a component of the larger workflow, and moves on to estop handling when cancelled; RobotUpdate doesn't touch any of that, and goes back to the settings page when cancelled, and uses wording more appropriate to being its own topline flow.

Closes RAUT-829


Wednesday 2023-11-08 16:35:02 by san7890

Fixes Shaving Beards + Mirror Code Improvement (#79529)

About The Pull Request

Fixes #79519

Basically we did a lot of assumptions that we really shouldn't do in the whole magical mirror framework (like having a boolean value for magical mirrors, what?). Anyways, I just made the UX experience a lot better when it came to bearded persons with feminine physiques to easily shave off their beard with an additional confirmatory prompt + details as well as keeping the nature of the magical mirror (giving you a swagadocious beard due to magic:tm:) intact.

Why It's Good For The Game

There was a lot of convoluted code that skipped through the quality filter checks (it was me i think) so let's both make the code far easier to grasp as well as ensure that people who legitimately acquire beards and wish to keep them, keep them.

We were also doing some FUCK shit on attack_hand and the like (overriding a FALSE return signal to return TRUE is not what we should be doing there)- so that's also cleaned up.

Changelog

🆑 fix: Both magic mirrors and regular mirrors are far better at respecting the choice of the beard you wish to wear (within reason, of course). /🆑


Wednesday 2023-11-08 17:33:59 by Architect

I HATE FUCKING CSS. ROT IN HELL CSS!!!!!. Deleted styles for head links, because of stupid React NavLink isActive (which actualy object with properties and that's not specified in DOCUMENTATION) and conflicts with other styles I cannot add highlighting, btw my way to make highlighting is a shit.


Wednesday 2023-11-08 17:47:02 by Jeff King

commit: give a hint when a commit message has been abandoned

If we launch an editor for the user to create a commit message, they may put significant work into doing so. Typically we try to check common mistakes that could cause the commit to fail early, so that we die before the user goes to the trouble.

We may still experience some errors afterwards, though; in this case, the user is given no hint that their commit message has been saved. Let's tell them where it is.

Signed-off-by: Jeff King peff@peff.net


Wednesday 2023-11-08 18:03:24 by Samuzero15

New turret sprites, Patcher gun! and stuff

(4 - 16 / 1 / 2022) *) Now the tracing of the Plasma Rifle advanced, will now stop chasing, after 5 seconds of flight. *) Now the Hell-trigger powerup is added on the shop. *) Re-factorized the weapon names and descriptions, creating the Language.weapons file. !+) Finally a new we-TOOL, yes tool! The Patcher Gun! // Fix turrets/dispensers/drones paying 10 credits per valid shot! // Shows the buildings health at the aim of this tool! // Also is cappable of stunning enemies! // 5% chance for Experience Point for each turret fix! +) New sprites for the bullet, plasma, rocket and shotgun turrets! // Im fixing the dissaperance of the turret bases and other bugs with these sprites, Im tired. +) New sprites for the Temperance rune!


Wednesday 2023-11-08 18:20:02 by oscar

[Eval] Add Chinese Homophonic (#1169)

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.

Eval details 📑

Eval name

Understand Chinese Homophonic

Eval description

We have found some popular homophonic sentences on the Internet, including the Chinese pronunciation of English words and homophones, and provide several options for the model to determine which option matches the homophonic sentence the best.

What makes this a useful eval?

Chinese homophonic puns are a widely popular internet cultural phenomenon that generates humor by utilizing the homophonic relationships between Chinese characters. These puns are typically spread in text form on social media, forums, and messaging applications, and they are extremely common in China's online culture.

Homophonic puns have a wide range of applications, encompassing ordinary daily life scenarios as well as hot news events, entertainment gossip, and political current affairs. These puns frequently appear in internet memes, jokes, advertising slogans, and short videos, garnering significant popularity among young people and internet users.

For those unfamiliar with them, homophonic puns may seem like encrypted text, making it difficult to grasp the true intention behind them. However, understanding them allows for the establishment of strong connections between individuals and facilitates smooth communication.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • Include at least 15 high-quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

Insert what makes your eval high quality that was not mentioned above. (Not required)

Eval structure 🏗️

Your eval should

  • Check that your data is in evals/registry/data/{name}
  • Check that your YAML is registered at evals/registry/evals/{name}.yaml
  • Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

  • I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.

Submit eval

  • I have filled out all required fields of this form
  • I have used Git LFS for the Eval JSON data
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"一天小鸭对小鸡表白:小鸡,我爱你。小鸡:你duck不必。这句话中的\"duck\"是什么意思?\nA. 鸭子\nB. 大可"}],
"ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"丑的人才有对象,美的卖空调。这句话中的\"美的\"是什么意思?\nA. 漂亮的\nB. 空调公司"}], "ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"我是一只小绵羊,我今天剪毛了,我失绵了。这句话中的\"失绵\"表达意思?\nA. 失眠\nB. 没有了羊毛"}], "ideal":
["A"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"以后我的吉祥物决定就是你了,螃蟹!——因为,你有钱(钳)。这句话中的\"\"是什么意思?\nA. 有钱\nB. 螃蟹的钳子"}],
"ideal": ["A"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"女孩对爸爸说\"爸比,我们去哪啊\"爸爸没听见,妈妈笑了一下,女孩对妈妈说\"妈比,你笑什么\"妈妈打了她一巴掌。妈妈为什么打她?\nA.
她提出了不合理的要求\nB. 她骂人了"}], "ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"天气这么热,我们总会熟的。这句话中的\"熟的\"是什么意思?\nA. 热熟了\nB. 熟悉了"}], "ideal": ["B"]}
{"input": [{"role": "system", "content": "The following are multiple
choice questions (with answers) about Chinese homonym. Answer the
question with english letter \"A\", \"B\" only, without explanation.
Reply with only the option letter."}, {"role": "user", "content":
"我好像胖了,没事我陪你减肥,我们戒荤叭。这句话中的\"戒荤\"是什么意思?\nA. 吃素食\nB. 结婚"}], "ideal":
["B"]}

Co-authored-by: oscar oscar@hellotalk.com


Wednesday 2023-11-08 18:20:02 by Juyeon Yoon

Add Korean honorific sentence classification eval (#1181)

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.

Eval details 📑

Eval name

korean-honorific

Eval description

Evaluates LLMs on the task of classifying Korean honorific/non-honorific sentences.

What makes this a useful eval?

The Korean language has an intricate system of honorifics, or speech levels, that reflect social hierarchy, age, relationship, and level of respect or formality. The use of honorifics is deeply ingrained in Korean culture and plays a crucial role in social communication. Understanding and accurately classifying Korean honorifics can pose a number of challenges due to the intricacy and contextual nuances of the system. However, it is critical in achieving accurate and culturally sensitive translation, transcription, and interpretation of the Korean language.

Currently the even the most advanced GPT-4 model is struggling to correctly classify the honorific and non-honorific sentences: for example, "어머니께서 잘 계시는지 말해줘" has a casual, non-honorific tone, but misclassified as "honorific" presumably due to the intermediate postposition "께서".

Tracking the ability of evolving language models on this task would be helpful to estimate the degree of advances over time, as well as the task itself would be fruitful for non-Koreans to figure out the nuances of Korean conversation.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • Include at least 15 high-quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

Insert what makes your eval high quality that was not mentioned above. (Not required)

Eval structure 🏗️

Your eval should

  • Check that your data is in evals/registry/data/{name}
  • Check that your YAML is registered at evals/registry/evals/{name}.yaml
  • Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

  • I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.

Submit eval

  • I have filled out all required fields of this form
  • I have used Git LFS for the Eval JSON data
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "그분이 잘 계시는지 물어봐
줘."}], "ideal": "non-honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "이 공원에서 자주
걷습니다."}], "ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "자주 드시나요?"}],
"ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "아니요, 접점은 없지만
개인적으로 관심이 있습니다."}], "ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "당신의 취미가
무엇인가요?"}], "ideal": "honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "꼭 모으길 바랄게."}],
"ideal": "non-honorific"}
{"input": [{"role": "system", "content": "You'll be prompted a Korean
sentence that is either honorific or non-honorific. Identify whether the
given one is honorific or not. If you think it is honorific, type
'honorific'. If you think it is not honorific, type 'non-honorific'. Do
not type anything else."}, {"role": "user", "content": "그러면 나도
준비해야겠다."}], "ideal": "non-honorific"}

Wednesday 2023-11-08 18:20:02 by Chen Zhao

[Eval] Chinese lantern riddles (#1176)

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.

Eval details 📑

Eval name

chinese-lantern-riddles

Eval description

This evaluation tests the model's performance in solving Chinese lantern riddles, which are based on the shape, pronunciation, and meaning of Chinese characters.

What makes this a useful eval?

Lantern riddles are a traditional Chinese festive activity that involves multiple participants guessing riddles together. Apart from being a part of festival celebrations, lantern riddles can also serve as an educational tool to help Chinese language learners enhance their vocabulary and language reasoning. Through the process of unraveling the riddles, students can also develop their logical thinking and reasoning skills, as well as nurture their imagination and creativity. Lantern riddles can also spark students' interest in language learning and make the learning experience more enjoyable.

Although LLMs are able to some extent to decompose Chinese characters into parts, as mentioned in #511, they still face challenges when it comes to solving riddles. In most cases, GPT 3.5 cannot reason correctly about the structure of Chinese characters. For instance, the riddle "上下一体(打一字)" can be interpreted as a combination ("一体") of "上" and "下", resulting in the answer "卡". However, GPT 3.5 gives the wrong answer, "升", with a reason that makes no sense. A similar situation occurs when GPT 3.5 reasons about the pronunciation of Chinese characters, with one of its explanations stating that the pronunciation of "盼(pàn)" is similar to the pronunciation of "俄(é)", which is entirely incorrect. However, on the positive side, GPT 3.5 shows good performance when a riddle only encodes meaning and does not require reasoning about the structure and pronunciation.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • Include at least 15 high-quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

Insert what makes your eval high quality that was not mentioned above. (Not required)

Eval structure 🏗️

Your eval should

  • Check that your data is in evals/registry/data/{name}
  • Check that your YAML is registered at evals/registry/evals/{name}.yaml
  • Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

  • I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.

Submit eval

  • I have filled out all required fields of this form
  • I have used Git LFS for the Eval JSON data
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n一撇(打一字)。"}], "ideal": [""]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n内里有人(打一字)。"}], "ideal":
[""]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n二三四五六七八九(打一成语)。"}], "ideal":
["缺衣少食"]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n谜底在山东(打一国家名)。"}], "ideal":
["秘鲁"]}
{"input": [{"role": "user", "content":
"以下灯谜的谜底是什么(请从汉字的形、音、意等角度考虑)?请给出答案,并给出依据。\n身穿红衣,常年哨放,遇紧急事,往火里闯(打一日常用品)。"}],
"ideal": ["灭火器"]}

Wednesday 2023-11-08 18:20:02 by robin luo

[eval] Chinese Idioms evulation (#1163)

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.

Eval details 📑

Eval name

chinese_idioms

Eval description

Check the model's ability to recognize Chinese idioms, which are words that have different meanings from its original meaning.

What makes this a useful eval?

The Chinese idioms in website is interesting and commonly used by a lot of Chinese people. However, the GPT4 and GPT3.5 fail to explain the meaning of the idioms with a correct explanation.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • [x ] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • [ x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • [ x] Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • [ x] Include at least 15 high-quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

Insert what makes your eval high quality that was not mentioned above. (Not required)

Eval structure 🏗️

Your eval should

  • [ x] Check that your data is in evals/registry/data/{name}
  • [ x] Check that your YAML is registered at evals/registry/evals/{name}.yaml
  • [ x] Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • [x ] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

  • [ x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • [ x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.

Submit eval

  • [ x] I have filled out all required fields of this form
  • [x ] I have used Git LFS for the Eval JSON data
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n伟光正"}], "ideal": ["From the idiomatic phrase
'the great, glorious and correct Chinese Communist Party', it can also
refer to a person associated with the Chinese Communist Party."]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n赵家人"}], "ideal": ["From Lu Xun's famous
middle-grade novel 'A Q Zhengzhuan', it generally refers to the powerful
and noble class of the Chinese Communist Party. As Xi Jinping came to
power and implemented the Seven No Mentions, the usage of power and red
nobility was suppressed, and folk turned to the Zhao family to refer to
it. Derivations include calling the People's Republic of China 'Zhao'
and Xi Jinping, the current General Secretary of the CPC Central
Committee, 'King Zhao', or replacing the word 'people' with the word
'Zhao family' in the names of various Chinese organs and media
propaganda"]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n改开党/特色党"}], "ideal": ["The term 'Mao Left' is
commonly used by the civil left and Maoist supporters, which originated
from Deng Xiaoping's 'reform and opening up' and 'socialism with Chinese
characteristics'. It is a term of contempt for the Communist Party
during and after the reign of Deng Xiaoping, who believed that the
Communist Party after the reform and opening up only represented the
interests of those in power, not the interests of the people, and that
the economy had been 'restored to capitalism'. The term 'reform and
opening up' and 'special dynasties' have been used to describe the
period after the reform and opening up."]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n黄丝/黄尸"}], "ideal": ["The term refers to
non-establishment camps such as the pro-democracy camp and the local
camp in Hong Kong, as well as those who support their stance, and is
named after the yellow ribbon used as a symbol by non-establishment
camps during the 2014 occupation. Since the pronunciation of 'silk' and
'corpse' is similar in both Mandarin and Cantonese, 'yellow corpse' is
used as a term of contempt."]}
{"input": [{"role": "user", "content":
"请解释下面词语的意思,请使用英文回答。\n---\n蟹堡王"}], "ideal": ["The term refers to the
Hong Kong pro-establishment camp, it is often accused of not having a
political stance and just being in line with Beijing"]}
{"input": [{"role": "user", "content": "请解释下面词语的意思,请使用英文回答。\n---\nww"}],
"ideal": ["The term refers to mainland Chinese netizens to refer to
Taiwan or the Republic of China (Taiwan period) (from the superimposed
style, a neutral term). In January 2022, Taiwan Affairs Office
spokesperson Zhu Fenglian said that the word Wanwan is a nickname for
the Taiwanese people 'Mengmeng' by the Chinese mainlanders"]}

Wednesday 2023-11-08 18:20:02 by jjyuhub

Ordering Randomised VersionList (#1164)

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.

Eval details 📑

Eval name

Ordering Randomised VersionList

Eval description

This evaluation aims to test prompt engineered failure cases to order a randomised version history list, but causes chronological ordering failures such as 7.5.2 -> 7.4.2 -> 7.5.1 -> 7.4.1 (incorrectly inserted 7.4.2 in between 7.5.2 and 7.5.1 and incorrectly skipping over the major release version 7.5.0 in the Explainable AI chain of thoughts) and 7.5.2 -> 7.5.1 -> 7.5.0 -> 7.4.1 (incorrectly skipped over 7.4.2 in the Explainable AI chain of thoughts).

What makes this a useful eval?

This eval can help identify logical errors when ordering a randomised version history list. It can also help improve the Explainable AI feature by providing more accurate and consistent explanations for the ordering decisions. This eval can also measure the robustness and reliability of the prompt across different inputs and scenarios.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • Include at least 15 high-quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

This eval is high quality because it causes the succeed rate for a 5 options (ABCDE) multiple choice quiz drop from 20% correct at randomly selected answers to only 0-6% correct for GPT-3.5-Turbo. These are prompt engineered failures, causing bigger failure rates than prior work, as performing so much worse than random is unnatural for such a super easy task.

Eval structure 🏗️

Your eval should

  • Check that your data is in evals/registry/data/{name}
  • Check that your YAML is registered at evals/registry/evals/{name}.yaml
  • Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

  • I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.

Submit eval

  • I have filled out all required fields of this form
  • I have used Git LFS for the Eval JSON data
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.0 Release Date: December 02, 2019 Version 7.4.1 Release
Date: October 23, 2019 Version 7.5.1 Release Date: December 18, 2019
Version 7.5.2 Release Date: January 21, 2020 Version 7.4.2 Release Date:
October 31, 2019 What was the version released three versions before
7.5.2? A. 7.4.2 B. 7.5.2 C. 7.5.1 D. 7.4.1 E. 7.5.0"}],"ideal":"A.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.2 Release Date: January 21, 2020 Version 7.4.1 Release Date:
October 23, 2019 Version 7.5.0 Release Date: December 02, 2019 Version
7.4.2 Release Date: October 31, 2019 Version 7.5.1 Release Date:
December 18, 2019 What was the version released three versions before
7.5.2? A. 7.5.2 B. 7.5.1 C. 7.4.1 D. 7.4.2 E. 7.5.0"}],"ideal":"D.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.1 Release Date: December 18, 2019 Version 7.5.0 Release
Date: December 02, 2019 Version 7.4.1 Release Date: October 23, 2019
Version 7.5.2 Release Date: January 21, 2020 Version 7.4.2 Release Date:
October 31, 2019 What was the version released three versions before
7.5.2? A. 7.5.0 B. 7.4.2 C. 7.5.1 D. 7.4.1 E. 7.5.2"}],"ideal":"B.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.5.0 Release Date: December 02, 2019 Version 7.5.1 Release
Date: December 18, 2019 Version 7.4.2 Release Date: October 31, 2019
Version 7.4.1 Release Date: October 23, 2019 Version 7.5.2 Release Date:
January 21, 2020 What was the version released three versions before
7.5.2? A. 7.5.1 B. 7.4.1 C. 7.5.2 D. 7.5.0 E. 7.4.2"}],"ideal":"E.
7.4.2"}
{"input":[{"role":"user","content":"Here's a list of software versions:
Version 7.4.2 Release Date: October 31, 2019 Version 7.5.1 Release Date:
December 18, 2019 Version 7.5.0 Release Date: December 02, 2019 Version
7.5.2 Release Date: January 21, 2020 Version 7.4.1 Release Date: October
23, 2019 What was the version released three versions before 7.5.2? A.
7.4.1 B. 7.5.2 C. 7.4.2 D. 7.5.0 E. 7.5.1"}],"ideal":"C. 7.4.2"}
  • The task of ordering a randomised version history list is relatively simple and straightforward for humans, but the AI system fails to follow the basic rules of chronological ordering.
  • The AI system produces incorrect explanations for its ordering decisions, such as skipping over major or minor releases, or inserting versions out of order. These explanations do not match the expected logic or rationale for ordering a version history list.
  • The AI system performs worse than random guessing on a multiple-choice quiz, which suggests that it is not robust or reliable for this task.

Co-authored-by: jjyuhub tdq459rcfm@privaterelay.appleid.com


Wednesday 2023-11-08 18:20:02 by Lorenzo

[Eval] Determine a gear rotation given a layout (#1136)

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.

Eval details 📑

Eval name

gears_rotation

Eval description

Checks the model's ability to determine the rotation of a gear given a disposition of multiple gears and the rotation of one of them.

What makes this a useful eval?

Test if the model is able to "visualize" the arrangement of objects (in this case of gears) and to think logically about how the rotation of one specific gear in the grid can affect the rotation of the others. Gpt3.5 had an accuracy of 0.16 (4/25 right). Gpt4 (chatgpt plus subscription) seems to fail in the same places as 3.5. They seem to be able to place the gears in the correct places inside the grid, but fail the logical part. Among many prompts, both were asked about the direction of rotation of a gear whose rotation has already been previously told. However, they still got it wrong.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • Include at least 15 high-quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

Insert what makes your eval high quality that was not mentioned above. (Not required)

Eval structure 🏗️

Your eval should

  • Check that your data is in evals/registry/data/{name}
  • Check that your YAML is registered at evals/registry/evals/{name}.yaml
  • Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

  • I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.

Submit eval

  • I have filled out all required fields of this form
  • I have used Git LFS for the Eval JSON data
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, each
placed to the left of the other: A-B-C. If gear C is running clockwise,
how is gear C running? Answer clockwise or counterclockwise. No
explanations."}], "ideal": "Clockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, each
placed one beneath the other: A-B-C. If gear C is running clockwise, how
is gear B running? Answer clockwise or counterclockwise. No
explanations."}], "ideal": "Counterclockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, placed
diagonally in a 3x3 grid: A-B-C. Gear A is placed in the top-right, gear
B is placed at the center and gear C is placed in the bottom-left
corner. They don't touch each other. If gear C is running clockwise, how
is gear A running? Answer clockwise, counterclockwise or \"not moving\".
No explanations."}], "ideal": "not moving"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotation direction of gears that are of the same
dimensions."}, {"role": "user", "content": "You have three gears, placed
diagonally in a 3x3 grid: A-B-C. Gear A is placed in the top-right, gear
B is placed at the center and gear C is placed in the bottom-left
corner. If gear C is running clockwise, how is gear A running? Answer
clockwise, counterclockwise or \"not moving\". No explanations."}],
"ideal": "Clockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotational direction of gears that are of the same
dimensions. Your working area is a 4x4 grid. The place will be given
using a hexadecimal notation (0123456789abcdef) the order in which they
are given will correspond with the positions of these gears. The first 4
char are placed on the first row from left to right. The 5th to 8th on
the second row always from left to right and so on. You will be also
given the rotation of a specific gear. You must be able to determine how
another specific gear is rotating. Answer clockwise or counterclockwise.
No explanations. Don't even use punctuation."}, {"role": "user",
"content": "a1c5e8f7932b46d0, gear 4 is running clockwise. How about
gear d?"}], "ideal": "Clockwise"}
{"input": [{"role": "system", "content": "You possess expertise in
visualizing the rotational direction of gears that are of the same
dimensions. Your working area is a 3x3 grid. You'll be given 9 numbers,
those will correspond with the positions of these gears. The first 3
given numbers are placed on the first row from left to right. The 4th to
6th are on the second row, always from left to right. Same with the
7th-9th on the third row. You will be also given the rotation of a
specific gear. You must be able to determine how another specific gear
is rotating. Answer clockwise or counterclockwise. No explanations.
Don't even use punctuation."}, {"role": "user", "content": "572913864,
gear 2 is rotating counterclockwise. How is number 7 rotating?"}],
"ideal": "Clockwise"}

Wednesday 2023-11-08 18:20:02 by Sean Bird

Simple block puzzles (#1167)

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access be granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject it since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Also, please note that we're using Git LFS for storing the JSON files, so please make sure that you move the JSON file to Git LFS before submitting a PR. Details on how to use Git LFS are available here.

Eval details 📑

Eval name

Simple 2-Block Arrangement Puzzles

Eval description

Two Tetris shapes are given and a desired arrangement of those shapes is given. The model must arrange the blocks to match the desired shape outline.

Here's an example of what a prompt/answer would look like:

image

What makes this a useful eval?

This kind of spatial reasoning is trivial for a human to do. It should also be a piece of cake for a generally-intelligent AI model.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • Include at least 15 high-quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

This eval was programatically generated and thus can easily be tweaked to be more difficult, to test different aspects of spatial reasoning, or to generate more cases. I wrote a script to generate this eval that anyone can come in and adjust.

Eval structure 🏗️

Your eval should

  • Check that your data is in evals/registry/data/{name}
  • Check that your YAML is registered at evals/registry/evals/{name}.yaml
  • Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

  • I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access be granted.

Submit eval

  • I have filled out all required fields of this form
  • I have used Git LFS for the Eval JSON data
  • (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nF \nFF\n F\n\n U\nUUU\n\n\nPlease create:\n\n
XX \nXXXXX \n X\n\nReplacing the 'X's with the corresponding letter of
the shape that should occupy each position. Only respond with the final
shape, no commentary."}], "ideal": " UF \nUUUFF \n F"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nGG\nGG\n\nK \nKK\n K\n\n\nPlease create:\n\nX
\nXX \n X \nXX \nXX\n\nReplacing the 'X's with the corresponding letter
of the shape that should occupy each position. Only respond with the
final shape, no commentary."}], "ideal": "K \nKK \n K \nGG \nGG"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nLLL\n L \n\n F\nFF\n F\n\n\nPlease create:\n\n
XXXX \nXX X \n X\n\nReplacing the 'X's with the corresponding letter of
the shape that should occupy each position. Only respond with the final
shape, no commentary."}], "ideal": " FLLL \nFF L \n F"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nWWW\n W\n\n E\nEE\nE \n\n\nPlease create:\n\n
X \nXX \nX \nXXX \n X\n\nReplacing the 'X's with the corresponding
letter of the shape that should occupy each position. Only respond with
the final shape, no commentary."}], "ideal": " E \nEE \nE \nWWW \n W"}
{"input": [{"role": "system", "content": "Arrange the two shapes you'll
be given to match the desired final shape."}, {"role": "user",
"content": "It's time to play a shape game! Your goal is to use arrange
shapes you'll be given into a predefined form. If you can arrange them
into the final form, you win! You may not rotate the shapes. Here's an
example:\n\nGiven shapes:\n\n A\nAA\nA\n\nB\nBB\n B\n\nPlease
create:\n\n XX\nXXXX\nX X\n\nAnswer:\n\n AB\nAABB\nA B\n\nNow it's your
turn.\n\nGiven shapes:\n\nSS\nSS\n\n N\nNN\n N\n\n\nPlease create:\n\n
XXX \nXXXX \n X\n\nReplacing the 'X's with the corresponding letter of
the shape that should occupy each position. Only respond with the final
shape, no commentary."}], "ideal": " NSS \nNNSS \n N"}

Wednesday 2023-11-08 18:22:56 by Dwayne Pryce

THE GRAND RENAMING HAS BEGUN (#481)

  • THE GRAND RENAMING HAS BEGUN but holy crap it still doesn't work because of some nbsphinx thing that I don't know how to even begin troubleshooting

  • Update .github/PULL_REQUEST_TEMPLATE.md

I am the goo0dest typer

Co-authored-by: Benjamin Pedigo benjamindpedigo@gmail.com

  • Update README.md

Co-authored-by: Benjamin Pedigo benjamindpedigo@gmail.com

  • Make the build status badge less obnoxious

  • Made a sentence actually make sense

  • Ah the last merge from dev must have overwritten some of the changes I made. This should be fixed now.

  • Found another instance of graspy in the issue template

  • Some last second changes, including a fix to the utils init file because the all value was being populated by identifier names not string representations of those identifier names

  • I approve of black hating the single quotes for a string because I also hate it but it's still pythonic even if I wish it weren't so

Co-authored-by: Benjamin Pedigo benjamindpedigo@gmail.com


Wednesday 2023-11-08 18:22:56 by Dwayne Pryce

Suitably dynamic versioning (#467)

  • Suitably dynamic versioning

The following versioning code bypasses a few problems with python module versions. The following scenarios are plausible:

  • A user clones graspologic and runs pip install -r requirements.txt then executes python in the project directory, accessing the graspologic library by python's local folder structure.
  • A users clones graspologic and runs python setup.py install in the environment of their choice, accessing the graspologic library either by the local folder structure or the .egg in their site-packages, depending on their current working directory.
  • A user clones no repository and wants to install the library solely via pip via the pip install ... command, which has 2 wings to consider:
    • The user wishes to try out the latest prerelease, which is going to be published with a X.Y.ZdevYYYYMMDDBUILDNUMBER style version and can be installed via pip install graspologic --pre
    • The user wishes to try out the latest release, which will be published as X.Y.Z.

This PR supports those 4 cases (notably, it does not support pip install . from the root project directory, which does some super weird stuff and I gave up on trying to solve it a long time ago)

The concept is this: the actual version upon a build action, which can be undertaken by:

  • CI building a snapshot build
  • CI building a release build
  • Local user building a local build

These states all require the same thing: a materialized version in a file. This version should be created at the time of this build action.

In the case of CI, we can populate the file in our CI build process and move on. It's the case of not being in CI where we need to consider what to do next, which leaves Local user building a local build (and local user using the filesystem as the library).

In these cases, the solution is the following: if we have a populated version.txt file, we use it. If we do not, we materialize a new version based on the __semver in version.py and the current time in YYYYMMDDHHmmSS format. This means that if you are running on the filesystem, and you say import graspy; print(graspy.__version__);, it will actually tell you the version is 0.1.0dev20200926120000 as an example. However, when you close the interpreter and do it again, it will tell you that the version is 0.1.0dev20200926120500 - because it will create a version for you at the time of import.

However, if you were to run python setup.py install, the setup.py file actually takes it on itself to either get a version number or use the materialized version described above, then to write it to version.txt. Which means that installing the graspologic library from setuptools will actually lock in the version number in perpetuity.

Gotchas

  • version.txt must always be empty in the source tree
  • pip install . does some weird thing where it registers an entry in site-packages that is like a symlink to the local filesystem anyway so it doesn't actually make an egg which means you get a new version each time and I gave up caring at this point since we got the three primary use cases: developers, users of pre-releases, and users of releases all covered. Users who install by cloning and running pip install are just going to get a weird behavior that probably isn't that important to track down, and regardless they'll get a clear X.Y.Zdev<timestamp> in their graspologic.__version__ which is enough for us to go on if there are any issues raised.
  • My testing resulted in filling this file and committing it, like I said not to do

  • Updated conf.py for sphinx to be able to find a version it likes. Or kinda likes. Maybe likes?

  • Forgot I had to add the recursive-include for the version file.

  • Making black happy


Wednesday 2023-11-08 19:06:34 by Mar Guglielmi

Changed name of floor to what its supposed to be (sorry if that fucked shit up) Also added wall texture


Wednesday 2023-11-08 19:44:24 by Michael Schurter

identity: default to RS256 for new workload ids (#18882)

OIDC mandates the support of the RS256 signing algorithm so in order to maximize workload identity's usefulness this change switches from using the EdDSA signing algorithm to RS256.

Old keys will continue to use EdDSA but new keys will use RS256. The EdDSA generation code was left in place because it's fast and cheap and I'm not going to lie I hope we get to use it again.

Test Updates

Most of our Variables and Keyring tests had a subtle assumption in them that the keyring would be initialized by the time the test server had elected a leader. ed25519 key generation is so fast that the fact that it was happening asynchronously with server startup didn't seem to cause problems. Sadly rsa key generation is so slow that basically all of these tests failed.

I added a new testutil.WaitForKeyring helper to replace testutil.WaitForLeader in cases where the keyring must be initialized before the test may continue. However this is mostly used in the nomad/ package.

In the api and command/agent packages I decided to switch their helpers to wait for keyring initialization by default. This will slow down tests a bit, but allow those packages to not be as concerned with subtle server readiness details. On my machine rsa key generation takes 63ms, so hopefully the difference isn't significant on CI runners.

TODO

  • Docs and changelog entries.
  • Upgrades - right now upgrades won't get RS256 keys until their root key rotates either manually or after ~30 days.
  • Observability - I'm not sure there's a way for operators to see if they're using EdDSA or RS256 unless they inspect a key. The JWKS endpoint can be inspected to see if EdDSA will be used for new identities, but it doesn't technically define which key is active. If upgrades can be fixed to automatically rotate keys, we probably don't need to worry about this.

Requiem for ed25519

When workload identities were first implemented we did not immediately consider OIDC compliance. Consul, Vault, and many other third parties support JWT auth methods without full OIDC compliance. For the machine<-->machine use cases workload identity is intended to fulfill, OIDC seemed like a bigger risk than asset.

EdDSA/ed25519 is the signing algorithm we chose for workload identity JWTs because of all these lovely properties:

  1. Deterministic keys that can be derived from our preexisting root keys. This was perhaps the biggest factor since we already had a root encryption key around from which we could derive a signing key.
  2. Wonderfully compact: 64 byte private key, 32 byte public key, 64 byte signatures. Just glorious.
  3. No parameters. No choices of encodings. It's all well-defined by RFC 8032.
  4. Fastest performing signing algorithm! We don't even care that much about the performance of our chosen algorithm, but what a free bonus!
  5. Arguably one of the most secure signing algorithms widely available. Not just from a cryptanalysis perspective, but from an API and usage perspective too.

Life was good with ed25519, but sadly it could not last.

IDPs, such as AWS's IAM OIDC Provider, love OIDC. They have OIDC implemented for humans, so why not reuse that OIDC support for machines as well? Since OIDC mandates RS256, many implementations don't bother implementing other signing algorithms (or at least not advertising their support). A quick survey of OIDC Discovery endpoints revealed only 2 out of 10 OIDC providers advertised support for anything other than RS256:

RS256 only:


Wednesday 2023-11-08 19:58:01 by jinuthomas

Catching File Exceptions in openpower-vpd-parser

In this commit, I have added code to handle file exceptions more effectively. By implementing proper exception handling, we can improve the robustness and reliability of the file operations within our codebase.

Here are the key changes made in this commit:

  • Introduced a try-catch block around the file operation sections.
  • Within the try block, added code to perform the necessary file operations.
  • Implemented catch blocks to handle specific file exceptions.
  • In each catch block, included appropriate error handling logic, such as logging the error message or displaying a user-friendly error message.
  • Ensured that the catch blocks gracefully handle the exceptions and prevent the program from crashing or behaving unexpectedly.

By adding this exception handling code, we can anticipate and handle potential file-related errors gracefully, providing a smoother experience for users and preventing any unexpected crashes or data loss. This would also aid in debugging issues.

Change-Id: I621a7f0ba68d2c298e4fea0a9d3e21d1939cd090 Signed-off-by: jinuthomas jinu.joy.thomas@in.ibm.com


Wednesday 2023-11-08 20:03:56 by Unit0016

Call Me Jane Bethesda Because This Is Some Environmental Storytelling

Good god man, the tooling for testing ruins is super unfriendly. Map template place is really not the kind of tool I should be relying on. Also, did you know that /ruin/powered doesn't mean it SUPPORTS power, but means that it actually has MAGIC power? what the fuck?? Who thought this was okay and who HURT them..


Wednesday 2023-11-08 20:04:26 by parvizmalik

Add files via upload

Here is an intuitive explanation of the jump algorithm, broken down into steps. This explanation aims to be memorable so that you can recall the logic at a glance:

Algorithm: Jump Game (Finding Minimum Jumps to Reach the End)

Imagine you're on a track with numbered tiles lined up in a row. Each number tells you the maximum number of tiles you can leap forward from that tile. Your goal is to reach the end of the track in as few jumps as possible.

Here's the strategy in memorable steps:

  1. Starting Block:

    • You're standing on the first tile, ready to start jumping.
  2. Look Ahead:

    • Before you jump, you check how far you could potentially leap from your current position and all the positions up to where you'd land. This is your scanning phase.
  3. Plan Your Jump:

    • Now, you decide where to land by picking the tile that gives you the furthest reach on your next jump. That doesn't mean you jump there directly. It just means you know it's the best tile to aim for.
  4. Leap of Faith:

    • You make the jump, but only to the next tile. You're not actually leaping all the way to the furthest tile you spotted. It's a controlled, one-tile hop.
  5. Counting Hops:

    • Every time you reach the furthest tile you previously noted, you count a hop. This is because you've committed to a sequence that you predicted would carry you forward optimally.
  6. Repeat:

    • You keep doing this—scanning, planning, hopping, and counting—until you're about to land on or pass the final tile.
  7. Finish Line:

    • When you've made the jump that takes you to or beyond the last tile, you've finished the game, and the number of hops you've counted is the minimum needed to get there.

Key Points to Remember:

  • Scan and Plan: Always look ahead and plan your jumps based on potential future leaps, not just the immediate next step.
  • Incremental Hops: You move forward one tile at a time, but you're always thinking several tiles ahead.
  • Count Wisely: You only count a hop when you've landed on the furthest tile you planned to reach from your last count.
  • Optimize Each Step: Every step you take is calculated to extend your reach, ensuring efficiency.

By keeping these memorable steps and key points in mind, you should be able to recall the essence of the jump algorithm anytime you need to.


Wednesday 2023-11-08 21:34:28 by Sebastian Markbåge

Add Server Context deprecation warning (#27424)

As agreed, we're removing Server Context. This was never official documented.

We've found that it's not that useful in practice. Often the better options are:

  • Read things off the url or global scope like params or cookies.
  • Use the module system for global dependency injection.
  • Use React.cache() to dedupe multiple things instead of computing once and passing down.

There are still legit use cases for Server Context but you have to be very careful not to pass any large data, so in generally we recommend against it anyway.

Yes, prop drilling is annoying but it's not impossible for the cases this is needed. I would personally always pick it over Server Context anyway.

Semantically, Server Context also blocks object deduping due to how it plays out with Server Components that can't be deduped. This is much more important feature.

Since it's already in canary along with the rest of RSC, we're adding a warning for a few versions before removing completely to help migration.


Co-authored-by: Josh Story josh.c.story@gmail.com


Wednesday 2023-11-08 21:53:31 by san7890

Fixes Space Dragon Attacking (#78964)

Fixes #78953

About The Pull Request

Basically the gist is that Space Dragon's special attack code was on AttackingTarget() rather than whatever the hell simple animals controlled by clients use (I didn't bother enough to look into the chain to remember this). This was the complete wrong proc to use, and it NEVER got executed. Anyways, we just hook into the signal for whatever the simple animal proc is as well as clean up all the code, make everything pretty, and most importantly:

MAKE THE DAMN CODE WORK

Why It's Good For The Game

Either someone did not test their code at all, or some weird esoteric change in the attack chain changed this somehow? I'm not sure when or why this happened but it is guaranteed to be fixed now.

The code cleanup and tinkering I did means that it's gonna be about 10% easier to port this over to a basic mob eventually (not doing a full refactor when this shit is this broken, the code added here is modular enough to the point where it's plug-n-play).

Changelog

🆑 fix: Space Dragons can now, once again, tear down walls and eat corpses. They also have regained their special damage modifier when attacking mechs. /🆑


Wednesday 2023-11-08 21:56:37 by acidvegas

Removed sourcehut aka sr.ht for banning me (fuck you), improved and fixed errors in helper scripts, etc


Wednesday 2023-11-08 23:08:39 by Darko V

Modifiers november 2023 (#3579)

  • Chaos modifier: Modifiers that your hero didn't have before will now be prioritized when you random a modifier on respawn.
  • Hyper Active modifier now provides only 5% cooldown reduction for all items.
  • Hyper Active modifier now provides only 5% cooldown reduction for Dazzle Bad Juju, Earth Spirit Rolling Boulder and Faceless Void Time Walk.
  • Hyper Lifesteal lifesteal and spell lifesteal against creeps reduced from 25% to 15%.
  • Hyper Lifesteal: Fixed lifesteal and spell lifesteal getting amplified by healing amplification instead of lifesteal amplification and spell lifesteal amplification respectively.
  • Octarine Soul cooldown reduction per point of Intelligence increased from 0.08% to 0.1%
  • Octarine Soul modifier no longer stacks with Hyper Active and Pro-Active modifiers.
  • Octarine Soul modifier no longer works for Dazzle Bad Juju.
  • Octarine Soul modifier no longer works for items.
  • Pro-Active modifier now provides only 10% cooldown reduction for Dazzle Bad Juju, Earth Spirit Rolling Boulder, Faceless Void Time Walk, Slark Shadow Dance, Terrorblade Sunder and Ursa Enrage.

Wednesday 2023-11-08 23:46:54 by necromanceranne

The Brawlening: Unarmed fighting interactions for shoving, grabbing and nonlethal takedowns (not martial arts) (#79362)

About The Pull Request

I've tweaked some elements of unarmed fighting to give it additional interactions between the various components, bridging them into a more coherent system and focusing more strongly as tool for disabling opponents nonlethally.

Shoving

Shoving guarantees that unarmed attacks will land while knocked off-balance (AKA when slowed by a shove).

Being off-balance means that you can be knocked down from a punch if you have taken enough brute and stamina damage combined (at least above 40).

Being off-balance makes you vulnerable to grabs while you have a moderate amount of stamina damage (30 damage), forcing you to have to resist even passive grabs. This pairs exceptionally well with tackling.

Grappling

Grappling someone makes your unarmed attacks penetrate armor based on a new limb value called unarmed_effectiveness. This is something shared by kicking.

Unarmed Attacks in General

unarmed_effectiveness has also taken over the functionality of unarmed_stun_threshold, as well as accuracy calculations. Human equivalent limbs (pretty much all of them except mushrooms and golems) have a value of 10.

Now, unarmed_effectiveness determines how accurately a given limb makes unarmed attacks. Unarmed attacks have a base inaccuracy of 20%, with effectiveness acting as a reduction to this value. (so for humans, that's 20% - 10% before any value changes from brute and stamina damage). It is also capped at 75% miss chance, just to avoid those weird instances of two brawling fighters being incapable of finishing each other off at a certain amount of damage and it being real awkward, like it does currently.

It also determines the base probability of landing a knockdown punch. For humans, this is 10%.

For the most part, these two particular changes are roughly equivalent to the current values, just handled in a way that is more straightforward to understand from a code perspective.

In addition to the above, human equivalent limbs have higher damage floors for unarmed attacks. Arms deal 5-10 damage, while legs deal 7-15 damage. In addition, kicks also deal stamina damage, like punches do.

Minor Mentions

Golems and Mushroom People (who don't even use their limbs for their unarmed strikes because mushroom people start with a martial art) have very accurate punches, and their punches penetrate quite a bit of armor when they are entitled to that. They also have a high knockdown probability. This is partially because they previously already had these features due to the wonky math at play, but also because this is their big thing they are good at.

Carp mutation also got a big win out of this as well. If and when you actually manage to get that to work and matter.

Why It's Good For The Game

My favorite thing in this game is the robustness of unarmed fighting. It's the part of the game that actually acknowledges the sandbox and environmental interaction in a big way. The only problem with the unarmed combat is that it is a bit disjointed, and often much weaker than using even the most pathetic weapon you can get your hands on unless you're using the stun loops available. Those loops get a bit boring, even if they're mostly all environmental (except for the lucky neckgrab finish). Giving more options generally means that even when not in an ideal position, you still have some options.

It also has some internal inconsistencies in design even in the same proc, like accuracy calculations and knockdowns, as well as weird splits in damage. So I decided to resolve that.

Now, every part of unarmed fighting has some relevance in the other parts. Predominantly, it is heavily favoured towards dealing stamina damage, making unarmed combat very favourable as a nonlethal method of taking someone down, which is something we currently lack considerably. While people may still opt to simply beat someone into actual crit rather than stop at stamina crit, the possibility is actually entirely available and supported now. No just banking on a lucky neckgrab after a shove.

Paying attention to damage dealt and thinking intelligently about how you apply combinations of effects allows even someone on the significant back foot an opportunity for a comeback if they know what they're doing against even armed opponents.

Separating accuracy and knockdown effectiveness from damage allows for more consistent design and readability, but also preventing weirdness ike tighter damage spreads increasing knockdown probabilities as well as increasing accuracy without the coder knowing why. This also lets us make unarmed attacks just that little bit stronger. Since unarmed attacks require more complicated combinations to work, I think this won't make them stronger than weapons necessarily, but it will make for more interesting swung fights.

Changelog

🆑 add: With the flood of Chi within the Spinward Sector receding, various masters of The Tunnel Arts, colloquially known as 'Maint-fu Masters', have started to refine the basics of their martial techniques. New forms have started to develop within Spacestation 13's hidden maintenance dojos. add: Someone shoved off-balance makes them vulnerable to more guaranteed unarmed strikes, knockdowns from a successful punch, and more difficult to escape grabs. add: Grabbing someone (as well as kicking them while they're on the floor) makes them more vulnerable to taking unarmed attack damage, even if they have armor. balance: Unarmed strikes made with human-equivalent limbs have higher damage floors, meaning you overall do more damage on average while not increasing the overall damage potential. It's more consistent! refactor: Significantly changed how punching accuracy and knockdowns are calculated. balance: Golem and mushroom limbs are a lot more effective at punching as a result of these various changes. As they should be. /🆑


Wednesday 2023-11-08 23:56:02 by craig[bot]

Merge #113809

113809: kvstreamer: add limit to how many eager batches are issued r=yuzefovich a=yuzefovich

kvstreamer: add limit to how many eager batches are issued

This commit fixes extremely suboptimal behavior of the streamer in the InOrder mode in some cases. In particular, previously it was possible for the buffered responses to consume most of the working budget, so the streamer would degrade to processing all requests effectively one BatchRequest with one Get / Scan at a time, significantly increasing the latency. For example, the query added as a regression test that performs 30k Gets across 10 ranges would usually take on the order of 1.5s (which is not great already since in the non-streamer path it takes 400ms), but in the degenerate cases it could be on the order of 20-30s.

Similar behavior could occur in the OutOfOrder mode too where we would issue more BatchRequests in which only one request could be satisfied (although in OutOfOrder mode the problem is not as severe - we don't buffer any results since we can always return them right away).

This problem is now fixed by imposing the limit on the budget's usage at which point the streamer stops issuing "eager" requests. Namely, now, when there is at least one request in flight, the streamer won't issue anymore requests once limit * eagerFraction is exceeded. This effectively reserves a portion of the budget for the "head-of-the-line" batch.

The "eager fraction" is controlled by a session variable, separate for each mode. The defaults of 0.5 for InOrder and 0.8 for OutOfOrder modes were chosen after running TPCH queries and the query that inspired this commit. These values bring the number of gRPC calls for the reproduction query from 1.5k-2k range to below 200 and the query latency to be reliably around 400ms.

I don't really see any significant downsides to this change - in the worst case, we'd be utilizing less of the available memory budget which is not that big of a deal, so I intend to backport this change. Also, setting the eager fractions to large values (greater than 1.0 is allowed) would disable this limiting behavior and revert to the previous behavior if we need it.

Fixes: #113729.

Release note (bug fix): Previously, when executing queries with index / lookup joins when the ordering needs to be maintained, CockroachDB in some cases could get into a pathological behavior which would lead to increased query latency, possibly by 1 or 2 orders of magnitude. This bug was introduced in 22.2 and is now fixed.

kvstreamer: increase default avg response multiple

This commit increases the default value for sql.distsql.streamer.avg_response_size_multiple cluster setting from 1.5 to 3.0. This setting controls the factor by which the current "avg response size" estimate is multiplied and allows for TargetBytes parameter to grow over time. In the reproduction query from the previous commit it was determined that the growth might not be as quick as desirable.

The effect of this change is as follows:

  • if we have responses of varying sizes, then we're now likely to be more effective since we'll end up issuing less BatchRequests
  • if we have responses of similar sizes, then we might pre-reserve too much budget upfront, so we'll end up with lower concurrency across ranges.

Thus, we don't want to increase the multiple by too much; however, keeping it at 1.5 can be quite suboptimal in some cases - 3.0 seems like a decent middle ground. This number was chosen based on running TPCH queries (both via InOrder and OutOfOrder modes of the streamer) and the reproduction query. (For the latter this change reduces the number of gRPC calls by a factor of 3 or so.)

Release note: None

Co-authored-by: Yahor Yuzefovich yahor@cockroachlabs.com


< 2023-11-08 >