Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge mc-1.18.x/1.18.2 to mc-1.18.x/stable for release #100

Merged
merged 71 commits into from
May 8, 2022

Conversation

Merith-TK
Copy link
Collaborator

No description provided.

Possseidon and others added 30 commits January 19, 2022 17:39
Codes for right and middle mouse buttons were swapped.
Fix table with mouse button codes in documentation.
- Fixes #1026
- The remaining bytes counter wasn't being decremented, so the code that
  splits off smaller packets was unreachable. Thus all file slices were
  being put into a single UploadFileMessage packet.
This was added in the 1.13 update and I'm still not sure why. Other mods
seem to get away without it, so I think it's fine to remove.

Also remove the fake net manager, as that's part of Forge nowadays.

Fixes #1044.
Fix large file uploads producing oversized packets.
Holding off until Forge releases for 1.18.2
Did not enjoy, would not recommend.
It's now impossible to run the client tests (tests are still there, but
none of the other infrastructure is). We've not run these for months now
due to their severe flakiness :(.
There's a couple of alternative ways to solve this. Ideally we'd send
our network messages at the same time as MC does
(ChunkManager.playerLoadedChunk), but this'd require a mixin.

Instead we just rely on the fact that if the chunk isn't loaded,
monitors won't have done anything and so we don't need to send their
contents!

Fixes #1047, probably doesn't cause any regressions. I've not seen any
issues on 1.16, but I also hadn't before so ¯\_(ツ)_/¯.
Forge 4.0.18 deprecated a lot of methods and moved where
RegistryEvent.NewRegistry lives, so we needed to update. This does break
the CC API a little bit (sorry!) though given Forge 1.18.2 is still in
flux, that's probably inevitable.
Documentation fix for rednet.broadcast
 - Remove the POSITION_COLOR render type. Instead we just render a
   background terminal quad as the pocket computer light - it's a little
   (lot?) more cheaty, but saves having to create a render type.

 - Use the existing position_color_tex shader instead of our copy. I
   looked at using RenderType.text, but had a bunch of problems with GUI
   terminals. Its possible we can fix it, but didn't want to spend too
   much time on it.

 - Remove some methods from FixedWidthFontRenderer, inlining them into
   the call site.

 - Switch back to using GL_QUADS rather than GL_TRIANGLES. I know Lig
   will shout at me for this, but the rest of MC uses QUADS, so I don't
   think best practice really matters here.

 - Fix the TBO backend monitor not rendering monitors with fog.
 
   Unfortunately we can't easily do this to the VBO one without writing
   a custom shader (which defeats the whole point of the VBO backend!),
   as the distance calculation of most render types expect an
   already-transformed position (camera-relative I think!) while we pass
   a world-relative one.

 - When rendering to a VBO we push vertices to a ByteBuffer directly,
   rather than going through MC's VertexConsumer system. This removes
   the overhead which comes with VertexConsumer, significantly improving
   performance.

 - Pre-convert palette colours to bytes, storing both the coloured and
   greyscale versions as a byte array. This allows us to remove the
   multiple casts and conversions (double -> float -> (greyscale) ->
   byte), offering noticeable performance improvements (multiple ms per
   frame).

   We're using a byte[] here rather than a record of three bytes as
   notionally it provides better performance when writing to a
   ByteBuffer directly compared to calling .put() four times. [^1]

 - Memorize getRenderBoundingBox. This was taking about 5% of the total
   time on the render thread[^2], so worth doing.

   I don't actually think the allocation is the heavy thing here -
   VisualVM says it's toWorldPos being slow. I'm not sure why - possibly
   just all the block property lookups? [^2]

Note that none of these changes improve compatibility with Optifine.
Right now there's some serious issues where monitors are writing _over_
blocks in front of them. To fix this, we probably need to remove the
depth blocker and just render characters with a z offset. Will do that
in a separate commit, as I need to evaluate how well that change will
work first.

The main advantage of this commit is the improved performance. In my 
stress test with 120 monitors updating every tick, I'm getting 10-20fps
[^3] (still much worse than TBOs, which manages a solid 60-100).

In practice, we'll actually be much better than this. Our network
bandwidth limits means only 40 change in a single tick - and so FPS is
much more reasonable (+60fps).

[^1]: In general, put(byte[]) is faster than put(byte) multiple times.
Just not clear if this is true when dealing with a small (and loop
unrolled) number of bytes.

[^2]: To be clear, this is with 120 monitors and no other block entities
with custom renderers. so not really representative.

[^3]: I wish I could provide a narrower range, but it varies so much
between me restarting the game. Makes it impossible to benchmark
anything!
Somehow this hits a happier path in the JVM. I guess it has trouble
inlining the VertexEmitter.vertex() calls because there are multiple
implementations, so reducing the number of calls and giving it a
chunkier function to JIT down helps? This is all conjecture because
I haven't figured out JitWatch yet :)

Anyways, this gives about a 9% improvement in my tests.
This gives about a 3% improvement in VBO rebuild stress tests, for the
cost of a little more memory.

getVertexCount() was showing up heavy in my profiles. Changing it to a
simple upper bound calculation melts that time away. If there's a
max size 0.5 text scale monitor in the scene, the buffer will grow to
~3 MB. For comparison's sake, the images in the "blit" program were
already growing the buffer to ~2.1 MB.
SquidDev and others added 29 commits April 26, 2022 21:43
Like #455, this sets our uniforms via a UBO rather than having separate
ones for each value. There are a couple of small differences:

 - Have a UBO for each monitor, rather than sharing one and rewriting it
   every monitor. This means we only need to update the buffer when the
   monitor changes.

 - Use std140 rather than the default layout. This means we don't have
   to care about location/stride in the buffer.

Also like #455, this doesn't actually seem to result in any performance
improvements for me. However, it does make it a bit easier to handle a
large number of uniforms.

Also cleans up the generation of the main monitor texture buffer:

 - Move buffer generation into a separate method - just ensures that it
   shows up separately in profilers.
 - Explicitly pass the position when setting bytes, rather than
   incrementing the internal one. This saves some memory reads/writes (I
   thought Java optimised them out, evidently not!). Saves a few fps
   when updating.
 - Use DSA when possible. Unclear if it helps at all, but nice to do :).
 - For TBOs, we now pass cursor position, colour and blink state as
   variables to the shader, and use them to overlay the cursor texture
   in the right place.

   As we no longer need to render the cursor, we can skip the depth
   buffer, meaning we have to do one fewer upload+draw cycle.

 - For VBOs, we bake the cursor into the main VBO, and switch between
   rendering n and n+1 quads. We still need the depth blocker, but can
   save one upload+draw cycle when the cursor is visible.

This saves significant time on the TBO renderer - somewhere between 4
and 7ms/frame, which bumps us up from 35 to 47fps on my test world (480
full-sized monitors, changing every tick). [Taken on 1.18, but should be
similar on 1.16]
Historically I've been reluctant to do this as people might be running
Optifine for performance rather than shaders, and the VBO renderer was
significantly slower when monitors were changing.

With the recent performance optimisations, the difference isn't as bad.
Given how many people ask/complain about the TBO renderer and shaders, I
think it's worth doing this, even if it's not as granular as I'd like.

Also changes how we do the monitor backend check. We now only check for
compatibility if BEST is selected - if there's an override, we assume
the user knows what they're doing (a bold assumption, if I may say so
myself).
This /significantly/ improves performance of the VBO renderer (3fps to
80fps with 120 constantly changing monitors) and offers some minor FPS
improvements to the TBO renderer.

This also makes the 1.16 rendering code a little more consistent with
the 1.18 code, cleaning it up a little in the process.

Closes #1065 - this is a backport of those changes for 1.16. I will
merge these changes into 1.18, as with everything else (oh boy, that'll
be fun).

Please note this is only tested on my machine right now - any help
testing on other CPU/GPU configurations is much appreciated.
I was right: I did not enjoy this.
GlStateManager.glDeleteBuffers clears a buffer before deleting it on
Linux - I assume otherwise there's memory leaks on some drivers? - which
clobbers BufferUploader's cache. Roll our own version which resets the
cache when needed.

Also always reset the cache when deleting/creating a DirectVertexBuffer.
I hate doing this, but I have too many merges in progress to rebase.
 - Bump Forge version to latest RB.
 - Generate an 8-bit audio stream again, as we no longer need to be
   compatible with MC's existing streams.

No functionality changes, just mildly less hacky.
> Modulo any game-breaking bugs [...] this will be the last CC: Tweaked
> release.

Terrible performance is game-breaking right? Or am I just petty?
Fixes printouts being drawn slightly offset to the left in all cases,
noticeable mainly when in item frames.
This allows us to sync the position to the entity immediately, rather
than the sound jumping about.

Someone has set up rick-rolling pocket computers (<3 to whoever did
this), and the lag on them irritates me enough to fix this.

Fixes #1074
 - Bump Loom and Fabric API version.
 - Update MixinSoundEngine - I believe the lambda is mapped differently
   on the latest Loom version.
 - Fix several deprecation warnings.
  - Remove most conventional tag definitions - those are built into
    Fabric now.
  - Port CC:T's data generators to Fabric. This covers loot tables,
    advancements/recipes and block/item tags. As with CC:T, some of our
    custom recipes are not covered.
This implements everything that CC:T does aside from turtles. Other
blocks could also be converted (printers, disk drives, etc...), but
I've never got round to them.
Port CC:T's data generators to Fabric
See #5. No gametests yet, those are more work.
Port across CC:T's Java and Lua tests
I noticed that the release builder never actually pulled the submodules into it, so it never contained the Overhaul resource pack, 

I hope
@Merith-TK Merith-TK merged commit d03b68e into mc-1.18.x/stable May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants