Skip to content

Commit

Permalink
fix: correctly handle FieldReference-special characters in field names.
Browse files Browse the repository at this point in the history
Keys passed to most methods of `ConvertedMap`, based on `IdentityHashMap`
depend on identity and not equivalence, and therefore rely on the keys being
_interned_ strings. In order to avoid hitting the JVM's global String intern
pool (which can have performance problems), operations to normalize a string
to its interned counterpart have traditionally relied on the behaviour of
`FieldReference#from` returning a likely-cached `FieldReference`, that had
an interned `key` and an empty `path`.

This is problematic on two points.

First, when `ConvertedMap` was given data with keys that _were_ valid string
field references representing a nested field (such as `[host][geo][location]`),
the implementation of `ConvertedMap#put` effectively silently discarded the
path components because it assumed them to be empty, and only the key was
kept (`location`).

Second, when `ConvertedMap` was given a map whose keys contained what the
field reference parser considered special characters but _were NOT_
valid field references, the resulting `FieldReference.IllegalSyntaxException`
caused the operation to abort.

Instead of using the `FieldReference` cache, which sits on top of objects whose
`key` and `path`-components are known to have been interned, we introduce an
internment helper on our `ConvertedMap` that is also backed by the global string
intern pool, and ensure that our field references are primed through this pool.

In addition to fixing the `ConvertedMap#newFromMap` functionality, this has
three net effects:

 - Our ConvertedMap operations still use strings
   from the global intern pool
 - We have a new, smaller cache of individual field
   names, improving lookup performance
 - Our FieldReference cache no longer is flooded
   with fragments and therefore is more likely to
   remain performant

NOTE: this does NOT create isolated intern pools, as doing so would require
      a careful audit of the possible code-paths to `ConvertedMap#putInterned`.
      The new cache is limited to 10k strings, and when more are used only
      the FIRST 10k strings will be primed into the cache, leaving the
      remainder to always hit the global String intern pool.

NOTE: by fixing this bug, we alow events to be created whose fields _CANNOT_
      be referenced with the existing FieldReference implementation.

Resolves: elastic#13606
Resolves: elastic#11608
  • Loading branch information
yaauie committed Apr 20, 2022
1 parent 77bb8c0 commit c19cf6e
Showing 1 changed file with 7 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@
import java.util.Map;
import org.assertj.core.api.Assertions;
import org.hamcrest.CoreMatchers;
import org.jruby.RubyBoolean;
import org.jruby.RubyHash;
import org.jruby.RubyString;
import org.jruby.exceptions.RuntimeError;
import org.jruby.javasupport.JavaUtil;
import org.jruby.runtime.Block;
import org.jruby.runtime.ThreadContext;
import org.jruby.runtime.builtin.IRubyObject;
import org.junit.Assert;
Expand Down Expand Up @@ -108,6 +110,11 @@ public void correctlyCreatesEventWhenGivenMapWithKeysThatHaveFieldReferenceSpeci
event.ruby_set_field(context, key, value);
IRubyObject retrievedValue = event.ruby_get_field(context, key);
Assert.assertThat(retrievedValue, CoreMatchers.equalTo(value));

RubyHash eventHash = (RubyHash) event.ruby_to_hash_with_metadata(context);
RubyBoolean eventIncludesField = eventHash.has_key_p(context, rubyString("il[[]]]legal"));
Assert.assertTrue(eventIncludesField.toJava(Boolean.class));
Assert.assertEquals(rubyString("okay"), eventHash.fetch(context, rubyString("il[[]]]legal"), Block.NULL_BLOCK));
}

private static RubyString rubyString(final String java) {
Expand Down

0 comments on commit c19cf6e

Please sign in to comment.