Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor stanzareceiver to allow splitting into separate receivers #2265

Closed
tigrannajaryan opened this issue Feb 1, 2021 · 1 comment · Fixed by #2306 or #2336
Closed

Refactor stanzareceiver to allow splitting into separate receivers #2265

tigrannajaryan opened this issue Feb 1, 2021 · 1 comment · Fixed by #2306 or #2336
Assignees

Comments

@tigrannajaryan
Copy link
Member

We would like to separate the "source" of the log from the "operators" (it is currently one list in stanzareceiver). The goal is to be able to do this in the config:

filelog:
 include: [ receiver/stanzareceiver/testdata/simple.log ]
 start_at: beginning
 operators:
   - type: regex_parser
       regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
       timestamp:
         parse_from: time
         layout: '%Y-%m-%d'
       severity:
         parse_from: sev
winlog:
 channels: [Application, System]
 # other winlog options.
 operators:
   # similar to filelog.
journald:
 # optional settings.
 directory:
 files:
 operators:
   # similar to filelog.
tigrannajaryan pushed a commit that referenced this issue Feb 11, 2021
**Link to tracking Issue:** 
This PR partially addresses the following issues:
- Resolves: #2265 
- Related:  #2268, #2282.

**Description:**

The main idea here is to convert `stanzareceiver` into a helper package for building various other stanza-based receivers. Each of these other receivers will only vary by input operator. Functionality pulled out of `stanzareceiver` was moved into a new `filelogreceiver`. `stanzareceiver` should most likely be renamed and/or moved, but is left in its previous package for this initial PR. 

`stanzareceiver` defines an interface called `LogReceiverType` which each stanza-based receiver must implement and pass to `stanzareceiver.NewFactory(LogReceiverType) component.ReceiverFactory`. 

With this interface, each stanza-based receiver should only need a small amount of work to have a fully functional receiver. Support for parsing operations, emission from stanza's internal pipeline, and conversion to pdata format are all handled in the helper package so that these will be standardized across all the full set of stanza-based receivers.

**Next Steps**
Input operators are _not yet_ isolated to the top level of the configuration. The end goal is: 
```
filelog:
 include: [ receiver/stanzareceiver/testdata/simple.log ]
 start_at: beginning
 operators:
   - type: regex_parser
       regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
       timestamp:
         parse_from: time
         layout: '%Y-%m-%d'
       severity:
         parse_from: sev
```

but the current state is still:
```
filelog:
 operators:
   - type: file_input
      include: [ receiver/stanzareceiver/testdata/simple.log ]
      start_at: beginning
   - type: regex_parser
       regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
       timestamp:
         parse_from: time
         layout: '%Y-%m-%d'
       severity:
         parse_from: sev
```

The primary requirement #2265 is to promote the input operator to the top level of the receiver config. This will be the focus of the next PR. This PR is mostly concerned with splitting up the package. The configuration changes might be a little messy so I wanted to address those separately.

On the subject of configuration - the interface defined by `stanzareceiver` has a method `Decode(configmodels.Receiver) (pipeline.Config, error)` which is in my opinion much too loosely defined. Too much responsibility is delegated to each stanza-based receiver. The main reason this is left this way for now is that `stanza` operators do not currently use `mapstructure` for config unmarshaling. There is currently a workaround in place, but once stanza operators are migrated to `mapstructure`, more responsibility for unmarshaling should be extracted back into the helper package, and this interface method should end up a lot cleaner. I'm planning to look into this in the next PR.

**Open questions** (which can be addressed in this PR or the next):
- Should the helper package be completely standalone, or does it belong in `receivercreator` or similar?
- If the helper package should be standalone, what should it be called? (probably not `stanzareceiver`)

**Temporarily removed functionality**
This functionality will be implemented in the near future. There is some design to do on how exactly this should work when used by multiple receivers:
- Offsets database (tracked by #2287)
- Plugins (tracked as item on #2264)

**Testing:** 
Unit tests are roughly the same as before. A few cases were dropped because they no longer applied. Certainly more tests will be added as this pattern is solidified. 

Testbed scenario is unchanged and still passing:
```
> make run-tests
./runtests.sh
=== RUN   TestLog10kDPS
=== RUN   TestLog10kDPS/OTLP
... (abbreviated)
=== RUN   TestLog10kDPS/Stanza
... (abbreviated)
--- PASS: TestLog10kDPS (30.73s)
    --- PASS: TestLog10kDPS/OTLP (15.32s)
    --- PASS: TestLog10kDPS/Stanza (15.41s)
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/testbed/tests_unstable_exe    31.406s
# Test PerformanceResults
Started: Mon, 08 Feb 2021 13:35:08 -0500

Test                                    |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Log10kDPS/OTLP                          |PASS  |     15s|    19.9|    20.6|         39|         47|    149900|        149900|
Log10kDPS/Stanza                        |PASS  |     15s|    28.4|    29.3|         40|         48|    150000|        150000|

Total duration: 31s
```
@djaglowski
Copy link
Member

Reopening because this was only partially addressed by #2306

@djaglowski djaglowski reopened this Feb 11, 2021
tigrannajaryan pushed a commit that referenced this issue Feb 17, 2021
**Link to tracking Issue:**

Resolves #2265, Resolves #2268, and Resolves #2282.

**External Changes**

Moves the file input operator to the top of the`filelog` configuration:
```
receivers:
  filelog:
    include: [ testdata/simple.log ]
    start_at: beginning
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
        timestamp:
          parse_from: time
          layout: '%Y-%m-%d'
        severity:
          parse_from: sev
```

---

Updates the `stanzareceiver.LogReceiverType`. It pulls more standard functionality into the `stanzareceiver` package, such as decoding the parser configs. However, I couldn't find a way to avoid adding a fairly "dumb" method, `BaseConfig(configmodels.Receiver) BaseConfig`, which just needs to grab `stanzareceiver.BaseConfig` from the individual implementation. I'm 99% sure there's a better way to do this, but I'm struggling to find it at the moment.


**Next Steps**

The `filelog` readme is badly out of date. Captured in #2335

---

The stanza operators still do not make use of `mapstructure`, which forces the configuration to work with generic `map[string]interface{}` and `map[interface{}]interface{}`. This is messy and makes it difficult to write good tests. I want to clean this up soon. Captured in several [issues](https://github.com/open-telemetry/opentelemetry-log-collection/issues) on the `opentelemetry-log-collection` repo.



**Testing:**
Merged `filelogreceiver/e2e_test.go` into `filelogreceiver/filelog_test.go`, but set of tests is the same.
Testbed still passes: 
```
=== RUN   TestLog10kDPS/filelog
2021/02/11 20:22:54 Starting mock backend...
... (abbreviated)
2021/02/11 20:23:10 Stopped backend. Received:   150,000 items (9,573/sec)
--- PASS: TestLog10kDPS (32.22s)
    --- PASS: TestLog10kDPS/OTLP (16.55s)
    --- PASS: TestLog10kDPS/filelog (15.67s)
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/testbed/tests_unstable_exe    32.855s
# Test PerformanceResults
Started: Thu, 11 Feb 2021 20:22:37 -0500

Test                                    |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Log10kDPS/OTLP                          |PASS  |     17s|    11.6|    12.7|         37|         46|    150000|        150000|
Log10kDPS/filelog                       |PASS  |     16s|    23.9|    25.9|         40|         48|    150000|        150000|
```
@tigrannajaryan tigrannajaryan added this to the Basic Logs Support milestone Mar 3, 2021
pmatyjasek-sumo referenced this issue in pmatyjasek-sumo/opentelemetry-collector-contrib Apr 28, 2021
**Link to tracking Issue:** 
This PR partially addresses the following issues:
- Resolves: #2265 
- Related:  #2268, #2282.

**Description:**

The main idea here is to convert `stanzareceiver` into a helper package for building various other stanza-based receivers. Each of these other receivers will only vary by input operator. Functionality pulled out of `stanzareceiver` was moved into a new `filelogreceiver`. `stanzareceiver` should most likely be renamed and/or moved, but is left in its previous package for this initial PR. 

`stanzareceiver` defines an interface called `LogReceiverType` which each stanza-based receiver must implement and pass to `stanzareceiver.NewFactory(LogReceiverType) component.ReceiverFactory`. 

With this interface, each stanza-based receiver should only need a small amount of work to have a fully functional receiver. Support for parsing operations, emission from stanza's internal pipeline, and conversion to pdata format are all handled in the helper package so that these will be standardized across all the full set of stanza-based receivers.

**Next Steps**
Input operators are _not yet_ isolated to the top level of the configuration. The end goal is: 
```
filelog:
 include: [ receiver/stanzareceiver/testdata/simple.log ]
 start_at: beginning
 operators:
   - type: regex_parser
       regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
       timestamp:
         parse_from: time
         layout: '%Y-%m-%d'
       severity:
         parse_from: sev
```

but the current state is still:
```
filelog:
 operators:
   - type: file_input
      include: [ receiver/stanzareceiver/testdata/simple.log ]
      start_at: beginning
   - type: regex_parser
       regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
       timestamp:
         parse_from: time
         layout: '%Y-%m-%d'
       severity:
         parse_from: sev
```

The primary requirement #2265 is to promote the input operator to the top level of the receiver config. This will be the focus of the next PR. This PR is mostly concerned with splitting up the package. The configuration changes might be a little messy so I wanted to address those separately.

On the subject of configuration - the interface defined by `stanzareceiver` has a method `Decode(configmodels.Receiver) (pipeline.Config, error)` which is in my opinion much too loosely defined. Too much responsibility is delegated to each stanza-based receiver. The main reason this is left this way for now is that `stanza` operators do not currently use `mapstructure` for config unmarshaling. There is currently a workaround in place, but once stanza operators are migrated to `mapstructure`, more responsibility for unmarshaling should be extracted back into the helper package, and this interface method should end up a lot cleaner. I'm planning to look into this in the next PR.

**Open questions** (which can be addressed in this PR or the next):
- Should the helper package be completely standalone, or does it belong in `receivercreator` or similar?
- If the helper package should be standalone, what should it be called? (probably not `stanzareceiver`)

**Temporarily removed functionality**
This functionality will be implemented in the near future. There is some design to do on how exactly this should work when used by multiple receivers:
- Offsets database (tracked by #2287)
- Plugins (tracked as item on #2264)

**Testing:** 
Unit tests are roughly the same as before. A few cases were dropped because they no longer applied. Certainly more tests will be added as this pattern is solidified. 

Testbed scenario is unchanged and still passing:
```
> make run-tests
./runtests.sh
=== RUN   TestLog10kDPS
=== RUN   TestLog10kDPS/OTLP
... (abbreviated)
=== RUN   TestLog10kDPS/Stanza
... (abbreviated)
--- PASS: TestLog10kDPS (30.73s)
    --- PASS: TestLog10kDPS/OTLP (15.32s)
    --- PASS: TestLog10kDPS/Stanza (15.41s)
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/testbed/tests_unstable_exe    31.406s
# Test PerformanceResults
Started: Mon, 08 Feb 2021 13:35:08 -0500

Test                                    |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Log10kDPS/OTLP                          |PASS  |     15s|    19.9|    20.6|         39|         47|    149900|        149900|
Log10kDPS/Stanza                        |PASS  |     15s|    28.4|    29.3|         40|         48|    150000|        150000|

Total duration: 31s
```
pmatyjasek-sumo referenced this issue in pmatyjasek-sumo/opentelemetry-collector-contrib Apr 28, 2021
**Link to tracking Issue:**

Resolves #2265, Resolves #2268, and Resolves #2282.

**External Changes**

Moves the file input operator to the top of the`filelog` configuration:
```
receivers:
  filelog:
    include: [ testdata/simple.log ]
    start_at: beginning
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
        timestamp:
          parse_from: time
          layout: '%Y-%m-%d'
        severity:
          parse_from: sev
```

---

Updates the `stanzareceiver.LogReceiverType`. It pulls more standard functionality into the `stanzareceiver` package, such as decoding the parser configs. However, I couldn't find a way to avoid adding a fairly "dumb" method, `BaseConfig(configmodels.Receiver) BaseConfig`, which just needs to grab `stanzareceiver.BaseConfig` from the individual implementation. I'm 99% sure there's a better way to do this, but I'm struggling to find it at the moment.


**Next Steps**

The `filelog` readme is badly out of date. Captured in #2335

---

The stanza operators still do not make use of `mapstructure`, which forces the configuration to work with generic `map[string]interface{}` and `map[interface{}]interface{}`. This is messy and makes it difficult to write good tests. I want to clean this up soon. Captured in several [issues](https://github.com/open-telemetry/opentelemetry-log-collection/issues) on the `opentelemetry-log-collection` repo.



**Testing:**
Merged `filelogreceiver/e2e_test.go` into `filelogreceiver/filelog_test.go`, but set of tests is the same.
Testbed still passes: 
```
=== RUN   TestLog10kDPS/filelog
2021/02/11 20:22:54 Starting mock backend...
... (abbreviated)
2021/02/11 20:23:10 Stopped backend. Received:   150,000 items (9,573/sec)
--- PASS: TestLog10kDPS (32.22s)
    --- PASS: TestLog10kDPS/OTLP (16.55s)
    --- PASS: TestLog10kDPS/filelog (15.67s)
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/testbed/tests_unstable_exe    32.855s
# Test PerformanceResults
Started: Thu, 11 Feb 2021 20:22:37 -0500

Test                                    |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Log10kDPS/OTLP                          |PASS  |     17s|    11.6|    12.7|         37|         46|    150000|        150000|
Log10kDPS/filelog                       |PASS  |     16s|    23.9|    25.9|         40|         48|    150000|        150000|
```
ljmsc referenced this issue in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
#2265)

* lock accesses to encoder

fixes #2264

* move locking outside loop to avoid deadlock

* Update CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment