Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for XML entity expansion limitation in SAX and pull parsers #187

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Jul 31, 2024

  • Supported REXML::Security.entity_expansion_limit= in SAX and Pull
  • Supported REXML::Security.entity_expansion_text_limit= in SAX and Pull

Supported `REXML::Security.entity_expansion_limit=` in SAX and Pull
Supported `REXML::Security.entity_expansion_text_limit=` in SAX and Pull
@naitoh naitoh marked this pull request as ready for review July 31, 2024 08:50
@kou kou changed the title fix: XML Entity Expansion is available in REXML(SAX or Pull) Add support for XML entity expansion limitation in SAX and pull parsers Aug 1, 2024
@kou kou merged commit 033d190 into ruby:master Aug 1, 2024
61 checks passed
@kou
Copy link
Member

kou commented Aug 1, 2024

Thanks.

kou pushed a commit that referenced this pull request Aug 1, 2024
`REXML::Parser::BaseParser` uses `REXML::Security` since #187. But
`rexml/parsers/baseparser.rb` doesn't require `rexml/security`
explicitly.

This doesn't cause a problem in normal usages because `require "rexml"`
requires `rexml/security` implicitly. If an user requires specific
parser such as `rexml/parsers/streamparser` explicitly, this causes a
problem.

We should require `rexml/security` explicitly in
`rexml/parsers/baseparser.rb` explicitly because
`REXML::Parser::BaseParser` uses `REXML::Security`.

## How to reproduce

When `lib/rexml/parsers/baseparser.rb` is required directly, the
`REXML::Security` module is not required. It causes the following error:

```ruby
require "rexml/parsers/streamparser"
require "rexml/streamlistener"

class Listener
  include REXML::StreamListener
end

REXML::Parsers::StreamParser.new("<root>&gt;</root>", Listener.new).parse
```

```console
$ ruby test.rb
lib/rexml/parsers/baseparser.rb:558:in 'block in REXML::Parsers::BaseParser#unnormalize': uninitialized constant REXML::Parsers::BaseParser::Security (NameError)

                if sum > Security.entity_expansion_text_limit
                         ^^^^^^^^
Did you mean?  SecurityError
	from <internal:array>:54:in 'Array#each'
	from rexml/parsers/baseparser.rb:551:in 'REXML::Parsers::BaseParser#unnormalize'
	from rexml/parsers/streamparser.rb:39:in 'REXML::Parsers::StreamParser#parse'
	from test.rb:8:in '<main>'
```
otegami added a commit to otegami/red-datasets that referenced this pull request Aug 1, 2024
…it during XML parsing

Using `Datasets::WikipediaKyotoJapaneseEnglish#each` raised an `entity expansion has grown too large (RuntimeError)`.
This error occurs because the entity expansion limit in REXML is set by ruby/rexml#187,
and `Datasets::WikipediaKyotoJapaneseEnglish#each` exceeds that limit.

In Red Datasets, increasing the entity expansion limit is not a problem because we want to handle large datasets.
Therefore, we temporarily increase the limit.

How to reproduce:

```console
$ cd red-datasets && bundle
$ bundle exec ruby example/wikipedia-kyoto-japanese-english.rb
...
/home/otegami/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
...
```
kou pushed a commit to red-data-tools/red-datasets that referenced this pull request Aug 5, 2024
…it during XML parsing (#198)

Using `Datasets::WikipediaKyotoJapaneseEnglish#each` raised an `entity
expansion has grown too large (RuntimeError)`. This error occurs because
the entity expansion limit in REXML is set by
ruby/rexml#187, and
`Datasets::WikipediaKyotoJapaneseEnglish#each` exceeds that limit.

In Red Datasets, increasing the entity expansion limit is not a problem
because we want to handle large datasets.
Therefore, we temporarily increase the limit.

## How to reproduce

```console
$ cd red-datasets && bundle
$ bundle exec ruby example/wikipedia-kyoto-japanese-english.rb
...
/home/otegami/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
...
```
otegami added a commit to otegami/red-datasets that referenced this pull request Aug 5, 2024
Using `Datasets::Wikipedia#each` raised an `entity expansion has grown too large (RuntimeError)`.
This error occurs because the entity expansion limit in REXML is set by ruby/rexml#187,
and `Datasets::Wikipedia#each` exceeds that limit.

In Red Datasets, increasing the entity expansion limit is not a problem because we want to handle large datasets.
Therefore, we temporarily increase the limit.

```ruby
require 'datasets'

wikipedia = Datasets::Wikipedia.new
wikipedia.each do |wiki|
  pp wiki
end
```

```console
$ cd red-datasets && bundle && bundle exec ruby wiki
/home/otegami/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
```
otegami added a commit to otegami/red-datasets that referenced this pull request Aug 5, 2024
Using `Datasets::Wikipedia#each` raised an `entity expansion has grown too large (RuntimeError)`.
This error occurs because the entity expansion limit in REXML is set by ruby/rexml#187,
and `Datasets::Wikipedia#each` exceeds that limit.

In Red Datasets, increasing the entity expansion limit is not a problem because we want to handle large datasets.
Therefore, we temporarily increase the limit.

```ruby
require 'datasets'

wikipedia = Datasets::Wikipedia.new
wikipedia.each do |wiki|
  pp wiki
end
```

```console
$ cd red-datasets && bundle && bundle exec ruby wiki
/home/otegami/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
```
naitoh added a commit to naitoh/rexml that referenced this pull request Aug 20, 2024
## Why?

See:
- ruby#187
- ruby#195

## Change
- Supported `REXML::Security.entity_expansion_limit=` in Stream parser
- Supported `REXML::Security.entity_expansion_text_limit=` in Stream parser
@naitoh naitoh deleted the fix_entity_expansion branch September 27, 2024 20:29
otegami added a commit to otegami/red-datasets that referenced this pull request Oct 6, 2024
Using `Datasets::Wikipedia#each` raised an `entity expansion has grown too large (RuntimeError)`.
This error occurs because the entity expansion limit in REXML is set by ruby/rexml#187,
and `Datasets::Wikipedia#each` exceeds that limit.

In Red Datasets, increasing the entity expansion limit is not a problem because we want to handle large datasets.
Therefore, we temporarily increase the limit.

```ruby
require 'datasets'

wikipedia = Datasets::Wikipedia.new
wikipedia.each do |wiki|
  pp wiki
end
```

```console
$ cd red-datasets && bundle && bundle exec ruby wiki
/home/otegami/.rbenv/versions/3.3.3/lib/ruby/gems/3.3.0/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants