Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed
- Propagate bozo flag from entry-level fields: thread bozo signal through `parse_item`, `parse_entry`, and `parse_rss10_item` (#70)

## [0.4.5] - 2026-02-20

### Fixed
Expand Down
89 changes: 47 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,28 @@
[![npm](https://img.shields.io/npm/v/feedparser-rs)](https://www.npmjs.com/package/feedparser-rs)
[![CI](https://img.shields.io/github/actions/workflow/status/bug-ops/feedparser-rs/ci.yml?branch=main)](https://github.com/bug-ops/feedparser-rs/actions)
[![codecov](https://codecov.io/gh/bug-ops/feedparser-rs/graph/badge.svg)](https://codecov.io/gh/bug-ops/feedparser-rs)
[![MSRV](https://img.shields.io/badge/MSRV-1.88.0-blue)](https://blog.rust-lang.org/)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](LICENSE-MIT)

High-performance RSS/Atom/JSON Feed parser written in Rust, with Python and Node.js bindings.
High-performance RSS/Atom/JSON Feed parser written in Rust, with Python and Node.js bindings. A drop-in replacement for Python [feedparser](https://github.com/kurtmckee/feedparser) that is 90-100x faster.

## Features

- **Multi-format support** RSS 0.9x, 1.0, 2.0 / Atom 0.3, 1.0 / JSON Feed 1.0, 1.1
- **Tolerant parsing** Handles malformed feeds gracefully with `bozo` flag pattern
- **HTTP fetching** Built-in URL fetching with compression (gzip, deflate, brotli)
- **Conditional GET** — ETag/Last-Modified support for bandwidth-efficient polling
- **Podcast support** — iTunes and Podcast 2.0 namespace extensions
- **Multi-language bindings** Native Python (PyO3) and Node.js (napi-rs) bindings
- **feedparser drop-in** Dict-style access, field aliases, same API patterns as Python feedparser
- **Multi-format support** -- RSS 0.9x, 1.0, 2.0 / Atom 0.3, 1.0 / JSON Feed 1.0, 1.1
- **Tolerant parsing** -- Handles malformed feeds gracefully with the `bozo` flag pattern, propagated at both feed and entry level
- **HTTP fetching** -- Built-in URL fetching with compression (gzip, deflate, brotli) and conditional GET (ETag/Last-Modified)
- **Podcast support** -- iTunes and Podcast 2.0 namespace extensions
- **Security** -- DoS protection via `ParserLimits`, SSRF protection, input size validation
- **Multi-language bindings** -- Native Python (PyO3) and Node.js (napi-rs) bindings
- **feedparser drop-in** -- Dict-style access, field aliases, same API patterns as Python feedparser

## Supported Formats

| Format | Versions | Status |
|--------|----------|--------|
| RSS | 0.90, 0.91, 0.92, 1.0, 2.0 | Full support |
| Atom | 0.3, 1.0 | Full support |
| JSON Feed | 1.0, 1.1 | Full support |
| RSS | 0.90, 0.91, 0.92, 1.0, 2.0 | Full support |
| Atom | 0.3, 1.0 | Full support |
| JSON Feed | 1.0, 1.1 | Full support |

### Namespace Extensions

Expand All @@ -53,27 +54,31 @@ Or add to your `Cargo.toml`:

```toml
[dependencies]
feedparser-rs = "0.2"
feedparser-rs = "0.4"
```

> [!IMPORTANT]
> Requires Rust 1.88.0 or later (edition 2024).

### Python

```bash
pip install feedparser-rs
```

> [!NOTE]
> Requires Python 3.10 or later.

### Node.js

```bash
npm install feedparser-rs
# or
yarn add feedparser-rs
# or
pnpm add feedparser-rs
```

### Python

```bash
pip install feedparser-rs
```
> [!NOTE]
> Requires Node.js 18 or later.

## Usage

Expand Down Expand Up @@ -126,23 +131,6 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
> [!TIP]
> Use `fetch_and_parse` for URL fetching with automatic compression handling (gzip, deflate, brotli).

### Node.js

```javascript
import { parse, fetchAndParse } from 'feedparser-rs';

// Parse from string
const feed = parse('<rss version="2.0">...</rss>');
console.log(feed.version); // 'rss20'
console.log(feed.feed.title);
console.log(feed.entries.length);

// Fetch from URL
const remoteFeed = await fetchAndParse('https://example.com/feed.xml');
```

See [Node.js API documentation](crates/feedparser-rs-node/README.md) for complete reference.

### Python

```python
Expand All @@ -162,13 +150,30 @@ print(d['feed']['title'])
print(d['entries'][0]['link'])

# Deprecated field aliases work
print(d.feed.description) # d.feed.subtitle
print(d.channel.title) # d.feed.title
print(d.feed.description) # -> d.feed.subtitle
print(d.channel.title) # -> d.feed.title
```

> [!NOTE]
> Python bindings provide full feedparser compatibility: dict-style access, field aliases, and `time.struct_time` for date fields.

### Node.js

```javascript
import { parse, fetchAndParse } from 'feedparser-rs';

// Parse from string
const feed = parse('<rss version="2.0">...</rss>');
console.log(feed.version); // 'rss20'
console.log(feed.feed.title);
console.log(feed.entries.length);

// Fetch from URL
const remoteFeed = await fetchAndParse('https://example.com/feed.xml');
```

See [Node.js API documentation](crates/feedparser-rs-node/README.md) for complete reference.

## Cargo Features

| Feature | Description | Default |
Expand All @@ -179,7 +184,7 @@ To disable HTTP support and reduce dependencies:

```toml
[dependencies]
feedparser-rs = { version = "0.2", default-features = false }
feedparser-rs = { version = "0.4", default-features = false }
```

## Workspace Structure
Expand Down Expand Up @@ -218,9 +223,9 @@ Measured on Apple M1 Pro, parsing real-world RSS feeds:

| Feed Size | Time | Throughput |
|-----------|------|------------|
| Small (2 KB) | **10.7 µs** | 187 MB/s |
| Medium (20 KB) | **93.6 µs** | 214 MB/s |
| Large (200 KB) | **939 µs** | 213 MB/s |
| Small (2 KB) | **10.7 us** | 187 MB/s |
| Medium (20 KB) | **93.6 us** | 214 MB/s |
| Large (200 KB) | **939 us** | 213 MB/s |

Format detection: **128 ns** (near-instant)

Expand Down
33 changes: 27 additions & 6 deletions crates/feedparser-rs-core/src/parser/atom.rs
Original file line number Diff line number Diff line change
Expand Up @@ -225,8 +225,23 @@ fn parse_feed_element(
entry_ctx.update_base(&xml_base);
}

match parse_entry(reader, &mut buf, limits, depth, &entry_ctx) {
Ok(entry) => feed.entries.push(entry),
let mut entry_bozo = false;
match parse_entry(
reader,
&mut buf,
limits,
depth,
&entry_ctx,
&mut entry_bozo,
) {
Ok(entry) => {
if entry_bozo && !feed.bozo {
feed.bozo = true;
feed.bozo_exception =
Some("Unresolvable entity in entry field".to_string());
}
feed.entries.push(entry);
}
Err(e) => {
feed.bozo = true;
feed.bozo_exception = Some(e.to_string());
Expand Down Expand Up @@ -284,6 +299,7 @@ fn parse_entry(
limits: &ParserLimits,
depth: &mut usize,
base_ctx: &BaseUrlContext,
bozo: &mut bool,
) -> Result<Entry> {
let mut entry = Entry::with_capacity();

Expand Down Expand Up @@ -327,7 +343,9 @@ fn parse_entry(
}
}
b"id" if !is_empty => {
entry.id = Some(read_text_str(reader, buf, limits)?.into());
let (text, had_bozo) = read_text(reader, buf, limits)?;
*bozo |= had_bozo;
entry.id = Some(text.into());
}
b"updated" if !is_empty => {
let text = read_text_str(reader, buf, limits)?;
Expand Down Expand Up @@ -383,14 +401,16 @@ fn parse_entry(
let handled = if let Some(dc_element) = is_dc_tag(tag) {
let dc_elem = dc_element.to_string();
if !is_empty {
let text = read_text_str(reader, buf, limits)?;
let (text, had_bozo) = read_text(reader, buf, limits)?;
*bozo |= had_bozo;
dublin_core::handle_entry_element(&dc_elem, &text, &mut entry);
}
true
} else if let Some(content_element) = is_content_tag(tag) {
let content_elem = content_element.to_string();
if !is_empty {
let text = read_text_str(reader, buf, limits)?;
let (text, had_bozo) = read_text(reader, buf, limits)?;
*bozo |= had_bozo;
content::handle_entry_element(&content_elem, &text, &mut entry);
}
true
Expand Down Expand Up @@ -423,7 +443,8 @@ fn parse_entry(
} else {
let media_elem = media_element.to_string();
if !is_empty {
let text = read_text_str(reader, buf, limits)?;
let (text, had_bozo) = read_text(reader, buf, limits)?;
*bozo |= had_bozo;
media_rss::handle_entry_element(&media_elem, &text, &mut entry);
}
}
Expand Down
Loading
Loading