Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider detaching entity while consuming it from Stream #3295

Closed
quaff opened this issue Jan 3, 2024 · 6 comments
Closed

Consider detaching entity while consuming it from Stream #3295

quaff opened this issue Jan 3, 2024 · 6 comments
Labels
status: declined A suggestion or change that we don't feel we should currently apply

Comments

@quaff
Copy link
Contributor

quaff commented Jan 3, 2024

The main purpose of Stream is avoiding return a large List which may lead to OOM, but before the transaction finished, the entities still exist in persistence context and cannot be recycled by GC, we need to evict entity manually after it be consumed. Here is a test case verify that:

package stream;

import java.util.stream.Stream;

import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.TestInstance;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.autoconfigure.domain.EntityScan;
import org.springframework.boot.test.autoconfigure.orm.jpa.DataJpaTest;
import org.springframework.data.domain.Sort;
import org.springframework.data.jpa.domain.AbstractPersistable;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.JpaSpecificationExecutor;
import org.springframework.data.jpa.repository.config.EnableJpaRepositories;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.TestPropertySource;

import jakarta.persistence.Entity;
import jakarta.persistence.EntityManager;

@TestInstance(TestInstance.Lifecycle.PER_CLASS)
@DataJpaTest(showSql = false)
@TestPropertySource(properties = "spring.jpa.properties.hibernate.jdbc.fetch_size=100")
@EnableJpaRepositories(basePackageClasses = StreamTests.TestEntityRepository.class, considerNestedRepositories = true)
@EntityScan(basePackageClasses = StreamTests.TestEntity.class)
@ContextConfiguration(classes = StreamTests.class)
class StreamTests {

	@Autowired
	TestEntityRepository repository;

	@Autowired
	EntityManager entityManager;

	@BeforeAll
	void prepare() {
		int size = 100000;
		for (int i = 0; i < size; i++) {
			TestEntity entity = new TestEntity();
			this.repository.save(entity);
		}
	}

	@ParameterizedTest
	@ValueSource(booleans = { false, true })
	void testStreamMemoryUsage(boolean detaching) {
		long usedMemory = usedMemory();
		Stream<TestEntity> stream = repository.stream();
		if (detaching) {
			stream = stream.peek(entityManager::detach);
		}

		stream.forEach(entity -> {
			// TODO do something with entity
		});

		System.out.println(
				"Memory usage increased " + (detaching ? "with	" : "without	") + " detaching: " + (usedMemory() - usedMemory));
	}

	private long usedMemory() {
		System.gc();
		return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
	}

	interface TestEntityRepository extends JpaRepository<TestEntity, Long>, JpaSpecificationExecutor<TestEntity> {

		default Stream<TestEntity> stream() {
			return findBy((root, query, cb) -> null, q -> q.sortBy(Sort.by("id"))).stream();
		}
	}

	@Entity
	static class TestEntity extends AbstractPersistable<Long> {

	}

}

it will output like this:

Memory usage increased without	 detaching: 3060128
Memory usage increased with	 detaching: 302464

Spring Data JPA could return Stream with .peek(entityManager::detach).
Or use a StatelessSession to fetch stream, It's Hibernate specific, I think there is an equivalent with EclipseLink, we should migrate if standard StatelessEntityManager is out.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jan 3, 2024
@quaff quaff changed the title Consider evicting entity from persistence context while consuming Stream Consider detach entity from persistence context while consuming Stream Jan 3, 2024
@quaff quaff changed the title Consider detach entity from persistence context while consuming Stream Consider detaching entity while consuming it from Stream Jan 3, 2024
quaff added a commit to quaff/spring-data-jpa that referenced this issue Jan 3, 2024
@mp911de mp911de added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged labels Jan 3, 2024
@mp911de
Copy link
Member

mp911de commented Jan 3, 2024

That isn't going to happen as detaching entities breaks how JPA works. Registering an aspect on JPA's Query to post-process getResultList() and other methods producing results if you want to detach entities could be reasonable. That would however have to happen in a component closer to Spring Framework.

@mp911de mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
@quaff
Copy link
Contributor Author

quaff commented Jan 3, 2024

That isn't going to happen as detaching entities breaks how JPA works. Registering an aspect on JPA's Query to post-process getResultList() and other methods producing results if you want to detach entities could be reasonable. That would however have to happen in a component closer to Spring Framework.

Could you explain why detaching entities breaks how JPA works and how to register an aspect on Query to post-process getResultList()?

@mp911de
Copy link
Member

mp911de commented Jan 3, 2024

Probably stating the obvious, JPA expects attached entities to work as designed (dirty tracking, lazy loading, persisting on transactional commit)

@quaff
Copy link
Contributor Author

quaff commented Jan 4, 2024

Probably stating the obvious, JPA expects attached entities to work as designed (dirty tracking, lazy loading, persisting on transactional commit)

I think it's worthy to treat specially, because the purpose of Stream is for fetching large ScrollableResults, we should do that in readonly transaction to prevent OOM, and If someone insist on that, they can merge entity manually.
It's a reasonable tradeoff, please rethink.

@mp911de
Copy link
Member

mp911de commented Jan 4, 2024

If you want a reasonable fix, please fix the JPA spec.

@quaff
Copy link
Contributor Author

quaff commented Jan 4, 2024

If you want a reasonable fix, please fix the JPA spec.

I'm not saying something is wrong have to fix, I'm proposing an enhancement, It's kind of contributing, I apologize if made noise here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined A suggestion or change that we don't feel we should currently apply
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants