Skip to content

Commit

Permalink
refactor: hybrid incremental solver and performance improvements (#349)
Browse files Browse the repository at this point in the history
This is a big PR that does two things:

**Incremental solver**

The solver is now incremental by default. This means that the solver
will only request information from the `DependencyProvider` when it is
relatively sure that that information is needed to come up with a
solution. Previously the solver would greedily request all information
about the problem space up front. This made the solver very unsuitable
for cases where requesting information about packages is expensive.

Optionally a `DependencyProvider` has the ability to hint to the solver
that it already has dependency information available for some
candidates. The solver will use this information to increase its
knowledge of the problem space without requiring network operations. The
latter is used for the conda solver.

In general, it's better to have all the information available upfront
because if the solver has "all" the information available it is able to
reach a solution quicker because it doesn't have to guess about
information it doesn't know.

**Performance improvements**

This PR also implements a performance improvement by being more
thoughtful about making decisions about the next steps to take in the
algorithm. Instead of picking the first available decision it can make
it checks all possible choices to make and picks the decision that
involves the least amount of versions to pick.

This is similar to what pubgrub does:


https://github.com/dart-lang/pub/blob/master/doc/solver.md#decision-making

In the future, we might take a look at different heuristics but this
implementation is good enough for now.

Some results:

| | libsolv | libsolv-rs 0.7.0 | libsolv-rs (this) |

|------------------------------------|---------|------------|--------------|
| python=3.9 | 7.03ms | **3.47ms** | 3.57ms |
| xtensor, xsimd | 5.28ms | **2.00ms** | 2.22ms |
| tensorflow | 773.27ms | 407.98ms | **152.94ms** |
| quetz | 1380.8ms | 1684.7ms | **301.53ms** |
| tensorboard=2.1.1, grpc-cpp=1.39.1 | 515.25ms | 122.48ms | **89.41ms**
|

Note that for "simple" cases this PR is slower but just slightly and for
"complex" cases this PR is _much_ faster.

---------

Co-authored-by: Tim de Jager <tdejager89@gmail.com>
  • Loading branch information
baszalmstra and tdejager authored Sep 25, 2023
1 parent a7ff82d commit 61ed341
Show file tree
Hide file tree
Showing 31 changed files with 923 additions and 625 deletions.
1 change: 1 addition & 0 deletions crates/rattler_libsolv_rs/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ itertools = "0.11.0"
petgraph = "0.6.4"
tracing = "0.1.37"
elsa = "1.9.0"
bitvec = "1.0.1"
serde = { version = "1.0", features = ["derive"], optional = true }

[dev-dependencies]
Expand Down
2 changes: 2 additions & 0 deletions crates/rattler_libsolv_rs/src/internal/arena.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#![allow(unused)]

use std::cell::{Cell, UnsafeCell};
use std::cmp;
use std::marker::PhantomData;
Expand Down
24 changes: 18 additions & 6 deletions crates/rattler_libsolv_rs/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,19 @@ mod solvable;
mod solver;

use itertools::Itertools;

pub use internal::{
id::{NameId, SolvableId, VersionSetId},
mapping::Mapping,
};
pub use pool::Pool;
pub use solvable::Solvable;
pub use solver::{Solver, SolverCache};
use std::{
fmt::{Debug, Display},
hash::Hash,
};

pub use internal::id::{NameId, SolvableId, VersionSetId};
pub use pool::Pool;
pub use solvable::Solvable;
pub use solver::Solver;

/// The solver is based around the fact that for for every package name we are trying to find a
/// single variant. Variants are grouped by their respective package name. A package name is
/// anything that we can compare and hash for uniqueness checks.
Expand Down Expand Up @@ -60,7 +63,7 @@ pub trait DependencyProvider<VS: VersionSet, N: PackageName = String>: Sized {
/// Sort the specified solvables based on which solvable to try first. The solver will
/// iteratively try to select the highest version. If a conflict is found with the highest
/// version the next version is tried. This continues until a solution is found.
fn sort_candidates(&self, solver: &Solver<VS, N, Self>, solvables: &mut [SolvableId]);
fn sort_candidates(&self, solver: &SolverCache<VS, N, Self>, solvables: &mut [SolvableId]);

/// Returns a list of solvables that should be considered when a package with the given name is
/// requested.
Expand Down Expand Up @@ -92,6 +95,15 @@ pub struct Candidates {
/// also be possible to simply return a single candidate using this field provides better error
/// messages to the user.
pub locked: Option<SolvableId>,

/// A hint to the solver that the dependencies of some of the solvables are also directly
/// available. This allows the solver to request the dependencies of these solvables
/// immediately. Having the dependency information available might make the solver much faster
/// because it has more information available up-front which provides the solver with a more
/// complete picture of the entire problem space. However, it might also be the case that the
/// solver doesnt actually need this information to form a solution. In general though, if the
/// dependencies can easily be provided one should provide them up-front.
pub hint_dependencies_available: Vec<SolvableId>,
}

/// Holds information about the dependencies of a package.
Expand Down
2 changes: 1 addition & 1 deletion crates/rattler_libsolv_rs/src/problem.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ impl Problem {
&Clause::Requires(package_id, version_set_id) => {
let package_node = Self::add_node(&mut graph, &mut nodes, package_id);

let candidates = solver.get_or_cache_sorted_candidates(version_set_id);
let candidates = solver.cache.get_or_cache_sorted_candidates(version_set_id);
if candidates.is_empty() {
tracing::info!(
"{package_id:?} requires {version_set_id:?}, which has no candidates"
Expand Down
227 changes: 227 additions & 0 deletions crates/rattler_libsolv_rs/src/solver/cache.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
use crate::internal::arena::ArenaId;
use crate::{
internal::{
arena::Arena,
frozen_copy_map::FrozenCopyMap,
id::{CandidatesId, DependenciesId},
},
Candidates, Dependencies, DependencyProvider, NameId, PackageName, Pool, SolvableId,
VersionSet, VersionSetId,
};
use bitvec::vec::BitVec;
use elsa::FrozenMap;
use std::cell::RefCell;
use std::marker::PhantomData;

/// Keeps a cache of previously computed and/or requested information about solvables and version
/// sets.
pub struct SolverCache<VS: VersionSet, N: PackageName, D: DependencyProvider<VS, N>> {
provider: D,

/// A mapping from package name to a list of candidates.
candidates: Arena<CandidatesId, Candidates>,
package_name_to_candidates: FrozenCopyMap<NameId, CandidatesId>,

/// A mapping of `VersionSetId` to the candidates that match that set.
version_set_candidates: FrozenMap<VersionSetId, Vec<SolvableId>>,

/// A mapping of `VersionSetId` to the candidates that do not match that set (only candidates
/// of the package indicated by the version set are included).
version_set_inverse_candidates: FrozenMap<VersionSetId, Vec<SolvableId>>,

/// A mapping of `VersionSetId` to a sorted list of candidates that match that set.
pub(crate) version_set_to_sorted_candidates: FrozenMap<VersionSetId, Vec<SolvableId>>,

/// A mapping from a solvable to a list of dependencies
solvable_dependencies: Arena<DependenciesId, Dependencies>,
solvable_to_dependencies: FrozenCopyMap<SolvableId, DependenciesId>,

/// A mapping that indicates that the dependencies for a particular solvable can cheaply be
/// retrieved from the dependency provider. This information is provided by the
/// DependencyProvider when the candidates for a package are requested.
hint_dependencies_available: RefCell<BitVec>,

_data: PhantomData<(VS, N)>,
}

impl<VS: VersionSet, N: PackageName, D: DependencyProvider<VS, N>> SolverCache<VS, N, D> {
/// Constructs a new instance from a provider.
pub fn new(provider: D) -> Self {
Self {
provider,

candidates: Default::default(),
package_name_to_candidates: Default::default(),
version_set_candidates: Default::default(),
version_set_inverse_candidates: Default::default(),
version_set_to_sorted_candidates: Default::default(),
solvable_dependencies: Default::default(),
solvable_to_dependencies: Default::default(),
hint_dependencies_available: Default::default(),

_data: Default::default(),
}
}

/// Returns a reference to the pool used by the solver
pub fn pool(&self) -> &Pool<VS, N> {
self.provider.pool()
}

/// Returns the candidates for the package with the given name. This will either ask the
/// [`DependencyProvider`] for the entries or a cached value.
pub fn get_or_cache_candidates(&self, package_name: NameId) -> &Candidates {
// If we already have the candidates for this package cached we can simply return
let candidates_id = match self.package_name_to_candidates.get_copy(&package_name) {
Some(id) => id,
None => {
// Otherwise we have to get them from the DependencyProvider
let candidates = self
.provider
.get_candidates(package_name)
.unwrap_or_default();

// Store information about which solvables dependency information is easy to
// retrieve.
{
let mut hint_dependencies_available =
self.hint_dependencies_available.borrow_mut();
for hint_candidate in candidates.hint_dependencies_available.iter() {
let idx = hint_candidate.to_usize();
if hint_dependencies_available.len() <= idx {
hint_dependencies_available.resize(idx + 1, false);
}
hint_dependencies_available.set(idx, true)
}
}

// Allocate an ID so we can refer to the candidates from everywhere
let candidates_id = self.candidates.alloc(candidates);
self.package_name_to_candidates
.insert_copy(package_name, candidates_id);

candidates_id
}
};

// Returns a reference from the arena
&self.candidates[candidates_id]
}

/// Returns the candidates of a package that match the specified version set.
pub fn get_or_cache_matching_candidates(&self, version_set_id: VersionSetId) -> &[SolvableId] {
match self.version_set_candidates.get(&version_set_id) {
Some(candidates) => candidates,
None => {
let package_name = self.pool().resolve_version_set_package_name(version_set_id);
let version_set = self.pool().resolve_version_set(version_set_id);
let candidates = self.get_or_cache_candidates(package_name);

let matching_candidates = candidates
.candidates
.iter()
.copied()
.filter(|&p| {
let version = self.pool().resolve_internal_solvable(p).solvable().inner();
version_set.contains(version)
})
.collect();

self.version_set_candidates
.insert(version_set_id, matching_candidates)
}
}
}

/// Returns the candidates that do *not* match the specified requirement.
pub fn get_or_cache_non_matching_candidates(
&self,
version_set_id: VersionSetId,
) -> &[SolvableId] {
match self.version_set_inverse_candidates.get(&version_set_id) {
Some(candidates) => candidates,
None => {
let package_name = self.pool().resolve_version_set_package_name(version_set_id);
let version_set = self.pool().resolve_version_set(version_set_id);
let candidates = self.get_or_cache_candidates(package_name);

let matching_candidates = candidates
.candidates
.iter()
.copied()
.filter(|&p| {
let version = self.pool().resolve_internal_solvable(p).solvable().inner();
!version_set.contains(version)
})
.collect();

self.version_set_inverse_candidates
.insert(version_set_id, matching_candidates)
}
}
}

/// Returns the candidates for the package with the given name similar to
/// [`Self::get_or_cache_candidates`] sorted from highest to lowest.
pub fn get_or_cache_sorted_candidates(&self, version_set_id: VersionSetId) -> &[SolvableId] {
match self.version_set_to_sorted_candidates.get(&version_set_id) {
Some(canidates) => canidates,
None => {
let package_name = self.pool().resolve_version_set_package_name(version_set_id);
let matching_candidates = self.get_or_cache_matching_candidates(version_set_id);
let candidates = self.get_or_cache_candidates(package_name);

// Sort all the candidates in order in which they should betried by the solver.
let mut sorted_candidates = Vec::new();
sorted_candidates.extend_from_slice(matching_candidates);
self.provider.sort_candidates(self, &mut sorted_candidates);

// If we have a solvable that we favor, we sort that to the front. This ensures
// that the version that is favored is picked first.
if let Some(favored_id) = candidates.favored {
if let Some(pos) = sorted_candidates.iter().position(|&s| s == favored_id) {
// Move the element at `pos` to the front of the array
sorted_candidates[0..=pos].rotate_right(1);
}
}

self.version_set_to_sorted_candidates
.insert(version_set_id, sorted_candidates)
}
}
}

/// Returns the dependencies of a solvable. Requests the solvables from the
/// [`DependencyProvider`] if they are not known yet.
pub fn get_or_cache_dependencies(&self, solvable_id: SolvableId) -> &Dependencies {
let dependencies_id = match self.solvable_to_dependencies.get_copy(&solvable_id) {
Some(id) => id,
None => {
let dependencies = self.provider.get_dependencies(solvable_id);
let dependencies_id = self.solvable_dependencies.alloc(dependencies);
self.solvable_to_dependencies
.insert_copy(solvable_id, dependencies_id);
dependencies_id
}
};

&self.solvable_dependencies[dependencies_id]
}

/// Returns true if the dependencies for the given solvable are "cheaply" available. This means
/// either the dependency provider indicated that the dependencies for a solvable are available
/// or the dependencies have already been requested.
pub fn are_dependencies_available_for(&self, solvable: SolvableId) -> bool {
if self.solvable_to_dependencies.get_copy(&solvable).is_some() {
true
} else {
let solvable_idx = solvable.to_usize();
let hint_dependencies_available = self.hint_dependencies_available.borrow();
let value = hint_dependencies_available
.get(solvable_idx)
.as_deref()
.copied();
value.unwrap_or(false)
}
}
}
5 changes: 4 additions & 1 deletion crates/rattler_libsolv_rs/src/solver/clause.rs
Original file line number Diff line number Diff line change
Expand Up @@ -491,8 +491,11 @@ impl<VS: VersionSet, N: PackageName + Display> Debug for ClauseDebug<'_, VS, N>
let match_spec = self.pool.resolve_version_set(match_spec_id).to_string();
write!(
f,
"{} requires {match_spec}",
"{} requires {} {match_spec}",
solvable_id.display(self.pool),
self.pool
.resolve_version_set_package_name(match_spec_id)
.display(self.pool)
)
}
Clause::Constrains(s1, s2, vset_id) => {
Expand Down
35 changes: 35 additions & 0 deletions crates/rattler_libsolv_rs/src/solver/decision_map.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
use crate::internal::{arena::ArenaId, id::SolvableId};
use crate::{PackageName, Pool, VersionSet};
use std::cmp::Ordering;
use std::fmt::{Display, Formatter};

/// Represents a decision (i.e. an assignment to a solvable) and the level at which it was made
///
Expand Down Expand Up @@ -75,4 +77,37 @@ impl DecisionMap {
pub fn value(&self, solvable_id: SolvableId) -> Option<bool> {
self.map.get(solvable_id.to_usize()).and_then(|d| d.value())
}

/// Returns an object that can be used to display the contents of the decision map in a human readable fashion.
#[allow(unused)]
pub fn display<'a, VS: VersionSet, N: PackageName + Display>(
&'a self,
pool: &'a Pool<VS, N>,
) -> DecisionMapDisplay<'a, VS, N> {
DecisionMapDisplay { map: self, pool }
}
}

pub struct DecisionMapDisplay<'a, VS: VersionSet, N: PackageName + Display> {
map: &'a DecisionMap,
pool: &'a Pool<VS, N>,
}

impl<'a, VS: VersionSet, N: PackageName + Display> Display for DecisionMapDisplay<'a, VS, N> {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
for (id, solvable) in self.pool.solvables.iter() {
write!(f, "{} := ", solvable.display(self.pool))?;
if let Some(value) = self.map.value(id) {
writeln!(
f,
"{} (level: {})",
if value { "true " } else { "false" },
self.map.level(id)
)?;
} else {
writeln!(f, "<undecided>")?;
}
}
Ok(())
}
}
Loading

0 comments on commit 61ed341

Please sign in to comment.