Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic results on different platforms #60

Open
lexaknyazev opened this issue Aug 31, 2019 · 10 comments
Open

Non-deterministic results on different platforms #60

lexaknyazev opened this issue Aug 31, 2019 · 10 comments
Labels
enhancement New feature or request

Comments

@lexaknyazev
Copy link
Contributor

In the generate_hierarchical_codebook_threaded function, an std::unordered_map is used for counting unique training vectors.

basis_universal/basisu_enc.h

Lines 1598 to 1608 in 6002320

bool generate_hierarchical_codebook_threaded(Quantizer& q,
uint32_t max_codebook_size, uint32_t max_parent_codebook_size,
std::vector<uint_vec>& codebook,
std::vector<uint_vec>& parent_codebook,
uint32_t max_threads, job_pool *pJob_pool)
{
typedef bit_hasher<typename Quantizer::training_vec_type> training_vec_bit_hasher;
typedef std::unordered_map < typename Quantizer::training_vec_type, weighted_block_group,
training_vec_bit_hasher> group_hash;
group_hash unique_vecs;

Iteration over the unordered map is implementation-dependent, so this code builds group_quant differently on MSVC and GCC/Clang.

basis_universal/basisu_enc.h

Lines 1634 to 1638 in 6002320

for (auto iter = unique_vecs.begin(); iter != unique_vecs.end(); ++iter)
{
group_quant.add_training_vec(iter->first, iter->second.m_total_weight);
unique_vec_iters.push_back(iter);
}

As a result, the encoder produces different selectors and codebooks on different platforms.

After switching group_hash to be an ordered map, files produced on different platforms are the same (likely with some performance cost):

diff --git a/basisu_enc.h b/basisu_enc.h
index 7d42121..7946c76 100644
--- a/basisu_enc.h
+++ b/basisu_enc.h
@@ -21,7 +21,7 @@
 #include <condition_variable>
 #include <functional>
 #include <thread>
-#include <unordered_map>
+#include <map>
 
 #ifndef _WIN32
 #include <libgen.h>
@@ -1602,8 +1602,7 @@ namespace basisu
 		uint32_t max_threads, job_pool *pJob_pool)
 	{
 		typedef bit_hasher<typename Quantizer::training_vec_type> training_vec_bit_hasher;
-		typedef std::unordered_map < typename Quantizer::training_vec_type, weighted_block_group, 
-			training_vec_bit_hasher> group_hash;
+		typedef std::map < typename Quantizer::training_vec_type, weighted_block_group> group_hash;
 		
 		group_hash unique_vecs;

Maybe, there is a better way to achieve deterministic results.

@turol
Copy link

turol commented Aug 31, 2019

Based on a quick look you could try this:

Split the loop on line 1634 into two. The first one adds to unique_vec_iters, the second to read from it instead of unique_vecs and adds to group_quant. Then add std::sort of unique_vec_iters between them to make it deterministic. If all Quantizer don't care about order you could do with just one loop and sort after. I don't want to go looking through all the code to establish that.

@MarkCallow
Copy link
Contributor

Is unordered map + sort definitely faster than an ordered map?

@turol
Copy link

turol commented Aug 31, 2019

Usually, in insert or lookup-heavy loads. In this case? I don't know. Try them and find out.

@richgel999
Copy link
Contributor

Okay, this is good work, thank you! I am finishing up the next milestone (ASTC, PVRTC alpha, etc.), and once it is done in 2-3 weeks I will work on this problem. I may just switch to my own fully deterministic containers.

@lexaknyazev
Copy link
Contributor Author

Here, each key is a vector with 6 float components.

In the worst case, comparison

inline bool operator<(const vec &rhs) const { for (uint32_t i = 0; i < N; i++) { if (m_v[i] < rhs.m_v[i]) return true; else if (m_v[i] != rhs.m_v[i]) return false; } return false; }

and equality
inline bool operator==(const vec &rhs) const { for (uint32_t i = 0; i < N; i++) if (m_v[i] != rhs.m_v[i]) return false; return true; }

operators involve FP operations for all 6 components.

For the unordered map, a simple byte-based hasher is used:

uint32_t hash_hsieh(const uint8_t* pBuf, size_t len);
template <typename Key>
struct bit_hasher
{
std::size_t operator()(const Key& k) const
{
return hash_hsieh(reinterpret_cast<const uint8_t *>(&k), sizeof(k));
}
};

Maybe, one more option is to use these hash values as comparison values for the ordered map to avoid FP operations.

@richgel999
Copy link
Contributor

Also note that I have not touched the encoder at all for this milestone, just the transcoders, so any work you do will be easily merged in.

@MarkCallow
Copy link
Contributor

I'm still getting non-deterministic results across systems with version 1.13 even though some custom containers have been added. All but one of my tests of ETC1S/BasisLZ encoding that pass on macOS fail on Windows and Linux. (I don't know if the Windows and Linux results are different from each other.) In one example the length of the compressed data is 1 byte less on Windows than macOS (47 vs. 48 bytes). The global dictionaries (selectors, endpoints, & tables) are the same size.

Curiously in the one case that succeeds on all platforms, the input file is a .pgm file. All the other files are .png. Also it only has one component.

@lexaknyazev
Copy link
Contributor Author

even though some custom containers have been added

The root cause seems to be still in place:

typedef std::unordered_map < typename Quantizer::training_vec_type, weighted_block_group,
training_vec_bit_hasher> group_hash;
group_hash unique_vecs;

@richgel999
Copy link
Contributor

We now have custom contains for vector and hash tables/sets. However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.

@MarkCallow
Copy link
Contributor

However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.

Yes there must be. I just ran the KTX-Software tests after integrating 1.16.3. ETC1S-encoded images on Windows and Linux are different from those on macOS. There were 2 fewer images different from macOS on Linux than on Windows so likely the Windows and Linux images are different from each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants