Non-deterministic results on different platforms #60

lexaknyazev · 2019-08-31T11:18:28Z

In the generate_hierarchical_codebook_threaded function, an std::unordered_map is used for counting unique training vectors.

basis_universal/basisu_enc.h

Lines 1598 to 1608 in 6002320

    
           bool generate_hierarchical_codebook_threaded(Quantizer& q, 
        
           	uint32_t max_codebook_size, uint32_t max_parent_codebook_size, 
        
           	std::vector<uint_vec>& codebook, 
        
           	std::vector<uint_vec>& parent_codebook, 
        
           	uint32_t max_threads, job_pool *pJob_pool) 
        
           { 
        
           	typedef bit_hasher<typename Quantizer::training_vec_type> training_vec_bit_hasher; 
        
           	typedef std::unordered_map < typename Quantizer::training_vec_type, weighted_block_group,  
        
           		training_vec_bit_hasher> group_hash; 
        
           	group_hash unique_vecs;

Iteration over the unordered map is implementation-dependent, so this code builds group_quant differently on MSVC and GCC/Clang.

basis_universal/basisu_enc.h

Lines 1634 to 1638 in 6002320

    
           for (auto iter = unique_vecs.begin(); iter != unique_vecs.end(); ++iter) 
        
           { 
        
           	group_quant.add_training_vec(iter->first, iter->second.m_total_weight); 
        
           	unique_vec_iters.push_back(iter); 
        
           }

As a result, the encoder produces different selectors and codebooks on different platforms.

After switching group_hash to be an ordered map, files produced on different platforms are the same (likely with some performance cost):

diff --git a/basisu_enc.h b/basisu_enc.h
index 7d42121..7946c76 100644
--- a/basisu_enc.h
+++ b/basisu_enc.h
@@ -21,7 +21,7 @@
 #include <condition_variable>
 #include <functional>
 #include <thread>
-#include <unordered_map>
+#include <map>
 
 #ifndef _WIN32
 #include <libgen.h>
@@ -1602,8 +1602,7 @@ namespace basisu
 		uint32_t max_threads, job_pool *pJob_pool)
 	{
 		typedef bit_hasher<typename Quantizer::training_vec_type> training_vec_bit_hasher;
-		typedef std::unordered_map < typename Quantizer::training_vec_type, weighted_block_group, 
-			training_vec_bit_hasher> group_hash;
+		typedef std::map < typename Quantizer::training_vec_type, weighted_block_group> group_hash;
 		
 		group_hash unique_vecs;

Maybe, there is a better way to achieve deterministic results.

The text was updated successfully, but these errors were encountered:

turol · 2019-08-31T13:04:38Z

Based on a quick look you could try this:

Split the loop on line 1634 into two. The first one adds to unique_vec_iters, the second to read from it instead of unique_vecs and adds to group_quant. Then add std::sort of unique_vec_iters between them to make it deterministic. If all Quantizer don't care about order you could do with just one loop and sort after. I don't want to go looking through all the code to establish that.

MarkCallow · 2019-08-31T18:51:23Z

Is unordered map + sort definitely faster than an ordered map?

turol · 2019-08-31T19:29:39Z

Usually, in insert or lookup-heavy loads. In this case? I don't know. Try them and find out.

richgel999 · 2019-08-31T19:41:59Z

Okay, this is good work, thank you! I am finishing up the next milestone (ASTC, PVRTC alpha, etc.), and once it is done in 2-3 weeks I will work on this problem. I may just switch to my own fully deterministic containers.

lexaknyazev · 2019-08-31T19:44:30Z

Here, each key is a vector with 6 float components.

In the worst case, comparison

basis_universal/basisu_enc.h

Line 112 in 6002320

    
           inline bool operator<(const vec &rhs) const { for (uint32_t i = 0; i < N; i++) { if (m_v[i] < rhs.m_v[i]) return true; else if (m_v[i] != rhs.m_v[i]) return false; } return false; }

and equality

basis_universal/basisu_enc.h

Line 111 in 6002320

    
           inline bool operator==(const vec &rhs) const { for (uint32_t i = 0; i < N; i++) if (m_v[i] != rhs.m_v[i]) return false;	return true; }

operators involve FP operations for all 6 components.

For the unordered map, a simple byte-based hasher is used:

basis_universal/basisu_enc.h

Lines 71 to 80 in 6002320

    
           uint32_t hash_hsieh(const uint8_t* pBuf, size_t len); 
        
           template <typename Key> 
        
           struct bit_hasher 
        
           { 
        
           	std::size_t operator()(const Key& k) const 
        
           	{ 
        
           		return hash_hsieh(reinterpret_cast<const uint8_t *>(&k), sizeof(k)); 
        
           	} 
        
           };

Maybe, one more option is to use these hash values as comparison values for the ordered map to avoid FP operations.

richgel999 · 2019-08-31T20:12:24Z

Also note that I have not touched the encoder at all for this milestone, just the transcoders, so any work you do will be easily merged in.

MarkCallow · 2021-04-02T09:29:23Z

I'm still getting non-deterministic results across systems with version 1.13 even though some custom containers have been added. All but one of my tests of ETC1S/BasisLZ encoding that pass on macOS fail on Windows and Linux. (I don't know if the Windows and Linux results are different from each other.) In one example the length of the compressed data is 1 byte less on Windows than macOS (47 vs. 48 bytes). The global dictionaries (selectors, endpoints, & tables) are the same size.

Curiously in the one case that succeeds on all platforms, the input file is a .pgm file. All the other files are .png. Also it only has one component.

lexaknyazev · 2021-04-02T13:56:56Z

even though some custom containers have been added

The root cause seems to be still in place:

basis_universal/encoder/basisu_enc.h

Lines 1843 to 1846 in 041ad47

    
           typedef std::unordered_map < typename Quantizer::training_vec_type, weighted_block_group,  
        
           	training_vec_bit_hasher> group_hash; 
        
           group_hash unique_vecs;

richgel999 · 2021-12-27T07:35:42Z

We now have custom contains for vector and hash tables/sets. However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.

MarkCallow · 2022-03-03T12:46:22Z

However, I believe there's still a std hash table in there in one place. Still leaving this up for more investigation.

Yes there must be. I just ran the KTX-Software tests after integrating 1.16.3. ETC1S-encoded images on Windows and Linux are different from those on macOS. There were 2 fewer images different from macOS on Linux than on Windows so likely the Windows and Linux images are different from each other.

See: BinomialLLC/basis_universal#60

richgel999 added the enhancement New feature or request label Dec 27, 2021

MarkCallow mentioned this issue Feb 16, 2022

Astc normal maps KhronosGroup/KTX-Software#493

Merged

MarkCallow mentioned this issue Nov 28, 2022

execute the basisu command, the content of the file generated by the same image is different on windows and macos #332

Open

aqnuep added a commit to KhronosGroup/KTX-Software-CTS that referenced this issue Mar 29, 2023

Allow output mismatches on MSVC builds for BasisLZ tests

b8436d3

See: BinomialLLC/basis_universal#60

MarkCallow mentioned this issue Mar 26, 2024

Properly convert Java char to C char in inputSwizzle KhronosGroup/KTX-Software#876

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-deterministic results on different platforms #60

Non-deterministic results on different platforms #60

lexaknyazev commented Aug 31, 2019

turol commented Aug 31, 2019

MarkCallow commented Aug 31, 2019

turol commented Aug 31, 2019

richgel999 commented Aug 31, 2019

lexaknyazev commented Aug 31, 2019

richgel999 commented Aug 31, 2019

MarkCallow commented Apr 2, 2021

lexaknyazev commented Apr 2, 2021

richgel999 commented Dec 27, 2021

MarkCallow commented Mar 3, 2022

Non-deterministic results on different platforms #60

Non-deterministic results on different platforms #60

Comments

lexaknyazev commented Aug 31, 2019

turol commented Aug 31, 2019

MarkCallow commented Aug 31, 2019

turol commented Aug 31, 2019

richgel999 commented Aug 31, 2019

lexaknyazev commented Aug 31, 2019

richgel999 commented Aug 31, 2019

MarkCallow commented Apr 2, 2021

lexaknyazev commented Apr 2, 2021

richgel999 commented Dec 27, 2021

MarkCallow commented Mar 3, 2022