Some integer scalar and vectored methods/overloads are not exposed in HW intrinsics API #10385

voinokin · 2018-05-27T16:27:50Z

Currently, the intrinsics are exposed for signed int -> signed int and for unsigned int -> signed int upconversion when both src and dest are 128-bit vector. But no such API method is directly exposed for unsigned int -> unsigned int, so one has to either use UnpackLow() (which is different operation), or wrap existing signed upconversion with StaticCast<signed, unsigned>() which may provide overhead (see Sub-optimal codegen when using Sse.StaticCast<TFrom,TTo>() with non-VEX encoded HW intrinsics #10357) .
The suggestion is to add following convenience methods:

Sse41.ConvertToVector128UInt16(Vector128(byte)) for [V]PMOVZXBW xmm, xmm (same insn as for existing ConvertToVector128Int16())
Sse41.ConvertToVector128UInt32(Vector128(byte/ushort)) for [V]PMOVZXBD/WD xmm, xmm (same insn as for existing ConvertToVector128Int32())
Sse41.ConvertToVector128UInt64(Vector128(byte/ushort/uint)) for [V]PMOVZXBQ/WQ/DQ xmm, xmm (same insn as for existing ConvertToVector128Int64())

Similarly, no direct means are exposed in API for unsigned int -> signed int upconversion when src is in 128-bit vector, and the dest is in 256-bit vector.
The methods being suggested are:

Avx2.ConvertToVector256Int16(Vector128(byte)) for VPMOVZXBW ymm, xmm (same insn as for existing ConvertToVector256UInt16())
Avx2.ConvertToVector256Int32(Vector128(byte/ushort)) for VPMOVZXBD/WD ymm, xmm (same insn as for existing ConvertToVector256UInt32())
Avx2.ConvertToVector256Int64(Vector128(byte/ushort/uint)) for VPMOVZXBQ/WQ/DQ ymm, xmm (same insn as for existing ConvertToVector256UInt64())

I was not able to find not-too-verbose method to convert 32/64-bit scalar value to 256-bit vector in YMM reg. It is possible to set 128-bit vector with Sse2.ConvertScalarToVector128UInt32/64() which produces (MOV r32/r64, imm + MOVD/MOVQ XMM, r32/r64), but then MOVDQA XMMd, XMMs is automatically issued when one attempts to use helper method Avx.ExtendToVector256() to get 256-bit vector. To my understanding the helper method was intended to be used as type conversion and produce no-op in such cases, since MOVD/MOVQ X/YMM, r32/r64 zeroes upper portion of dest reg. Below is an example of the issue I'm trying to explain:

var v = Avx.ExtendToVector256(Sse2.ConvertScalarToVector128UInt64(0x12345678UL));
00007FF989272625  mov         ecx,12345678h  
00007FF98927262A  vmovq       xmm0,rcx  
00007FF98927262F  vmovdqa     xmm6,xmm0  <======= this is not required

OTOH, the following conversion in reverse direction produces code that looks fine/optimal:

var v1 = Sse2.ConvertToUInt64(Avx.GetLowerHalf(Avx.SetZeroVector256<ulong>()));
00007FF989282618  vpxor       ymm0,ymm0,ymm0  
00007FF98928261D  vmovq       rsi,xmm0

There exists just one overload for Ssse3.AlignRight() that works on sbyte. I believe it makes sense to add overloads for other integer types, the same way as it was implemented for Sse2.ShiftRightLogical128BitLane() which is quite similar in operation. Otherwise developers will have to use type casting.

There exists just one overload for Ssse3.Shuffle() that works on sbyte. I believe it makes sense to add overload that will also work on byte.
Adding something mentioned in https://github.com/dotnet/coreclr/issues/18300#issuecomment-394772776

I believe these versions deserve their own overloads since conceptually they can be used on SSExx-only hardware which does not provide anything closer to implement "gather" and "scatter" operations (actually "scatter" only appears in AVX512, and anyway granularity is 32 or 64 bits IIRC):

PMOVZX/SX... xmm, [m] - these load from [m] and extend at once, a nice fusion. Esp. note the 2x 8-bit version.
PEXTRB/D/W + EXTRACTPS [m], xmm, i - spill single element from xmm to [m]
PINSRB/D/W + INSERTPS xmm, [m], i - merge single element from [m] into xmm. There is special issue open on API for INSERTPS ( HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector of 32-bit floats #10383 ).

category:testing
theme:intrinsics
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

RussKeldorph · 2018-05-30T23:42:20Z

@CarolEidt @fiigii @tannergooding @4creators @eerhardt Should this be discussed in corefx first?

tannergooding · 2018-05-31T00:04:26Z

@RussKeldorph, some of this (such as the inefficient codegen) should be resolved by a PR I am currently working on.

As for the new APIs, it is probably worth discussion during the next HWIntrinsic design review (I sent an e-mail on this a couple days ago).

tannergooding · 2018-06-05T07:02:58Z

Some of the inefficient codegen was cleaned up with dotnet/coreclr#18262

More of the remaining inefficient codegen will be cleaned up with dotnet/coreclr#18297

After dotnet/coreclr#18297, there are is a bit more work that involves special handling for various intrinsics, but we are getting closer.

voinokin · 2018-06-05T12:47:53Z

@tannergooding, thanks for the good news!
I wonder whether there will be any way to test these changes independently (by me) before first 2.2.0 preview will become available ?

saucecontrol · 2018-06-05T16:33:14Z

@voinokin you can use the SDK daily builds with the daily Intrinsics packages from myget.
https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/dogfooding.md

Or you can build the runtime from source if you want to be able to do JitDumps and the like
https://github.com/dotnet/coreclr/blob/master/Documentation/building/windows-instructions.md
https://github.com/dotnet/coreclr/blob/master/Documentation/workflow/UsingYourBuild.md

voinokin · 2018-06-05T16:56:56Z

@saucecontrol Thanks! I'll give it a try in some time.

voinokin · 2018-06-07T17:32:07Z

Adding these:

There exists just one overload for Ssse3.AlignRight() that works on sbyte. I believe it makes sense to add overloads for other integer types, the same way as it was implemented for Sse2.ShiftRightLogical128BitLane() which is quite similar in operation. Otherwise developers will have to use type casting.

There exists just one overload for Ssse3.Shuffle() that works on sbyte. I believe it makes sense to add overload that will also work on byte.

tannergooding · 2018-06-07T17:36:39Z

There exists just one overload for Ssse3.AlignRight() that works on sbyte. I believe it makes sense to add overloads for other integer types, the same way as it was implemented for Sse2.ShiftRightLogical128BitLane() which is quite similar in operation. Otherwise developers will have to use type casting.

We will want to be careful with this one, since the instruction explicitly operates on byte values.

There exists just one overload for Ssse3.Shuffl() that works on sbyte. I believe it makes sense to add overload that will also work on byte.

It definitely makes sense to ensure that both the signed and unsigned versions are exposed here.

voinokin · 2018-06-07T17:42:09Z

@tannergooding

We will want to be careful with this one, since the instruction explicitly operates on byte values.

It is the same for ShiftRightLogical128BitLane() - that's my point. BTW, the method appeared quite useful without typecasting :-).
I suggest to rename argument of AlignRight() which is now called mask (?) to smth like numBytes, again the same way as it's done for ShiftRightLogical128BitLane().

saucecontrol · 2018-06-07T17:53:55Z

We will want to be careful with this one, since the instruction explicitly operates on byte values.

Really, all the masked byte-shuffle instructions work on the minimum element size that can be represented by the mask, but they're not necessarily most-often used on that size.

I have an example here that uses AlignRight on ulong values. Something went funny in the codegen with all the casts I had to do, though. Even though I have a managed helper for that here, I couldn't use it in that context without speed taking a dive. It would be nice if we could have managed helper overloads for cases like that, but we'd have to be assured that they boil down to the same instruction during JIT. I don't know whether that managed helper should be part of the Intrinsics API or whether we should have to roll our own, but it'll be a common use-case no doubt.

voinokin · 2018-06-07T17:57:34Z

Something went funny in the codegen with all the casts I had to do, though. Even though I have a managed helper for that here, I couldn't use it in that context without speed taking a dive.

Try using ref modifier on your method's parameters - sometimes that helps when current JIT version inlines. My advice is - check the disasm after that, I've seen a lot of it for now... And logged some strange points here too.

saucecontrol · 2018-06-07T18:01:00Z

Ah yes, thanks. I do that in a lot of cases but didn't try it on those tiny cast helpers.

tannergooding · 2018-06-07T18:11:37Z

A good bit of the "bad codegen" was because we didn't support the ins reg, [mem] encodings for a good number of the intrinsics.

The latest builds out of master should have much better codegen and the last of the non load/store intrinsics should support containment with: dotnet/coreclr#18349.

There is, of course, still some more work to be done, but hopefully you will see much better results.

tannergooding · 2018-06-07T18:12:06Z

(StaticCast itself does still need a change, but I will be working on that next).

voinokin · 2018-06-08T20:05:11Z

Not sure it is proper place here to discuss API that is already defined (?)... Anyway, I stumbled across this with the names of API methods that load and store smth.

Some background:

the overloads to move (parts of) vectors FROM memory are mostly called LoadScalarVectorNNN(), LoadVectorNNN() / LoadAlignedVectorNNN() / LoadDquVectorNNN() / LoadAlignedVectorNNNNonTemporal().
the overloads to move vectors TO memory are mostly called: Store() / StoreAligned() / StoreAlignedNonTemporal().

Here are my points:

My feeling is that there could be better wording for ...Dqu... overloads.
There is no word "Vector" in methods that do store entire vectors; it's similar to partial store operations. It looks ok though, until you find out that there exist overloads of StoreNonTemporal() which do not operate on vectors taking GP reg as input (MOVNTI [m], reg) - may it happen that these specific ones need some more clear names ?
There exist LoadScalarVectorNNN(int/uint/long/ulong/float/double) overloads to load just 1 element according to data type and then zero remaining part of vector. OTOH, the methods intended for opposite operations have names split: StoreScalar(float/double) (MOVSS/MOVSD [m], xmm), StoreLow(long/ulong) (MOVQ [m], xmm which in fact stores one 64-bit element call it scalar), and I found no method exposed to store 32-bit ints as scalars (MOVD [m], xmm).

voinokin · 2018-06-11T23:35:27Z

Adding something mentioned in https://github.com/dotnet/coreclr/issues/18300#issuecomment-394772776

I believe these versions deserve their own overloads since conceptually they can be used on SSExx-only hardware which does not provide anything closer to implement "gather" and "scatter" operations (actually "scatter" only appears in AVX512, and anyway granularity is 32 or 64 bits IIRC):

PMOVZX/SX... xmm, [m] - these load from [m] and extend at once, a nice fusion. Esp. note the 2x 8-bit version.
PEXTRB/D/W + EXTRACTPS [m], xmm, i - spill single element from xmm to [m]
PINSRB/D/W + INSERTPS xmm, [m], i - merge single element from [m] into xmm. There is special issue open on API for INSERTPS ( HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector of 32-bit floats #10383 ).

fiigii · 2018-06-12T00:14:01Z

My feeling is that there could be better wording for ...Dqu... overloads.

Ah, actually I tried to find a better name for these guys but I thought there is no single word can explain the semantics very well, so just followed C++ names... Do you have suggestions?

There is no word "Vector" in methods that do store entire vectors; it's similar to partial store operations.

Store* intrinsics take source as a parameter, so the different vector length can be resolved by the overload system, which makes the API simpler.

OTOH, the methods intended for opposite operations have names split: StoreScalar(float/double) (MOVSS/MOVSD [m], xmm), StoreLow(long/ulong) (MOVQ [m], xmm which in fact stores one 64-bit element call it scalar),

The Scalar suffix only makes sence with floating point types (float and double) because x86/x64 architectures execute floating point computation via SIMD (SSE2) units.

fiigii · 2018-06-12T00:22:20Z

and I found no method exposed to store 32-bit ints as scalars (MOVD [m], xmm).

I believe these versions deserve their own overloads since conceptually they can be used on SSExx-only

In the current design, we are avoiding exposing "memory-access encoding" as much as possible, and we plan to generate these encodes via containment optimization (i.e. merging ins(load(address)) or store(address, ins(...)) in a single instruction). I think it also works for scalar type containment (i.e., folding a[i] = Sse2.ConvertToInt32(v) to MOVD [m], xmm).

voinokin · 2018-06-12T07:42:51Z

My feeling is that there could be better wording for ...Dqu... overloads.

Ah, actually I tried to find a better name for these guys but I thought there is no single word can explain the semantics very well, so just followed C++ names... Do you have suggestions?

For [V]LDDQU I suggest to use LoadUnalignedVectorNNN - explicitly stating that the operation is intended as special case of unaligned loads. (I believe it behaves just like MOVDQU on current CPUs though.)

In the current design, we are avoiding exposing "memory-access encoding" as much as possible, and we plan to generate these encodes via containment optimization (i.e. merging ins(load(address)) or store(address, ins(...)) in a single instruction). I think it also works for scalar type containment (i.e., folding a[i] = Sse2.ConvertToInt32(v) to MOVD [m], xmm).

Given this will work, still some unclarities remain with API:

128-bit version of [V]PMOVZX/SXBQ xmm, [m16] - there is no single operation to assign-extend 2x8-bit values to vector to my knowledge other that this. All other versions of [V]PMOVZX/SX... take at least 32 bits which will be achievable thru containment support with some typecasting + LoadVectorNNN() or ConvertScalarToVectorXXX() , but this specific version only takes 16 bits on input.
[UPD] Avx.Broadcast(short*/ushort*) (VPBROADCASTW xmm/ymm, [m16]) does also exists, but this is about different operation and AVX2.
For MOVD/Q xmm, [m] - LoadScalarVector128(int/uint/long/ulong) do already exist, why not removing them in favor of Sse2.ConvertScalarToVector128[U]Int32/64(indir) ? It looks asymmetrical for 32-bit ints for now - there is direct load operation exposed, but not the corresponding direct store operation.
For INSERTPS xmm, [m32], i - related HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector of 32-bit floats #10383, supposedly fixed by Improve Intel hardware intrinsic APIs coreclr#17637. I can't see the final version, but if the existing overload taking just one scalar value will be replaced with the overload taking the vector, then the subject instruction encoding will become unavailable (I had no understanding of containment being introduced when I was logging HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector of 32-bit floats #10383).

fiigii · 2018-06-12T08:28:04Z

For [V]LDDQU I suggest to use LoadUnalignedVectorNNN

LoadUnaligned* is not enough to express lddqu semantics. In x86 SIMD programming, unaligned is usually related to instructions like movups, movdqu, etc. So, that may be confusing...

voinokin · 2018-06-12T08:36:46Z

For [V]LDDQU I suggest to use LoadUnalignedVectorNNN

LoadUnaligned* is not enough to express lddqu semantics. In x86 SIMD programming, unaligned is usually related to instructions like movups, movdqu, etc. So, that may be confusing...

Then, the last remaining idea from me would be to extend existing LoadVector128/256(type* ptr) overloads with optional parameter so that it would become LoadVector128/256(type* ptr, bool forceUnaligned=false). Looks a bit ugly though....

mikedn · 2018-06-12T08:40:13Z

Perhaps LoadUnalignedSplit*. Though I tend to think that coming up with such fancy names for already established instructions/intrinsics does more harm than good. And LDDQU is kind of useless these days...

tannergooding · 2018-12-10T21:13:09Z

I think we've addressed some of this already. Could the original post be updated with anything still relevant or the issue be otherwise closed?

fiigii · 2018-12-10T21:16:27Z

I think we've addressed some of this already

Right, I think we can close this issue and open a new issue for "folding store".

voinokin · 2018-12-10T23:03:20Z

I think we've addressed some of this already. Could the original post be updated with anything still relevant or the issue be otherwise closed?

Tell me which issues remain and I will update the original post. Thanks.

AndyAyersMS · 2019-03-12T17:43:01Z

@tannergooding can you help get this sorted out? Hoping there is no work left here.
cc @CarolEidt

tannergooding · 2019-03-12T21:07:34Z

No.1 and No.2 haven't been resolved and need a separate proposal addressing them logged against CoreFX and in the recommended format (https://github.com/dotnet/corefx/issues/35768 tracks some of the issues raised).
The primary issue here is that PMOV* has both sign-extending and zero-extending versions. We need to ensure these take types and are exposed in a mechanism that is familiar to existing .NET users.

No.3 is meant to be covered by the 128-bit conversion and then a widening conversion to 256-bit via the ToVector256 or ToVector256Unsafe method.

No.4 and No.5 have bneen resolved.

For No.6, No.7, and No.8, we aren't currently looking at providing helper methods like these.

voinokin · 2019-03-12T21:39:43Z

PMOVZX/SX... xmm, [m] - these load from [m] and extend at once, a nice fusion. Esp. note the 2x 8-bit version.

Regarding No. 6 - my point is it's not helper method, but rather a separate operation which loads values and extends them to 16/32/64 bits. This can currently be replaced with several ops using typecasting:

16-bit operands: Load [m16x2] from pointer cast to u32*/i32* as scalar into vector + reverse typecasting + Z/S-extend
32-bit operands: Load [m32x2] from pointer cast to u64*/i64* as scalar into vector + reverse typecasting + Z/S-extend
But in case of 8-bit operands I can't see anything shorter than this: Load [m8x2] from pointer cast to u16*/i16* to GPR16 + extend to GPR32 or GPR64 + load value to vector + reverse typecasting + Z/S-extend. To my memory, 5-6 months ago when I last checked, JIT was not emitting MOVZX/SX GPR, [m8/16/32] which would compress two first instructions to just one. Even if this feature is implemented now, this would leave 3 instructions (MOVZX + MOVQ + PMOVZX/SX) instead of just one. If it's not - we have 4 instructions.

My use cases for 8-bit version are decoding stream of compressed bytes.

tannergooding · 2019-03-12T21:46:33Z

my point is it's not helper method, but rather a separate operation which loads values and extends them to 16/32/64 bits.

Might be misunderstanding, but this isn't a singular hardware instruction; so it would be classified as a helper (it is implemented in terms of the actual intrinsics) rather than being an actual hardware intrinsic itself.

Given that it isn't a singular hardware instruction, and it isn't considered one of the "core" operations (which is basically just creating a vector and accessing individual elements), it likely wouldn't be considered at this point (users should be able to provide their own implementation in the interim).

saucecontrol · 2019-03-12T21:56:05Z

a separate operation which loads values and extends them to 16/32/64 bits

That should be just the xmm/mem encoded versions of PMOV[ZS]X[BWD][WDQ]. These are already handled by containment, but the correct mem overloads are addressed in https://github.com/dotnet/corefx/issues/35768

voinokin · 2019-03-12T21:58:47Z

Might be misunderstanding, but this isn't a singular hardware instruction; so it would be classified as a helper (it is implemented in terms of the actual intrinsics) rather than being an actual hardware intrinsic itself. Given that it isn't a singular hardware instruction, and it isn't considered one of the "core" operations (which is basically just creating a vector and accessing individual elements), it likely wouldn't be considered at this point (users should be able to provide their own implementation in the interim).

Here you have it (sorry, found no better way for now):
https://gcc.godbolt.org/z/TglHbD

Also, check 3rd form from the top https://www.felixcloutier.com/x86/pmovzx
66 0f 38 32 /r --- PMOVZXBQ xmm1, xmm2/m16 --- Zero extend 2 packed 8-bit integers in the low 2 bytes of xmm2/m16 to 2 packed 64-bit integers in xmm1.

I mean, IT IS singular hardware instruction.

tannergooding · 2019-03-12T22:02:00Z

PMOVZXBQ is covered by ConvertToVector128Int64, the encoding of the memory operand is being tracked by https://github.com/dotnet/corefx/issues/35768.

voinokin · 2019-03-12T22:15:55Z

I confirm - dotnet/corefx#35768 covers my understanding expressed in item No. 6
It's nice we will have these APIs implemented :-)

RussKeldorph · 2019-03-21T16:12:51Z

@tannergooding Can this be closed in favor of other issues? If there is remaining work here, could you open separate issues to make it very clear what work remains for 3.0?

tannergooding · 2019-03-21T16:15:48Z

Yes, I think this could be closed as I believe all issues are either resolved or tracked by other existing issues.

@voinokin, feel free to clarify if you don't believe that is the case.

damageboy · 2019-06-06T09:13:01Z

I know this issue is officially close, and I'm late to the show, but I am a bit confused by the current state of preview5...:

All issues seem to be resolved, all PRs merged, yet PMOVZXB{D,Q} and other do not seem to be generated and the current master branch show this unwelcoming comment:

https://github.com/dotnet/coreclr/blob/1a495118c005b9a5409c81fea1813bd2b3044cbd/src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Avx2.cs#L750-L755

Which seems to imply it isn't really supported at this stage...

CarolEidt · 2019-06-06T14:57:31Z

Is it this part that's confusing: "The native signature does not exist."?

If so, that just means that there's no corresponding native (C++) intrinsic. You notice that for other intrinsics the equivalent C++ intrinsic is shown in addition to the target instruction, for example, a little further down we have:

        /// <summary>
        /// __m128i _mm256_extracti128_si256 (__m256i a, const int imm8)
        ///   VEXTRACTI128 xmm, ymm, imm8
        /// </summary>
        public new static Vector128<sbyte> ExtractVector128(Vector256<sbyte> value, byte index) => ExtractVector128(value, index);

The second line is the native (C++) intrinsic.

voinokin changed the title ~~Some integer scalar and vectored conversion methods are not exposed in HW intrinsics API~~ Some integer scalar and vectored methods are not exposed in HW intrinsics API Jun 7, 2018

voinokin changed the title ~~Some integer scalar and vectored methods are not exposed in HW intrinsics API~~ Some integer scalar and vectored methods/overloads are not exposed in HW intrinsics API Jun 7, 2018

tannergooding closed this as completed Mar 21, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 16, 2020

Some integer scalar and vectored methods/overloads are not exposed in HW intrinsics API #10385

Some integer scalar and vectored methods/overloads are not exposed in HW intrinsics API #10385

Comments

voinokin commented May 27, 2018

RussKeldorph commented May 30, 2018

tannergooding commented May 31, 2018

tannergooding commented Jun 5, 2018

voinokin commented Jun 5, 2018

saucecontrol commented Jun 5, 2018 • edited Loading

voinokin commented Jun 5, 2018 • edited Loading

voinokin commented Jun 7, 2018 • edited Loading

tannergooding commented Jun 7, 2018

voinokin commented Jun 7, 2018

saucecontrol commented Jun 7, 2018

voinokin commented Jun 7, 2018 • edited Loading

saucecontrol commented Jun 7, 2018

tannergooding commented Jun 7, 2018

tannergooding commented Jun 7, 2018

voinokin commented Jun 8, 2018 • edited Loading

voinokin commented Jun 11, 2018

fiigii commented Jun 12, 2018

fiigii commented Jun 12, 2018 • edited Loading

voinokin commented Jun 12, 2018 • edited Loading

fiigii commented Jun 12, 2018

voinokin commented Jun 12, 2018 • edited Loading

mikedn commented Jun 12, 2018

tannergooding commented Dec 10, 2018

fiigii commented Dec 10, 2018

voinokin commented Dec 10, 2018

AndyAyersMS commented Mar 12, 2019

tannergooding commented Mar 12, 2019

voinokin commented Mar 12, 2019 • edited Loading

tannergooding commented Mar 12, 2019

saucecontrol commented Mar 12, 2019

voinokin commented Mar 12, 2019 • edited Loading

tannergooding commented Mar 12, 2019

voinokin commented Mar 12, 2019

RussKeldorph commented Mar 21, 2019

tannergooding commented Mar 21, 2019

damageboy commented Jun 6, 2019

CarolEidt commented Jun 6, 2019

saucecontrol commented Jun 5, 2018 •

edited

Loading

voinokin commented Jun 5, 2018 •

edited

Loading

voinokin commented Jun 7, 2018 •

edited

Loading

voinokin commented Jun 7, 2018 •

edited

Loading

voinokin commented Jun 8, 2018 •

edited

Loading

fiigii commented Jun 12, 2018 •

edited

Loading

voinokin commented Jun 12, 2018 •

edited

Loading

voinokin commented Jun 12, 2018 •

edited

Loading

voinokin commented Mar 12, 2019 •

edited

Loading

voinokin commented Mar 12, 2019 •

edited

Loading