Skip to content

Commit b6123cc

Browse files
authored
UPBGE: Use SoA for vertices in RAS_DisplayArray. (#715)
The principle of AoS (array of structure) is to store an element structure in a array e.g: struct Vertex { mt::vec3_packed pos; mt::vec3_packed nor; }; std::vector<Vertex> vertices; This kind of structure is very easy to use and totally tolerated for big structure or small amount of elements, also a reference or a pointer to a Vertex can be passed to any functions. Unfortunatly the memory cache usage is most of the time not efficient. Imagining a function modifying only the position of vertices, at each iteration the Vertex struct will be loaded in cache and only the member pos will be used, so 12 bytes on 24, the left 12 bytes will pollute the cache and cause more cache loads. If our function is modifying all the member of Vertex the cache isn't an issue. The opposite way SoA (structure of array) stores in a structure an array for each member, e.g: struct Vertices { std::vector<mt::vec3_packed> pos; std::vector<mt::vec3_packed> nor; }; With this method passing a vertex to an other function is quite complicated as it ends up passing the Vertices instance and the vertex index to sample pos and nor. But on the cache side, if we back to our function modifying the positions, this function will load cache pages of only position and not waste memory at loading other unused data. Also for the case of modifying positions and normals, the both array will be stored in different cache pages without performance decrease compared to AoS. For CPU using SoA is generally an improvement, for GPU too, excepted for old GPU which might prefer interleaved data but nothing really confirms it. Previously in UPBGE and BGE, the vertices were stored in AoS idiom, UPBGE make more complex the vertices by adding different vertex struct for each combination of UV and color layer. All was accessed through an interface RAS_Vertex which hold a pointer to a RAS_IVertexData, the base class of any vertex data of any format. In the same time RAS_IDisplayArray was an interface to RAS_DisplayArray<VertexData>. By using SoA, RAS_DisplayArray owns a VertexData struct with a list for all vertex members (position, normal, tangent, 8 uv and 8 color), depending on the format some UV and color array are left empty. In the same time function for getting and setting all the member data are added, these function takes a vertex index, UV/color index for uv and color, and for setter a value too. By this way BL_SkinDeformer update the position just by calling RAS_DisplayArray::SetPosition(i, pos) instead of getting the RAS_Vertex via RAS_DisplayArray::GetVertex(i) and calling RAS_Vertex::SetXYZ. With these modifications RAS_DisplayArray<>, RAS_VertexData<> are removed and RAS_BatchDisplayArray doesn't need anymore virtual inheritance. On the conversion side, without RAS_VertexData the structure BL_SharedVertexPredicate used to find similar vertices is now copying the normal, tangent, uv and color data inside. Once a vertex is unique, it is added to the display array through RAS_DisplayArray::AddVertex(pos, nor, tan, uvs, colors, origIndex, flag) which append the vertex data to m_vertexData and construct the vertex info. VBO don't try to re-interleave the data as the time cost is too expensive, instead each member are sent to the VBO one by one, this is proceeded in RAS_StorageVbo::CopyVertexData. An other advantage of SoA is allowing to update only one kind of data, if the positions are modified these data are just copied to the beginning of the VBO without touching of the other data. This technique is used in RAS_StorageVbo::CopyVertexData by checking a modification flag. OpenGL attributes (VAO) are changed too because of the new VBO layout. The RAS_VertexDataMemoryFormat is replaced by RAS_DisplayArrayLayout which is not constant after the display array creation as modifying the size of the array change the offset of each data type in the VBO. In consideration RAS_DisplayArray::GetLayout return a new RAS_DisplayArrayLayout with the proper offsets. To recreate the attributes, RAS_AttributeArray::Clear is called when detecting a size update in RAS_DisplayArrayBucket::UpdateActiveMeshSlots. To summarize the advantages of using SoA are the cache friendly load, the possibility to update only modified data in VBO and the simplification of storing multiple vertex formats. But the inconveniences could be some old GPU limitation and the recreation of the VAO at each display array size update in modifier deformers. This patch was tested with 3 files : The first file is 1600 cubes of 384 faces deformed by an armature. If the cube have only the default UV and color layer : Previous Current Animation 9.6 6.3 Rasterizer 17.5 10.8 With 8 UV and color layers : Previous Current Animation 17.5 6.7 Rasterizer 42.7 11.9 The second and third files are about modification of vertex position from python or rendering a mesh with a huge amount of vertices, both files didn't show a time difference.
1 parent 6cc0ea0 commit b6123cc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1087
-1474
lines changed

intern/mathfu/mathfu/internal/vector_2.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,8 @@ struct VectorPacked<T, 2> {
157157
/// @param vector Vector to create the VectorPacked from.
158158
explicit VectorPacked(const Vector<T, 2>& vector) { vector.Pack(this); }
159159

160+
explicit VectorPacked(const T * const s) :x(s[0]), y(s[1]) {}
161+
160162
/// Copy a Vector to a VectorPacked.
161163
///
162164
/// Both VectorPacked and Vector must have the same number of dimensions.
@@ -167,6 +169,14 @@ struct VectorPacked<T, 2> {
167169
return *this;
168170
}
169171

172+
inline const T& operator[](int i) const {
173+
return data[i];
174+
}
175+
176+
inline T& operator[](int i) {
177+
return data[i];
178+
}
179+
170180
#include "mathfu/internal/disable_warnings_begin.h"
171181
/// Elements of the packed vector one per dimension.
172182
union {

intern/mathfu/mathfu/internal/vector_3.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@ struct VectorPacked<T, 3> {
174174
/// @param vector Vector to create the VectorPacked from.
175175
explicit VectorPacked(const Vector<T, 3>& vector) { vector.Pack(this); }
176176

177+
explicit VectorPacked(const T * const s) :x(s[0]), y(s[1]), z(s[2]) {}
178+
177179
/// Copy a Vector to a VectorPacked.
178180
///
179181
/// Both VectorPacked and Vector must have the same number of dimensions.
@@ -184,6 +186,14 @@ struct VectorPacked<T, 3> {
184186
return *this;
185187
}
186188

189+
inline const T& operator[](int i) const {
190+
return data[i];
191+
}
192+
193+
inline T& operator[](int i) {
194+
return data[i];
195+
}
196+
187197
#include "mathfu/internal/disable_warnings_begin.h"
188198
/// Elements of the packed vector one per dimension.
189199
union {

intern/mathfu/mathfu/internal/vector_4.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,8 @@ struct VectorPacked<T, 4> {
184184
/// @param vector Vector to create the VectorPacked from.
185185
explicit VectorPacked(const Vector<T, 4>& vector) { vector.Pack(this); }
186186

187+
explicit VectorPacked(const T * const s) :x(s[0]), y(s[1]), z(s[2]), w(s[3]) {}
188+
187189
/// Copy a Vector to a VectorPacked.
188190
///
189191
/// Both VectorPacked and Vector must have the same number of dimensions.
@@ -194,6 +196,14 @@ struct VectorPacked<T, 4> {
194196
return *this;
195197
}
196198

199+
inline const T& operator[](int i) const {
200+
return data[i];
201+
}
202+
203+
inline T& operator[](int i) {
204+
return data[i];
205+
}
206+
197207
#include "mathfu/internal/disable_warnings_begin.h"
198208
/// Elements of the packed vector one per dimension.
199209
union {

intern/mathfu/mathfu/vector.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,8 @@ struct VectorPacked {
128128
/// @param vector Vector to create the VectorPacked from.
129129
explicit VectorPacked(const Vector<T, d>& vector) { vector.Pack(this); }
130130

131+
explicit VectorPacked(const T * const s) { MATHFU_VECTOR_OPERATION(data[i] = s[i]); }
132+
131133
/// Copy a Vector to a VectorPacked.
132134
///
133135
/// Both VectorPacked and Vector must have the same number of dimensions.
@@ -138,6 +140,14 @@ struct VectorPacked {
138140
return *this;
139141
}
140142

143+
inline const T& operator[](int i) const {
144+
return data[i];
145+
}
146+
147+
inline T& operator[](int i) {
148+
return data[i];
149+
}
150+
141151
/// Elements of the packed vector one per dimension.
142152
T data[d];
143153
};

source/gameengine/Converter/BL_BlenderDataConversion.cpp

Lines changed: 61 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@
6565
#include "RAS_ILightObject.h"
6666

6767
#include "RAS_ICanvas.h"
68-
#include "RAS_Vertex.h"
6968
#include "RAS_BucketManager.h"
7069
#include "RAS_BoundingBoxManager.h"
7170
#include "RAS_IPolygonMaterial.h"
@@ -179,7 +178,7 @@ extern Material defmaterial;
179178

180179
// For construction to find shared vertices.
181180
struct BL_SharedVertex {
182-
RAS_IDisplayArray *array;
181+
RAS_DisplayArray *array;
183182
unsigned int offset;
184183
};
185184

@@ -189,20 +188,59 @@ using BL_SharedVertexMap = std::vector<BL_SharedVertexList>;
189188
class BL_SharedVertexPredicate
190189
{
191190
private:
192-
RAS_Vertex m_vertex;
193-
RAS_IDisplayArray *m_array;
191+
RAS_DisplayArray *m_array;
192+
mt::vec3_packed m_normal;
193+
mt::vec4_packed m_tangent;
194+
mt::vec2_packed m_uvs[RAS_Texture::MaxUnits];
195+
unsigned int m_colors[RAS_Texture::MaxUnits];
194196

195197
public:
196-
BL_SharedVertexPredicate(RAS_Vertex vertex, RAS_IDisplayArray *array)
197-
:m_vertex(vertex),
198-
m_array(array)
198+
BL_SharedVertexPredicate(RAS_DisplayArray *array, const mt::vec3_packed& normal, const mt::vec4_packed& tangent, mt::vec2_packed uvs[], unsigned int colors[])
199+
:m_array(array),
200+
m_normal(normal),
201+
m_tangent(tangent)
199202
{
203+
const RAS_DisplayArray::Format& format = m_array->GetFormat();
204+
205+
for (unsigned short i = 0, size = format.uvSize; i < size; ++i) {
206+
m_uvs[i] = uvs[i];
207+
}
208+
209+
for (unsigned short i = 0, size = format.colorSize; i < size; ++i) {
210+
m_colors[i] = colors[i];
211+
}
200212
}
201213

202214
bool operator()(const BL_SharedVertex& sharedVert) const
203215
{
204-
RAS_IDisplayArray *otherArray = sharedVert.array;
205-
return (m_array == otherArray) && (otherArray->GetVertexNoCache(sharedVert.offset).CloseTo(m_vertex));
216+
RAS_DisplayArray *otherArray = sharedVert.array;
217+
if (m_array != otherArray) {
218+
return false;
219+
}
220+
221+
const unsigned int offset = sharedVert.offset;
222+
223+
static const float eps = FLT_EPSILON;
224+
if (!compare_v3v3(m_array->GetNormal(offset).data, m_normal.data, eps) ||
225+
!compare_v3v3(m_array->GetTangent(offset).data, m_tangent.data, eps))
226+
{
227+
return false;
228+
}
229+
230+
const RAS_DisplayArray::Format& format = m_array->GetFormat();
231+
for (unsigned short i = 0, size = format.uvSize; i < size; ++i) {
232+
if (!compare_v2v2(m_array->GetUv(offset, i).data, m_uvs[i].data, eps)) {
233+
return false;
234+
}
235+
}
236+
237+
for (unsigned short i = 0, size = format.colorSize; i < size; ++i) {
238+
if (m_array->GetRawColor(offset, i) != m_colors[i]) {
239+
return false;
240+
}
241+
}
242+
243+
return true;
206244
}
207245
};
208246

@@ -344,16 +382,17 @@ SCA_IInputDevice::SCA_EnumInputs BL_ConvertKeyCode(int key_code)
344382
}
345383

346384
static void BL_GetUvRgba(const RAS_Mesh::LayersInfo& layersInfo, std::vector<MLoopUV *>& uvLayers,
347-
std::vector<MLoopCol *>& colorLayers, unsigned int loop, float uvs[RAS_Texture::MaxUnits][2],
348-
unsigned int rgba[RAS_Vertex::MAX_UNIT])
385+
std::vector<MLoopCol *>& colorLayers, unsigned int loop, mt::vec2_packed uvs[RAS_Texture::MaxUnits],
386+
unsigned int rgba[RAS_Texture::MaxUnits])
349387
{
350388
// No need to initialize layers to zero as all the converted layer are all the layers needed.
351389

352390
for (const RAS_Mesh::Layer& layer : layersInfo.colorLayers) {
353391
const unsigned short index = layer.index;
354392
const MLoopCol& col = colorLayers[index][loop];
355393

356-
union Convert{
394+
union Convert
395+
{
357396
// Color isn't swapped in MLoopCol.
358397
MLoopCol col;
359398
unsigned int val;
@@ -367,15 +406,15 @@ static void BL_GetUvRgba(const RAS_Mesh::LayersInfo& layersInfo, std::vector<MLo
367406
for (const RAS_Mesh::Layer& layer : layersInfo.uvLayers) {
368407
const unsigned short index = layer.index;
369408
const MLoopUV& uv = uvLayers[index][loop];
370-
copy_v2_v2(uvs[index], uv.uv);
409+
uvs[index] = mt::vec2_packed(uv.uv);
371410
}
372411

373412
/* All vertices have at least one uv and color layer accessible to the user
374413
* even if it they are not used in any shaders. Initialize this layer to zero
375414
* when no uv or color layer exist.
376415
*/
377416
if (layersInfo.uvLayers.empty()) {
378-
zero_v2((uvs[0]));
417+
uvs[0] = mt::zero2;
379418
}
380419
if (layersInfo.colorLayers.empty()) {
381420
rgba[0] = 0xFFFFFFFF;
@@ -449,7 +488,7 @@ KX_Mesh *BL_ConvertMesh(Mesh *me, Object *blenderobj, KX_Scene *scene, BL_SceneC
449488
}
450489

451490
// Initialize vertex format with used uv and color layers.
452-
RAS_VertexFormat vertformat;
491+
RAS_DisplayArray::Format vertformat;
453492
vertformat.uvSize = max_ii(1, uvCount);
454493
vertformat.colorSize = max_ii(1, colorCount);
455494

@@ -535,7 +574,7 @@ void BL_ConvertDerivedMeshToArray(DerivedMesh *dm, Mesh *me, const std::vector<B
535574
const MPoly& mpoly = mpolys[i];
536575

537576
const BL_MeshMaterial& mat = mats[mpoly.mat_nr];
538-
RAS_IDisplayArray *array = mat.array;
577+
RAS_DisplayArray *array = mat.array;
539578

540579
// Mark face as flat, so vertices are split.
541580
const bool flat = (mpoly.flag & ME_SMOOTH) == 0;
@@ -548,33 +587,27 @@ void BL_ConvertDerivedMeshToArray(DerivedMesh *dm, Mesh *me, const std::vector<B
548587
const MVert& mvert = mverts[vertid];
549588

550589
static const float dummyTangent[4] = {0.0f, 0.0f, 0.0f, 0.0f};
551-
const float *tan = tangent ? tangent[j] : dummyTangent;
552-
553-
float uvs[RAS_Texture::MaxUnits][2];
590+
const mt::vec4_packed tan(tangent ? tangent[j] : dummyTangent);
591+
const mt::vec3_packed nor(normals[j]);
592+
const mt::vec3_packed pos(mvert.co);
593+
mt::vec2_packed uvs[RAS_Texture::MaxUnits];
554594
unsigned int rgba[RAS_Texture::MaxUnits];
555595

556596
BL_GetUvRgba(layersInfo, uvLayers, colorLayers, j, uvs, rgba);
557597

558-
RAS_Vertex vertex = array->CreateVertex(mvert.co, uvs, tan, rgba, normals[j]);
559-
560598
BL_SharedVertexList& sharedList = sharedMap[vertid];
561599
BL_SharedVertexList::iterator it = std::find_if(sharedList.begin(), sharedList.end(),
562-
BL_SharedVertexPredicate(vertex, array));
600+
BL_SharedVertexPredicate(array, nor, tan, uvs, rgba));
563601

564602
unsigned int offset;
565603
if (it != sharedList.end()) {
566604
offset = it->offset;
567605
}
568606
else {
569-
offset = array->AddVertex(vertex);
570-
const RAS_VertexInfo info(vertid, flat);
571-
array->AddVertexInfo(info);
607+
offset = array->AddVertex(pos, nor, tan, uvs, rgba, vertid, flat);
572608
sharedList.push_back({array, offset});
573609
}
574610

575-
// Destruct the vertex data as it is copied or unused.
576-
array->DeleteVertexData(vertex);
577-
578611
// Add tracked vertices by the mpoly.
579612
vertices[vertid] = offset;
580613
}

source/gameengine/Converter/BL_BlenderDataConversion.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ struct Object;
4747
struct Main;
4848

4949
struct BL_MeshMaterial {
50-
RAS_IDisplayArray *array;
50+
RAS_DisplayArray *array;
5151
RAS_MaterialBucket *bucket;
5252
bool visible;
5353
bool twoside;

source/gameengine/Converter/BL_MeshDeformer.cpp

Lines changed: 16 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -49,22 +49,21 @@
4949
#include <string>
5050
#include "BLI_math.h"
5151

52-
void BL_MeshDeformer::Apply(RAS_IDisplayArray *UNUSED(array))
52+
void BL_MeshDeformer::Apply(RAS_DisplayArray *UNUSED(array))
5353
{
5454
// only apply once per frame if the mesh is actually modified
5555
if (m_lastDeformUpdate != m_gameobj->GetLastFrame()) {
5656
// For each display array
5757
for (const DisplayArraySlot& slot : m_slots) {
58-
RAS_IDisplayArray *array = slot.m_displayArray;
58+
RAS_DisplayArray *array = slot.m_displayArray;
5959

6060
// For each vertex
6161
for (unsigned int i = 0, size = array->GetVertexCount(); i < size; ++i) {
62-
RAS_Vertex v = array->GetVertex(i);
6362
const RAS_VertexInfo& vinfo = array->GetVertexInfo(i);
64-
v.SetXYZ(m_bmesh->mvert[vinfo.GetOrigIndex()].co);
63+
array->SetPosition(i, mt::vec3_packed(m_bmesh->mvert[vinfo.GetOrigIndex()].co));
6564
}
6665

67-
array->NotifyUpdate(RAS_IDisplayArray::POSITION_MODIFIED);
66+
array->NotifyUpdate(RAS_DisplayArray::POSITION_MODIFIED);
6867
}
6968

7069
m_lastDeformUpdate = m_gameobj->GetLastFrame();
@@ -114,53 +113,49 @@ void BL_MeshDeformer::RecalcNormals()
114113
* since the GPU can do it faster */
115114

116115
/* set vertex normals to zero */
117-
for (std::array<float, 3>& normal : m_transnors) {
118-
normal = {{0.0f, 0.0f, 0.0f}};
119-
}
116+
std::fill(m_transnors.begin(), m_transnors.end(), mt::zero3);
120117

121118
for (const DisplayArraySlot& slot : m_slots) {
122-
RAS_IDisplayArray *array = slot.m_displayArray;
119+
RAS_DisplayArray *array = slot.m_displayArray;
123120
for (unsigned int i = 0, size = array->GetTriangleIndexCount(); i < size; i += 3) {
124-
const float *co[3];
121+
mt::vec3_packed co[3];
125122
bool flat = false;
126123

127124
for (unsigned short j = 0; j < 3; ++j) {
128125
const unsigned int index = array->GetTriangleIndex(i + j);
129126
const RAS_VertexInfo& vinfo = array->GetVertexInfo(index);
130127
const unsigned int origindex = vinfo.GetOrigIndex();
131128

132-
co[j] = m_transverts[origindex].data();
129+
co[j] = m_transverts[origindex];
133130
flat |= (vinfo.GetFlag() & RAS_VertexInfo::FLAT);
134131
}
135132

136-
float pnorm[3];
137-
normal_tri_v3(pnorm, co[0], co[1], co[2]);
133+
mt::vec3_packed pnorm;
134+
normal_tri_v3(pnorm.data, co[0].data, co[1].data, co[2].data);
138135

139136
for (unsigned short j = 0; j < 3; ++j) {
140137
const unsigned int index = array->GetTriangleIndex(i + j);
141138

142139
if (flat) {
143-
RAS_Vertex vert = array->GetVertex(index);
144-
vert.SetNormal(pnorm);
140+
array->SetNormal(index, pnorm);
145141
}
146142
else {
147143
const RAS_VertexInfo& vinfo = array->GetVertexInfo(index);
148144
const unsigned int origindex = vinfo.GetOrigIndex();
149-
add_v3_v3(m_transnors[origindex].data(), pnorm);
145+
add_v3_v3(m_transnors[origindex].data, pnorm.data);
150146
}
151147
}
152148
}
153149
}
154150

155151
// Assign smooth vertex normals.
156152
for (const DisplayArraySlot& slot : m_slots) {
157-
RAS_IDisplayArray *array = slot.m_displayArray;
153+
RAS_DisplayArray *array = slot.m_displayArray;
158154
for (unsigned int i = 0, size = array->GetVertexCount(); i < size; ++i) {
159-
RAS_Vertex v = array->GetVertex(i);
160155
const RAS_VertexInfo& vinfo = array->GetVertexInfo(i);
161156

162157
if (!(vinfo.GetFlag() & RAS_VertexInfo::FLAT)) {
163-
v.SetNormal(m_transnors[vinfo.GetOrigIndex()].data());
158+
array->SetNormal(i, m_transnors[vinfo.GetOrigIndex()]);
164159
}
165160
}
166161
}
@@ -176,8 +171,8 @@ void BL_MeshDeformer::VerifyStorage()
176171
}
177172

178173
for (unsigned int v = 0; v < totvert; ++v) {
179-
copy_v3_v3(m_transverts[v].data(), m_bmesh->mvert[v].co);
180-
normal_short_to_float_v3(m_transnors[v].data(), m_bmesh->mvert[v].no);
174+
copy_v3_v3(m_transverts[v].data, m_bmesh->mvert[v].co);
175+
normal_short_to_float_v3(m_transnors[v].data, m_bmesh->mvert[v].no);
181176
}
182177
}
183178

0 commit comments

Comments
 (0)