Skip to content

Commit

Permalink
UPBGE: Use SoA for vertices in RAS_DisplayArray. (#715)
Browse files Browse the repository at this point in the history
The principle of AoS (array of structure) is to store an element structure in a
array e.g:

struct Vertex
{
	mt::vec3_packed pos;
	mt::vec3_packed nor;
};

std::vector<Vertex> vertices;

This kind of structure is very easy to use and totally tolerated for big
structure or small amount of elements, also a reference or a pointer to a Vertex
can be passed to any functions.
Unfortunatly the memory cache usage is most of the time not efficient. Imagining
a function modifying only the position of vertices, at each iteration the Vertex
struct will be loaded in cache and only the member pos will be used, so 12 bytes
on 24, the left 12 bytes will pollute the cache and cause more cache loads. If
our function is modifying all the member of Vertex the cache isn't an issue.

The opposite way SoA (structure of array) stores in a structure an array for
each member, e.g:

struct Vertices
{
	std::vector<mt::vec3_packed> pos;
	std::vector<mt::vec3_packed> nor;
};

With this method passing a vertex to an other function is quite complicated as
it ends up passing the Vertices instance and the vertex index to sample pos and
nor. But on the cache side, if we back to our function modifying the positions,
this function will load cache pages of only position and not waste memory at
loading other unused data. Also for the case of modifying positions and normals,
the both array will be stored in different cache pages without performance
decrease compared to AoS.

For CPU using SoA is generally an improvement, for GPU too, excepted for old GPU
which might prefer interleaved data but nothing really confirms it.

Previously in UPBGE and BGE, the vertices were stored in AoS idiom, UPBGE make
more complex the vertices by adding different vertex struct for each combination
of UV and color layer. All was accessed through an interface RAS_Vertex which
hold a pointer to a RAS_IVertexData, the base class of any vertex data of any
format. In the same time RAS_IDisplayArray was an interface to
RAS_DisplayArray<VertexData>.
By using SoA, RAS_DisplayArray owns a VertexData struct with a list for all
vertex members (position, normal, tangent, 8 uv and 8 color), depending on the
format some UV and color array are left empty. In the same time function for
getting and setting all the member data are added, these function takes a vertex
index, UV/color index for uv and color, and for setter a value too.
By this way BL_SkinDeformer update the position just by calling
RAS_DisplayArray::SetPosition(i, pos) instead of getting the RAS_Vertex via
RAS_DisplayArray::GetVertex(i) and calling RAS_Vertex::SetXYZ.

With these modifications RAS_DisplayArray<>, RAS_VertexData<> are removed and
RAS_BatchDisplayArray doesn't need anymore virtual inheritance.

On the conversion side, without RAS_VertexData the structure
BL_SharedVertexPredicate used to find similar vertices is now copying the
normal, tangent, uv and color data inside. Once a vertex is unique, it is added
to the display array through RAS_DisplayArray::AddVertex(pos, nor, tan, uvs,
colors, origIndex, flag) which append the vertex data to m_vertexData and
construct the vertex info.

VBO don't try to re-interleave the data as the time cost is too expensive,
instead each member are sent to the VBO one by one, this is proceeded in
RAS_StorageVbo::CopyVertexData. An other advantage of SoA is allowing to update
only one kind of data, if the positions are modified these data are just copied
to the beginning of the VBO without touching of the other data. This technique
is used in RAS_StorageVbo::CopyVertexData by checking a modification flag.
OpenGL attributes (VAO) are changed too because of the new VBO layout. The
RAS_VertexDataMemoryFormat is replaced by RAS_DisplayArrayLayout which is not
constant after the display array creation as modifying the size of the array
change the offset of each data type in the VBO. In consideration
RAS_DisplayArray::GetLayout return a new RAS_DisplayArrayLayout with the proper
offsets. To recreate the attributes, RAS_AttributeArray::Clear is called when
detecting a size update in RAS_DisplayArrayBucket::UpdateActiveMeshSlots.

To summarize the advantages of using SoA are the cache friendly load, the
possibility to update only modified data in VBO and the simplification of
storing multiple vertex formats. But the inconveniences could be some old GPU
limitation and the recreation of the VAO at each display array size update in
modifier deformers.

This patch was tested with 3 files :

The first file is 1600 cubes of 384 faces deformed by an armature.
If the cube have only the default UV and color layer :
			Previous	Current
Animation	9.6			6.3
Rasterizer	17.5		10.8

With 8 UV and color layers :
			Previous	Current
Animation	17.5		6.7
Rasterizer	42.7		11.9

The second and third files are about modification of vertex position from python
or rendering a mesh with a huge amount of vertices, both files didn't show a
time difference.
  • Loading branch information
panzergame authored Jun 22, 2018
1 parent 6cc0ea0 commit b6123cc
Show file tree
Hide file tree
Showing 63 changed files with 1,087 additions and 1,474 deletions.
10 changes: 10 additions & 0 deletions intern/mathfu/mathfu/internal/vector_2.h
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ struct VectorPacked<T, 2> {
/// @param vector Vector to create the VectorPacked from.
explicit VectorPacked(const Vector<T, 2>& vector) { vector.Pack(this); }

explicit VectorPacked(const T * const s) :x(s[0]), y(s[1]) {}

/// Copy a Vector to a VectorPacked.
///
/// Both VectorPacked and Vector must have the same number of dimensions.
Expand All @@ -167,6 +169,14 @@ struct VectorPacked<T, 2> {
return *this;
}

inline const T& operator[](int i) const {
return data[i];
}

inline T& operator[](int i) {
return data[i];
}

#include "mathfu/internal/disable_warnings_begin.h"
/// Elements of the packed vector one per dimension.
union {
Expand Down
10 changes: 10 additions & 0 deletions intern/mathfu/mathfu/internal/vector_3.h
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,8 @@ struct VectorPacked<T, 3> {
/// @param vector Vector to create the VectorPacked from.
explicit VectorPacked(const Vector<T, 3>& vector) { vector.Pack(this); }

explicit VectorPacked(const T * const s) :x(s[0]), y(s[1]), z(s[2]) {}

/// Copy a Vector to a VectorPacked.
///
/// Both VectorPacked and Vector must have the same number of dimensions.
Expand All @@ -184,6 +186,14 @@ struct VectorPacked<T, 3> {
return *this;
}

inline const T& operator[](int i) const {
return data[i];
}

inline T& operator[](int i) {
return data[i];
}

#include "mathfu/internal/disable_warnings_begin.h"
/// Elements of the packed vector one per dimension.
union {
Expand Down
10 changes: 10 additions & 0 deletions intern/mathfu/mathfu/internal/vector_4.h
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ struct VectorPacked<T, 4> {
/// @param vector Vector to create the VectorPacked from.
explicit VectorPacked(const Vector<T, 4>& vector) { vector.Pack(this); }

explicit VectorPacked(const T * const s) :x(s[0]), y(s[1]), z(s[2]), w(s[3]) {}

/// Copy a Vector to a VectorPacked.
///
/// Both VectorPacked and Vector must have the same number of dimensions.
Expand All @@ -194,6 +196,14 @@ struct VectorPacked<T, 4> {
return *this;
}

inline const T& operator[](int i) const {
return data[i];
}

inline T& operator[](int i) {
return data[i];
}

#include "mathfu/internal/disable_warnings_begin.h"
/// Elements of the packed vector one per dimension.
union {
Expand Down
10 changes: 10 additions & 0 deletions intern/mathfu/mathfu/vector.h
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ struct VectorPacked {
/// @param vector Vector to create the VectorPacked from.
explicit VectorPacked(const Vector<T, d>& vector) { vector.Pack(this); }

explicit VectorPacked(const T * const s) { MATHFU_VECTOR_OPERATION(data[i] = s[i]); }

/// Copy a Vector to a VectorPacked.
///
/// Both VectorPacked and Vector must have the same number of dimensions.
Expand All @@ -138,6 +140,14 @@ struct VectorPacked {
return *this;
}

inline const T& operator[](int i) const {
return data[i];
}

inline T& operator[](int i) {
return data[i];
}

/// Elements of the packed vector one per dimension.
T data[d];
};
Expand Down
89 changes: 61 additions & 28 deletions source/gameengine/Converter/BL_BlenderDataConversion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@
#include "RAS_ILightObject.h"

#include "RAS_ICanvas.h"
#include "RAS_Vertex.h"
#include "RAS_BucketManager.h"
#include "RAS_BoundingBoxManager.h"
#include "RAS_IPolygonMaterial.h"
Expand Down Expand Up @@ -179,7 +178,7 @@ extern Material defmaterial;

// For construction to find shared vertices.
struct BL_SharedVertex {
RAS_IDisplayArray *array;
RAS_DisplayArray *array;
unsigned int offset;
};

Expand All @@ -189,20 +188,59 @@ using BL_SharedVertexMap = std::vector<BL_SharedVertexList>;
class BL_SharedVertexPredicate
{
private:
RAS_Vertex m_vertex;
RAS_IDisplayArray *m_array;
RAS_DisplayArray *m_array;
mt::vec3_packed m_normal;
mt::vec4_packed m_tangent;
mt::vec2_packed m_uvs[RAS_Texture::MaxUnits];
unsigned int m_colors[RAS_Texture::MaxUnits];

public:
BL_SharedVertexPredicate(RAS_Vertex vertex, RAS_IDisplayArray *array)
:m_vertex(vertex),
m_array(array)
BL_SharedVertexPredicate(RAS_DisplayArray *array, const mt::vec3_packed& normal, const mt::vec4_packed& tangent, mt::vec2_packed uvs[], unsigned int colors[])
:m_array(array),
m_normal(normal),
m_tangent(tangent)
{
const RAS_DisplayArray::Format& format = m_array->GetFormat();

for (unsigned short i = 0, size = format.uvSize; i < size; ++i) {
m_uvs[i] = uvs[i];
}

for (unsigned short i = 0, size = format.colorSize; i < size; ++i) {
m_colors[i] = colors[i];
}
}

bool operator()(const BL_SharedVertex& sharedVert) const
{
RAS_IDisplayArray *otherArray = sharedVert.array;
return (m_array == otherArray) && (otherArray->GetVertexNoCache(sharedVert.offset).CloseTo(m_vertex));
RAS_DisplayArray *otherArray = sharedVert.array;
if (m_array != otherArray) {
return false;
}

const unsigned int offset = sharedVert.offset;

static const float eps = FLT_EPSILON;
if (!compare_v3v3(m_array->GetNormal(offset).data, m_normal.data, eps) ||
!compare_v3v3(m_array->GetTangent(offset).data, m_tangent.data, eps))
{
return false;
}

const RAS_DisplayArray::Format& format = m_array->GetFormat();
for (unsigned short i = 0, size = format.uvSize; i < size; ++i) {
if (!compare_v2v2(m_array->GetUv(offset, i).data, m_uvs[i].data, eps)) {
return false;
}
}

for (unsigned short i = 0, size = format.colorSize; i < size; ++i) {
if (m_array->GetRawColor(offset, i) != m_colors[i]) {
return false;
}
}

return true;
}
};

Expand Down Expand Up @@ -344,16 +382,17 @@ SCA_IInputDevice::SCA_EnumInputs BL_ConvertKeyCode(int key_code)
}

static void BL_GetUvRgba(const RAS_Mesh::LayersInfo& layersInfo, std::vector<MLoopUV *>& uvLayers,
std::vector<MLoopCol *>& colorLayers, unsigned int loop, float uvs[RAS_Texture::MaxUnits][2],
unsigned int rgba[RAS_Vertex::MAX_UNIT])
std::vector<MLoopCol *>& colorLayers, unsigned int loop, mt::vec2_packed uvs[RAS_Texture::MaxUnits],
unsigned int rgba[RAS_Texture::MaxUnits])
{
// No need to initialize layers to zero as all the converted layer are all the layers needed.

for (const RAS_Mesh::Layer& layer : layersInfo.colorLayers) {
const unsigned short index = layer.index;
const MLoopCol& col = colorLayers[index][loop];

union Convert{
union Convert
{
// Color isn't swapped in MLoopCol.
MLoopCol col;
unsigned int val;
Expand All @@ -367,15 +406,15 @@ static void BL_GetUvRgba(const RAS_Mesh::LayersInfo& layersInfo, std::vector<MLo
for (const RAS_Mesh::Layer& layer : layersInfo.uvLayers) {
const unsigned short index = layer.index;
const MLoopUV& uv = uvLayers[index][loop];
copy_v2_v2(uvs[index], uv.uv);
uvs[index] = mt::vec2_packed(uv.uv);
}

/* All vertices have at least one uv and color layer accessible to the user
* even if it they are not used in any shaders. Initialize this layer to zero
* when no uv or color layer exist.
*/
if (layersInfo.uvLayers.empty()) {
zero_v2((uvs[0]));
uvs[0] = mt::zero2;
}
if (layersInfo.colorLayers.empty()) {
rgba[0] = 0xFFFFFFFF;
Expand Down Expand Up @@ -449,7 +488,7 @@ KX_Mesh *BL_ConvertMesh(Mesh *me, Object *blenderobj, KX_Scene *scene, BL_SceneC
}

// Initialize vertex format with used uv and color layers.
RAS_VertexFormat vertformat;
RAS_DisplayArray::Format vertformat;
vertformat.uvSize = max_ii(1, uvCount);
vertformat.colorSize = max_ii(1, colorCount);

Expand Down Expand Up @@ -535,7 +574,7 @@ void BL_ConvertDerivedMeshToArray(DerivedMesh *dm, Mesh *me, const std::vector<B
const MPoly& mpoly = mpolys[i];

const BL_MeshMaterial& mat = mats[mpoly.mat_nr];
RAS_IDisplayArray *array = mat.array;
RAS_DisplayArray *array = mat.array;

// Mark face as flat, so vertices are split.
const bool flat = (mpoly.flag & ME_SMOOTH) == 0;
Expand All @@ -548,33 +587,27 @@ void BL_ConvertDerivedMeshToArray(DerivedMesh *dm, Mesh *me, const std::vector<B
const MVert& mvert = mverts[vertid];

static const float dummyTangent[4] = {0.0f, 0.0f, 0.0f, 0.0f};
const float *tan = tangent ? tangent[j] : dummyTangent;

float uvs[RAS_Texture::MaxUnits][2];
const mt::vec4_packed tan(tangent ? tangent[j] : dummyTangent);
const mt::vec3_packed nor(normals[j]);
const mt::vec3_packed pos(mvert.co);
mt::vec2_packed uvs[RAS_Texture::MaxUnits];
unsigned int rgba[RAS_Texture::MaxUnits];

BL_GetUvRgba(layersInfo, uvLayers, colorLayers, j, uvs, rgba);

RAS_Vertex vertex = array->CreateVertex(mvert.co, uvs, tan, rgba, normals[j]);

BL_SharedVertexList& sharedList = sharedMap[vertid];
BL_SharedVertexList::iterator it = std::find_if(sharedList.begin(), sharedList.end(),
BL_SharedVertexPredicate(vertex, array));
BL_SharedVertexPredicate(array, nor, tan, uvs, rgba));

unsigned int offset;
if (it != sharedList.end()) {
offset = it->offset;
}
else {
offset = array->AddVertex(vertex);
const RAS_VertexInfo info(vertid, flat);
array->AddVertexInfo(info);
offset = array->AddVertex(pos, nor, tan, uvs, rgba, vertid, flat);
sharedList.push_back({array, offset});
}

// Destruct the vertex data as it is copied or unused.
array->DeleteVertexData(vertex);

// Add tracked vertices by the mpoly.
vertices[vertid] = offset;
}
Expand Down
2 changes: 1 addition & 1 deletion source/gameengine/Converter/BL_BlenderDataConversion.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ struct Object;
struct Main;

struct BL_MeshMaterial {
RAS_IDisplayArray *array;
RAS_DisplayArray *array;
RAS_MaterialBucket *bucket;
bool visible;
bool twoside;
Expand Down
37 changes: 16 additions & 21 deletions source/gameengine/Converter/BL_MeshDeformer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,22 +49,21 @@
#include <string>
#include "BLI_math.h"

void BL_MeshDeformer::Apply(RAS_IDisplayArray *UNUSED(array))
void BL_MeshDeformer::Apply(RAS_DisplayArray *UNUSED(array))
{
// only apply once per frame if the mesh is actually modified
if (m_lastDeformUpdate != m_gameobj->GetLastFrame()) {
// For each display array
for (const DisplayArraySlot& slot : m_slots) {
RAS_IDisplayArray *array = slot.m_displayArray;
RAS_DisplayArray *array = slot.m_displayArray;

// For each vertex
for (unsigned int i = 0, size = array->GetVertexCount(); i < size; ++i) {
RAS_Vertex v = array->GetVertex(i);
const RAS_VertexInfo& vinfo = array->GetVertexInfo(i);
v.SetXYZ(m_bmesh->mvert[vinfo.GetOrigIndex()].co);
array->SetPosition(i, mt::vec3_packed(m_bmesh->mvert[vinfo.GetOrigIndex()].co));
}

array->NotifyUpdate(RAS_IDisplayArray::POSITION_MODIFIED);
array->NotifyUpdate(RAS_DisplayArray::POSITION_MODIFIED);
}

m_lastDeformUpdate = m_gameobj->GetLastFrame();
Expand Down Expand Up @@ -114,53 +113,49 @@ void BL_MeshDeformer::RecalcNormals()
* since the GPU can do it faster */

/* set vertex normals to zero */
for (std::array<float, 3>& normal : m_transnors) {
normal = {{0.0f, 0.0f, 0.0f}};
}
std::fill(m_transnors.begin(), m_transnors.end(), mt::zero3);

for (const DisplayArraySlot& slot : m_slots) {
RAS_IDisplayArray *array = slot.m_displayArray;
RAS_DisplayArray *array = slot.m_displayArray;
for (unsigned int i = 0, size = array->GetTriangleIndexCount(); i < size; i += 3) {
const float *co[3];
mt::vec3_packed co[3];
bool flat = false;

for (unsigned short j = 0; j < 3; ++j) {
const unsigned int index = array->GetTriangleIndex(i + j);
const RAS_VertexInfo& vinfo = array->GetVertexInfo(index);
const unsigned int origindex = vinfo.GetOrigIndex();

co[j] = m_transverts[origindex].data();
co[j] = m_transverts[origindex];
flat |= (vinfo.GetFlag() & RAS_VertexInfo::FLAT);
}

float pnorm[3];
normal_tri_v3(pnorm, co[0], co[1], co[2]);
mt::vec3_packed pnorm;
normal_tri_v3(pnorm.data, co[0].data, co[1].data, co[2].data);

for (unsigned short j = 0; j < 3; ++j) {
const unsigned int index = array->GetTriangleIndex(i + j);

if (flat) {
RAS_Vertex vert = array->GetVertex(index);
vert.SetNormal(pnorm);
array->SetNormal(index, pnorm);
}
else {
const RAS_VertexInfo& vinfo = array->GetVertexInfo(index);
const unsigned int origindex = vinfo.GetOrigIndex();
add_v3_v3(m_transnors[origindex].data(), pnorm);
add_v3_v3(m_transnors[origindex].data, pnorm.data);
}
}
}
}

// Assign smooth vertex normals.
for (const DisplayArraySlot& slot : m_slots) {
RAS_IDisplayArray *array = slot.m_displayArray;
RAS_DisplayArray *array = slot.m_displayArray;
for (unsigned int i = 0, size = array->GetVertexCount(); i < size; ++i) {
RAS_Vertex v = array->GetVertex(i);
const RAS_VertexInfo& vinfo = array->GetVertexInfo(i);

if (!(vinfo.GetFlag() & RAS_VertexInfo::FLAT)) {
v.SetNormal(m_transnors[vinfo.GetOrigIndex()].data());
array->SetNormal(i, m_transnors[vinfo.GetOrigIndex()]);
}
}
}
Expand All @@ -176,8 +171,8 @@ void BL_MeshDeformer::VerifyStorage()
}

for (unsigned int v = 0; v < totvert; ++v) {
copy_v3_v3(m_transverts[v].data(), m_bmesh->mvert[v].co);
normal_short_to_float_v3(m_transnors[v].data(), m_bmesh->mvert[v].no);
copy_v3_v3(m_transverts[v].data, m_bmesh->mvert[v].co);
normal_short_to_float_v3(m_transnors[v].data, m_bmesh->mvert[v].no);
}
}

Loading

0 comments on commit b6123cc

Please sign in to comment.