Midnight Coder: July 2018

因為已經很久沒有新文章，今天就來更新一下開發進度吧。

六月將Switch繪圖底層移植的工作完成後，終於有時間回到自己的專案來工作。之前雖然已經在Unity上將 voxel系統做了一些嘗試，但一直都沒有時間將程式碼做一個整理以及最佳化，所以就趁著還沒辦法開始production的這段時間先來做一些前製的工作，將該準備好的技術都先整理完成。同時因為還是會用voxel的模型來製作遊戲，所以美術的製程跟一般傳統遊戲會有些不同，因此也需要將技術部分確定後建立起之後的美術製程。

之前其實我用了兩種方法在Unity上實作voxel系統，一種是用內建的particle system，但由voxel system控制每一個分子的移動以及位移，因為底層是Unity自家最佳化過的，所以效能較好，但是缺點就是彈性較小。另一種方法則是使用material instancing，這種做法就跟我之前引擎是類似的，優點就是彈性更大，但是在之前實作中效能跟particle system還是有點差距。所以這次的任務就是優化material instancing based voxel system，同時還要設計一個新的檔案格式符合新的voxel system的新功能。

分析了一下material instancing based voxel system，發現瓶頸是在每個update中的4x4矩陣相乘，不過這部分其實很單純，就是呼叫以下function

static void InstanceTransformUpdate(ref Matrix4x4 o, ref Matrix4x4 p, ref Matrix4x4 t){o = p * t;}

於是我試著將移除矩陣相乘的operator overloading，將function改成如下

static void InstanceTransformUpdate(ref Matrix4x4 o, ref Matrix4x4 p, ref Matrix4x4 t)
{
o.m00 = p.m00 * t.m00 + p.m01 * t.m10 + p.m02 * t.m20 + p.m03 * t.m30;
o.m01 = p.m00 * t.m01 + p.m01 * t.m11 + p.m02 * t.m21 + p.m03 * t.m31;
o.m02 = p.m00 * t.m02 + p.m01 * t.m12 + p.m02 * t.m22 + p.m03 * t.m32;
o.m03 = p.m00 * t.m03 + p.m01 * t.m13 + p.m02 * t.m23 + p.m03 * t.m33;

o.m10 = p.m10 * t.m00 + p.m11 * t.m10 + p.m12 * t.m20 + p.m13 * t.m30;
o.m11 = p.m10 * t.m01 + p.m11 * t.m11 + p.m12 * t.m21 + p.m13 * t.m31;
o.m12 = p.m10 * t.m02 + p.m11 * t.m12 + p.m12 * t.m22 + p.m13 * t.m32;
o.m13 = p.m10 * t.m03 + p.m11 * t.m13 + p.m12 * t.m23 + p.m13 * t.m33;

o.m20 = p.m20 * t.m00 + p.m21 * t.m10 + p.m22 * t.m20 + p.m23 * t.m30;
o.m21 = p.m20 * t.m01 + p.m21 * t.m11 + p.m22 * t.m21 + p.m23 * t.m31;
o.m22 = p.m20 * t.m02 + p.m21 * t.m12 + p.m22 * t.m22 + p.m23 * t.m32;
o.m23 = p.m20 * t.m03 + p.m21 * t.m13 + p.m22 * t.m23 + p.m23 * t.m33;

o.m30 = p.m30 * t.m00 + p.m31 * t.m10 + p.m32 * t.m20 + p.m33 * t.m30;
o.m31 = p.m30 * t.m01 + p.m31 * t.m11 + p.m32 * t.m21 + p.m33 * t.m31;
o.m32 = p.m30 * t.m02 + p.m31 * t.m12 + p.m32 * t.m22 + p.m33 * t.m32;
o.m33 = p.m30 * t.m03 + p.m31 * t.m13 + p.m32 * t.m23 + p.m33 * t.m33;
}

結果效能顯著的提升，從75 fps直接跳到120 fps! 起初我以為是C#的數學運算效能問題，後來有朋友傳來Unity的C# 原始碼，一切才水落石出，以下是Unity的矩陣相乘原始碼

// Multiplies two matrices.

public static Matrix4x4 operator*(Matrix4x4 lhs, Matrix4x4 rhs)

{

Matrix4x4 res;

res.m00 = lhs.m00 * rhs.m00 + lhs.m01 * rhs.m10 + lhs.m02 * rhs.m20 + lhs.m03 * rhs.m30;

res.m01 = lhs.m00 * rhs.m01 + lhs.m01 * rhs.m11 + lhs.m02 * rhs.m21 + lhs.m03 * rhs.m31;

res.m02 = lhs.m00 * rhs.m02 + lhs.m01 * rhs.m12 + lhs.m02 * rhs.m22 + lhs.m03 * rhs.m32;

res.m03 = lhs.m00 * rhs.m03 + lhs.m01 * rhs.m13 + lhs.m02 * rhs.m23 + lhs.m03 * rhs.m33;

res.m10 = lhs.m10 * rhs.m00 + lhs.m11 * rhs.m10 + lhs.m12 * rhs.m20 + lhs.m13 * rhs.m30;

res.m11 = lhs.m10 * rhs.m01 + lhs.m11 * rhs.m11 + lhs.m12 * rhs.m21 + lhs.m13 * rhs.m31;

res.m12 = lhs.m10 * rhs.m02 + lhs.m11 * rhs.m12 + lhs.m12 * rhs.m22 + lhs.m13 * rhs.m32;

res.m13 = lhs.m10 * rhs.m03 + lhs.m11 * rhs.m13 + lhs.m12 * rhs.m23 + lhs.m13 * rhs.m33;

res.m20 = lhs.m20 * rhs.m00 + lhs.m21 * rhs.m10 + lhs.m22 * rhs.m20 + lhs.m23 * rhs.m30;

res.m21 = lhs.m20 * rhs.m01 + lhs.m21 * rhs.m11 + lhs.m22 * rhs.m21 + lhs.m23 * rhs.m31;

res.m22 = lhs.m20 * rhs.m02 + lhs.m21 * rhs.m12 + lhs.m22 * rhs.m22 + lhs.m23 * rhs.m32;

res.m23 = lhs.m20 * rhs.m03 + lhs.m21 * rhs.m13 + lhs.m22 * rhs.m23 + lhs.m23 * rhs.m33;

res.m30 = lhs.m30 * rhs.m00 + lhs.m31 * rhs.m10 + lhs.m32 * rhs.m20 + lhs.m33 * rhs.m30;

res.m31 = lhs.m30 * rhs.m01 + lhs.m31 * rhs.m11 + lhs.m32 * rhs.m21 + lhs.m33 * rhs.m31;

res.m32 = lhs.m30 * rhs.m02 + lhs.m31 * rhs.m12 + lhs.m32 * rhs.m22 + lhs.m33 * rhs.m32;

res.m33 = lhs.m30 * rhs.m03 + lhs.m31 * rhs.m13 + lhs.m32 * rhs.m23 + lhs.m33 * rhs.m33;

return res;

}

原來問題是來自於pass by value，如果直接使用Matrix4x4的乘法operator，每一次的相乘就會new出三個暫時的Matrix4x4，而這就是效能衰退的原因，同時還可能會造成GC的問題。不過我比較不理解的是為何Unity不另外增加一個static的數學lib是可以直接 pass by reference，讓需要效能的地方可以有個選擇，之前我的引擎雖然是C++，但是為了避免多一個copy動作，我就有設計另一組static的數學lib來給需要效能的地方使用。

之後我還嘗試了將矩陣相乘用C++寫成plugin，甚至用組合語言改寫並加上SSE加速，但效能提升很有限，所以最後我還是決定使用C#的static math function就好。不過即便如此，提升後的效能還是跟particle system有段差距。我想到如果能將矩陣運算丟入GPU的話，應該能提升更多效能，於是修改了rendering的code以及shader，終於可以得到跟Particle System差不多的效能，在某些情況下還可以超越。將來如果有時間，我大概會想辦法把剩下的CPU運算都丟進compute shader中，不過以目前的專案來說這樣的效能應該已經足夠，所以我就先在這裡打住了。

這次還做了一個之前一直想做的新設計就是將voxel data跟rendering的code分離，藉由將這兩個類別decouple讓整個系統更有彈性同時也可以得到更好的效能。

回到遊戲專案上，其實目前專案的進度有點卡住，主要是因為還沒有找到合適的美術人選，所以只能先從技術部分下手，但是關於rendering的部分因為沒有確定風格也無法展開太多的研究。這次的專案是一個以voxel為基礎的動作遊戲，如果有對做voxel類型遊戲有興趣的美術高手我們可以聊聊喔 :)

Midnight Coder

Wednesday, July 11, 2018

Development Updates

static void InstanceTransformUpdate(ref Matrix4x4 o, ref Matrix4x4 p, ref Matrix4x4 t){o = p * t;}

Labels

Search This Blog

最新回應

LynxEngine Update

Lynx Engine

My Works

My Friends

Everything about rendering

Everything about game

Programming

Others

Blog Archive

Total Pageviews

來訪人數

部落格觀察

Followers

About Me

Midnight Coder

Wednesday, July 11, 2018

Development Updates

static void InstanceTransformUpdate(ref Matrix4x4 o, ref Matrix4x4 p, ref Matrix4x4 t){o = p * t;}

Labels

Search This Blog

最新回應

LynxEngine Update

Lynx Engine

My Works

My Friends

Everything about rendering

Everything about game

Programming

Others

Blog Archive

Total Pageviews

來訪人數

部落格觀察

訂閱

Followers

About Me