Matrix processors & how to use them, Both Mac, Intel, AMD & NV Want to make heavy use of , But how to utilize them : RS https://www.phoronix.com/vr.php?view=31014

Wed Mar 30 17:24:36 UTC 2022

Matrix processors & how to use them, Both Mac, Intel, AMD & NV Want to
make heavy use of , But how to utilize them : RS
https://www.phoronix.com/vr.php?view=31014

Multi Operation Maths - CPU,GPU Computation (c)RS

Kind of an F16 operation & Integer 16 or Int8 if you need it, With
careful management and special libraries ..
Capable of speeding up PC,Mac & Consoles :HPC:
Requires specially compiled libraries so compiled codes can be managed
& roll ops assessed.

Performing multiple 4,8,16,32 operations on a 64Bit integer core (The example)

Rules:

All operations need to be by the same multiplication

Rolls usable to convert value for example Mul & Division

For example :

451 722 551 834 x 6

In the case of non base factor roll numbers

We have to fraction the difference between the value and our base roll number,

10 for example and 6, So the maths is convoluted & may not be worth it,

Could do 6 + rolls & then -Rolls

On a 10 processor the first factor would be 10x because we could
compensate by placement

But we still need space to expand the result to the right or left

0451072205510834 x 10 =

4510722055108340

or 4510 roll -12
7220 roll -8
5510 roll -4
8340 no roll

Converting base 10 to & from hex may make sense

Depending on the cost of roll; This operation may be worth it!

This operation is in Base 10 & 8Bit makes more sense mostly for common
operations in hex..

But 8 is not a very big number for larger maths & 16Bit makes more
sense; Because it has a larger number.

Performing numeric expansion:
consoles in particular and FPU where expansion is required for
emergence mathematics

Performing numeric expansion for circumstances where we require larger
numbers for example:

To fill the 187 FPU buffer..

To do that we will roll to the left & expand the number, although we
may need multiple operations..

Like i say : Roll + or Roll -

1447000
-Roll 3 = 1447
or
+Roll 3 = 1447000000

That way we can potentially exceed the Bit Depth 32Bit for example.

Rupert S https://science.n-helix.com

https://science.n-helix.com/2021/02/multi-operation-maths.html

https://science.n-helix.com/2018/01/integer-floats-with-remainder-theory.html

*****

Packed F16C & F16 Values in use on CPU & GPU - RS.txt

F16C & F16 : lower precision values that are usable to optimise GPU &
CPU operation that involve; Less Detailed values like Hashes or game
data Metadata or Machine Learning : RS

Firstly the F16C is the FX 8320E supported instruction so the CPU can
potentially use packed F16 Float instructions directly from the CPU,
As quoted F16 carefully managed produces a pipeline that is 100% F16..

Packed F16 instructions use 2 data sets per 32Bit storage register...

Data is converted if the array of instructions includes F32 & commonly
all F16 should be present first; Before group conversion or
alternatively...

Allocating an additional 16Bits of data for example 0000x4 or subtle
variance data that allows unique renders... Such as a chaos key or
Entropy / RNG Random data...

Potentially allocating a static key in the form of AES Output from
base pair F16c Value...

The additional data make potentially each game player render unique!

Fast Conversion Proposals include:

Unique per player additional data (AES Key conversion for example, Or
DES; DES Produces smaller faster values)

Static key, Sorted by data type (Base on player profile or Game map)

Dynamic Key

0000 or empty buffer hash

Side by Side : Wide format texture = 2xF16 Value on same 32Bit Value
Top & Bottom : F16 Double layered format texture = 2xF16 Value on same
32Bit Value

Yes transparency for alien skin can use : Top & Bottom F16 layered texture
Machines also; Or even 4 layers for a truly special effect.

Combine both methodology and crypto hash with one or more layer of
BumpMap RayTracing SiMD

SiMD is also 16Bit compatible so no conversion required.

Weather & clouds are examples perfect for light fast loads over
massive GPU Arrays.

F16 are also theoretically ideal for 16Bit audio if using SiMD..

In the case of AVX probably worth using dynamic key conversion..
A Dynamic Remainder key that allows lower bits to interpolate Sound data.

Other object sources such as render can potentially use the F16 system to..
Interpolate or Tessellate bits on shift from F16 to F32 on final plane
write to frame buffer..
The memory required would be the buffer & not the source process..

An example is to replace the bits missing from F16 in F32/F64 with
tessellation shaping and sharpening code; Dynamically relative to
performance of the GPU/CPU...
F16 values obviously transfer from GPU to CPU fast & CPU to GPU..

Image enhancement is also possible with a bitshift stack buffer that
passes additional data to the missing bits..
For example pre processed micro BumpMapping or Compute shading
process; That will pull the bits in.. Under the F16 data
453000.172000 > 453545.172711 bit swap.. could be complex!
Done with a cache? Entirely possible with united cache L3

DLSS & Dynamic sharpen & Smooth/Filter enhanced virtual resolution ..
Can significantly enhance the process..
Of dynamic buffer pipelining to render path. (on requirement benefit)

(c)Rupert S https://science.n-helix.com/2019/06/vulkan-stack.html

https://gpuopen.com/learn/first-steps-implementing-fp16/

*****

Submissions for review

RS

https://drive.google.com/drive/folders/1X5fUvsXkvBU6td78uq3EdEUJ_S6iUplA?usp=sharing

https://lore.kernel.org/lkml/20220329164117.1449-1-mario.limonciello@amd.com/

https://www.phoronix.com/scan.php?page=news_item&px=AMD-PSP-Sysfs-Expose

https://lkml.org/lkml/2022/3/30/1005