Float Computation

Generated from include/msa.h. This page contains 34 intrinsics.

v2f64 __msa_fadd_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fadd_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fadd.d
Builtin: __builtin_msa_fadd_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:510

Description

Add lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = a.fp64[0] + b.fp64[0];
dst.fp64[1] = a.fp64[1] + b.fp64[1];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fadd_d __builtin_msa_fadd_d

v4f32 __msa_fadd_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fadd_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fadd.w
Builtin: __builtin_msa_fadd_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:509

Description

Add lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = a.fp32[0] + b.fp32[0];
dst.fp32[1] = a.fp32[1] + b.fp32[1];
dst.fp32[2] = a.fp32[2] + b.fp32[2];
dst.fp32[3] = a.fp32[3] + b.fp32[3];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fadd_w __builtin_msa_fadd_w

v2f64 __msa_fdiv_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fdiv_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fdiv.d
Builtin: __builtin_msa_fdiv_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:516

Description

Divide lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = a.fp64[0] / b.fp64[0];
dst.fp64[1] = a.fp64[1] / b.fp64[1];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 8/23 0.21(1/4.67)

Header Mapping

#define __msa_fdiv_d __builtin_msa_fdiv_d

v4f32 __msa_fdiv_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fdiv_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fdiv.w
Builtin: __builtin_msa_fdiv_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:515

Description

Divide lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = a.fp32[0] / b.fp32[0];
dst.fp32[1] = a.fp32[1] / b.fp32[1];
dst.fp32[2] = a.fp32[2] / b.fp32[2];
dst.fp32[3] = a.fp32[3] / b.fp32[3];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 11/27 0.14(1/7)

Header Mapping

#define __msa_fdiv_w __builtin_msa_fdiv_w

v2f64 __msa_ffql_d (v4i32 a)

Synopsis

v2f64 __msa_ffql_d (v4i32 a)
#include <msa.h>
Instruction: ffql.d
Builtin: __builtin_msa_ffql_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:564

Description

Convert lower fixed-point q-format lanes to floating point lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fixed_point_q_to_float_lower_half(a, 0);
dst.fp64[1] = fixed_point_q_to_float_lower_half(a, 1);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 1

Header Mapping

#define __msa_ffql_d __builtin_msa_ffql_d

v4f32 __msa_ffql_w (v8i16 a)

Synopsis

v4f32 __msa_ffql_w (v8i16 a)
#include <msa.h>
Instruction: ffql.w
Builtin: __builtin_msa_ffql_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:563

Description

Convert lower fixed-point q-format lanes to floating point lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fixed_point_q_to_float_lower_half(a, 0);
dst.fp32[1] = fixed_point_q_to_float_lower_half(a, 1);
dst.fp32[2] = fixed_point_q_to_float_lower_half(a, 2);
dst.fp32[3] = fixed_point_q_to_float_lower_half(a, 3);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 1

Header Mapping

#define __msa_ffql_w __builtin_msa_ffql_w

v2f64 __msa_ffqr_d (v4i32 a)

Synopsis

v2f64 __msa_ffqr_d (v4i32 a)
#include <msa.h>
Instruction: ffqr.d
Builtin: __builtin_msa_ffqr_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:566

Description

Convert upper fixed-point q-format lanes to floating point lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fixed_point_q_to_float_upper_half(a, 0);
dst.fp64[1] = fixed_point_q_to_float_upper_half(a, 1);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 1

Header Mapping

#define __msa_ffqr_d __builtin_msa_ffqr_d

v4f32 __msa_ffqr_w (v8i16 a)

Synopsis

v4f32 __msa_ffqr_w (v8i16 a)
#include <msa.h>
Instruction: ffqr.w
Builtin: __builtin_msa_ffqr_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:565

Description

Convert upper fixed-point q-format lanes to floating point lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fixed_point_q_to_float_upper_half(a, 0);
dst.fp32[1] = fixed_point_q_to_float_upper_half(a, 1);
dst.fp32[2] = fixed_point_q_to_float_upper_half(a, 2);
dst.fp32[3] = fixed_point_q_to_float_upper_half(a, 3);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 1

Header Mapping

#define __msa_ffqr_w __builtin_msa_ffqr_w

v2f64 __msa_flog2_d (v2f64 a)

Synopsis

v2f64 __msa_flog2_d (v2f64 a)
#include <msa.h>
Instruction: flog2.d
Builtin: __builtin_msa_flog2_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:558

Description

Compute base-2 logarithm lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = log2(a.fp64[0]);
dst.fp64[1] = log2(a.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 4 2

Header Mapping

#define __msa_flog2_d __builtin_msa_flog2_d

v4f32 __msa_flog2_w (v4f32 a)

Synopsis

v4f32 __msa_flog2_w (v4f32 a)
#include <msa.h>
Instruction: flog2.w
Builtin: __builtin_msa_flog2_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:557

Description

Compute base-2 logarithm lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = log2(a.fp32[0]);
dst.fp32[1] = log2(a.fp32[1]);
dst.fp32[2] = log2(a.fp32[2]);
dst.fp32[3] = log2(a.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 4 2

Header Mapping

#define __msa_flog2_w __builtin_msa_flog2_w

v2f64 __msa_fmadd_d (v2f64 a, v2f64 b, v2f64 c)

Synopsis

v2f64 __msa_fmadd_d (v2f64 a, v2f64 b, v2f64 c)
#include <msa.h>
Instruction: fmadd.d
Builtin: __builtin_msa_fmadd_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:518

Description

Fused multiply-add lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fused_round((a.fp64[0] * b.fp64[0]) + c.fp64[0]);
dst.fp64[1] = fused_round((a.fp64[1] * b.fp64[1]) + c.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fmadd_d __builtin_msa_fmadd_d

v4f32 __msa_fmadd_w (v4f32 a, v4f32 b, v4f32 c)

Synopsis

v4f32 __msa_fmadd_w (v4f32 a, v4f32 b, v4f32 c)
#include <msa.h>
Instruction: fmadd.w
Builtin: __builtin_msa_fmadd_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:517

Description

Fused multiply-add lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fused_round((a.fp32[0] * b.fp32[0]) + c.fp32[0]);
dst.fp32[1] = fused_round((a.fp32[1] * b.fp32[1]) + c.fp32[1]);
dst.fp32[2] = fused_round((a.fp32[2] * b.fp32[2]) + c.fp32[2]);
dst.fp32[3] = fused_round((a.fp32[3] * b.fp32[3]) + c.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fmadd_w __builtin_msa_fmadd_w

v2f64 __msa_fmax_a_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fmax_a_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fmax.a.d
Builtin: __builtin_msa_fmax_a_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:534

Description

Compute maximum lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fp_max(a.fp64[0], b.fp64[0]);
dst.fp64[1] = fp_max(a.fp64[1], b.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmax_a_d __builtin_msa_fmax_a_d

v4f32 __msa_fmax_a_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fmax_a_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fmax.a.w
Builtin: __builtin_msa_fmax_a_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:533

Description

Compute maximum lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fp_max(a.fp32[0], b.fp32[0]);
dst.fp32[1] = fp_max(a.fp32[1], b.fp32[1]);
dst.fp32[2] = fp_max(a.fp32[2], b.fp32[2]);
dst.fp32[3] = fp_max(a.fp32[3], b.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmax_a_w __builtin_msa_fmax_a_w

v2f64 __msa_fmax_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fmax_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fmax.d
Builtin: __builtin_msa_fmax_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:532

Description

Compute maximum lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fp_max(a.fp64[0], b.fp64[0]);
dst.fp64[1] = fp_max(a.fp64[1], b.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmax_d __builtin_msa_fmax_d

v4f32 __msa_fmax_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fmax_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fmax.w
Builtin: __builtin_msa_fmax_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:531

Description

Compute maximum lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fp_max(a.fp32[0], b.fp32[0]);
dst.fp32[1] = fp_max(a.fp32[1], b.fp32[1]);
dst.fp32[2] = fp_max(a.fp32[2], b.fp32[2]);
dst.fp32[3] = fp_max(a.fp32[3], b.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmax_w __builtin_msa_fmax_w

v2f64 __msa_fmin_a_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fmin_a_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fmin.a.d
Builtin: __builtin_msa_fmin_a_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:530

Description

Compute minimum lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fp_min(a.fp64[0], b.fp64[0]);
dst.fp64[1] = fp_min(a.fp64[1], b.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmin_a_d __builtin_msa_fmin_a_d

v4f32 __msa_fmin_a_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fmin_a_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fmin.a.w
Builtin: __builtin_msa_fmin_a_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:529

Description

Compute minimum lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fp_min(a.fp32[0], b.fp32[0]);
dst.fp32[1] = fp_min(a.fp32[1], b.fp32[1]);
dst.fp32[2] = fp_min(a.fp32[2], b.fp32[2]);
dst.fp32[3] = fp_min(a.fp32[3], b.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmin_a_w __builtin_msa_fmin_a_w

v2f64 __msa_fmin_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fmin_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fmin.d
Builtin: __builtin_msa_fmin_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:528

Description

Compute minimum lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fp_min(a.fp64[0], b.fp64[0]);
dst.fp64[1] = fp_min(a.fp64[1], b.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmin_d __builtin_msa_fmin_d

v4f32 __msa_fmin_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fmin_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fmin.w
Builtin: __builtin_msa_fmin_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:527

Description

Compute minimum lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fp_min(a.fp32[0], b.fp32[0]);
dst.fp32[1] = fp_min(a.fp32[1], b.fp32[1]);
dst.fp32[2] = fp_min(a.fp32[2], b.fp32[2]);
dst.fp32[3] = fp_min(a.fp32[3], b.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 2 2

Header Mapping

#define __msa_fmin_w __builtin_msa_fmin_w

v2f64 __msa_fmsub_d (v2f64 a, v2f64 b, v2f64 c)

Synopsis

v2f64 __msa_fmsub_d (v2f64 a, v2f64 b, v2f64 c)
#include <msa.h>
Instruction: fmsub.d
Builtin: __builtin_msa_fmsub_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:520

Description

Fused multiply-subtract lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = fused_round((a.fp64[0] * b.fp64[0]) - c.fp64[0]);
dst.fp64[1] = fused_round((a.fp64[1] * b.fp64[1]) - c.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fmsub_d __builtin_msa_fmsub_d

v4f32 __msa_fmsub_w (v4f32 a, v4f32 b, v4f32 c)

Synopsis

v4f32 __msa_fmsub_w (v4f32 a, v4f32 b, v4f32 c)
#include <msa.h>
Instruction: fmsub.w
Builtin: __builtin_msa_fmsub_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:519

Description

Fused multiply-subtract lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = fused_round((a.fp32[0] * b.fp32[0]) - c.fp32[0]);
dst.fp32[1] = fused_round((a.fp32[1] * b.fp32[1]) - c.fp32[1]);
dst.fp32[2] = fused_round((a.fp32[2] * b.fp32[2]) - c.fp32[2]);
dst.fp32[3] = fused_round((a.fp32[3] * b.fp32[3]) - c.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fmsub_w __builtin_msa_fmsub_w

v2f64 __msa_fmul_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fmul_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fmul.d
Builtin: __builtin_msa_fmul_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:514

Description

Multiply lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = a.fp64[0] * b.fp64[0];
dst.fp64[1] = a.fp64[1] * b.fp64[1];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fmul_d __builtin_msa_fmul_d

v4f32 __msa_fmul_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fmul_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fmul.w
Builtin: __builtin_msa_fmul_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:513

Description

Multiply lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = a.fp32[0] * b.fp32[0];
dst.fp32[1] = a.fp32[1] * b.fp32[1];
dst.fp32[2] = a.fp32[2] * b.fp32[2];
dst.fp32[3] = a.fp32[3] * b.fp32[3];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fmul_w __builtin_msa_fmul_w

v2f64 __msa_frcp_d (v2f64 a)

Synopsis

v2f64 __msa_frcp_d (v2f64 a)
#include <msa.h>
Instruction: frcp.d
Builtin: __builtin_msa_frcp_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:552

Description

Compute reciprocal estimate lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = 1.0 / a.fp64[0];
dst.fp64[1] = 1.0 / a.fp64[1];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 8 0.15(1/6.76)

Header Mapping

#define __msa_frcp_d __builtin_msa_frcp_d

v4f32 __msa_frcp_w (v4f32 a)

Synopsis

v4f32 __msa_frcp_w (v4f32 a)
#include <msa.h>
Instruction: frcp.w
Builtin: __builtin_msa_frcp_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:551

Description

Compute reciprocal estimate lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = 1.0 / a.fp32[0];
dst.fp32[1] = 1.0 / a.fp32[1];
dst.fp32[2] = 1.0 / a.fp32[2];
dst.fp32[3] = 1.0 / a.fp32[3];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 11 0.12(1/8.36)

Header Mapping

#define __msa_frcp_w __builtin_msa_frcp_w

v2f64 __msa_frsqrt_d (v2f64 a)

Synopsis

v2f64 __msa_frsqrt_d (v2f64 a)
#include <msa.h>
Instruction: frsqrt.d
Builtin: __builtin_msa_frsqrt_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:556

Description

Compute reciprocal square-root estimate lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = 1.0 / sqrt(a.fp64[0]);
dst.fp64[1] = 1.0 / sqrt(a.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 15 0.06(1/16.59)

Header Mapping

#define __msa_frsqrt_d __builtin_msa_frsqrt_d

v4f32 __msa_frsqrt_w (v4f32 a)

Synopsis

v4f32 __msa_frsqrt_w (v4f32 a)
#include <msa.h>
Instruction: frsqrt.w
Builtin: __builtin_msa_frsqrt_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:555

Description

Compute reciprocal square-root estimate lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = 1.0 / sqrt(a.fp32[0]);
dst.fp32[1] = 1.0 / sqrt(a.fp32[1]);
dst.fp32[2] = 1.0 / sqrt(a.fp32[2]);
dst.fp32[3] = 1.0 / sqrt(a.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 17 0.05(1/20)

Header Mapping

#define __msa_frsqrt_w __builtin_msa_frsqrt_w

v2f64 __msa_fsqrt_d (v2f64 a)

Synopsis

v2f64 __msa_fsqrt_d (v2f64 a)
#include <msa.h>
Instruction: fsqrt.d
Builtin: __builtin_msa_fsqrt_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:550

Description

Compute square root lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = sqrt(a.fp64[0]);
dst.fp64[1] = sqrt(a.fp64[1]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 10 0.09(1/11.43)

Header Mapping

#define __msa_fsqrt_d __builtin_msa_fsqrt_d

v4f32 __msa_fsqrt_w (v4f32 a)

Synopsis

v4f32 __msa_fsqrt_w (v4f32 a)
#include <msa.h>
Instruction: fsqrt.w
Builtin: __builtin_msa_fsqrt_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:549

Description

Compute square root lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = sqrt(a.fp32[0]);
dst.fp32[1] = sqrt(a.fp32[1]);
dst.fp32[2] = sqrt(a.fp32[2]);
dst.fp32[3] = sqrt(a.fp32[3]);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 11 0.08(1/13)

Header Mapping

#define __msa_fsqrt_w __builtin_msa_fsqrt_w

v2f64 __msa_fsub_d (v2f64 a, v2f64 b)

Synopsis

v2f64 __msa_fsub_d (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: fsub.d
Builtin: __builtin_msa_fsub_d
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:512

Description

Subtract lane-wise for 2 x fp64 lanes.

Operation

dst.fp64[0] = a.fp64[0] - b.fp64[0];
dst.fp64[1] = a.fp64[1] - b.fp64[1];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fsub_d __builtin_msa_fsub_d

v4f32 __msa_fsub_w (v4f32 a, v4f32 b)

Synopsis

v4f32 __msa_fsub_w (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: fsub.w
Builtin: __builtin_msa_fsub_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:511

Description

Subtract lane-wise for 4 x fp32 lanes.

Operation

dst.fp32[0] = a.fp32[0] - b.fp32[0];
dst.fp32[1] = a.fp32[1] - b.fp32[1];
dst.fp32[2] = a.fp32[2] - b.fp32[2];
dst.fp32[3] = a.fp32[3] - b.fp32[3];

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 2

Header Mapping

#define __msa_fsub_w __builtin_msa_fsub_w

v8i16 __msa_ftq_h (v4f32 a, v4f32 b)

Synopsis

v8i16 __msa_ftq_h (v4f32 a, v4f32 b)
#include <msa.h>
Instruction: ftq.h
Builtin: __builtin_msa_ftq_h
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:525

Description

Convert floating-point lanes to fixed-point q-format lane-wise for 8 x fp16 lanes.

Operation

dst.i32[0] = float_to_fixed_point_q(a, b, 0);
dst.i32[1] = float_to_fixed_point_q(a, b, 1);
dst.i32[2] = float_to_fixed_point_q(a, b, 2);
dst.i32[3] = float_to_fixed_point_q(a, b, 3);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 1

Header Mapping

#define __msa_ftq_h __builtin_msa_ftq_h

v4i32 __msa_ftq_w (v2f64 a, v2f64 b)

Synopsis

v4i32 __msa_ftq_w (v2f64 a, v2f64 b)
#include <msa.h>
Instruction: ftq.w
Builtin: __builtin_msa_ftq_w
CPU Flags: __mips_msa
Kind: alias
Source: include/msa.h:526

Description

Convert floating-point lanes to fixed-point q-format lane-wise for 4 x fp32 lanes.

Operation

dst.i32[0] = float_to_fixed_point_q(a, b, 0);
dst.i32[1] = float_to_fixed_point_q(a, b, 1);
dst.i32[2] = float_to_fixed_point_q(a, b, 2);
dst.i32[3] = float_to_fixed_point_q(a, b, 3);

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3A4000 GS464V 5 1

Header Mapping

#define __msa_ftq_w __builtin_msa_ftq_w