We want to ignore the incoming pending NaN state (since the pending will propagate to the output if there was one on the input), and we want to add a new pending NaN state if we can (that is to say, if it isn't cancelled out by both inputs having arithmetic state). Do this by discarding the pending states on the inputs, intersecting them (to keep only the arithmetic state), then union in a pending nan state (which might do nothing, if it's arithmetic).
If the above sounds confusing, keep in mind that when a value is arithmetic, the act of performing the "NaN canonicalization" is a no-op. Thus, being arithmetic cancels out pending NaN states.
Unfortunately, this is quite buggy. For something as simple as F32Sub, to combine two ExtraInfos, we want to add a new pending_f32_nan(), unless both of the inputs are arithmetic_f32(). In this commit, we incorrectly calculate that we don't need a pending_f32_nan if either one of the inputs was arithmetic_f32().
It seemed like a good idea at the time, but in practice we discard the extra info all or almost all of the time.
This also introduces a new bug. In an operation like multiply, it's valid to multiply two values, one with a pending NaN and one without. As written, in the SIMD case (because of the two kinds of pending in play), we assert.
1002: Update the LLVM pass list. r=nlewycky a=nlewycky
# Description
Adds optimizations of loops, and inlinling and some simple interprocedural optimization.
Measured on the libsodium benchmarks, the new pass pipeline is a 2.35% geomean improvement. No major performance regressions known.
Co-authored-by: Nick Lewycky <nick@wasmer.io>
By exposing the target information through `CompilerConfig`,
compiler(only LLVM at the moment) could create a machine with
different CPU feature flags other than current host, which makes it
capable to "cross compile" to some degree.
Update #959
911: Don't emit bounds checks when the offset is known at compile time to be less than the minimum memory size. r=nlewycky a=nlewycky
Co-authored-by: Nick Lewycky <nick@wasmer.io>
This patch updates all the backends to use the definition of
`Trampoline` as defined in the `wasmer_runtime_core::typed_func`
module. That way, there is no copy of that type, and as such, it is
easier to avoid regression (a simple `cargo check` does the job).
This patch also formats the `use` statements in the updated files.
901: Set target triple and datalayout when creating the LLVM module. r=nlewycky a=nlewycky
We were giving LLVM a triple and datalayout only when producing native code from the LLVM IR. With this change, we tell LLVM as early as possible so that the entire optimization stack knows that it's safe to use target-specific constructs (including target intrinsics `@llvm.x86.sse2.ucomieq.sd`) as well as cost models (for autovectorization) and knowing the bitwidth of the registers so that we can know it's profitable to eliminate redundant extend/truncate casts.
Co-authored-by: Nick Lewycky <nick@wasmer.io>