We want to ignore the incoming pending NaN state (since the pending will propagate to the output if there was one on the input), and we want to add a new pending NaN state if we can (that is to say, if it isn't cancelled out by both inputs having arithmetic state). Do this by discarding the pending states on the inputs, intersecting them (to keep only the arithmetic state), then union in a pending nan state (which might do nothing, if it's arithmetic).
If the above sounds confusing, keep in mind that when a value is arithmetic, the act of performing the "NaN canonicalization" is a no-op. Thus, being arithmetic cancels out pending NaN states.
Unfortunately, this is quite buggy. For something as simple as F32Sub, to combine two ExtraInfos, we want to add a new pending_f32_nan(), unless both of the inputs are arithmetic_f32(). In this commit, we incorrectly calculate that we don't need a pending_f32_nan if either one of the inputs was arithmetic_f32().
It seemed like a good idea at the time, but in practice we discard the extra info all or almost all of the time.
This also introduces a new bug. In an operation like multiply, it's valid to multiply two values, one with a pending NaN and one without. As written, in the SIMD case (because of the two kinds of pending in play), we assert.
1002: Update the LLVM pass list. r=nlewycky a=nlewycky
# Description
Adds optimizations of loops, and inlinling and some simple interprocedural optimization.
Measured on the libsodium benchmarks, the new pass pipeline is a 2.35% geomean improvement. No major performance regressions known.
Co-authored-by: Nick Lewycky <nick@wasmer.io>
By exposing the target information through `CompilerConfig`,
compiler(only LLVM at the moment) could create a machine with
different CPU feature flags other than current host, which makes it
capable to "cross compile" to some degree.
Update #959
911: Don't emit bounds checks when the offset is known at compile time to be less than the minimum memory size. r=nlewycky a=nlewycky
Co-authored-by: Nick Lewycky <nick@wasmer.io>
This patch updates all the backends to use the definition of
`Trampoline` as defined in the `wasmer_runtime_core::typed_func`
module. That way, there is no copy of that type, and as such, it is
easier to avoid regression (a simple `cargo check` does the job).
This patch also formats the `use` statements in the updated files.
901: Set target triple and datalayout when creating the LLVM module. r=nlewycky a=nlewycky
We were giving LLVM a triple and datalayout only when producing native code from the LLVM IR. With this change, we tell LLVM as early as possible so that the entire optimization stack knows that it's safe to use target-specific constructs (including target intrinsics `@llvm.x86.sse2.ucomieq.sd`) as well as cost models (for autovectorization) and knowing the bitwidth of the registers so that we can know it's profitable to eliminate redundant extend/truncate casts.
Co-authored-by: Nick Lewycky <nick@wasmer.io>
904: Use getelementptr instruction instead of int_to_ptr and ptr_to_int. r=nlewycky a=nlewycky
The main part of this change is that we no longer turn pointers into integers to do arithmetic on them, then turn them back into pointers. Doing so is a signal to LLVM that it should not attempt to analyze the provenance of the pointers, disabling some optimizations. Using getelementptr allows us to perform arithmetic on pointers while keeping them in pointer types, which LLVM can then analyze.
Most of the textual change is a refactoring to reorder the operations. Previously the bounds checking and determining of the base and bounds were combined because you could put both into the same match, since both actions are performed differently depending on whether the memory is static or dynamic. In this case, we simply check the type twice and do two different things, with comments labelling the steps.
Co-authored-by: Nick Lewycky <nick@wasmer.io>