594: Improvement cache in Windows r=syrusakbary a=syrusakbary
Caching was disabled on Windows, but can be re-enabled easily by improving the folder cache naming.
Reason why caching was disabled on Windows: We use a very long string (64 chars) for the wasmer version (hash). But we can use the version directly (no need to hashing)
Co-authored-by: Syrus Akbary <me@syrusakbary.com>
Before this change, 'wasmer run --backend=llvm some-simd.wasm' would run without complaint.
Also, note that the flag is not part of the cache key, so after any successful run, we can run it again without passing the flag.
Not handled here is @llvm.minnum and @llvm.maxnum which should be replaced with
@llvm.minimum and @llvm.maximum, but using those currently leads to LLVM backend
fatal errors.
These is one test failure remaining with V128 global variables.
* Fix trunc_sat. We need both the largest float that can be converted to an int
and the largest int, they are not the same number.
* Implement calling of functions that take V128 by passing in two i64's.
* Improve support for V128 in spectests. Parse binary modules with the same
features as the outer spectest. Fix compilation error involving Result in
emitted .rs file. Handle V128 in more cases when producing .rs file. Parse
the wast script with SIMD enabled.
* Adjust the WAVM spectest so that it parses with WABT and mostly passes with
wasmer. Wabt is particular about ints not having decimal places and floats
having decimal places. Wasmer does not support mutable globals or shared
memory. Tests of shuffles are disabled. Some assert_invalid tests that wabt
won't even parse are disabled.
"Then thou must count to three. Three shall be the number of the counting and the number of the counting shall be three. Four shalt thou not count, neither shalt thou count two, excepting that thou then proceedeth to three."
Fix typo in function name. Use two fcmp instructions instead of unpacking the bits of the IEEE float and using integer arithmetic to determine details about its value.
A few notes:
a) the inliner doesn't help because all the calls are indirect and not even opt -O2 can figure out which functions they're actually calling.
b) aggressive instruction combining is not a super-set of the instruction combiner. Instcombine is made up of a large number (probably 10,000s) of patterns, and some particularly slow ones were taken out and moved to the aggressive instruction combiner. Aggressive instcombine *only* runs that handful of optimizations, which fired zero times on our example wasm files.
c) NewGVN is not ready for production, it has asserts that fire when building sqlite or cowsay. This is why sqlite didn't build with the llvm backend.
d) Scalar-replacement-of-aggregates (sroa) is a strict superset of promote-memory-to-registers (mem2reg), and you probably want sroa because it's usually faster. It also fires 10,000s more times than mem2reg on lua.wasm.
e) Aggressive-dead-code-elimination was only deleting as much regular dead-code-elimination, but is slower because it depends on a postdominator tree (PDT) analysis that. Other passes don't need PDT so we'll have to build it for just this one pass (as opposed to regular dominator-tree which is reused by many passes). I've replaced this with bit-tracking dead-code-elimination which deletes more code than dce/adce.
543: update version numbers to 0.5.5 r=MarkMcCaskey a=MarkMcCaskey
544: Use bitcast instead of alloca+load+ptrcast+store sequence. r=MarkMcCaskey a=nlewycky
Co-authored-by: Mark McCaskey <mark@wasmer.io>
Co-authored-by: Mark McCaskey <markmccaskey@users.noreply.github.com>
Co-authored-by: Nick Lewycky <nick@wasmer.io>
Co-authored-by: nlewycky <nick@wasmer.io>