Options
Expressions with =
write into an existing array, while those with :=
do not. This is the same notation as TensorOperations.jl and Einsum.jl. But unlike those packages, sometimes the result of @cast
is a view of the original, for instance @cast A[i,j] := B[j,i]
gives A = transpose(B)
. You can forbid this, and insist on a copy, by writing |=
instead (or passing the option collect
).
Various other options can be given after the main expression.
Ways of slicing
The default way of slicing creates an array of views, but if you use |=
, or the option lazy=false
, then instead then you get copies:
M = rand(1:99, 3,4)
@cast S[k][i] := M[i,k] # collect(eachcol(M)) ≈ [view(M,:,k) for k in 1:4]
@cast S[k][i] |= M[i,k] # [M[:,k] for k in 1:4]
@cast S[k][i] := M[i,k] lazy=false # the same
The default way of un-slicing uses LazyStack.jl to create a view. The keyword lazy=false
after the expression will turn this off, to make a solid array using Base's code:
@cast A[i,k] := S[k][i] # A = LazyStack.stack(B)
@cast A[i,k] := S[k][i] lazy=false # A = reduce(hcat, B)
size(A) == (3, 4) # true
Another kind of slices are provided by StaticArrays.jl, in which a Vector of SVectors is just a different interpretation of the same memory as a Matrix. By another slight abuse of notation, such slices are written here as curly brackets:
@cast S[k]{i} := M[i,k] i in 1:3 # S = reinterpret(SVector{3,Int}, vec(M))
@cast S[k] := M{:3, k} # equivalent
@cast R[k,i] := S[k]{i} # such slices can be reinterpreted back again
Both S
and R
here are views of the original matrix M
. When creating such slices, their size ought to be provided, either as a literal integer or through the types. Note that you may also write S[k]{i:3} = ...
.
The second notation (with M{:,k}
) is useful for mapslices. Continuing the example:
M10 = rand(10,1000);
mapslices(cumsum, M10, dims=1) # 630 μs using @btime
@cast _[i,j] := cumsum(M10[:,j])[i] # 64 μs
@cast _[i,j] := cumsum(M10{:10,j})[i] # 38 μs
Better broadcasting
The package Strided.jl can apply multi-threading to broadcasting, and some other magic. You can enable it like this:
using Strided # and export JULIA_NUM_THREADS = 4 before starting
A = randn(4000,4000); B = similar(A);
@time @cast B[i,j] = (A[i,j] + A[j,i])/2; # 0.12 seconds
@time @cast @strided B[i,j] = (A[i,j] + A[j,i])/2; # 0.025 seconds
The package LoopVectorization.jl provides a macro @turbo
which modifies broadcasting to use vectorised instructions. This can be used as follows:
using LoopVectorization, BenchmarkTools
C = randn(40,40);
D = @btime @cast _[i,j] := exp($C[i,j]); # 13 μs
D′ = @btime @cast @turbo _[i,j] := exp($C[i,j]); # 3 μs
D ≈ D′
When broadcasting and then summing over some directions, it can be faster to avoid creating the entire array, then throwing it away. This can be done with the package LazyArrays.jl which has a lazy BroadcastArray
. In the following example, the product V .* V' .* V3
contains about 1GB of data, the writing of which is avoided by giving the option lazy
:
using LazyArrays
V = rand(500); V3 = reshape(V,1,1,:);
@time @reduce W[i] := sum(j,k) V[i]*V[j]*V[k]; # 0.6 seconds, 950 MB
@time @reduce @lazy W[i] := sum(j,k) V[i]*V[j]*V[k]; # 0.025 s, 5 KB
However, right now this gives 2.1 seconds (16 allocations: 4.656 KiB)
, so it seems to be saving memory but not time. Something is broken!
Less lazy
To disable the default use of PermutedDimsArray
-like wrappers, etc, give the option lazy=false
:
@pretty @cast Z[y,x] := M[x,-y] lazy=false
# Z = transmutedims(reverse(M, dims = 2), (2, 1))
@pretty @cast Z[y,x] := M[x,-y]
# Z = transmute(Reverse{2}(M), (2, 1)) # transpose(@view M[:, end:-1:begin])
This also controls how the extraction of diagonal elements and creation of diagonal matrices are done:
@pretty @cast M[i,i] := A[i,i] lazy=false
# M = diagm(0 => diag(A))
@pretty @cast D[i,i] := A[i,i]
# D = Diagonal(diagview(A)) # Diagonal(view(A, diagind(A)))
@pretty @cast M[i,i] = A[i,i] lazy=false # into diagonal of existing matrix M
# diagview(M) .= diag(A); M
Gradients
I moved some code here from SliceMap.jl, which I should describe!