cutlass cute

https://www.youtube.com/watch?v=vzUhbDO_0qk
https://zhuanlan.zhihu.com/p/662089556
https://zhuanlan.zhihu.com/p/661182311

coalesce

auto layout = Layout<Shape <_2,Shape <_1,_6>>,
                     Stride<_1,Stride<_6,_2>>>{};
auto result = coalesce(layout);    // _12:_1

cute 的说法
Generalizing, consider a layout with just two integral modes, s0:d0 and s1:d1. Denote the result of coalescing this layout as s0:d0 ++ s1:d1. Then, there are four cases:

  • s0:d0 ++ _1:d1 => s0:d0. Ignore modes with size static-1.
  • _1:d0 ++ s1:d1 => s1:d1. Ignore modes with size static-1.
  • s0:d0 ++ s1:s0d0 => s0s1:d0. If the second mode’s stride is the product of the first mode’s size and stride, then they can be combined.
  • s0:d0 ++ s1:d1 => (s0,s1):(d0,d1). Else, nothing can be done and they must be treated separately.

That’s it! We can flatten any layout and apply the above binary operation to each pair of adjacent modes in order to “coalesce” the modes of the layout.
所以我们要先把(2,(1,6)):(1,(6,2))给flatten,再应用上面的4条规则

1D coord and natural coord

  {
/*
size=  4    , 8   , 6         
size=        32   , 6         
shape=((2,2),(4,2),(2,3))     

从右向左算,右边的除以(左边所有mode的size的积), 左边的%(左边所有mode的size的积)
191->(1,1),(3,1),(1,2)        
1D to natural coord           
   I%32,I/32                  
 ->(31, 5)                    
   (31,(5%2,5/2))             
   (31,(1,2))                 
 ->(31%4, 32/4,(1,2))         
 ->(3,    7,   (1,2))         
->((3%2,3/2),(7%4,7/4),(1,2)) 
->((1  ,1  ),(3,  1  ),(1,2)) 

从右向左做乘法,乘的数是左边所有mode的size的积
   natural coord to 1D        
  ((i,j),(k,l),(m,n))         
->(i,j)+(k,l)*4+(m,n)*32      
->(i+j*2)+(k+l*4)*4+(m+n*2)*32
*/
  printf("test====================test idx2crd_v2\n");
  auto shape_v2 = cute::Shape<cute::Shape<_2,_2>, cute::Shape<_4,_2>, cute::Shape<_2,_3>>{};
  sz_shape = cute::size(shape_v2);
  printf("sz_shape:%d\n", sz_shape); print(shape_v2); printf("\n");
  for (int i = 0; i < sz_shape; i++) {
    printf("%03d |", i); print(idx2crd(   i, shape_v2)); printf("\n");
  }
  printf("\n");
  printf("test====================TEST IDX2CRD_V2\n");
  }

output:
https://static.189505.xyz//blogTexts/cutlass/idx2crd_v2.txt

another example

    (3,6,2,8):(w,x,y,z) / 72 shape:                                                                    
shape(3,6,2,8)/72            from left to right                                                        
     (1,1,1,4)               if  num < divisor write 1, and see left accumulate product as new dividend
                             until left accumulate product > divisor                                   

                             if num > divisor, write quotient, stop divide, right number keep untouch  

  3    6   2    8      /72   stride coefficient                                                        

     72/3 24/6  4/2     (3,6,2,8):(w,x,y,z) / 72                                                       
  72   24 4     2      =>(1,1,1,4):(72*w,24*x,4*y,2*z)                                                 

  %X means to get X number, fetch from left to right,                                                  
  < X part, fetch all, until left accumulate product has X                                             
  > X, clip to X,  right are all 1                                                                     
  (6,2)%2=>(2,1) 6>2 fetch 2,then 2*1=2                                                                
  (6,2)%12=>(6,2) 6<12 fetch all=6, then 6*2=12                                                        
  (3,6,2,8)%6=>(3,2,1,1) 3<6, fetch all=3, then 2 3*2=6, remain are all 1                              
  (3,6,2,8) %  9 => (3,3,1,1)                                                                          

Leave a Comment