{"id":641,"date":"2025-10-15T17:09:01","date_gmt":"2025-10-15T09:09:01","guid":{"rendered":"https:\/\/189505.xyz\/?p=641"},"modified":"2025-11-08T22:23:54","modified_gmt":"2025-11-08T14:23:54","slug":"cutlass-cute","status":"publish","type":"post","link":"https:\/\/189505.xyz\/?p=641","title":{"rendered":"cutlass cute"},"content":{"rendered":"<p><a href=\"https:\/\/www.youtube.com\/watch?v=vzUhbDO_0qk\">https:\/\/www.youtube.com\/watch?v=vzUhbDO_0qk<\/a><br \/>\n<a href=\"https:\/\/zhuanlan.zhihu.com\/p\/662089556\">https:\/\/zhuanlan.zhihu.com\/p\/662089556<\/a><br \/>\n<a href=\"https:\/\/zhuanlan.zhihu.com\/p\/661182311\">https:\/\/zhuanlan.zhihu.com\/p\/661182311<\/a><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_40 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" area-label=\"ez-toc-toggle-icon-1\"><label for=\"item-69e01ab357645\" aria-label=\"Table of Content\"><span style=\"display: flex;align-items: center;width: 35px;height: 30px;justify-content: center;direction:ltr;\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/label><input  type=\"checkbox\" id=\"item-69e01ab357645\"><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/189505.xyz\/?p=641\/#coalesce\" title=\"coalesce\">coalesce<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/189505.xyz\/?p=641\/#1D_coord_and_natural_coord\" title=\"1D coord and natural coord\">1D coord and natural coord<\/a><ul class='ez-toc-list-level-2'><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/189505.xyz\/?p=641\/#another_example\" title=\"another example\">another example<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"coalesce\"><\/span>coalesce<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<pre><code>auto layout = Layout&lt;Shape &lt;_2,Shape &lt;_1,_6&gt;&gt;,\n                     Stride&lt;_1,Stride&lt;_6,_2&gt;&gt;&gt;{};\nauto result = coalesce(layout);    \/\/ _12:_1<\/code><\/pre>\n<p>cute \u7684\u8bf4\u6cd5<br \/>\nGeneralizing, consider a layout with just two integral modes, s0:d0 and s1:d1. Denote the result of coalescing this layout as s0:d0 ++ s1:d1. Then, there are four cases:<\/p>\n<ul>\n<li>s0:d0  ++  _1:d1  =&gt;  s0:d0. Ignore modes with size static-1.<\/li>\n<li>_1:d0  ++  s1:d1  =&gt;  s1:d1. Ignore modes with size static-1.<\/li>\n<li>s0:d0  ++  s1:s0<em>d0  =&gt;  s0<\/em>s1:d0. If the second mode\u2019s stride is the product of the first mode\u2019s size and stride, then they can be combined.<\/li>\n<li>s0:d0  ++  s1:d1  =&gt;  (s0,s1):(d0,d1). Else, nothing can be done and they must be treated separately.<\/li>\n<\/ul>\n<p>That\u2019s it! We can flatten any layout and apply the above binary operation to each pair of adjacent modes in order to \u201ccoalesce\u201d the modes of the layout.<br \/>\n\u6240\u4ee5\u6211\u4eec\u8981\u5148\u628a(2,(1,6)):(1,(6,2))\u7ed9flatten\uff0c\u518d\u5e94\u7528\u4e0a\u9762\u76844\u6761\u89c4\u5219<\/p>\n<h1><span class=\"ez-toc-section\" id=\"1D_coord_and_natural_coord\"><\/span>1D coord and natural coord<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<pre><code>  {\n\/*\nsize=  4    , 8   , 6         \nsize=        32   , 6         \nshape=((2,2),(4,2),(2,3))     \n\n\u4ece\u53f3\u5411\u5de6\u7b97\uff0c\u53f3\u8fb9\u7684\u9664\u4ee5(\u5de6\u8fb9\u6240\u6709mode\u7684size\u7684\u79ef), \u5de6\u8fb9\u7684%(\u5de6\u8fb9\u6240\u6709mode\u7684size\u7684\u79ef)\n191-&gt;(1,1),(3,1),(1,2)        \n1D to natural coord           \n   I%32,I\/32                  \n -&gt;(31, 5)                    \n   (31,(5%2,5\/2))             \n   (31,(1,2))                 \n -&gt;(31%4, 32\/4,(1,2))         \n -&gt;(3,    7,   (1,2))         \n-&gt;((3%2,3\/2),(7%4,7\/4),(1,2)) \n-&gt;((1  ,1  ),(3,  1  ),(1,2)) \n\n\u4ece\u53f3\u5411\u5de6\u505a\u4e58\u6cd5\uff0c\u4e58\u7684\u6570\u662f\u5de6\u8fb9\u6240\u6709mode\u7684size\u7684\u79ef\n   natural coord to 1D        \n  ((i,j),(k,l),(m,n))         \n-&gt;(i,j)+(k,l)*4+(m,n)*32      \n-&gt;(i+j*2)+(k+l*4)*4+(m+n*2)*32\n*\/\n  printf(&quot;test====================test idx2crd_v2\\n&quot;);\n  auto shape_v2 = cute::Shape&lt;cute::Shape&lt;_2,_2&gt;, cute::Shape&lt;_4,_2&gt;, cute::Shape&lt;_2,_3&gt;&gt;{};\n  sz_shape = cute::size(shape_v2);\n  printf(&quot;sz_shape:%d\\n&quot;, sz_shape); print(shape_v2); printf(&quot;\\n&quot;);\n  for (int i = 0; i &lt; sz_shape; i++) {\n    printf(&quot;%03d |&quot;, i); print(idx2crd(   i, shape_v2)); printf(&quot;\\n&quot;);\n  }\n  printf(&quot;\\n&quot;);\n  printf(&quot;test====================TEST IDX2CRD_V2\\n&quot;);\n  }\n<\/code><\/pre>\n<p>output:<br \/>\n<a href=\"https:\/\/static.189505.xyz\/\/blogTexts\/cutlass\/idx2crd_v2.txt\">https:\/\/static.189505.xyz\/\/blogTexts\/cutlass\/idx2crd_v2.txt<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"another_example\"><\/span>another example<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<pre><code>    (3,6,2,8):(w,x,y,z) \/ 72 shape:                                                                    \nshape(3,6,2,8)\/72            from left to right                                                        \n     (1,1,1,4)               if  num &lt; divisor write 1, and see left accumulate product as new dividend\n                             until left accumulate product &gt; divisor                                   \n\n                             if num &gt; divisor, write quotient, stop divide, right number keep untouch  \n\n  3    6   2    8      \/72   stride coefficient                                                        \n\n     72\/3 24\/6  4\/2     (3,6,2,8):(w,x,y,z) \/ 72                                                       \n  72   24 4     2      =&gt;(1,1,1,4):(72*w,24*x,4*y,2*z)                                                 \n\n  %X means to get X number, fetch from left to right,                                                  \n  &lt; X part, fetch all, until left accumulate product has X                                             \n  &gt; X, clip to X,  right are all 1                                                                     \n  (6,2)%2=&gt;(2,1) 6&gt;2 fetch 2,then 2*1=2                                                                \n  (6,2)%12=&gt;(6,2) 6&lt;12 fetch all=6, then 6*2=12                                                        \n  (3,6,2,8)%6=&gt;(3,2,1,1) 3&lt;6, fetch all=3, then 2 3*2=6, remain are all 1                              \n  (3,6,2,8) %  9 =&gt; (3,3,1,1)                                                                          <\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.youtube.com\/watch?v=vzUhbDO_0qk https:\/\/zhu &#8230; <a title=\"cutlass cute\" class=\"read-more\" href=\"https:\/\/189505.xyz\/?p=641\" aria-label=\"More on cutlass cute\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/189505.xyz\/index.php?rest_route=\/wp\/v2\/posts\/641"}],"collection":[{"href":"https:\/\/189505.xyz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/189505.xyz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/189505.xyz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/189505.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=641"}],"version-history":[{"count":8,"href":"https:\/\/189505.xyz\/index.php?rest_route=\/wp\/v2\/posts\/641\/revisions"}],"predecessor-version":[{"id":650,"href":"https:\/\/189505.xyz\/index.php?rest_route=\/wp\/v2\/posts\/641\/revisions\/650"}],"wp:attachment":[{"href":"https:\/\/189505.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=641"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/189505.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=641"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/189505.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=641"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}