• dragontamer@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    1 year ago

    Professor Lemire btw, is a high-performance professor who has been doing a lot of AVX512 techniques / articles for the past few years. His blogposts are very popular on Hacker News (news.ycombinator.com). Pretty cool guy, I think its well worth it to follow his blog if you’re into low-level assembly, low-level memory optimizations and the like.


    pext (and the reverse, pdep) are basically a 64-bit bitwise gather and 64-bit bitwise scatter instruction. On Intel, they execute in 1-tick, but on AMD they execute on 19-ticks (at least, a few years ago). Rumor is that the newest AMD chips are faster at it.

    pdep and pext are some of my favorite functions, because gather/scatter is an important supercomputer / parallelism concept, and Intel invented an extremely elegant way to describe bit-movement in 64-bit registers. Given the huge importance of gather/scatter is to supercomputer algorithms of the past 40 years, I expect many, many more applications of pdep/pext.

    My own experiments with pdep and pext was to create a small-sized bit-scale relational database for solving 4-coloring theorem (like) problems. I was able to implement “select” with a pext, and “joins” as a pdep. (4-bits is a single-column table. 16-bits for a dual-column table. 64-bits for a triple-column table).

    • liam@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I agree he’s doing very cool work! I’m just waiting for the day I get to use some of his tidbits! :)