cuda-core

Star

Here are 5 public repositories matching this topic...

ai-bond / flash-attention-v100

Star

Implementation of FlashAttention-2 for Nvidia Tesla V100

gpu-acceleration tensorcore flash-attention cuda-core

Updated Mar 22, 2026
Cuda

Bruce-Lee-LY / cuda_hgemv

Star

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

gpu cuda cublas nvidia gemm gemv matrix-multiply tensor-core hgemm cuda-core hgemv

Updated Sep 8, 2024
Cuda

Bruce-Lee-LY / decoding_attention

Star

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

gpu cuda inference nvidia mha mla multi-head-attention gqa mqa llm large-language-model flash-attention cuda-core decoding-attention flashinfer flashmla

Updated Jun 11, 2025
C++

aymanelrody / FlashMLA

Star

⚡ Optimize attention mechanisms with FlashMLA, a library of advanced sparse and dense kernels for DeepSeek models, improving performance and efficiency.

windows gpu cuda inference nvidia mha mla multi-head-attention gqa mqa llm flash-attention cuda-core decoding-attention deepseek flashinfer flashmla

Updated Mar 29, 2026
C++

kamalrss88 / FlashMLA

Star

🚀 Accelerate attention mechanisms with FlashMLA, featuring optimized kernels for DeepSeek models, enhancing performance through sparse and dense attention.

windows gpu cuda inference nvidia nvidia-cuda mla multi-head-attention mqa llm flash-attention cuda-core decoding-attention deepseek flashinfer flashmla

Updated Mar 29, 2026
C++

Improve this page

Add a description, image, and links to the cuda-core topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cuda-core topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda-core

Here are 5 public repositories matching this topic...

ai-bond / flash-attention-v100

Bruce-Lee-LY / cuda_hgemv

Bruce-Lee-LY / decoding_attention

aymanelrody / FlashMLA

kamalrss88 / FlashMLA

Improve this page

Add this topic to your repo