llm-quantization

Here are 20 public repositories matching this topic...

snu-mllab / GuidedQuant

Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)

quantization efficient-inference large-language-models llm-inference llm-quantization

Updated Apr 13, 2026
Python

zlaabsi / opentq

Star

Open quantization tooling for TurboQuant-style low-bit LLM releases, stock GGUF deployment, and Apple Silicon runtime experiments.

apple toolkit tooling tensor quantization apple-silicon llm llm-inference gguf llm-quantization turboquant

Updated May 5, 2026
Python

GongCheng1919 / bias-compensation

Star

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

post-training-quantization llm-compression output-error-optimization bias-compensation llm-quantization

Updated Mar 12, 2025
Python

Dookoo2 / SVSK

Star

Q4 quantization method

llm llms llm-inference llm-quantization

Updated Apr 28, 2026
Python

Iro96 / TurboQuant-H

Star

A more deep research about TurboQuant algorithms

machine-learning algorithms llm llm-quantization turboquant

Updated Apr 6, 2026
Python

A high-performance, memory-efficient healthcare framework that deploys fine-tuned Large Language Models (LLMs) on edge devices. Multi-agent system to provide personalized diagnostic reasoning, health education, and dietary planning.

lora multi-agent-systems qlora peft-fine-tuning-llm llm-quantization

Updated Sep 7, 2025
Jupyter Notebook

t81dev / ternary

Star

Ternary Quantization for LLMs: Implement balanced ternary (T3_K) weights for 2.63-bit quantization—the first working solution for modern large language models.

balanced-ternary llama-cpp gguf llm-quantization ai-efficiency ternary-logic

Updated Nov 29, 2025
C++

Kyworn / ShiftQuant

Star

Shift-based post-training quantization analysis for LLMs (ShiftQuant paper)

python machine-learning research neural-networks llm-quantization

Updated Mar 28, 2026
Python

0DevDutt0 / EdgeMind

Star

Production-grade LLM quantization, benchmarking, and edge deployment toolkit. Supports bitsandbytes INT8/INT4, GPTQ (Hessian calibration), AWQ (activation-aware), and GGUF (Q2_K–Q8_0). Four-dimensional benchmarking: perplexity, TPS/TTFT, VRAM profiling, and LLM-as-Judge quality scoring. RTX 5090 Blackwell sm_120 ready.

Updated Jun 14, 2026
Python

hemantjuyal / LLM-Quantization-Lab

Star

LLM quantization project built around `llama.cpp` + `Ollama` + `GGUF`

large-language-models llama-cpp ollama llm-quantization llama-models

Updated Mar 22, 2026
Python

MagicTeaMC / AutoGGUF

Star

Let me make GGUF files quickly

llm llamacpp llama-cpp gguf llm-quantization

Updated Jun 4, 2025
Python

Danny1218 / quantization-autopsy

Star

Paired capability-level GGUF quantization fragility benchmark across Qwen2.5-3B and SmolLM2 1.7B.

benchmark model-evaluation llama-cpp qwen gguf llm-quantization smollm2

Updated Jun 25, 2026
Python

nagababumo / Quantization-in-Depth

Star

pytorch quantization dequantization 2-bit hugging-face hugging-face-hub llm-quantization torch-quantization

Updated Jun 26, 2024
Jupyter Notebook

violinmelody / CelestiaLLM

Star

Local & lightweight LLM inference runtime in C++ with support for GGUF & quantization

open-source library opensource cpp17 mit-license cpp-library cpp-lib llm cpp-module llm-inference llm-local llm-tools llm-framework gguf llm-library llm-quantization llm-integration lightweight-llm

Updated Feb 27, 2026

JuiceB0xC0de / GWIQ-atlas

Star

GWIQ-Atlas: is a brain-atlasing and model-interpretability suite that combines per-layer census, compliance behaviour tracing, SAE features, and quantization analyses for LLMs.

python transformers quantization sae sparse-autoencoder brain-atlas huggingface activation-analysis mechanistic-interpretability llm-analysis llm-quantization llm-interpretability model-atlas feature-census

Updated Jun 30, 2026
Python

Kyworn / PentaNet-v1.0

Star

PentaNet extends BitNet's ternary quantization to pentanary {-2,-1,0,+1,+2}, improving perplexity by 6.4% at 124M params while preserving zero-multiplier arithmetic.

python machine-learning neural-networks model-optimization llm-quantization

Updated Apr 17, 2026
Python

Ealow1971 / low-latency-inference-engine

Star

A high-performance inference engine optimized for deploying quantized LLMs on edge devices. Focuses on SIMD optimizations and memory management.

machine-learning performance ai deep-learning cpp mlops edge-ai senior-engineer llm-quantization

Updated Apr 9, 2026
C++

aioffgrid / OVForge

Star

OpenVINO Model Manager — desktop GUI for Intel Arc

linux-gui openvino intel-gpu nncf pyqt6 intel-arc llm-tools optimum-intel llm-quantization llm-conversion

Updated Jun 22, 2026
Python

paraglondhe098 / sentiment-classification-llm

Star

Implemented and fine-tuned BERT for a custom sequence classification task, leveraging LoRA adapters for efficient parameter updates and 4-bit quantization to optimize performance and resource utilization.

nlp lora quantization data-augmentation nlp-augmentation llm qlora llm-fine-tuning peft-fine-tuning-llm llm-quantization

Updated Dec 30, 2024
Jupyter Notebook

samarthamp / advanced-nlp-course-projects

Star

Implementation of advanced Natural Language Processing architectures and optimization techniques, built from scratch. The projects focus on understanding the internal mechanics of Transformers, LLM efficiency through quantization, and scaling via Mixture-of-Experts (MoE).

load-balancing mixture-of-experts transformer-architecture positional-encoding llm-fine-tuning llm-quantization

Updated Jan 8, 2026
Python

Improve this page

Add a description, image, and links to the llm-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-quantization

Here are 20 public repositories matching this topic...

snu-mllab / GuidedQuant

zlaabsi / opentq

GongCheng1919 / bias-compensation

Dookoo2 / SVSK

Iro96 / TurboQuant-H

nithya333 / Medi-LLM

t81dev / ternary

Kyworn / ShiftQuant

0DevDutt0 / EdgeMind

hemantjuyal / LLM-Quantization-Lab

MagicTeaMC / AutoGGUF

Danny1218 / quantization-autopsy

nagababumo / Quantization-in-Depth

violinmelody / CelestiaLLM

JuiceB0xC0de / GWIQ-atlas

Kyworn / PentaNet-v1.0

Ealow1971 / low-latency-inference-engine

aioffgrid / OVForge

paraglondhe098 / sentiment-classification-llm

samarthamp / advanced-nlp-course-projects

Improve this page

Add this topic to your repo