news 2026/6/6 5:19:03

CANN/amct GPTQ量化示例

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
CANN/amct GPTQ量化示例

AMCT Large Model GPTQ Quantization

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

1 Quantization Prerequisites

1.1 Install Dependencies

The dependency packages for this sample can be found in requirements.txt

Note that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed

1.2 Model and Dataset Preparation

This sample uses Llama2-7b and qwen2-7b models, pileval data, and wikitext2 dataset as examples. Data is loaded online, and models need to be downloaded by users themselves and the model path needs to be specified when executing the script.

1.3 Simple Quantization Configuration

The quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:

INT4 weight-only quantization configuration:from amct_pytorch import INT4_GPTQ_WEIGHT_QUANT_CFGMXFP4_E2M1 weight-only quantization configuration:

cfg = { 'batch_num': 1, 'quant_cfg': { 'weights': { 'type': 'mxfp4_e2m1', 'symmetric': True, 'strategy': 'group', 'group_size': 32 }, }, 'algorithm': {'gptq'}, 'skip_layers': {'lm_head'} }

If you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.

The GPTQ algorithm only supports weight quantization. The supported quantization types and quantization configurations are:

FieldTypeDescriptionValue RangeNotes
batch_numuint32Number of batches used for quantization1/
skip_layersstrLayers to skip quantization/Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or letters
weights.typestrQuantized weight type'int4'/'int8'/'float4_e2m1'/'mxfp4_e2m1'/
weights.symmetricboolSymmetric quantizationTRUE/FALSEfloat4_e2m1 and mxfp4_e2m1 only support symmetric quantization configuration
weights.strategystrQuantization granularity'tensor'/'channel'/'group'float4_e2m1 and mxfp4_e2m1 only support group strategy configuration
algorithmdictQuantization algorithm configuration used{'gptq'}/

2 Quantization Example

2.1 Use Interface Method to Call

step 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model and dataset paths in the sample program according to actual conditions:

python3 src/run_llama2_samples.py --model_path=/data/Llama2_7b_hf/
python3 src/run_qwen_samples.py --model_path=/data/Qwen2-7b/

If the following information appears, it indicates that quantization is successful:

Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707

step 2.Recommended to use the following configuration

Where Score is the quantized model PPL. For specific values, refer to the following table:

ModelCalibration SetDatasetPre-quantization PPLPost-INT4 quantization PPLPost-MXFP4 quantization PPL
LLAMA2-7Bpilevalwikitext25.4725.6015.799
QWEN2-7Bpilevalwikitext27.1377.2537.305

After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/6 5:14:01

凸性:商业优化的隐形安全阀与决策可靠性基石

1. 项目概述:为什么一家零售企业会为“凸性”这个数学概念开三次跨部门会议?“当优化真正起效时:凸性在商业决策中的角色”——这个标题乍看像某本冷门运筹学教材的副标题,但过去三年里,我亲眼见过它出现在三家不同行业…

作者头像 李华
网站建设 2026/6/6 5:01:07

终极Windows防护神器:OpenArk免费Rootkit检测工具完全指南

终极Windows防护神器:OpenArk免费Rootkit检测工具完全指南 【免费下载链接】OpenArk The Next Generation of Anti-Rookit(ARK) tool for Windows. 项目地址: https://gitcode.com/GitHub_Trending/op/OpenArk 你的Windows系统是否曾遭遇难以察觉的安全威胁&…

作者头像 李华
网站建设 2026/6/6 4:59:12

5分钟搞定网易云QQ音乐歌词:163MusicLyrics终极免费解决方案

5分钟搞定网易云QQ音乐歌词:163MusicLyrics终极免费解决方案 【免费下载链接】163MusicLyrics 云音乐歌词获取处理工具【网易云、QQ音乐】 项目地址: https://gitcode.com/GitHub_Trending/16/163MusicLyrics 还在为找不到准确的音乐歌词而烦恼吗&#xff1f…

作者头像 李华