Optimizing Mid-Sized Open-Weight Large Language Models: Fine-Tuning Strategies for Cost-Efficient AI Solutions

Authors

  • Nur-Adib Maspo
  • Ben DAIA Mounir
  • Henrik Eriksson

Keywords:

Large Language Models, Natural Language Processing, Parameter-Efficient Fine-Tuning (PEFT), Low-Rank Adaptation (LoRA), Quantized Low-Rank Adaptation (QLoRA)

Abstract

Large Language Models (LLMs) hold strong potential for enterprise customer service automation, particularly for schema-driven and multi-turn dialogue tasks. Yet, real-world adoption requires high accuracy while operating under strict computational and memory constraints. This study evaluates Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) as Parameter-Efficient Fine-Tuning (PEFT) methods for enhancing the LLaMA-8B model in structured automation involving function-call classification and JSON interactions. Experiments compared LoRA, QLoRA, and a zero-shot baseline across function-call accuracy, JSON validity, slot extraction, inference speed, and memory usage. LoRA achieved the best balance, with 84.62% F1-score in function-call accuracy and 95.38% compliance with JSON/schema guidelines. QLoRA reached perfect JSON validity (100%) but performed poorly in recognizing functions, recalling only 6.15%, making it overly conservative for real-world deployment. Both methods struggled with escalation intent detection. Findings highlight the e-off between accuracy and quantization efficiency in PEFT approaches. While QLoRA offers clear benefits in memory and latency, LoRA provides more reliable accuracy with modest resource overhead. This work offers practical guidance for selecting PEFT methods, identifying LoRA as a suitable option for deploying LLMs in structured enterprise customer service automation under resource constraints.

Downloads

Published

2026-06-15

How to Cite

Maspo, N.-A., Mounir, B. D., & Eriksson, H. (2026). Optimizing Mid-Sized Open-Weight Large Language Models: Fine-Tuning Strategies for Cost-Efficient AI Solutions. Open International Journal of Informatics, 14(1), 177–187. Retrieved from https://oiji.utm.my/index.php/oiji/article/view/389