A Dashboard for Optimizing Prompts in Large Language Models
Project Overview
This research develops an interactive dashboard for optimizing prompts in large language models, specifically designed to assist prompt developers in improving text classification performance. The dashboard provides comprehensive visualizations, prompt experimentation tools, and insights into LLM attention mechanisms to bridge the gap between complex model capabilities and user-friendly interfaces.
Key Features
- Dataset visualization with label distribution histograms and word clouds for understanding data characteristics
- Interactive prompt engineering with template-based variable substitution and synonym suggestions
- Real-time prompt testing with color-coded performance feedback on individual samples
- Attention score visualizations to understand LLM decision-making processes
- Prompt evaluation with confusion matrices and performance metrics across multiple samples
- Support for multiple datasets (AG-news, Amazon Polarity, GLUE) and TinyLlama model integration
Technical Implementation
Built using modern web technologies with a three-component architecture: data selection, prompt engineering, and prompt evaluation. Implemented TinyLlama-1.1B-Chat-v1.0 via Hugging Face pipelines for efficient local inference. Features template-based prompt generation with variable substitution, WordNet-based synonym suggestions, and interactive attention visualization. The system processes text classification tasks with exact label matching and frequency-based prediction resolution for multi-label responses.
Key Findings & Impact
Demonstrated that prompt phrasing significantly impacts classification accuracy, with certain prompt structures achieving up to 80% accuracy on test samples. Attention visualizations revealed how LLMs focus on specific tokens when making predictions, providing insights for prompt optimization. The dashboard successfully bridges the gap between expert prompt engineering knowledge and accessible tools for non-experts, enabling more effective LLM utilization across various text classification domains.
Publication
Co-authored with Angelo Broere, Luuk Versteeg, and Tobie Werner. This work addresses the growing need for user-friendly tools in the prompt engineering space, inspired by PromptIDE and extending it with comprehensive attention visualization and evaluation capabilities. Code available at: https://github.com/Luuk-Versteeg/MMA-group3