← Back to Projects

A Dashboard for Optimizing Prompts in Large Language Models

Project Overview

This research develops an interactive dashboard for optimizing prompts in large language models, specifically designed to assist prompt developers in improving text classification performance. The dashboard provides comprehensive visualizations, prompt experimentation tools, and insights into LLM attention mechanisms to bridge the gap between complex model capabilities and user-friendly interfaces.

Key Features

Dataset visualization with label distribution histograms and word clouds for understanding data characteristics
Interactive prompt engineering with template-based variable substitution and synonym suggestions
Real-time prompt testing with color-coded performance feedback on individual samples
Attention score visualizations to understand LLM decision-making processes
Prompt evaluation with confusion matrices and performance metrics across multiple samples
Support for multiple datasets (AG-news, Amazon Polarity, GLUE) and TinyLlama model integration

Technical Implementation

Built using modern web technologies with a three-component architecture: data selection, prompt engineering, and prompt evaluation. Implemented TinyLlama-1.1B-Chat-v1.0 via Hugging Face pipelines for efficient local inference. Features template-based prompt generation with variable substitution, WordNet-based synonym suggestions, and interactive attention visualization. The system processes text classification tasks with exact label matching and frequency-based prediction resolution for multi-label responses.

Key Findings & Impact

Demonstrated that prompt phrasing significantly impacts classification accuracy, with certain prompt structures achieving up to 80% accuracy on test samples. Attention visualizations revealed how LLMs focus on specific tokens when making predictions, providing insights for prompt optimization. The dashboard successfully bridges the gap between expert prompt engineering knowledge and accessible tools for non-experts, enabling more effective LLM utilization across various text classification domains.

Publication

Co-authored with Angelo Broere, Luuk Versteeg, and Tobie Werner. This work addresses the growing need for user-friendly tools in the prompt engineering space, inspired by PromptIDE and extending it with comprehensive attention visualization and evaluation capabilities. Code available at: https://github.com/Luuk-Versteeg/MMA-group3