Back to Home

Front AI Drafter

AI-powered customer support system that automatically categorizes messages and generates contextual email draft replies

Role Operations & AI Integration
Accuracy 98%+ accuracy (categorizer, rejection response, initial agents)
Scale 17,000+ messages/year

The Challenge

allUP's ticket volume grew 4x in six months, from ~750 messages in January '25 to 3,000+ at peak.

Support Volume Growth Messages received per month (Jan–Jul 2025)
3,000 2,000 1,000 0 737 1,611 2,050 791 994 1,331 3,000 JanFebMarAprMayJunJul
Monthly volume Growth trend

The complexity made it harder: our support required understanding the platform, user workflows, and internal knowledge base. We needed AI that could handle nuance, not just FAQs.

Why We Built Custom

We tried the obvious route first: I built out a Front Knowledge Base and enabled their AI features. Two problems emerged.

1. No access to live data. Front AI could only pull from the KB. Anything requiring platform context or company/candidate-specific information meant wrong answers or escalation.

2. Generic responses. It couldn't recognize patterns in how I'd handled similar emails. Every reply felt like talking to a chatbot instead of matching our established tone and consistency.

$0.70/resolution

For a first draft, not a solution.
I still had to review and edit every output.

Our Approach

We decided to build in-house. I analyzed months of support tickets to identify patterns: more than half could be solved with standard knowledge, a significant portion needed live platform data, and only a small fraction truly required manual response.

I built the system in two phases. Phase one: an AI agent that categorized messages and drafted personalized responses using our knowledge base and the team's voice patterns. Phase two: deeper personalization by pulling real-time candidate data, role deadlines, and interview progress via MCP integration.

Development Workflow

The project followed a rigorous iterative development cycle that connected multiple tools in a continuous improvement loop. Each step in the workflow builds upon the previous, creating a seamless pipeline from development to deployment.

Identify Issue
Edit in Cursor
Git Workflow
Agent Builder Sync
Test in Braintrust
Review Results

1. Development in Cursor IDE

All agent development happens in Cursor IDE. I edit agent prompts, instructions, and configurations in src/agents/agentDefinitions.ts. This file contains all agent definitions including the main categorizer, sub-categorizer, and all specialized agents (positive response, rejection response, and 9 sub-agents for the 'other' category).

Common edits include updating agent instructions/prompts, adding few-shot examples to improve agent behavior, adjusting model settings (reasoning effort, temperature, etc.), and updating guardrails configuration (PII detection, moderation, etc.). Before committing, I test changes using local test scripts like replay-test-messages.ts or test-categorizer-failures.ts to verify the agent behaves as expected with sample inputs.

2. Version Control with Git

After testing locally, changes move through a structured Git workflow. I pull the latest from the main branch, install dependencies, build and verify locally, then commit changes with descriptive messages. Pull requests enable code review and automated testing through GitHub Actions, ensuring code quality before merging.

When creating a pull request, Cursor IDE can generate a web link, title, and description to streamline the PR creation process. After review and approval, merging the PR triggers automated deployment via GitHub Actions.

3. Configuration in OpenAI Agent Builder

Agent configurations are managed in OpenAI Agent Builder, allowing for easy adjustment of model settings, instructions, and guardrails. If changes were made in Agent Builder, I export updated agents or verify that code changes are reflected correctly. This ensures configuration matches for Braintrust testing and maintains consistency across the development workflow.

4. Testing & Evaluation in Braintrust

Braintrust serves as our testing and evaluation platform for validating agent performance before deployment. I use the playground to test agent behavior interactively, configure settings to match Agent Builder, and test different prompts to see how outputs change.

I review detailed traces from agent runs showing how the agent reasoned through responses, tool calls made, step-by-step decision making, and guardrails checks. Manual review of 100-300 traces per iteration ensures quality and accuracy improvements. I test categorization accuracy, verify routing to correct categories, and review draft output quality for tone appropriateness, context understanding, and accuracy of information.

Based on traces, logs, and playground testing, I identify what needs improvement, determine which few-shot examples to add, refine agent instructions, and adjust categorization logic—documenting insights for the next iteration in Cursor.

Putting It All Together

Each iteration follows the same pattern, creating a continuous improvement loop:

1. Identify Need for Improvement

Trigger points include Braintrust traces showing issues, manual review revealing patterns, playground testing showing unexpected behavior, or monitoring logs indicating trends.

2. Edit in Cursor

Make code changes, update agent definitions, modify instructions/prompts, add few-shot examples, and test locally using test scripts.

3. Commit and Push (Git Workflow)

Pull latest from main, install dependencies, build and verify locally, commit changes, push to remote, and create pull request. GitHub Actions automatically runs tests and deploys.

4. Sync with Agent Builder

Export updated agents from Agent Builder if changes were made there, or verify in Agent Builder that code changes are reflected correctly. Review agent settings to ensure configuration matches for Braintrust testing.

5. Test in Braintrust

Configure playground with settings from Agent Builder, test agents interactively with sample messages, run test batches to generate traces, and review 100-300 traces manually for accuracy.

6. Review Results and Iterate

Analyze Braintrust results, compare before/after performance, and decide: if results are good, mark complete and move to next improvement; if results need work, go back to Step 2 to iterate further.

Key Results

98%+
Agent Accuracy

Achieved across categorizer, rejection response agent, and initial agents through iterative prompt engineering and testing

17,000+
Messages/Year

System built to handle the full volume of support tickets at scale

Continuous
Iterative Improvement

Established workflow for ongoing refinement and optimization

Technologies & Tools

OpenAI Agents SDK TypeScript Braintrust Git Cursor IDE OpenAI Agent Builder GitHub Actions Prompt Engineering

Lessons Learned

The biggest takeaway: AI systems need a real development workflow, not just prompt tweaking. The iterative cycle of development → testing → evaluation → refinement proved essential for achieving high accuracy. Key learnings included:

  • The value of comprehensive testing infrastructure for AI systems
  • How iterative prompt engineering can dramatically improve accuracy
  • The importance of manual trace review in identifying edge cases
  • How tool integration enables efficient AI development workflows

Resume

Download

Unable to display PDF. Download instead