Maneesh Ujji | Applied AI / AI Systems Engineer

// applied ai · agent systems

Support Copilot · AI-assisted support system

RAG + human-in-the-loop approval + confidence scoring

View code

How a ticket moves through the system

// why i built it this way

Three confidence tiers, not two. Most systems do "confident" vs "not confident." I added a middle tier: the system still drafts, but explicitly flags its uncertainty so the reviewer knows to look closer.

Citations aren't optional. Every draft includes the source docs it pulled from. If the retrieval was weak or irrelevant, that's visible, so the reviewer can see the system's reasoning and not just its output.

Below 0.4, no draft at all. The system doesn't guess. It escalates directly to a human agent with the raw ticket. Some problems shouldn't have an AI answer.

RAGChromaDBsentence-transformersPythonHuman-in-the-Loop

Creator Inbox Intelligence Agent · Multi-agent email system

Classify, retrieve, draft, log: four agents, one inbox

View code

How four agents handle an email

// why i built it this way

Four specialized agents, not one general one. Each agent has a single job. The classifier doesn't draft. The drafter doesn't classify. This makes each component testable and replaceable independently.

Spam gets killed early. The classifier's first job is to stop spam from consuming any downstream compute. If it's spam, archive it and move on. No retrieval, no drafting, no logging.

Evaluation isn't afterthought. I built a full evaluation pipeline: accuracy per category, F1 scores, and confusion matrices against a baseline. If the classifier degrades, you know exactly where and by how much.

Multi-AgentGeminiSemantic SearchPythonEvaluation Pipeline

Flight Price Tracker Agent · LLM agent system

Amadeus API + Gemini + memory → buy/wait recommendations

View code

How the agent tracks and decides

// why i built it this way

JSON memory, not a database. For a personal tool tracking a handful of routes, a lightweight JSON file beats setting up a database. The memory stores price snapshots over time: simple, portable, and zero infrastructure.

LLM for analysis, not just comparison. A simple "is today cheaper than yesterday?" check is trivial. I use Gemini to analyze trends across multiple snapshots: seasonality patterns, rate of change, and whether a drop is noise or a real deal.

Fully automated. Windows Task Scheduler triggers the agent daily. It fetches, analyzes, and emails, with no manual intervention. I wake up and either get a recommendation or nothing (no news = no action needed).

LLM AgentGeminiAmadeus APISMTPPythonTask Automation

JobPulse · AI Chrome Extension

Scrape job listings → LLM analyzes resume fit → score in browser

View code

How JobPulse works in your browser

// why i built it this way

In the browser, not a separate app. Job seekers have 20 tabs open. Making them copy-paste into a separate tool breaks the flow. JobPulse runs right on the job listing page, with zero context switching.

Structured extraction, not raw text comparison. The LLM doesn't just compare blobs of text. It extracts structured fields (required skills, years of experience, tech stack) from the listing, then matches each against your resume's parsed skills.

Gaps matter more than matches. The most useful output isn't "you match 78%." It's "you're missing Kubernetes and they asked for it twice." That tells you whether to apply or what to study.

VueViteChrome ExtensionLLMJavaScript

// data science · ml pipelines

SpaceX Launch Prediction · IBM Data Science Capstone

End-to-end ML pipeline from raw launch data to stakeholder dashboard

View code

From raw launch data to stakeholder dashboard

Model comparison: same data, four approaches

Logistic regression

83.3%

linear baseline

SVM

83.3%

kernel-based

Decision tree
88.9%
captures feature splits
best performer

KNN

83.3%

instance-based

What the data revealed

KSC LC-39A
highest success-rate launch site

2013 →
success rate climbs sharply after 2013

FT + B5
booster versions with best odds

// why i built it this way

Four models, not one. The point wasn't to pick the best model; it was to show stakeholders that different approaches converge on similar accuracy, which builds confidence in the prediction itself.

Dashboard over notebook. The audience was non-technical. A Jupyter notebook is useless to them. The Plotly Dash dashboard lets them filter by launch site, orbit type, and booster version and see predictions update live.

SQL before modeling. Running SQL queries on the raw data first surfaced the patterns that mattered: launch site and booster version were the strongest predictors. The models confirmed what the data already showed.

PythonScikit-learnSQLPlotly DashPandasWeb Scraping

Text Emotion Recognition · NLP classification

Classify emotions in social media text across multiple categories

View code

From raw tweets to emotion labels

Prediction strength across categories

joy

strong

anger

strong

sadness

moderate

fear

moderate

surprise

weaker (fewer samples)

love

weaker (overlaps with joy)

// why i built it this way

Multi-category, not binary sentiment. "Positive vs negative" is a solved problem. Distinguishing joy from love, or anger from fear, in short informal text is where the real challenge is, and where NLP gets interesting.

Pipeline quality matters more than model choice. The preprocessing step (handling slang, emojis, hashtags, abbreviations) had more impact on accuracy than switching between models. Garbage in, garbage out.

Per-category evaluation, not just aggregate accuracy. 85% overall accuracy hides that the model might be terrible at detecting surprise (rare class) while great at joy (common class). I evaluated each emotion independently.

NLPPythonScikit-learnText Classification

Depression Detection (PHQ-9) · Predictive health analytics

Predict depression severity levels from PHQ-9 survey data

View code

From survey responses to severity prediction

// why i built it this way

Clinical scoring, not arbitrary labels. PHQ-9 has established clinical cutoffs. I used the validated severity bands (minimal/mild/moderate/moderately severe/severe) rather than inventing my own thresholds.

Age as a lens, not a feature. Instead of just throwing age into the model, I used it to segment the analysis, revealing that severity patterns differ meaningfully across age groups, which has clinical implications.

Visualization for awareness, not diagnosis. This is a research tool, not a clinical one. The outputs are designed to communicate patterns to researchers and public health audiences, not to diagnose individuals.

PythonMLData VisualizationHealthcare Analytics

Diabetes Prevalence Analysis · Public health data science

Do states with higher diabetes rates have worse COVID-19 outcomes?

View code

The question

U.S. state-level diabetes prevalence → COVID-19 mortality and hospitalization rates

The analysis pipeline

What the data showed

Positive correlation
Higher diabetes prevalence → higher COVID mortality, even controlling for baseline factors.

Regional disparities
Southern states cluster with the highest prevalence and the highest outcome severity.

Confounders matter
Income, healthcare access, and obesity overlap with diabetes rates. Flagged, not ignored.

// why i built it this way

Research question first, not technique first. I didn't start with "let me try regression." I started with "does diabetes prevalence predict COVID outcomes?" The method follows the question.

Confounders acknowledged, not ignored. The correlation between diabetes and COVID mortality is real, but income, obesity, and healthcare access travel with it. The analysis flags this instead of overstating the finding.

Audience-aware visualization. The outputs were designed for people who don't read scatter plots. Clear labels, annotated axes, and takeaway-first summaries rather than raw statistical output.

PythonPandasRegressionData VisualizationPublic Health

// experience

Jan 2026 to Present

Marketing Data Analyst at Aramark

Cleveland, OH

Production data systems for university dining. Backend debugging, student account fixes, operational website maintenance. Exploring AI-assisted email triage.

Oct 2024 to Dec 2025

IT Analyst at Aramark

Cleveland, OH

Backend troubleshooting for student meal plan systems. Built and deployed the VikingFoodCo website used campus-wide.

May 2024 to Jan 2025

Graduate Research Assistant at Cleveland State University

Cleveland, OH

Thermal imagery object detection. Simulation-based transfer learning. Visual performance reporting.

// principles

Systems should surface uncertainty, not hide it.

Human review before automated action.

Evaluate against failure modes, not just accuracy.

Build for the case where the model is wrong.

// education

M.S. Computer Science from Cleveland State University

Dec 2025

B.Tech Computer Science Engineering from Avanthi Institute of Engineering and Technology

Nov 2022

OpenAI Prompt Engineering Google/Kaggle AI Agents DeepLearning.AI Supervised ML