Hello, I'm Evan.
linkedin instagram github

I like to learn. I like talking to people. And I like trying new things.

Let's build something cool together!


What I'm Doing

Applied Mathematics and Computer Science at Harvard University

Growth Intern at Perplexity

Fellow at Xfund

Contract Work at Arize AI


What I've Done

Summer Associate at Atomic

Software Engineering Intern at Offline Ventures

Product Intern at Alpha Studio

Product and Strategy Analyst at Upped Events

Software Engineering Intern at Spreetail

Principal at HUCP


What I've Researched and Written

Creating and Validating Synthetic Datasets for LLM Evaluation & Experimentation

Synthetic datasets are an increasingly important part of developing with large language models.

September 5, 2024

Read It

Advanced Guards: Moving To Dynamic Guardrails for LLM Applications

While static guards are great at filtering out predefined content like NSFW language, they struggle when faced with sophisticated attacks like jailbreak attempts, prompt injection, and more.

August 8, 2024

Read It

LLM Tracing: From Automatically Collecting Traces To Troubleshooting Your LLM App

Tracing is a powerful observability technique that offers developers an effective way to better see what goes on inside their LLM applications.

August 8, 2024

Read It

Text To SQL: Evaluating SQL Generation with LLM as a Judge

LLM-as-a-Judge provides a solid proxy for AI SQL generation performance, especially as a quick check on results.

August 1, 2024

Read It

Different Ways to Instrument Your LLM Application

This blog explores the different ways you can instrument your LLM application, comparing manual and automatic instrumentation techniques, and looking into the unique benefits that OpenTelemetry (OTEL) brings to the table.

July 25, 2024

Read It

LLM Performance At Time Series Analysis

People are trusting LLMs to parse some pretty complex data - can they even do simple time series analysis?

May 5, 2024

Read It

Getting the Generation Part Right In RAG

Most RAG research focuses on retrieval, but what can you do to get the generative part right?

April 4, 2024

Read It

Do Not Use LLMs to Generate Numeric Evals

Not only are LLMs bad at math, they often fail at basic numerical reasoning.

March 3, 2024

Read It

The Needle In a Haystack Test

Despite developers using context length as a model qualifier, LLMs often lose information buried in long prompts.

February 2, 2024

Read It

Evaluating Prompts: A Developer's Guide

Prompt engineering from the ground up. No, we are not LinkedIn lunatics selling you a course.

December 12, 2023

Read It

Benchmarking RAG Development

There are a lot of independent variables in RAG systems, and thoroughly testing across configurations is the only way to figure out what setup might work best for you.

November 11, 2023

Read It

Introduction To Retrieval Augmented Generation

Confused by serial Tweeter RAG takes? This guide breaks down the basics.

August 8, 2023

Read It

Shoot me a message on LinkedIn, Instagram, or email me at evanjolley@gmail.com to connect.