March 21, 2026

Data Analysis Prompts for SQL, Python, and Beyond

There's a saying in data work that 80% of the job is wrangling — cleaning messy CSVs, writing the same aggregation query for the fifth time this week, reformatting columns because someone decided to store dates as strings. The actual analysis, the insight-finding part, is the other 20%.

AI doesn't replace the thinking. But it can absolutely eat through the boring 80% so you have more time for the part that actually matters. The trick is knowing which prompts to reach for. Here's what's actually useful, by category.

SQL

SQL is where AI shines the most for data work. Writing joins, window functions, and CTEs from scratch is tedious — and AI gets it right fast, especially when you give it your schema upfront.

"I have two tables: orders(order_id, customer_id, amount, created_at) and customers(customer_id, name, country). Write a query that shows the top 10 customers by total spend in the last 90 days, including their country."

"Rewrite this query using a window function instead of a subquery so it runs faster on a large table: [paste query]"

The schema detail is everything here. Vague prompts get vague SQL. Tell it your table names, column names, and what dialect you're using (PostgreSQL, BigQuery, SQLite, etc.) and the output is usually production-ready.

Analysis & Exploration

You've got a dataset in front of you and you're not sure where to start. AI is great at suggesting an analytical angle you hadn't thought of.

"I have a CSV with columns: user_id, session_start, session_end, pages_viewed, converted. What are the most interesting things I should look at first to understand what drives conversion?"

Use it as a thinking partner here, not an answer machine. Ask it to suggest hypotheses. Ask it what questions the data probably can't answer. That back-and-forth is where you get value.

Visualization

Choosing the right chart type is underrated. AI is surprisingly good at this — tell it what you're trying to communicate and let it recommend the format.

"I want to show how monthly revenue has trended over 18 months, broken down by product category. What chart type works best, and can you write the Python code for it using matplotlib?"

You can also ask it to critique an existing visualization: paste a description or a snippet of your plotting code and ask "what would make this clearer for a non-technical audience?" The answers are usually practical.

Data Cleaning

This is the real timesaver. Cleaning tasks are repetitive and annoying to write from scratch, but they follow predictable patterns that AI handles well.

"I have a pandas DataFrame with a phone_number column. The values are inconsistently formatted — some have dashes, some have parentheses, some have country codes, some don't. Write a function that normalizes all of them to E.164 format."

Deduplication — find and merge near-duplicate rows with fuzzy matching
Type coercion — fix columns that should be dates or numbers but aren't
Missing values — decide on a strategy and implement it in one shot
Outlier detection — flag anomalies worth investigating before modeling

For cleaning tasks, always describe what "messy" looks like. Paste a few example bad values so the AI can see the actual problem, not a hypothetical one.

Python & R

Whether you're a Python person or an R person, you've spent time Googling the exact syntax for something you've done before. Just ask.

"In pandas, I have a DataFrame grouped by category. I want to apply a different aggregation function to each column — sum for revenue, mean for conversion_rate, and count for orders. Show me how to do this in one groupby call."

It's also great for translating between the two. If you have Python code and need to hand it off to an R user, prompt it: "Convert this pandas code to the tidyverse equivalent." Works well enough to save a lot of time.

Reporting

The analysis is done. Now you have to explain it to people who weren't in the room. This is where AI can do the heavy lifting on the writing.

"Here are the key findings from my analysis: [paste bullet points]. Write an executive summary for a non-technical audience. Keep it under 200 words, lead with the most important finding, and avoid jargon."

Specify the audience every time. "Non-technical stakeholders" gets a very different result than "the data team" or "the board." The more context you give about who's reading and what decision they need to make, the more useful the output.

Two tips that apply to every category

First: always describe your schema or data structure at the top of the prompt. It sounds obvious, but most people skip it and then wonder why the output doesn't fit their actual data. Even two lines — column names and types — changes the quality dramatically.

Second: specify the output format you want. "Give me a Python function" gets different output than "give me a Jupyter notebook cell" or "give me a SQL query I can paste into dbt." AI will default to whatever feels natural to it; you should tell it what fits your workflow.

The prompts that work best treat AI like a smart colleague who needs context, not a magic box that reads your mind. Give it the schema, give it an example, tell it what format you need — and the boring 80% starts feeling a lot shorter.

Ready to try these prompts? Send them to multiple AIs at once and see who gives the best answer.

Browse Data Prompts Open Prompt Router