May 29, 2025·9 min read

How briefstock.ai Generates 8-Section Stock Research Reports in Under 60 Seconds

The architecture behind briefstock.ai — how real P/E, DCF, and FCF calculations run in Python, how AI narration is prompted to sound like an analyst rather than a chatbot, and what breaks under load.

briefstock.ai generates an 8-section stock research report — valuation, earnings quality, growth analysis, risk factors, technical setup, peer comparison, DCF model, and analyst verdict — in under 60 seconds. The reports read like they were written by a human analyst, not a language model reciting facts.

Getting there required making a set of architectural decisions that aren't obvious from the outside. This is the internal writeup.

The core constraint: AI shouldn't do math

The most important design decision in briefstock is what the AI does and doesn't do.

What the AI does: Narrate. Interpret. Write. Given a structured data payload, the AI explains what the numbers mean, identifies patterns, and synthesizes a conclusion. It's a writer, not a calculator.

What the AI doesn't do: Calculate. Fetch data. Make up numbers.

Every number in a briefstock report — the P/E ratio, the DCF intrinsic value, the FCF yield, the 52-week relative performance — is calculated by a Python function from verified source data. The AI receives those numbers as structured inputs and writes about them. It never generates a financial figure.

This distinction matters because language models hallucinate financial data at a rate that makes them useless for quantitative analysis. Ask a model "what is Apple's current P/E ratio?" and you'll get a number that sounds plausible and may be months or years stale. We don't ask. We calculate it ourselves and feed it to the model.

The principle: calculation is deterministic, narration is creative

Anything that requires precision (numbers, dates, percentages) gets computed in code. Anything that requires interpretation (what does this number mean for this company?) goes to the AI. This boundary produces reports that are both accurate and well-written.

Data pipeline

Reports start with a ticker symbol and end with a structured data object that contains ~200 fields across the 8 report sections. Here's how that data gets populated:

Market data (real-time): price, volume, 52-week range, market cap — fetched from a market data API with a 15-minute cache.

Financial statements (quarterly/annual): income statement, balance sheet, cash flow statement — pulled from SEC EDGAR filings, parsed and normalized.

Derived calculations (Python): Everything that requires computation runs in Python functions before the AI ever sees the data. Examples:

def calculate_dcf(fcf_per_share, growth_rate, terminal_rate, discount_rate, years=10):
    cash_flows = []
    for year in range(1, years + 1):
        projected_fcf = fcf_per_share * ((1 + growth_rate) ** year)
        discounted = projected_fcf / ((1 + discount_rate) ** year)
        cash_flows.append(discounted)
    
    terminal_value = (cash_flows[-1] * (1 + terminal_rate)) / (discount_rate - terminal_rate)
    terminal_pv = terminal_value / ((1 + discount_rate) ** years)
    
    return sum(cash_flows) + terminal_pv

Peer data: The 3 closest peers by sector and market cap, with their key ratios, are pulled and normalized so the AI can write a relative comparison.

The full data collection and calculation step takes 8–12 seconds. The AI narration step takes 35–45 seconds. Total: under 60 seconds.

The prompting architecture

The 8-section report is generated in a single API call with a structured prompt. The sections are not generated sequentially — that would take 8× as long. Instead, the model is prompted to produce all 8 sections in one pass as a JSON object.

The system prompt is the critical piece. The difference between a report that reads like ChatGPT and one that reads like an analyst is almost entirely in the prompt design.

Key prompt principles:

Ground every claim in the data payload. The prompt explicitly instructs the model to reference specific numbers from the input rather than generating commentary in the abstract. "Apple's FCF yield of 4.2% is above the sector median of 3.1%" is grounded. "Apple generates strong free cash flow" is not.

Prohibit hedging language. Language model defaults to "it's important to note that," "investors should be aware," "it is worth mentioning." The system prompt explicitly bans these phrases and their variants. Analyst reports are declarative, not disclaimer-heavy.

Require specific formats for conclusions. The analyst verdict section must contain a one-sentence thesis, a target price range with explicit methodology, and a bear/base/bull scenario table. Open-ended instructions produce open-ended outputs. Constrained formats produce consistent, useful outputs.

Calibrate tone by section. Valuation and DCF sections are technical and precise. Risk factors are measured and specific. The analyst verdict is direct and opinionated. The system prompt assigns a different tone register to each section rather than using a single global instruction.

The first version read like a Wikipedia article

The original prompt produced technically accurate reports that were boring and non-committal. "The company has shown revenue growth over the past three years. However, there are risks to consider." This is useless to a real investor. It took 3 rounds of prompt revision to produce reports that make actual claims.

Streaming

Users see the report appear section by section rather than waiting for the full response. This is a UX decision, not an architecture one — the underlying API call is the same — but it makes the 45-second generation feel fast because content is visible within 2–3 seconds.

The implementation uses Next.js route handlers with ReadableStream and Server-Sent Events. Each section appears as its JSON key completes:

export async function POST(req: Request) {
  const { ticker } = await req.json();
  const reportData = await buildReportData(ticker); // Python calculations
  
  const stream = new ReadableStream({
    async start(controller) {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: buildPrompt(reportData),
        stream: true,
      });
      
      for await (const chunk of response) {
        const text = chunk.choices[0]?.delta?.content || '';
        controller.enqueue(new TextEncoder().encode(`data: ${text}\n\n`));
      }
      controller.close();
    },
  });
  
  return new Response(stream, {
    headers: { 'Content-Type': 'text/event-stream' },
  });
}

What breaks under load

I learned three things when briefstock hit its first traffic spike (about 400 concurrent report generations):

The Python calculation layer saturates first. The financial calculations are CPU-bound. On a single-process setup, concurrent requests queue behind each other at the calculation step. The fix was moving the Python service to a queue-based architecture where calculation workers scale independently from the API layer.

LLM rate limits are real at low thresholds. At 400 concurrent requests, I hit rate limits on the LLM API within minutes. The solution was implementing a retry-with-backoff queue for the narration step, with a user-visible "queued" state rather than a silent failure.

Caching is worth the complexity. Identical requests for the same ticker within a 15-minute window should return the same report. Without caching, each request triggers a full data fetch and LLM call. With caching, 80%+ of requests on popular tickers are served from cache. Cache key is {ticker}:{15-minute-bucket}.

The cache hit rate surprised me

I expected maybe 30% cache hit rate. It's 81% for the top 20 tickers. AAPL, MSFT, NVDA, TSLA — every time one of these trends on financial Twitter, briefstock sees a wave of identical requests. The cache turns a capacity problem into a non-event.

Tech stack

| Layer | Technology | |---|---| | Frontend | Next.js 14 App Router | | API | Next.js route handlers + FastAPI (Python) | | Calculations | Python (pandas, numpy) | | LLM | Claude claude-sonnet-4-6 (narration), GPT-4o (fallback) | | Market data | Financial Modeling Prep API | | Fundamentals | SEC EDGAR (direct) | | Cache | Redis (Upstash) | | Deployment | Vercel (Next.js) + Railway (Python service) | | Database | Supabase (user accounts, report history) |

The Python service on Railway is the only non-serverless component. Financial calculations require pandas and numpy — these don't run well in serverless environments due to cold start times and package size limits.

What I'd build differently

I'd design the data schema before writing the prompt. I iterated the prompt and the data schema simultaneously, which caused a lot of churn. The prompt references field names from the data object — every time I renamed a field, I had to update the prompt. Define the schema first, then write the prompt.

I'd add a human review queue for edge cases. Some companies produce unusual financial structures (SPACs, recent IPOs with limited history, companies transitioning between reporting standards) that break the calculation assumptions. Right now these fail silently with degraded reports. A human review queue for flagged edge cases would improve quality on the long tail.

I'd charge more. The per-report cost including API calls, data fetching, and compute is roughly $0.18. We charge $19/month for unlimited reports. That math works at current scale but doesn't leave much headroom for the Python infrastructure as scale increases. Pricing should have been $29/month from the start.

Araho Digital

We build what we write about.

Every technique in this post was used on a real client project. If you're building a SaaS product or internal tool and want it done in weeks, not months — that's what we do.

Fixed price. Fixed scope. Money-back guarantee.