I Built an AI That Reads 400 Repos and 22 RSS Feeds So I Don’t Have To


How a knowledge graph, two cron jobs, and an LLM replaced my morning routine

Surya Jayanti · April 2026

I manage a platform with 400+ repositories across multiple GitHub orgs.

Every morning used to mean:

  • 30+ merged PRs
  • releases I didn’t track
  • wiki updates I missed
  • 22 blogs worth of industry noise

So I built three AI systems that read everything for me, analyze it against my codebase, and send a briefing before my first coffee.

This is how it works.


1. The Knowledge Graph: Teaching an LLM Your Codebase

LLMs are great generalists, but they don’t understand your platform. Ask them about your services and they’ll confidently hallucinate.

I call this the hallucination gap.

The fix: build a structured representation of your codebase.

Why not just RAG?

Chunk-and-embed gives you fragments.
But real insights live between files:

  • imports → dependencies
  • Kafka producers → consumers
  • service calls → system wiring

So instead of dumping code, I extract relationships.

4-Layer Model

  • Structural: imports, APIs
  • Behavioral: config + defaults
  • Constraint: validations + rules
  • Cross-repo: service interactions

All extracted with deterministic scripts. No ML. Same input → same output.

Output: Not Code, but Intelligence

Instead of raw code chunks, I generate summaries like:

“This service has 3 direct consumers and sits on a critical path. Changes impact user-facing systems.”

Now the LLM answers operational questions correctly.

Stack

  • ChromaDB (local)
  • FastAPI
  • bge-small embeddings
  • ~83K documents
  • <100ms query latency

2. The Daily Digest: Your Morning Engineering Brief

Next problem: I still had to look for things.

So I flipped it. The system brings everything to me.

Every morning at 8 AM:

  1. Collect
    • PRs, releases, wiki updates
  2. Filter noise
    • bots, dependency bumps, test repos
    • cap at 40 items
  3. Summarize (LLM)
    • 2–3 sentence factual summaries
    • strict rules: no speculation
  4. Deliver
    • clean HTML email

Output

  • grouped by repo
  • executive summary at the top
  • links to everything

Takes ~90 seconds to read. I know exactly what changed overnight.


3. Weekly Industry Insights: News That Actually Matters

22 RSS feeds. ~200 articles/week.

Most irrelevant. Some critical.

This system filters and maps them to my architecture.

The Trick: Component Mapping

I map keywords → platform components:

"canary", "flagger" → Progressive delivery system

So instead of generic summaries:

❌ “New Kubernetes security feature”
✅ “Impacts our workload manager → RBAC updates needed”

Relevance Scoring

  • source priority
  • model releases
  • security signals
  • keyword matches

Capped + categorized for diversity.

Result

~20 high-signal articles/week with:

  • summaries
  • impact statements
  • platform-specific insights

Scheduling: Why launchd > cron

Both systems run locally using launchd:

  • triggers on wake + scheduled time
  • avoids missed runs
  • uses sentinel files for dedup

No infra. Just a laptop.


What Makes This Work

  1. Grounded in your architecture
    Not generic AI output—real system awareness
  2. Pre-filtered signal
    100+ events → ~30 meaningful items
  3. Synthesized insights
    Patterns > raw updates
  4. Delivered, not dashboarded
    Inbox > dashboards you never open
  5. Zero maintenance
    ~750 lines of code total

Build Your Own (Quick Recipe)

Knowledge Graph

  • start with 20 repos
  • extract imports, configs, relationships
  • build dependency graph
  • generate human-readable summaries
  • store in ChromaDB

Daily Digest

  • collect PRs via GitHub CLI
  • filter noise
  • summarize with strict prompts
  • send email

Weekly Insights

  • RSS feeds + tiers
  • keyword → component mapping
  • score relevance
  • generate impact summaries

The Takeaway

The future of enterprise AI isn’t bigger models.

It’s domain-specific intelligence grounded in your system.

Your codebase already contains everything:

  • dependencies
  • configs
  • constraints
  • service wiring

Extract it. Structure it. Feed it to an LLM.

Then automate the reading.


Three systems.
~750 lines of code.
Zero daily effort.

My platform reads everything before I do. 🚀

Comments

Popular posts from this blog

AI-Powered Scrum for building software

My spiritual journey - Dalai Lama

Year in Review 2024