I Built an AI That Reads 400 Repos and 22 RSS Feeds So I Don’t Have To
How a knowledge graph, two cron jobs, and an LLM replaced my morning routine
Surya Jayanti · April 2026
I manage a platform with 400+ repositories across multiple GitHub orgs.
Every morning used to mean:
- 30+ merged PRs
- releases I didn’t track
- wiki updates I missed
- 22 blogs worth of industry noise
So I built three AI systems that read everything for me, analyze it against my codebase, and send a briefing before my first coffee.
This is how it works.
1. The Knowledge Graph: Teaching an LLM Your Codebase
LLMs are great generalists, but they don’t understand your platform. Ask them about your services and they’ll confidently hallucinate.
I call this the hallucination gap.
The fix: build a structured representation of your codebase.
Why not just RAG?
Chunk-and-embed gives you fragments.
But real insights live between files:
- imports → dependencies
- Kafka producers → consumers
- service calls → system wiring
So instead of dumping code, I extract relationships.
4-Layer Model
- Structural: imports, APIs
- Behavioral: config + defaults
- Constraint: validations + rules
- Cross-repo: service interactions
All extracted with deterministic scripts. No ML. Same input → same output.
Output: Not Code, but Intelligence
Instead of raw code chunks, I generate summaries like:
“This service has 3 direct consumers and sits on a critical path. Changes impact user-facing systems.”
Now the LLM answers operational questions correctly.
Stack
- ChromaDB (local)
- FastAPI
- bge-small embeddings
- ~83K documents
- <100ms query latency
2. The Daily Digest: Your Morning Engineering Brief
Next problem: I still had to look for things.
So I flipped it. The system brings everything to me.
Every morning at 8 AM:
-
Collect
- PRs, releases, wiki updates
-
Filter noise
- bots, dependency bumps, test repos
- cap at 40 items
-
Summarize (LLM)
- 2–3 sentence factual summaries
- strict rules: no speculation
-
Deliver
- clean HTML email
Output
- grouped by repo
- executive summary at the top
- links to everything
Takes ~90 seconds to read. I know exactly what changed overnight.
3. Weekly Industry Insights: News That Actually Matters
22 RSS feeds. ~200 articles/week.
Most irrelevant. Some critical.
This system filters and maps them to my architecture.
The Trick: Component Mapping
I map keywords → platform components:
"canary", "flagger" → Progressive delivery system
So instead of generic summaries:
❌ “New Kubernetes security feature”
✅ “Impacts our workload manager → RBAC updates needed”
Relevance Scoring
- source priority
- model releases
- security signals
- keyword matches
Capped + categorized for diversity.
Result
~20 high-signal articles/week with:
- summaries
- impact statements
- platform-specific insights
Scheduling: Why launchd > cron
Both systems run locally using launchd:
- triggers on wake + scheduled time
- avoids missed runs
- uses sentinel files for dedup
No infra. Just a laptop.
What Makes This Work
-
Grounded in your architecture
Not generic AI output—real system awareness -
Pre-filtered signal
100+ events → ~30 meaningful items -
Synthesized insights
Patterns > raw updates -
Delivered, not dashboarded
Inbox > dashboards you never open -
Zero maintenance
~750 lines of code total
Build Your Own (Quick Recipe)
Knowledge Graph
- start with 20 repos
- extract imports, configs, relationships
- build dependency graph
- generate human-readable summaries
- store in ChromaDB
Daily Digest
- collect PRs via GitHub CLI
- filter noise
- summarize with strict prompts
- send email
Weekly Insights
- RSS feeds + tiers
- keyword → component mapping
- score relevance
- generate impact summaries
The Takeaway
The future of enterprise AI isn’t bigger models.
It’s domain-specific intelligence grounded in your system.
Your codebase already contains everything:
- dependencies
- configs
- constraints
- service wiring
Extract it. Structure it. Feed it to an LLM.
Then automate the reading.
Three systems.
~750 lines of code.
Zero daily effort.
My platform reads everything before I do. 🚀

Comments