I Built an AI That Reads 400 Repos and 22 RSS Feeds So I Don’t Have To

How a knowledge graph, two cron jobs, and an LLM replaced my morning routine

Surya Jayanti · April 2026

I manage a platform with 400+ repositories across multiple GitHub orgs.

Every morning used to mean:

30+ merged PRs
releases I didn’t track
wiki updates I missed
22 blogs worth of industry noise

So I built three AI systems that read everything for me, analyze it against my codebase, and send a briefing before my first coffee.

This is how it works.

1. The Knowledge Graph: Teaching an LLM Your Codebase

LLMs are great generalists, but they don’t understand your platform. Ask them about your services and they’ll confidently hallucinate.

I call this the hallucination gap.

The fix: build a structured representation of your codebase.

Why not just RAG?

Chunk-and-embed gives you fragments.
But real insights live between files:

imports → dependencies
Kafka producers → consumers
service calls → system wiring

So instead of dumping code, I extract relationships.

4-Layer Model

Structural: imports, APIs
Behavioral: config + defaults
Constraint: validations + rules
Cross-repo: service interactions

All extracted with deterministic scripts. No ML. Same input → same output.

Output: Not Code, but Intelligence

Instead of raw code chunks, I generate summaries like:

“This service has 3 direct consumers and sits on a critical path. Changes impact user-facing systems.”

Now the LLM answers operational questions correctly.

Stack

ChromaDB (local)
FastAPI
bge-small embeddings
~83K documents
<100ms query latency

2. The Daily Digest: Your Morning Engineering Brief

Next problem: I still had to look for things.

So I flipped it. The system brings everything to me.

Every morning at 8 AM:

Collect
- PRs, releases, wiki updates
Filter noise
- bots, dependency bumps, test repos
- cap at 40 items
Summarize (LLM)
- 2–3 sentence factual summaries
- strict rules: no speculation
Deliver
- clean HTML email

Output

grouped by repo
executive summary at the top
links to everything

Takes ~90 seconds to read. I know exactly what changed overnight.

3. Weekly Industry Insights: News That Actually Matters

22 RSS feeds. ~200 articles/week.

Most irrelevant. Some critical.

This system filters and maps them to my architecture.

The Trick: Component Mapping

I map keywords → platform components:


"canary", "flagger" → Progressive delivery system

So instead of generic summaries:

❌ “New Kubernetes security feature”
✅ “Impacts our workload manager → RBAC updates needed”

Relevance Scoring

source priority
model releases
security signals
keyword matches

Capped + categorized for diversity.

Result

~20 high-signal articles/week with:

summaries
impact statements
platform-specific insights

Scheduling: Why launchd > cron

Both systems run locally using launchd:

triggers on wake + scheduled time
avoids missed runs
uses sentinel files for dedup

No infra. Just a laptop.

What Makes This Work

Grounded in your architecture
Not generic AI output—real system awareness
Pre-filtered signal
100+ events → ~30 meaningful items
Synthesized insights
Patterns > raw updates
Delivered, not dashboarded
Inbox > dashboards you never open
Zero maintenance
~750 lines of code total

Build Your Own (Quick Recipe)

Knowledge Graph

start with 20 repos
extract imports, configs, relationships
build dependency graph
generate human-readable summaries
store in ChromaDB

Daily Digest

collect PRs via GitHub CLI
filter noise
summarize with strict prompts
send email

Weekly Insights

RSS feeds + tiers
keyword → component mapping
score relevance
generate impact summaries

The Takeaway

The future of enterprise AI isn’t bigger models.

It’s domain-specific intelligence grounded in your system.

Your codebase already contains everything:

dependencies
configs
constraints
service wiring

Extract it. Structure it. Feed it to an LLM.

Then automate the reading.

Three systems.
~750 lines of code.
Zero daily effort.

My platform reads everything before I do. 🚀

Search This Blog

Surya Prakash Jayanti