Lumen

Lumen started in late 2020 as a Django side-project — a learning platform for myself. Five years and one model revolution later, the original prototype is gone and what remains is the question: can an agent actually teach? Not just summarize and quote. So I rebuilt it. Custom orchestrator, no LangChain. Groq Llama 3.3 for the latency-per-dollar that makes "watch it think" real. Public evals so you can audit the agent's competence yourself.

— Ahmed Hobeishy

An open-source, AI-first LMS built as a portfolio anchor for agentic-AI engineering work.

Public deploy on AWS t4g.small (Graviton2 ARM, 2 vCPU + 2 GB RAM) — Caddy 2 fronts a single docker-compose.prod.yml running FastAPI + Celery + Postgres 17 (pgvector) + Redis 7 + MinIO. Real LLM calls via Groq Llama 3.3 70B; retrieval embeddings via Cloudflare Workers AI (@cf/baai/bge-small-en-v1.5). Runbook: docs/deployment/aws-vps.md.

Silent 1:50 captioned walkthrough — landing → multi-agent tutor → agent reasoning panel → observable trace → self-critique authoring replay → admin observability. A voiced Loom is queued for re-record once the live demo lands; script at docs/release/loom-recording-script.md.

Lumen home — Workbench UI redesign, lime accent on a dark canvas

What this is

Lumen is the live demo of how multi-agent systems, retrieval-augmented generation, the Model Context Protocol, and evaluation rigor come together inside a real product. It is the centrepiece of an agentic-AI engineering portfolio — a self-hostable LMS that doubles as a working argument for "I build production-grade AI systems, not toy demos."

AI tutor with citations — course-scoped retrieval-augmented generation; every claim points at a specific lesson chunk.
Multi-modal ingest — paste a YouTube, Notion, or Google Docs URL and get a draft course back; instructor reviews before commit.
AI-assisted authoring — brief → outline → lesson bodies → quizzes; nothing auto-persists.
Spaced-repetition reviews — FSRS-6 scheduler; every completed quiz joins the learner's queue.
Open Badges 3.0 — Ed25519-signed verifiable credentials; PDF certificate as the human-readable fallback.
Observable agent traces — every LLM call recorded with tokens, cost, latency, and (soon) the planner's tool-call log.
Eval suite — 30-item tutor + 10-item authoring + 10-item ingest golden datasets; LLM-as-judge; CI smoke gate.

Going live — operator handoff

If you've cloned this and want to take the streaming demo live, the 4-step operator handoff (flag flip → sealed eval run → screencap → repo rename) is at docs/release/operator-handoff.md. Until those steps run, code lands flag-OFF and the public /eval surface stays honest-empty — by design.

What to look at first

If you're reviewing the code, these are the highest-signal files:

The tutor orchestrator — apps/backend/app/services/tutor_orchestrator_stream.py (async-iterator event sequence) + apps/backend/app/workers/tasks/tutor_streaming.py (Celery task with atomic phase fence + after_commit enqueue + contextlib.suppress-wrapped cleanup).
SSE wire — apps/backend/app/api/v1/tutor_streaming.py (4 endpoints, all flag-gated) + apps/backend/app/services/redis_streams.py (XADD/XREAD + trim-detect).
Lua cost scripts — apps/backend/app/core/lua/ (reserve/reconcile/check/release in microcents, integer math, zero FP drift).
Adversarial scorer — apps/backend/app/evals/adversarial.py + apps/backend/evals/security/probes.jsonl (15-probe corpus; per-probe outputs deliberately NOT published).
Frontend SSE parser + reducer — apps/frontend/src/lib/tutor/sse-parser.ts + use-tutor-stream.ts (useSyncExternalStore reducer + iOS Safari 15.0-15.3 UA sniff).
ADRs — docs/adr/0017–0019 (Celery prefork; Redis Streams not pub/sub; atomic phase fence + after_commit).
Public surfaces — /eval, /eval/methodology, /case-study.

Architecture

flowchart LR user([Learner / Instructor]) subgraph Edge[Edge] web[Next.js 15 App Router · RSC Docker on EC2] end subgraph App[Application] api[FastAPI · Python 3.13 Docker on EC2] worker[Celery worker + beat Docker on EC2] end subgraph Agents[Agent layer] planner[Planner-orchestrator I2 · shipped] subs[Sub-agents: retriever · web-searcher code-runner · quiz-gen concept-explainer] authoring[Self-critique authoring I3 · shipped] pathagent[Learning-path agent I5 · shipped] end subgraph MCP[MCP surface] mcp[Lumen MCP server I1 · shipped claude mcp add lumen] end subgraph Data[Data plane] pg[(Postgres 17 + pgvector Docker on EC2)] redis[(Redis 7 Docker on EC2)] s3[(MinIO S3-compatible Docker on EC2)] end subgraph LLM[Swappable LLM layer] provider{LLM_PROVIDER openai-compatible} groq[Groq · Llama 3.3 70B demo · free] anthropic[Anthropic · Claude Sonnet prod · paid] openai[OpenAI · GPT-4 class prod · paid] end subgraph Eval[Eval loop] golden[(Golden datasets tutor · authoring · ingest)] judge[LLM-as-judge 0–5 per axis] meter[LLMCostMeter llm_calls table] end user --> web web --> api api --> pg api --> redis api --> s3 api --> worker worker --> pg api --> planner planner --> subs subs --> pg api --> authoring api --> pathagent mcp --> api planner --> provider authoring --> provider pathagent --> provider provider -.demo.-> groq provider -.prod.-> anthropic provider -.prod.-> openai api --> meter golden --> judge judge --> provider

Architecture B+: AI-first OSS LMS. Provider-agnostic LLM layer; the live demo runs Groq Llama 3.3 70B for $0, prod-ready for Anthropic or OpenAI via the same LLMProvider abstraction. Every agent call goes through the cost-meter so observability and the per-user 24h budget guard work identically across providers. See docs/architecture.md for the full topology.

The agentic patterns I built

The resume bullets, with links to the code. Every item below is on the release branch today (1.1.0-agentic).

Planner-orchestrator multi-agent tutor (shipped — Phase I, item I2) — apps/backend/app/services/tutor_orchestrator.py reads the learner's question and picks among five sub-agents under apps/backend/app/services/tutor_subagents/ — retriever, web_searcher, code_runner, quiz_generator, concept_explainer — with a hard cap of 5 tool-call rounds per turn. Every step lands in agent_tracer.py so the frontend can render the plan and which tools fired. The moat is showing how the agent thinks, not just what it said.
Self-critique authoring agent (shipped — Phase I, item I3) — apps/backend/app/services/authoring_orchestrator.py drives researcher → outliner → critic → reviser → lesson-drafter → final-critic via the modules under authoring_subagents/; max three revision loops; the full chain persists as CourseDraftTrace so an instructor replays the reasoning before accepting a draft.
Lumen MCP server (shipped — Phase I, item I1) — apps/backend/app/mcp/server.py exposes nine tools (list_courses, get_course, ask_tutor, list_my_due_reviews, grade_review_card, create_course_draft, ingest_url_to_draft, list_my_progress, search_lesson_content) over stdio + HTTP; OAuth client-credentials for service-to-service; installable in Claude Desktop with the JSON snippet below. Registry metadata at apps/backend/app/mcp/registry_metadata.json ready for mcp-publisher publish against registry.modelcontextprotocol.io.
Eval harness with LLM-as-judge (shipped — Phase H, item H2) — 30-item tutor suite + 10 authoring + 10 ingest under apps/backend/evals/. Run with make eval or python -m app.evals run --suite tutor. Judge scores each item 0–5 on suite-specific axes; reports land as JSONL with mean + regression vs. previous run. CI smoke gate runs a 3-item subset on every PR. Admin dashboard at /admin/evals.
Production-grade observability (shipped — Phase H, items H1 + H7) — every LLM call's prompt/completion tokens, USD cost, latency, and outcome land in the llm_calls table (Alembic 0022) via apps/backend/app/services/llm_call_log.py. The per-user 24h budget guard returns HTTP 429 llm.budget_exceeded once the threshold trips. /admin/observability adds Celery queue depth, retrieval-quality drill-down, and a per-trace expander; learners get a per-turn trace drill-down at /dashboard/tutor/{conversation_id}/turn/{message_id} powered by learner_traces.py + agent_tracer.py (I4).
Personalized learning-path agent (shipped — Phase I, item I5) — apps/backend/app/services/learning_path.py takes a learner goal ("become a backend engineer in 6 months"), assembles an 8-course plan respecting prerequisites and FSRS load, schedules it weekly, and re-plans monthly as new courses and progress data arrive.

What's running today

Feature	Status
Course-scoped RAG tutor with citations (Phase E1)	✅ shipped (1.0.0-rebuild)
AI-assisted authoring (Phase E2)	✅ shipped (1.0.0-rebuild)
Multi-modal ingest — YouTube / Notion / Google Docs (E3)	✅ shipped (1.0.0-rebuild)
FSRS-6 spaced-repetition reviews (Phase E4)	✅ shipped (1.0.0-rebuild)
Open Badges 3.0 / W3C VC credentials (Phase E5)	✅ shipped (1.0.0-rebuild)
Tiptap block editor (Phase E6)	✅ shipped (1.0.0-rebuild)
Mastery dashboard (Phase E7)	✅ shipped (1.0.0-rebuild)
pgvector + provider-agnostic embeddings (Phase E0)	✅ shipped (1.0.0-rebuild)
WCAG 2.2 AA axe-core CI gate (Phase D5)	✅ shipped (1.0.0-rebuild)
LLM cost meter + per-user 24h budget guard (H1)	✅ shipped (wave 1)
Eval harness + golden datasets + judge dashboard (H2)	✅ shipped (wave 1)
Playwright e2e against the live stack (H3)	✅ shipped (wave 1)
Production-exposure security pass (H6)	✅ shipped (wave 1)
AWS t4g.small single-VM deploy runbook (H4)	✅ shipped (1.1.0-agentic)
README rewrite for agentic-AI positioning (H5)	✅ shipped (1.1.0-agentic)
Agent-trace + retrieval observability surface (H7)	✅ shipped (1.1.0-agentic)
Lumen MCP server (I1)	✅ shipped (1.1.0-agentic)
Multi-agent planner-orchestrator tutor (I2)	✅ shipped (1.1.0-agentic)
Self-critique authoring agent (I3)	✅ shipped (1.1.0-agentic)
Agent-trace observability surface for learners (I4)	✅ shipped (1.1.0-agentic)
Personalized learning-path agent (I5)	✅ shipped (1.1.0-agentic)

Eval scores

Headline number

Authoring suite, n=10, judge = Llama 3.3 70B (Groq): mean overall 3.85/5. Per-axis breakdown — coverage 4.0, scope 4.0, learning_arc 3.9, brief_fidelity 3.5. All 10/10 items judged, zero judge errors. Full JSONL: docs/eval/authoring-n10-groq-20260525.jsonl (10 individual items + summary record). Reproduce locally with the snippet below.

# Real eval run, n=10 — needs LLM_PROVIDER=openai + OPENAI_API_BASE=https://api.groq.com/openai/v1
# + OPENAI_API_KEY=<your-groq-key> + LLM_MODEL=llama-3.3-70b-versatile in .env:
docker compose exec api python -m app.evals run --suite authoring

Suite coverage

Suite	n	Score (latest)	Notes
`authoring`	10	3.85/5	Real Groq signal — no retrieval needed, judge directly compares generated outline vs. ideal.
`tutor`	30	2.33/5	Real retrieval + real LLM. Embeddings via Cloudflare Workers AI (`@cf/baai/bge-small-en-v1.5`, 384-dim, free tier), LLM + judge via Groq Llama 3.3 70B. 10/30 judged (20 skipped — courses not seeded), faithfulness 3.3, helpfulness 2.8, citation_correctness 0.9. The low citation score reflects a mismatch between the eval's expected `must_cite_ids` and what the retriever actually pulls — relevant chunks land but not the specific ones the dataset hardcodes. Report: `docs/eval/tutor-n30-groq-cloudflare-20260525.jsonl`. Prior run with noop embeddings (2.0/5) kept at `docs/eval/tutor-n30-groq-noopembed-20260525.jsonl` for comparison.
`ingest`	10	0.83/5**	**Of 10 YouTube items, 4 were fully ingested + judged; 6 hit upstream transcript fetch errors (rate-limited cloud IPs, age-restricted videos, etc). The judged 4 scored low on `chapter_count_accuracy` + `structure_quality` because the v1 chunker emits one module-per-video instead of detecting chapter boundaries — known follow-up. Report: `docs/eval/ingest-n10-groq-20260525.jsonl`.

Each item is scored 0–5 by an LLM-as-judge on suite-specific axes (faithfulness, citation_correctness, helpfulness for tutor; coverage, learning_arc, scope, brief_fidelity for authoring; chunking_quality, metadata_completeness for ingest). Reports carry per-axis means, an overall mean, and a regression diff vs. the previous run. CI gates a 3-item smoke on every PR via .github/workflows/pnpm-eval-smoke.yml.

Run it locally

Prereqs. Docker Desktop 4.30+ (or Docker Engine 27 + Compose v2). Optional: an LLM API key — a Groq key is recommended for the free tier; without one, the AI features fall back to the deterministic noop provider so the rest of the app still works.

git clone https://github.com/ahmedEid1/E-Learning-Platform.git
cd E-Learning-Platform
cp .env.example .env
docker compose up
make migrate
make seed

Then open http://localhost:3000 and log in with one of the seeded accounts:

Role	Email	Password
Admin	admin@lumen.test	Admin!2026
Instructor	teacher@lumen.test	Teach!2026
Student	student@lumen.test	Learn!2026

For real LLM features (tutor, authoring, ingest, evals), set the following in .env and restart:

LLM_PROVIDER=openai
OPENAI_API_BASE=https://api.groq.com/openai/v1
OPENAI_API_KEY=<your-groq-key>
LLM_MODEL=llama-3.3-70b-versatile

The same LLMProvider abstraction also accepts native Anthropic (LLM_PROVIDER=anthropic) and OpenAI (LLM_PROVIDER=openai with the default base URL) configurations — no code changes, switch by env var.

Deploy it (AWS EC2 t4g.small, free through Dec 31 2026)

The live demo runs on one AWS EC2 t4g.small Graviton2 VM (2 vCPU + 2 GB RAM + 30 GB gp3, ARM64 Ubuntu 24.04) — covered by AWS's t4g.small free-trial promo through Dec 31 2026 and absorbed by the new-account Free Plan credits before that. The unmodified docker-compose.prod.yml brings up FastAPI + Celery worker + beat + Postgres-pgvector + Redis + MinIO + a containerised Caddy 2 that auto-fetches a Let's Encrypt cert. The 2 GB RAM cap is handled by a 4 GB swapfile + tuned Postgres config in the bootstrap script. Cloudflare's DNS proxy in front is an optional next step, not a prerequisite.

tl;dr after the EC2 instance is running and you've SSHed in:

ssh -i ~/.ssh/lumen-prod.pem ubuntu@<elastic-ip>
curl -fsSL https://raw.githubusercontent.com/ahmedEid1/E-Learning-Platform/main/scripts/aws-bootstrap.sh | sudo bash
# log out, log back in as the new admin user, then:
git clone https://github.com/ahmedEid1/E-Learning-Platform.git lumen && cd lumen
cp .env.example .env.production    # fill APP_DOMAIN + secrets (see runbook step 5)
docker compose -f docker-compose.prod.yml --env-file .env.production up -d

Full runbook (EC2 creation through TLS + smokes): docs/deployment/aws-vps.md. Cost callout: ** $0 wall-clock** on a new AWS Free-Plan account (6 months, up to$ 200 credits absorb the Elastic IP); ~ $6/mo on an existing account until Dec 31 2026 when t4g free hours expire. The per-user 24h LLM budget guard caps Groq spend at$ 1/user/day by default and the operator can dial it lower in .env. Migration path off AWS at end-of-trial: rerun the same runbook against Oracle Always Free A1 (if capacity ever appears) or Hetzner CAX11 — the compose stack is identical because all three targets are ARM64 Ubuntu 24.04.

Use Lumen from Claude Desktop

Lumen ships an MCP server (Phase I, item I1) that exposes its catalog, RAG tutor, FSRS review queue, AI authoring pipeline, and multi-modal ingest as nine tools. Add it as an MCP source in Claude Desktop:

// ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
// %APPDATA%\Claude\claude_desktop_config.json (Windows)
{
  "mcpServers": {
    "lumen": {
      "command": "uvx",
      "args": ["--from", "lumen-backend", "python", "-m", "app.mcp", "--transport", "stdio"],
      "env": {
        "LUMEN_MCP_AUTH_TOKEN": "<your-token>",
        "DATABASE_URL": "<postgres-url>"
      }
    }
  }
}

Generate the LUMEN_MCP_AUTH_TOKEN value with make mcp-token against your running Lumen instance — that prints a fresh OAuth client_id + client_secret pair; paste the secret as the env value. For Claude Code, the equivalent one-liner is claude mcp add lumen -- python -m app.mcp --transport stdio (set LUMEN_MCP_AUTH_TOKEN in your shell first). Full operator guide: docs/mcp.md.

Once installed, ask Claude 'list my Lumen courses' and watch the MCP tool calls fire in the desktop sidebar — the planner picks among list_courses, get_course, ask_tutor, list_my_due_reviews, grade_review_card, create_course_draft, ingest_url_to_draft, list_my_progress, and search_lesson_content.

Built by

Ahmed Hobeishy — full-stack engineer (Python + TypeScript + DevOps), based in Essen, Germany. Building Lumen as the centrepiece of an agentic-AI engineering portfolio. Currently open to senior agentic-AI engineering roles.

LinkedIn: https://www.linkedin.com/in/ahmedhobeishy/
GitHub: https://github.com/ahmedEid1
Reach out via LinkedIn, or open an issue on this repo.

License + status

MIT — see LICENSE.

Status: actively built. 1.1.0-agentic shipped 2026-05-22 (Phase H + all five Phase I items — MCP server, multi-agent tutor, self-critique authoring, learner-trace surface, learning-path agent). Wave 2 portfolio-activation prep completed 2026-05-25 (eval harness wiring + agentic-demo seed + screenshot pack + single-VM deploy runbook + MCP registry metadata + README truthing). The deploy target pivoted from Oracle Always Free to AWS t4g.small after Frankfurt Always-Free capacity stayed saturated for 24h and Oracle's PAYG region-subscription cap blocked the Stockholm fallback; the new AWS runbook (docs/deployment/aws-vps.md) ships ~$0/mo wall-clock on a new-account Free Plan through end-of-2026. Remaining work is operator-side: provision the EC2 instance and run the deploy runbook, mint the live tutor-eval score against Groq, record the 90-second Loom, and start applying. The MCP server is already published to registry.modelcontextprotocol.io as io.github.ahmedEid1/lumen v1.1.0.

Lumen

Lumen Overview

What is Lumen about?

How to use Lumen?

Key Features of Lumen

Use Cases of Lumen

FAQ from Lumen

Lumen's README

Lumen

What this is

Going live — operator handoff

What to look at first

Architecture

The agentic patterns I built

What's running today

Eval scores

Headline number

Suite coverage

Run it locally

Deploy it (AWS EC2 t4g.small, free through Dec 31 2026)

Use Lumen from Claude Desktop

Built by

License + status

Lumen Reviews

Login Required

Similar MCP Servers like Lumen

Sequential Thinking

LibreChat

Goose

Roo Code

Pydantic AI

Mcp Use

Mcp Agent

Klavis AI

5ire

Actions

Lumen's Information