Blog on Handy Intelligence

Components as Knowledge Objects: RAG Instead of Prompt Monoliths

Mon, 09 Feb 2026 00:00:00 +0000

If you’ve ever tried to squeeze a complex UI or system setup into an agent workflow, you know the problem: either you pack massive component text collections into the context — or the agent guesses. Neither ends well.

A practical way out: instead of carrying components (frontend components, services, modules, even entire feature slices) as “walls of text” in prompts, store them as knowledge objects in a RAG database — and retrieve them on demand via tools.

0.5 ExaFLOPS: What Deutsche Telekom's Setup Really Means for AI Training

Thu, 05 Feb 2026 00:00:00 +0000

0.5 ExaFLOPS is a strong number – but the precision behind it determines whether it translates to “AI autonomy in months” or “in years.”

The Telekom Industrial AI Cloud

Deutsche Telekom has launched its Industrial AI Cloud in Munich (Tucherpark): ~10,000 NVIDIA Blackwell GPUs (DGX B200 + RTX PRO Servers) delivering “up to 0.5 ExaFLOPS,” operated under German requirements for data protection, security, and availability.

The Key Interpretation

These 0.5 ExaFLOPS are very plausibly FP64 (HPC metric). A single B200 is rated at roughly ~37–40 TFLOPS FP64; multiply by 10,000 and you land exactly in the 0.4–0.5 EFLOPS range.

AWS vs. Hetzner in 60 Seconds: Compute Is Rarely the Problem

Thu, 05 Feb 2026 00:00:00 +0000

AWS vs. Hetzner in 60 seconds: Compute is rarely the problem – egress is.

Setup

For comparability: 24/7 (~730 h/month), Small/Medium/Large (2/4/8 vCPU with 4/16/32 GB RAM).

AWS examples: t3.medium, m6i.xlarge, m6i.2xlarge
Hetzner: CX22, CCX23, CCX33

🔹 Compute Only (1 VM/Node, 24/7)

Size	AWS EC2/ECS	AWS EKS (incl. Cluster Fee)	Hetzner
Small	~$30/month	~$103	€3.79
Medium	~$140	~$213	€24.49
Large	~$280	~$353	€39.90

🔹 “Mini K8s” (1 Cluster + 2 Workers, Medium)

AWS EKS: 2×$140 + $73 ≈ $353/month (compute-only)
Hetzner: 2×€24.49 ≈ €48.98/month

🔹 The Real Punch: Traffic/Egress

AWS: roughly $0.09/GB → 1 TB ≈ $92, 10 TB ≈ $922, 25 TB ≈ $2,304
Hetzner EU: 20 TB included, then ~€1/TB → at 25 TB: ~€5 extra

Kubernetes Is Not Just “EKS or Nothing”

And yes: On affordable VMs, it’s often surprisingly quick to set up – e.g., with k3s or RKE2. And if you don’t need “full K8s” at all, you can use the cloud API to build many simple solutions even more directly (provisioning, scaling, automation) – without taking on the full Kubernetes overhead.

n8n License Risk: Why 'Open Source' Doesn't Mean 'Free for Everything'

Thu, 05 Feb 2026 00:00:00 +0000

Many teams treat n8n as “open source = free for everything.” That’s exactly where the license risk begins.

The Sustainable Use License

n8n (the free self-hosted Community Edition) is licensed under the Sustainable Use License. It’s deliberately designed to let you use, customize, and operate n8n internally – but not simply resell it as your own product or service. White-labeling, hosting it and charging customers, or building an offering whose core value consists of n8n is explicitly excluded.

The Browser as a Secure Compute Sandbox for IoT Devices

Thu, 05 Feb 2026 00:00:00 +0000

You can also think of the browser as a “secure compute sandbox” for an IoT device that can’t handle AI on its own.

The Idea

The device communicates only via HTTP with a local web app. The AI runs via WebGPU on the user’s PC in the browser – handling everything that would be too heavy for the device itself.

The Browser as an “Embedded Extension”

This effectively turns the browser into an “embedded extension” of the device:

Tokens vs. Embeddings: Two Completely Different Things

Wed, 04 Feb 2026 00:00:00 +0000

Tokens vs. Embeddings: Two Completely Different Things

Many people talk about tokens and embeddings – often meaning “something to do with AI.” But they are two completely different things.

🔢 Tokens: The Text Building Blocks

Tokens are the text building blocks that a model works with. A sentence is broken down into small units (subwords, words, characters). The more text, the more tokens. Tokens are essentially a counting unit for input/output.

🔐 AI Security: When 5% Errors Suddenly Become 40% Garbage

Tue, 03 Feb 2026 00:00:00 +0000

🔐 AI Security: When 5% Errors Suddenly Become 40% Garbage…

Artificial intelligence is impressive. It writes text, generates images, helps with coding. But what happens when AI trains itself, evaluates itself – and ultimately builds its own workflow?

💡 Spoiler: Soon you won’t have a workflow anymore. You’ll have a house of cards.

👉 The Problem: AI Builds on Its Own Mistakes

Many tools like Clawdbot promise to fully automate your content or data pipeline with AI. Sounds efficient – but who’s actually checking whether the AI is working with its own errors?

Digital Sovereignty: What Happens When Someone Pulls the Plug?

Tue, 03 Feb 2026 00:00:00 +0000

⚡️ What Happens When Someone Pulls the Plug Tomorrow?

Not metaphorically. For real: identity, collaboration, cloud services, security stack – gone or restricted.

The graphic makes it brutally clear: Digital dependency is now an operational risk.

Not an “IT issue.” It’s about: delivery capability, cash flow, reputation. 💥

And the tricky part: Many organizations sense the risk – but it stays vague. “We’re in the cloud… it’ll be fine…” 😬

K3s vs. K8s: The Uncomfortable Truth (Without the Hype)

Tue, 03 Feb 2026 00:00:00 +0000

K3s vs. K8s: The Uncomfortable Truth (Without the Hype)

There’s a discussion that stubbornly persists in many teams:

“K3s is just Kubernetes light.”

The uncomfortable answer is much simpler:

K3s IS Kubernetes. Period.

Not “for beginners.” Not “for edge.” Not “light.” It’s a Kubernetes distribution setup that takes away the pain – not the capabilities.

What K3s Really Is

To put it bluntly:

K8s (DIY): “Here are the parts. Have fun assembling.”
K3s: “Here’s a ready-made cluster. Do something with it.”

And the crucial point:

Transformers: Impressive, but Really the Future?

Wed, 10 Dec 2025 00:00:00 +0000

🧠 Transformers Are Impressive – but Are They Really the Future?

The diagram below (from mechanistic interpretability research) is one of the best examples of why the Transformer architecture is hitting its limits.

👉 The Task: 36 + 59

What is trivial for us becomes a labyrinthine process inside a Transformer, with two parallel paths – one roughly estimates, the other tries to get the last digit correct.

Best of: Claude Code – Agents, Hooks & Git Magic

Fri, 26 Sep 2025 00:00:00 +0000

🚀 Best of: Claude Code – Agents, Hooks & Git Magic

The most essential learnings from community projects, blogs & docs – for a robust, reproducible AI dev pipeline. 👇

🧭 Agentic Workflows (Plan → Build → Verify)

Multi-agent pipelines: o3 plans in detail, Sonnet builds, strict model verifies.
Each task = its own commit; parallel with Git worktrees.

🧩 Sub-Agents & Meta-Agents

Sub-agents as Markdown with YAML (Reviewer, Test Engineer, Docs, Security, Perf, Architect).
Meta-agent generates new sub-agents incl. tooling & prompts – consistent format by design.

🛠️ Best Practices

Phase 1: Read & Plan (structured steps).
Phase 2: Implement & Validate (tests first, then commit). ✅

🪝 Hook System (8 Events)

UserPromptSubmit, Pre/PostToolUse, Notification, Stop, SubagentStop, PreCompact, SessionStart.

Common Crawl: Gold for the Data World

Fri, 05 Sep 2025 00:00:00 +0000

🌐 What Is Common Crawl and Why Is It Gold for the Data World? 💡

Common Crawl is an open web archive that has been storing large portions of the public internet on a monthly basis since 2008. 💾💻

And the best part? It is freely available! For researchers, developers, startups – for anyone who wants to work with large text datasets. 🙌

📦 What Is Inside Common Crawl?

👉 Website content (HTML, text)
👉 Metadata (timestamps, URLs, language, etc.)
👉 Link structures (Who links to whom?)
👉 Text data for language modeling
👉 Crawl volume? Several billion web pages per month! 😮

A typical crawl contains data from tens of millions of domains – e.g. news sites, blogs, Wikipedia, Stack Overflow, product descriptions, forums… the colorful mix of the internet. 🌍

Excel Is Not AI Food – It Is the Packaging

Fri, 29 Aug 2025 00:00:00 +0000

Excel Is Not AI Food – It Is the Packaging. 🧮📦

The most effective approach: Build an MCP server around your Excel file and let the AI request exactly the data slices it actually needs via function calls.

Why This Works

🎯 Precise slices: list_sheets → describe → select(columns, where, limit, cursor) – only relevant data lands in the context.
💸 Costs under control: Projection/filter/aggregation run server-side (pushdown).
🧪 Reproducible: Types, validation, constraints & idempotency in the tool, not in the prompt.
🔒 Governance: PII masking, audit logs, rate limits, row-level security.
🔁 Write-back: write_back(mapping, validate=true) with checks & clean reporting.

How It Works

Register Excel with MCP (under the hood: Power Query, pandas, or SQL).
AI uses describe() for structure & data types.
AI pulls targeted slices via select() and works where language & judgment matter: classifying, normalizing, merging duplicates, summarizing.
Validate results and write back with write_back() into new columns/sheets/DB.

Mini Case Study

Product catalog with 20,000 rows. MCP delivers only name, description, brand where category is missing or uncertain. The AI classifies these 6–10%. Then write_back() with validation → new category column. Fast, affordable, auditable – and scalable.

MoE ≠ Less RAM -- But More Speed ⚡️

Thu, 28 Aug 2025 00:00:00 +0000

MoE ≠ Less RAM – But More Speed ⚡️

There’s a persistent misconception that Mixture-of-Experts (MoE) reduces memory usage on end devices. In reality, during inference serving, all expert weights are loaded. The MoE trick: only a few experts (e.g., Top-2) are computed per token. This saves FLOPs and increases throughput – especially for large providers with many GPUs – but it doesn’t save on weights. 💾

📊 Numbers for Intuition

Model	FP16	4-bit
Dense 7B	≈ 14 GB	≈ 4–5 GB (+ KV cache)
Dense 70B	≈ 140 GB	≈ 35–45 GB
MoE 8x7B (Top-2)	≈ 112 GB (total ≈ 56 B params)	≈ 28–35 GB
MoE 16x8B (Top-2)	≈ ~256 GB (total ≈ 128 B)	≈ 64–80 GB

With MoE 8x7B, only ≈ 14 B parameters are active per token – but ~56 B remain loaded.

🔍 Transformer Explainer: Understand LLMs -- Without Mystifying Them

Wed, 27 Aug 2025 00:00:00 +0000

🔍 Transformer Explainer: Understand LLMs – Without Mystifying Them

If you want to understand how large language models (LLMs) work, the Transformer Explainer by Polo Club is a brilliant starting point:

👉 https://poloclub.github.io/transformer-explainer/

It interactively shows how tokens flow through layers, what attention heads “look at,” and how the next word is ultimately predicted. 🎛️✨

Why This Matters

➡️ LLMs ≠ thinking. They are highly scaled next-token predictors.
➡️ Less anthropomorphism. No consciousness, no intention – just statistics.
➡️ Better practice. Those who understand what happens inside the model write better prompts, evaluate more realistically, and set boundaries more effectively.

🛠️ Quick Technical Overview (Without Math Overkill)

Text is broken into tokens and transformed into vectors (embeddings).
Self-attention weighs which earlier tokens are important (more “attention” = more influence).
MLP/feedforward & residual connections mix signals, layer norm stabilizes.
At the end, logits → softmax → the most probable next token. Then it starts over. 🔁

💡 Key Takeaways

“Hallucinations” aren’t lies – they’re confident but wrong predictions from insufficient context.
Good context + clear instructions → better token sequences.
Evaluation > gut feeling: Measure quality, robustness, and risks instead of attributing intelligence.

👉 Tip

Open the Transformer Explainer and watch how attention patterns change when you vary the input text. You’ll immediately see why words at different positions “count” with different weights. It demystifies – and makes you better at working with LLMs. 💡

🔥 Claude Code in Practice: Hooks, Subagents & Multi-Agent Power

Tue, 26 Aug 2025 00:00:00 +0000

🔥 Claude Code in Practice: Hooks, Subagents & Multi-Agent Power

Claude Code delivers real workflow features for dev teams – not just “chat + code,” but structured automation. Here are the highlights:

🪝 Hooks (Automation Triggers)

Define events like on-plan, on-edit, on-test, or on-commit. At each step, linting, type checks, tests, or your CLI script run automatically. Results flow back directly – Claude iterates until it’s green.

🧩 Subagents (Specialists)

Create focused helpers with clear roles: Implementer, Test Writer, Docs Scribe, Security Reviewer. Each subagent gets its own briefing, access scopes, and quality criteria.

🚀 Lightweight, Powerful and Versatile: The New Gemma 3 270M Model

Tue, 26 Aug 2025 00:00:00 +0000

🚀 Lightweight, Powerful and Versatile: The New Gemma 3 270M Model Is Here! 🧠💡

If you think you need massive server farms to work with AI – think again! 😎 The new Gemma 3 270M model from Google shows that small can also be smart:

🧩 Compact & Efficient

With just 270 million parameters, it’s extremely resource-friendly and even runs smoothly on a CPU! 💻 No expensive setup needed – ideal for local applications, edge devices, or rapid prototyping.

🚀 Sora Is Amazing—But Not for Precision Work (Yet)!

Thu, 13 Mar 2025 00:00:00 +0000

AI-generated videos have taken a huge leap forward with Sora, OpenAI’s latest video generation model. But here’s the catch: while it produces stunning, cinematic visuals, it’s still not suited for tasks that demand precision and control.

📸 Where Sora Struggles

Despite its creative power, Sora has difficulty handling small details and consistency—two key requirements in professional visual work.

Compare that with task-specific AI models:

✅ Clothing replacement AI in advertising swaps outfits pixel-perfectly.
✅ Face-swapping tools deliver hyper-realistic expressions without visible glitches.
✅ Product rendering AI creates flawless, photorealistic 3D assets with ease.

🧠 Why Specialized Models Win

These specialized AIs are trained for one purpose. That means:

Building AI: Roads, Networks, and Creativity 🚧🤖

Fri, 31 Jan 2025 00:00:00 +0000

Building AI: Roads, Networks, and Creativity 🚧🤖

Creating AI solutions—whether simple or complex—is a lot like designing a road network. Sometimes, all you need is a single, straightforward road. Other times, you might require:

🏗 Switches to change directions,
🏘 Passages that navigate through neighborhoods, or
🌐 Webs of interconnected pathways to handle complex routes.

Each scenario is unique, with its own set of challenges and goals.

Now, here’s the key insight: Agentic AI—the buzzword of the moment—isn’t about reinventing roads. It’s about connecting them. 🛤 It’s like adding highways, intersections, or even entirely new towns to your map. But the core ingredients? They’re still the same: traditional AI models, tools, and infrastructure.

Why Local AI Could Be the Smartest Tech Move of 2025

Wed, 15 Jan 2025 00:00:00 +0000

🔒✨ Why Local AI Could Be the Smartest Tech Move of 2025

Cloud AI is convenient – until the API bill arrives. 💸

But what if AI models could run locally – on your own devices, without sending data to the cloud, without paying per request, and with full data control?

🌐 Welcome to the World of Local AI

🚀 Process massive amounts of data right on-site
🧠 No constant internet connection required
💼 Sensitive business data stays where it belongs – within your own organization
💰 No more unpredictable API costs

🔧 What’s Already Possible Today

Whether it’s large language models at your own workstation or fine-tuned vision models analyzing terabytes of images – local AI is pushing past the previous limits of performance and cost.

⚠️ Critical Issue: OpenAI Structured Output Fields Can Be Overridden

Thu, 17 Oct 2024 00:00:00 +0000

⚠️ Critical Issue: OpenAI Structured Output Fields Can Be Overridden 🚨

In OpenAI’s models, structured output fields are meant to guide the format and content of responses. However, the descriptions of these fields are treated as part of the context of the query. This means that if you know the field name, you can redefine its purpose by specifying a new meaning in the prompt.

🔴 This behavior is not intended and can lead to issues, such as:

AI & Docker: Maximum Efficiency Through Containerization

Thu, 26 Sep 2024 00:00:00 +0000

🚀 AI & Docker: Maximum Efficiency Through Containerization! 🧠

In the rapidly growing field of Artificial Intelligence (AI), Docker plays a crucial role. It enables developers and organizations to run AI applications in isolated, platform-independent environments. But why is Docker so important for AI? 🤔

🔑 Key Benefits of Docker for AI

Simplicity: With Docker, AI environments can be created and configured in just minutes.
Scalability: Docker containers scale easily across different systems—locally, in the cloud, or in hybrid setups.
Reproducibility: Docker ensures that AI projects run the same everywhere, regardless of the underlying infrastructure.
Open-Source Power: Many of the best open-source AI tools are ready to use with Docker out of the box! 🌍

🧰 Popular Open-Source AI Tools That Work Seamlessly with Docker

TensorFlow 🧠 – A comprehensive machine learning platform developed by Google.
PyTorch 🔥 – A flexible deep learning framework from Facebook AI.
Hugging Face Transformers 🤗 – A leading NLP library for cutting-edge language models.
OpenCV 👁 – Open-source computer vision library for image processing.
Ray ⚡ – A framework for distributed machine learning and parallel computing.
MLflow 📊 – An open-source platform to manage the machine learning lifecycle.
KubeFlow 🛠 – A machine learning platform built for Kubernetes.

Docker not only makes it easy to deploy these tools quickly, but also to integrate them into various environments. Whether you’re running small experiments or training large-scale AI models, Docker provides the efficiency and flexibility you need.