Vibe Coding Forem

Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture

Raziq Din — Sun, 24 May 2026 19:04:55 +0000

In modern software engineering, the gap between web platforms and native mobile applications is bridged by Progressive Web Apps (PWAs). For a high-performance system or web applications, a PWA transformation isn't just about aesthetics, it's about technical resilience, cross-platform compatibility, and optimized resource management.

Below is the architectural breakdown of how the Web App Manifest and Service Worker work together to "upgrade" a standard FastAPI application.

The Architecture: A Full-Stack Perspective

To support PWA features, your project structure must treat static assets as "first-class citizens" that the browser can discover and cache. Example project structure can be referred below:

myproject/
├── app/
│   ├── main.py             # FastAPI entry point & Uvicorn config
│   ├── templates/          # Jinja2 templates (HTML)
│   └── static/             # The PWA Asset Hub
│       ├── manifest.json   # Identity & Display metadata
│       └── js/
│           └── sw.js       # The Service Worker "Proxy" logic
├── Dockerfile              # Containerizing the environment
└── docker-compose.yml      # Port mapping (e.g., 8080:8000)

Phase 1: Defining Identity with manifest.json

The Web App Manifest is a JSON metadata file that allows your website to be "installed" on a device. It dictates the "standalone" behavior , removing the browser's address bar to provide a native look and feel.

Below are the default configuration you can try and implement in your project

{
  "name": "Titan Gym Booking System",
  "short_name": "TitanGym",
  "description": "High-performance gym reservation and QR access system.",
  "start_url": "/",
  "display": "standalone",
  "background_color": "#1A1A1A",
  "theme_color": "#CCFF00",
  "icons": [
    {
      "src": "/static/icons/icon-192.png",
      "sizes": "192x192",
      "type": "image/png"
    },
    {
      "src": "/static/icons/icon-512.png",
      "sizes": "512x512",
      "type": "image/png"
    }
  ]
}

Phase 2: The Logic Layer with sw.js

The Service Worker is a programmable network proxy. It runs in a background thread, separate from the main browser window, allowing it to intercept network requests and manage Offline Caching.


// /static/js/sw.js
const CACHE_NAME = 'titan-gym-v1';
const STATIC_ASSETS = [
  '/',
  '/static/css/style.css',
  '/static/js/main.js',
  '/static/manifest.json'
];

// INSTALL: Pre-cache core assets for offline use
self.addEventListener('install', (event) => {
  event.waitUntil(
    caches.open(CACHE_NAME).then((cache) => {
      return cache.addAll(STATIC_ASSETS);
    })
  );
});

// FETCH: Intercept requests and serve from cache if network fails
self.addEventListener('fetch', (event) => {
  event.respondWith(
    caches.match(event.request).then((cachedResponse) => {
      return cachedResponse || fetch(event.request);
    })
  );
});

Phase 3: Integration and Registration

For the browser to activate these features, we must register the Service Worker in your main layout. This tells the browser: "This site is a PWA; start the background engine."


<!-- In your base.html or home.html -->
<head>
    <link rel="manifest" href="/static/manifest.json">
</head>

<script>
  if ('serviceWorker' in navigator) {
    window.addEventListener('load', () => {
      navigator.serviceWorker.register('/static/js/sw.js')
        .then(reg => console.log('SW Registered!', reg))
        .catch(err => console.error('SW Registration Failed:', err));
    });
  }
</script>

Conclusion: Why Engineers Choose PWAs

By leveraging this architecture within a Dockerized FastAPI environment, you achieve several engineering goals:

Network Independence: The Service Worker serves the "My Reservations" page even during campus Wi-Fi outages.
Zero-Friction Updates: Unlike native apps, updating the "app" is as simple as deploying a new Docker image.
Low Latency: Pre-caching static assets reduces the Time to Interactive (TTI), as the browser pulls files from local storage instead of making remote calls.

This setup ensures that your web project isn't just a website, but a reliable tool that lives directly on the user's home screen.

🔢 JS Array Playground

Naimesh Rao — Sun, 24 May 2026 19:03:39 +0000

👉 https://codepen.io/naimeshrao/full/qEqmgKv

Learning array becomes way easier when you can actually play with them.

Experiment with map(), filter(), reduce(), find(), and more in an interactive JavaScript learning lab.

I built a local first AI CCTV assistant using Gemma 4 + Frigate

Dhanush Reddy — Sun, 24 May 2026 18:48:26 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I Built a local-first AI CCTV assistant Using Gemma 4, Frigate, and Home Assistant

Most CCTV notifications are noisy and not actually useful.

Typical camera alerts are mostly:

Motion detected at Front Door

That tells me something happened, but not what actually happened.

Was it:

a delivery person?
someone waiting outside?
a suspicious stranger?
or just random movement?

I wanted a system that could intelligently summarize what was happening on my cameras while keeping everything completely local and privacy-friendly.

So I built a local AI-powered surveillance assistant using:

Frigate for movement detection
Home Assistant for automations
Gemma 4 running locally through Ollama
Mobile notifications for real-time summaries

Now instead of generic motion alerts, I receive notifications like:

“A person wearing a dark shirt approached the gate, waited briefly, and left.”

All processed locally without sending any footage to the cloud.

The system continuously monitors my WiFi connected CCTV cameras using Frigate. Frigate is an open-source NVR that performs real-time object detection on video feeds. You can learn more about Frigate here.

Whenever Frigate detects a person:

Relevant frames are sent to a locally running Gemma 4 model
Frigate creates an event with the video clip and the AI summary.
Home Assistant automation triggers
A rich notification is sent directly to my phone

The result is a much more useful home surveillance experience.

Instead of manually checking every alert, I can instantly understand what happened from the notification itself.

Demo

Video demo will be added here soon.

Code

You can find the YAML configuration for Frigate and Home Assistant automations in this GitHub Gist.

Update prompt and other alert / review settings as per your requirements in the Frigate config

If you run into any issues while setting it up, feel free to comment below. Your WiFi cameras should provide an RTSP stream that can be integrated into the Frigate configuration. Just refer to your camera’s documentation for instructions on retrieving the stream URL.

How I Used Gemma 4

I used the gemma4:e2b model running locally through Ollama.

The model runs inside a dedicated Proxmox container with GPU passthrough enabled for faster inference.

I specifically chose E2B variant because:

it is lightweight enough for local deployment
inference latency is low
it works well for concise visual summarization tasks

Without Gemma 4, the system would only know:

“A person was detected.”

With Gemma 4, the system understands and communicates:

what the person is doing
where they moved
how long they stayed
and whether the event is important

That dramatically improves the usefulness of camera notifications.

Your PDF Parser Is Failing You — Here's How to Fix It With One API Call

Savitar AI — Sun, 24 May 2026 18:44:31 +0000

PDF documents are used everywhere — invoices, contracts, reports, receipts, scanned files, and forms. But manually extracting text from PDFs can be slow, repetitive, and difficult to automate.
This is where AI-powered PDF extraction APIs help developers automate document workflows using simple REST APIs.
In this beginner-friendly tutorial, we’ll learn how to extract text from PDFs using Python and the Enterprise PII Detection & Redaction API available on RapidAPI.

You can also explore the live developer hub and workflow demo here:
https://savitar-dev-hub--savitar-dev-hub.us-east4.hosted.app

What is PDF Text Extraction?

PDF text extraction is the process of automatically reading and extracting text content from PDF documents.

Instead of manually copying data from files, developers can use APIs to:

process PDFs automatically
extract structured text
automate document workflows
build OCR pipelines
analyze documents using AI

This is especially useful for:

SaaS applications
finance automation
legal document systems
OCR workflows
enterprise document processing

Why Traditional PDF Parsing Fails

Many PDFs are:

scanned images
blurry documents
photographed papers
handwritten notes
image-based files
Traditional parsers struggle with these files.

AI-powered OCR APIs solve this problem by combining:

OCR (Optical Character Recognition)
document AI
structured extraction
intelligent text recognition

Before PDF Extraction

The API accepts uploaded PDF files and processes them automatically. The screenshot below shows a PDF before extraction.

Before PDF extraction using AI-powered document extraction API.

After PDF Extraction

Once processed, the API extracts structured text from the PDF automatically.

After PDF extraction using AI-powered document extraction API.

Live demo available on the Savitar Developer Hub.

This extracted text can then be used for:

automation workflows
AI pipelines
analytics
search indexing
compliance systems

Features of the PDF Extraction API

The Enterprise PII Detection & Redaction API supports:
✅ PDF text extraction
✅ OCR for scanned documents
✅ Structured JSON output
✅ REST API integration
✅ Batch document processing
✅ AI-powered OCR workflows
✅ Fast processing pipelines

Supported formats:

PDF
DOCX
PPTX
XLSX
PNG
JPG
TIFF
WEBP

Step 1 — Install Python Requests

First, install the requests library.

pip install requests

Step 2 — Python API Example

The following Python script uploads a PDF file and extracts text automatically.

import requests

url = "https://enterprise-pii-detection-redaction-api.p.rapidapi.com/extract"

headers = {
   "x-rapidapi-key": "YOUR_API_KEY",
   "x-rapidapi-host": "enterprise-pii-detection-redaction-api.p.rapidapi.com"
}

files = {
   "file": open("sample.pdf", "rb")
}

response = requests.post(url, headers=headers, files=files)

print(response.json())

Replace YOUR_API_KEY with your key from RapidAPI, and point sample.pdf at your document. That's the entire integration.

Example API Response

After processing the PDF, the API returns structured JSON output.

{
  "text": "Contractor Quotation Comparison & Inflation Analysis Report...",
  "filename": "sample.pdf",
  "file_type": "pdf",
  "page_count": 3,
  "model": "mistral-ocr-latest"
}

response.json()["text"] gives you the full extracted content — ready to pipe into a database, a search index, an LLM, or any downstream system you're building.

This makes it easy to integrate PDF extraction into:

web apps
SaaS platforms
automation workflows
AI systems

OCR Support for Scanned PDFs

One of the biggest challenges in document processing is scanned PDFs.

This API includes OCR support that can extract text from:

scanned invoices
handwritten notes
photographed documents
receipts
screenshots

OCR Input Example

The API can process scanned or handwritten documents automatically.

OCR Output Example

After OCR processing, the extracted text is returned in structured format.

OCR output generated from scanned handwritten documents.

This helps developers build:

intelligent document systems
searchable archives
AI document workflows
automated business pipelines

Benefits of API-Based PDF Extraction

Using an AI-powered PDF extraction API helps developers:

avoid building OCR systems from scratch

scale document processing easily

automate repetitive workflows

improve accuracy

save development time

Real-World Use Cases

PDF extraction APIs are widely used in:

Finance
invoice automation
receipt extraction
accounting workflows
HR
resume parsing
employee document processing
LegalTech
contract analysis
legal document indexing
Healthcare
patient record digitization
medical document OCR
SaaS Platforms
automation workflows
AI document pipelines

Final Thoughts

AI-powered PDF extraction APIs are making document automation significantly easier for developers and businesses.
Instead of manually copying text from PDFs or building complex OCR systems internally, developers can integrate document extraction directly into their applications using simple REST APIs.

Whether you're building:

OCR workflows,
automation systems,
AI applications,
or enterprise document pipelines,
PDF extraction APIs can dramatically improve efficiency and scalability.

Try the API

Looking for an AI-powered OCR and PDF extraction workflow?

The Enterprise PII Detection & Redaction API helps developers:

extract text from PDFs
process scanned documents
automate OCR workflows
build AI-powered document pipelines

Explore the API on RapidAPI:

https://rapidapi.com/savitarai/api/enterprise-pii-detection-redaction-api

Live Developer Hub:

https://savitar-dev-hub--savitar-dev-hub.us-east4.hosted.app

🔖 Tags: PDF extraction API · OCR API · Python · AI OCR · scanned PDF OCR · document extraction · REST API · image to text · PDF parser · document automation

CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform

Abhisek Padhy — Sun, 24 May 2026 18:43:53 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

CrowdShield AI is a live platform built on the creative north star of delivering an automated "Autopilot for Stadium Operations".

Borrowing heavily from high-density telemetry dashboards and digital twins, the interface is optimized for low-light command center environments. It processes dense real-time streams to track occupancy metrics, map threat matrices, and trigger automated emergency action sequences instantly when critical thresholds are breached.

Live Website: 🔗 https://crowdshield-361013050235.us-central1.run.app/
GitHub Repository: 🔗 https://github.com/Abb2907/crowdshield

The platform uses live telemetry, dynamic AI-driven spectator routing, automated emergency orchestration, and comprehensive analytics to mitigate bottlenecks, counter ticket fraud, and streamline stadium egress and ingress.

Demo

Live Website: 🔗 https://crowdshield-361013050235.us-central1.run.app/
GitHub Repository: 🔗 https://github.com/Abb2907/crowdshield

The Comeback Story

The initial deployment pipeline struck a critical bottleneck. While the structural architecture for CrowdShield AI—the multi-workspace ecosystem containing the real-time React analytics client, the Node.js telemetry backend, and the Supabase database migrations—was completely mapped out, the production rollout stalled.

The application was trapped in a container startup loop during the gcloud run deploy phase. The root package.json had its production start script bound to a local development hot-reloader (npm run dev:backend), and the Procfile was erroneously attempting to compile TypeScript source code (npm run build --workspace=backend) at boot time within Cloud Run's resource-constrained, read-only runtime environment. This heavy mechanical overhead caused container execution to lag, missing the port binding health checks on port 8080 and resulting in an immediate deployment failure.

What changed, fixed, and added to finish it up:

To transition CrowdShield AI into a stable, operational "Autopilot for Stadium Operations," the compilation and runtime execution layers were completely decoupled:

Fixed the Production Runtime Pipeline: Rewrote the root package.json scripts to isolate the compilation phase. The production start command was stripped of development overhead and updated to directly execute the pre-compiled JavaScript bundle (node backend/dist/index.js).
Deterministic Port Binding: Audited and validated the backend entry point configuration (backend/src/index.ts) to ensure the application dynamically reads the environment’s target port (process.env.PORT) and binds correctly to 0.0.0.0.
AI-Accelerated Core Features: Leveraging GitHub Copilot as a velocity multiplier, the rest of the stadium orchestration loops were brought online. Repetitive infrastructure routing, relational telemetry table indexing, and testing boilerplate were rapidly scaffolded to ensure stable state changes—moving successfully from Green (Safe), to Amber (Warning), to Red (Critical/Incident State) as crowd density metrics fluctuate.

My Experience with GitHub Copilot

Using GitHub Copilot provided a critical operational velocity window, accelerating concurrent development across the frontend client, telemetry backend, and relational database layers of the CrowdShield AI architecture.

Primary Optimization Vectors:

Infrastructure Scaffolding: Accelerated the deployment of structural boilerplate, database migrations, and schema definitions across workspaces.
Test Boilerplate Generation: Automated the generation of comprehensive unit and end-to-end telemetry validation suites.
Parser & Runtime Iteration: Drastically reduced execution latencies when testing core event-parsing and automated operational loops.
Alternative Implementations: Enabled rapid-fire evaluation of competing algorithmic patterns and performance profiles in real time.
Documentation Pipelines: Streamlined the synthesis of technical specifications, strategy documents, and architectural rulesets.
Refactoring Friction Reduction: Maintained system telemetry integrity by smoothing over data structure transformations during critical code cleanups.

The Core Metric: The primary return on investment was not the replacement of core architectural thinking, but the severe reduction of mechanical overhead surrounding low-level system experimentation.

Because CrowdShield AI is built explicitly for deterministic stadium operations and specification-driven development, every automated code generation sequence was strictly validated against the project's rigid architectural invariants and safety parameters.

In many ways, the CrowdShield ecosystem acts as a direct exploration of a foundational, meta-level thesis:

What would an execution runtime look like if it were engineered from day zero to be natively interpreted, expanded, and sustained by AI agents?

That exact question is what continues to drive the roadmap and engineering velocity behind the entire CrowdShield AI ecosystem.

I built a free AI observability tool, prove your AI is useful, not just running

emmanuela Opurum — Sun, 24 May 2026 18:42:57 +0000

Most AI monitoring tools tell you if your API is up.
Mine tells you if it's worth running.

Try it right now, no signup, 30 seconds

Visit: https://ai-pou-tracker.vercel.app

Type any prompt in the Quick Test box on the left
Hit ▶ Send Request
Watch your request hit the live AI engine and appear on the dashboard instantly

Or call the API directly:

curl -X POST https://ai-pou-tracker.vercel.app/api/request \
-H "Content-Type: application/json" \
-d '{"prompt": "does this AI actually work?"}'

What it tracks

Real AI success rate (not just 200 OK)
Actual cost savings from semantic caching
Latency patterns and fallback triggers
A live Proof of Usefulness score that updates in real time

Why I built it

I was building AI features and had no way to prove they were
actually useful, only that they were running. Every judge,
investor, and team lead asks "is it working?" but nobody
could define what "working" really means for AI.

So I built a dashboard that answers that question with
real numbers.

Tech stack

Next.js 14 App Router
Upstash Redis (persistent serverless storage)
HuggingFace Inference API (fallback model)
Recharts (live visualizations)
Vercel edge deployment

Live stats so far

500+ real API requests tracked
100% AI success rate
Sub-500ms average latency across all routes
6 production API routes all returning 200

Try it and tell me what you think

Live demo: https://ai-pou-tracker.vercel.app
⭐ GitHub: https://github.com/Cloud-Architect-Emma/ai-pou-tracker

What AI production metrics matter most to you?
Drop them in the comments, I'm actively adding features.

Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders

Rohit — Sun, 24 May 2026 18:41:09 +0000

This is a submission for the Google I/O Writing Challenge

The developer track at Google I/O 2026 made one thing undeniably clear: the era of the simple AI chat assistant is over. We have officially entered the Agentic Era.

For independent developers, solo founders, and micro-SaaS builders who rely on high-velocity building—a development philosophy often called "vibe coding"—the headline launch of Google Antigravity 2.0 as a standalone desktop application represents a massive paradigm shift. It takes generative AI out of the isolated browser sidebar and morphs it into a fully contextualized, autonomous background engineering team.

Instead of treating AI as a glorified autocomplete tool, Antigravity 2.0 treats AI as an infrastructure orchestrator. Here is a deep technical breakdown of how this platform works under the hood, why its structural architecture changes how we write software, and how solo builders can leverage it to scale their output exponentially.

1. The Engine Layer: Why Gemini 3.5 Flash Changes the Economics of Agents

Building autonomous coding loops has historically faced two major bottlenecks: latency and cost. When an AI agent needs to read a repository, analyze a bug, write a fix, run a compiler, read the terminal error, and attempt a second fix, it consumes an enormous amount of tokens across multiple sequential calls. If the model is slow or expensive, the entire workflow becomes impractical for daily development.

Google bypassed this infrastructure bottleneck by co-optimizing Antigravity 2.0 around the newly released Gemini 3.5 Flash model.

Throughput Metrics: Clocking in at an incredible 289 output tokens per second, Gemini 3.5 Flash provides the rapid-fire inference required to sustain real-world agent loops without stalling your workflow.
Context Preservation via Event Compaction: Running long-horizon tasks usually risks exhausting context windows or spiking API costs. Antigravity 2.0 utilizes an engineering feature called Event Compaction. Instead of blindly truncating your conversation history, the system dynamically compresses older context blocks, saving up to 38% on token overhead during long debugging sessions.

2. Multi-Agent Orchestration & Parallel Engineering Pipelines

Traditional IDE extensions operate linearly: you prompt, you wait, you review a diff, and you click accept. If you need a backend database schema, an API route, and a matching frontend UI component, you generally have to hold the AI's hand through each step sequentially.

Antigravity 2.0 completely rewrites this lifecycle by introducing Multi-Agent Workflows and Dynamic Subagents.

               [ Main Antigravity Agent ]
                           │
       ┌───────────────────┼───────────────────┐
       ▼                   ▼                   ▼
[Subagent A: UI]   [Subagent B: Test]   [Subagent C: DB]
(React/Tailwind)   (Vitest/Regression)  (Prisma/Migration)

When you assign a macro-level objective to Antigravity, the primary agent evaluates the workspace and autonomously spawns specialized, sandboxed subagents to tackle distinct tasks in parallel:

Isolated Execution Environments: Subagents operate within persistent, secure remote Linux sandboxes. They can install dependencies, compile binaries, and execute code safely without clogging your local machine’s environment.
The Solo Founder Advantage: This architecture effectively transforms a single software engineer into a cross-functional development team. While your primary focus remains on high-level user experience, design feel, and core business logic, one background subagent can be actively writing edge-case regression tests, while another maps out a database migration pipeline.

3. Native Intent Control: Slash Commands for Real World Workflows

One of the greatest friction points in AI development is maintaining alignment—ensuring the model doesn't confidently refactor a critical piece of codebase into oblivion. Antigravity 2.0 handles this through explicit, engineering-focused intent controls built directly into the command interface:

/goal [task]: This initiates an asynchronous, long-horizon loop. It instructs the agent to run an entire multi-step task to absolute completion in the background, signaling you only when the objective is achieved or if it encounters a fatal blocker.
/grill-me: To combat hallucinations and misaligned logic, this command forces the agent to pause. It requires the AI to actively interview you, asking sharp architectural questions to clarify edge cases before it touches a single line of production code.
/browser: This grants the agent autonomous web-browsing permissions. If a subagent encounters an undocumented breaking change in a third-party framework library, it can independently scour updated web documentation, extract the correct syntax, and patch the codebase.

Furthermore, context is no longer isolated to a single file or a lone directory. Antigravity 2.0 handles multi-repository "Projects," allowing background agents to retain state, track global variables, and safely manage workspace directory permissions across complex, full-stack micro-SaaS setups.

The Strategic Takeaway for Micro-SaaS Founders

For independent builders looking to launch lean, low-overhead digital products, the structural shifts unveiled at Google I/O 2026 alter the competitive landscape. With the introduction of the accessible $100 Antigravity tier and native integrations with the Firebase Agent Skills bundle, managing underlying backend infrastructure is becoming fully automated.

The competitive advantage in software development is rapidly shifting. It is no longer about who can write boilerplate code or configure server routing the fastest; it is about who can best orchestrate autonomous AI pipelines to solve hyper-niche, real-world problems.

Antigravity 2.0 proves that the future of engineering isn't about writing code line-by-line—it's about directing a highly specialized, agentic system to build your vision at scale.

What are your thoughts on the Antigravity 2.0 standalone application? Are you planning to migrate your development stack to an agent-first environment, or do you prefer traditional IDE plugins? Let's discuss in the comments below!

터미널 AI 에이전트 구축 (v12)

matias yoon — Sun, 24 May 2026 18:40:28 +0000

터미널 AI 에이전트 구축 (v12)

터미널에서 직접 작동하는 AI 에이전트를 구축하여 개발 워크플로우를 최적화하세요. 이 가이드는 개발자들이 직접 구축하고 커스터마이징할 수 있는 실질적인 터미널 AI 에이전트를 제공합니다.

1. CLI AI 에이전트 생태계

현재 CLI AI 에이전트 생태계는 다음과 같은 주요 도구들로 구성되어 있습니다:

Aider

# 설치
pip install aider

# 기본 사용
aider --help

Continue.dev

# VS Code 확장으로 설치
# https://marketplace.visualstudio.com/items?itemName=Continue.continue

OpenCode

# GitHub 저장소에서 직접 설치
git clone https://github.com/open-code/open-code.git
cd open-code
pip install -e .

커스텀 스크립트

# 간단한 래퍼 스크립트 예제
#!/bin/bash
# aider-wrapper.sh
aider "$@" --model gpt-4-turbo

2. 로컬 LLM API 엔드포인트 설정

로컬 LLM을 사용하여 비용을 절감하고 성능을 향상시키세요:

Ollama 설치

# Ubuntu/Debian
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# 시작
ollama serve

# 모델 다운로드
ollama pull llama3
ollama pull codellama:7b

API 엔드포인트 생성

# llm_api.py
from flask import Flask, request, jsonify
import ollama

app = Flask(__name__)

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    prompt = data.get('prompt', '')
    model = data.get('model', 'llama3')

    response = ollama.chat(
        model=model,
        messages=[{'role': 'user', 'content': prompt}]
    )

    return jsonify({
        'response': response['message']['content']
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=11434)

3. 함수 호출 기능이 있는 Python CLI 에이전트

다음은 간단한 Python CLI 에이전트 예제입니다:

#!/usr/bin/env python3
# smart_agent.py
import subprocess
import json
import sys
from typing import Dict, List
import openai

class TerminalAgent:
    def __init__(self, api_key: str, model: str = "gpt-4-turbo"):
        self.client = openai.OpenAI(api_key=api_key)
        self.model = model

    def run_command(self, command: str) -> str:
        """명령어 실행"""
        try:
            result = subprocess.run(
                command, 
                shell=True, 
                capture_output=True, 
                text=True,
                timeout=30
            )
            return result.stdout
        except subprocess.TimeoutExpired:
            return "Command timed out"

    def call_function(self, function_name: str, args: Dict) -> str:
        """함수 호출"""
        if function_name == "run_command":
            return self.run_command(args.get("command", ""))
        elif function_name == "list_files":
            return self.run_command("ls -la")
        elif function_name == "git_status":
            return self.run_command("git status")
        else:
            return f"Unknown function: {function_name}"

    def process_request(self, user_prompt: str) -> str:
        """사용자 요청 처리"""
        system_prompt = """
        You are a terminal assistant. You can execute shell commands and 
        return their output. You have access to these functions:
        - run_command(command): execute shell command
        - list_files(): list all files in current directory
        - git_status(): show git status
        """

        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                functions=[
                    {
                        "name": "run_command",
                        "description": "Execute shell command",
                        "parameters": {
                            "type": "object",
                            "properties": {
                                "command": {"type": "string"}
                            },
                            "required": ["command"]
                        }
                    },
                    {
                        "name": "list_files",
                        "description": "List all files in directory",
                        "parameters": {
                            "type": "object",
                            "properties": {}
                        }
                    },
                    {
                        "name": "git_status",
                        "description": "Show git status",
                        "parameters": {
                            "type": "object",
                            "properties": {}
                        }
                    }
                ],
                function_call="auto"
            )

            # 함수 호출 처리
            if response.choices[0].finish_reason == "function_call":
                function_call = response.choices[0].message.function_call
                result = self.call_function(
                    function_call.name,
                    json.loads(function_call.arguments)
                )

                return f"Command result: {result}"
            else:
                return response.choices[0].message.content

        except Exception as e:
            return f"Error: {str(e)}"

def main():
    if len(sys.argv) < 2:
        print("Usage: python smart_agent.py \"your prompt here\"")
        sys.exit(1)

    agent = TerminalAgent(api_key="your-api-key-here")
    prompt = sys.argv[1]
    result = agent.process_request(prompt)
    print(result)

if __name__ == "__main__":
    main()

4. tmux와 통합

터미널 멀티플렉서와의 통합을 통해 더 효율적인 워크플로우 구축:

tmux 스크립트 생성

#!/bin/bash
# tmux_agent.sh
# 새로운 세션 생성
tmux new-session -d -s agent-session

# 창 분할
tmux split-window -h
tmux split-window -v

# 첫 번째 창에 에이전트 실행
tmux send-keys -t agent-session:0.0 "python smart_agent.py" Enter

# 두 번째 창에 코드 편집기 열기
tmux send-keys -t agent-session:0.0 "vim" Enter

# 세션 연결
tmux attach -t agent-session

tmux와 에이전트 통신

# tmux_integration.py
import subprocess
import json

class TmuxAgent:
    def __init__(self, session_name: str = "agent-session"):
        self.session_name = session_name

    def send_to_window(self, window_index: int, command: str):
        """특정 창에 명령어 전송"""
        subprocess.run([
            "tmux", "send-keys", "-t", f"{self.session_name}:{window_index}", 
            command, "Enter"
        ])

    def create_window(self):
        """새 창 생성"""
        subprocess.run([
            "tmux", "new-window", "-t", self.session_name
        ])

    def get_window_output(self, window_index: int):
        """창의 출력 얻기"""
        result = subprocess.run([
            "tmux", "capture-pane", "-p", "-t", f"{self.session_name}:{window_index}"
        ], capture_output=True, text=True)
        return result.stdout

# 사용 예제
agent = TmuxAgent("my-session")
agent.send_to_window(0, "ls -la")

5. 맞춤형 도구 개발

코드 검색 도구


python
# code_search.py
import os
import re
from typing import List, Dict

class CodeSearcher:
    def __init__(self, root_dir: str = "."):
        self.root_dir = root_dir

    def find_files(self, pattern: str, file_types: List[str] = None) -> List[str]:
        """파일 찾기"""
        matches = []
        for root, dirs, files in os.walk(self.root_dir):
            for file in files:
                if file_types and not any(file.endswith(ft) for ft in file_types):
                    continue
                if re.search(pattern, file):
                    matches.append(os.path.join(root, file))
        return matches

    def search_in_files(self, pattern: str, file_extensions: List[str] = None) -> Dict[str, List[str]]:
        """파일 내에서 패턴 검색"""
        results = {}
        search_files = self.find_files(r".*", file_extensions)

        for file_path in search_files:
            try:
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()
                    matches = re.finditer(pattern, content)
                    line_numbers = [m.start() for m in matches]

                    if line_numbers:
                        results[file_path] = line_numbers
            except Exception:
                continue
        return results

# 사용 예제
searcher =

---

📥 **Get the full guide on Gumroad**: https://gumroad.com/l/auto ($5)

Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers)

baroque Ai — Sun, 24 May 2026 18:40:26 +0000

I recently worked on a project where I needed reliable Instagram data inside a backend workflow.

At first, I tried the usual approaches:

Direct scraping
Browser automation
instagrapi
Rotating proxies

It worked… until it didn’t.

Instagram constantly changes things. Sessions expire, accounts get flagged, rate limits appear randomly, and maintenance becomes a bigger problem than the actual product you're building.

That’s when I started using HikerAPI — a REST Instagram API with simple authentication using an x-access-key header.

The pricing starts from $0.001/request and includes 100 free requests, which made it easy to test before integrating it into production.

The Problem with Traditional Instagram Scraping

If you've built anything around Instagram data before, you probably know the pain points:

Login challenges
Session invalidation
Proxy management
CAPTCHA issues
Random breaking changes
Rate limiting
Infrastructure overhead

Libraries like instagrapi are useful, especially for prototypes and personal automation, but they still depend on reverse-engineered private APIs.

That means:

Your app can break unexpectedly
Reliability becomes your responsibility
Scaling gets harder over time

For hobby projects, that might be acceptable.

For client work or production systems, it becomes risky.

Why I Tried HikerAPI

I wanted something simpler:

REST endpoints
No browser automation
No managing Instagram accounts
No proxy rotation
Easy backend integration

The biggest advantage for me was speed of integration.

Instead of spending time debugging scraping logic, I could focus on building features.

Quick Start

The API authentication is extremely straightforward.

Here’s the exact Python example I used to test it:

import requests

headers = {"x-access-key": "YOUR_KEY"}

r = requests.get(
"https://api.hikerapi.com/v2/user/by/username?username=instagram",
headers=headers
)

print(r.json())

That’s it.

No cookies.
No login sessions.
No Selenium.

Real Use Case: Instagram Lead Discovery

One real use case I explored was finding niche creators and business accounts for .

The workflow looked something like this:

Search accounts by username or keyword
Pull profile metadata
Store results in a database
Run filtering logic
Display curated profiles in a dashboard

A simplified version:

import requests

API_KEY = "YOUR_KEY"

headers = {
"x-access-key": API_KEY
}

usernames = [
"instagram",
"nike",
"natgeo"
]

for username in usernames:
response = requests.get(
f"https://api.hikerapi.com/v2/user/by/username?username={username}",
headers=headers
)

data = response.json()

print({
    "username": data.get("username"),
    "followers": data.get("follower_count"),
    "verified": data.get("is_verified")
})

This made it easy to plug Instagram data into an existing backend pipeline.

Comparing HikerAPI vs instagrapi

Here’s the honest tradeoff.

HikerAPI Advantages
Much faster setup
Cleaner REST architecture
No session management
No proxy maintenance
Easier scaling
Better for backend products and SaaS apps
instagrapi Advantages
More control
Potentially cheaper at scale
Good for experiments
Works well for personal tooling
The Tradeoff

With HikerAPI, you’re paying for convenience and reliability.

With scraping libraries, you save money but spend more engineering time maintaining infrastructure.

For me, the decision depended on the project.

If I’m building:

an MVP,
a client project,
a production workflow,
or something time-sensitive,

…I’d rather use a managed API than babysit scrapers.

Performance Notes

The response times were fast enough for typical backend usage in my project.

I also liked that I could integrate it directly into:

Flask APIs
FastAPI services
Node.js backends
scheduled jobs
data pipelines

without needing a separate scraping server.

Things to Keep in Mind

A few honest considerations:

You still need to handle rate limits properly
External APIs introduce dependency risk
Costs can add up at very large scale
You’re relying on a third-party service

That said, the engineering time saved was worth it for my use case.

Final Thoughts

If you're experimenting with Instagram automation, analytics, creator tools, or lead generation, there are basically two paths:

Option 1: Build and maintain scrapers yourself

More control, more maintenance.

Option 2: Use a managed API

Less infrastructure pain, faster development.

For my project, HikerAPI helped me move faster and spend more time building actual product features instead of debugging Instagram internals.

If you've been fighting with proxies, session cookies, or broken scraping scripts lately, it's worth trying the free requests just to compare the experience yourself.

Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory

lweiss01 — Sun, 24 May 2026 18:40:21 +0000

TL;DR: AI coding agent memory should live in the repository, not the chat window. Bigger context windows and vector databases are solving the wrong problem. Here is the case for treating the repo itself as the durable cognitive surface.

Everyone is trying to solve AI agent memory right now.

Longer context windows.
Vector databases.
Conversation replay.
Semantic retrieval.
Infinite transcripts.

But after spending months building workflows across Claude, Codex, Gemini, Cursor, and other coding agents, I've started to think we may be treating the wrong thing as the source of truth.

The problem is not:

"How do we make the model remember everything forever?"

The problem is:

"How does a software project remain cognitively coherent across sessions, compaction, agent switches, and time?"

Those are very different problems.

The Context Window Is Not Durable Infrastructure

Modern AI coding workflows are surprisingly fragile.

An agent works for hours. The context window fills up. Compaction happens. Then suddenly:

architectural reasoning disappears
unresolved work gets forgotten
regressions come back
agents undo each other
humans re-explain the same context repeatedly

The industry response so far has mostly been: store more. Bigger context windows, vector databases, hosted memory services, semantic retrieval over giant transcripts.

But transcripts are not understanding.

And replaying giant chat histories is not the same thing as preserving operational continuity.

In practice, most coding workflows do not fail because information disappeared entirely. They fail because the important state was never extracted from the conversation in the first place.

Checkpoints, Not Transcripts

The idea I have been exploring is pretty simple:

Instead of preserving entire conversations forever, preserve structured checkpoints at meaningful moments.

Not:

every token
every thought
every conversational detour

But the things that actually matter:

current state
architectural decisions
unresolved threads
regression risks
next recommended actions
implementation reasoning
handoff context

The checkpoint becomes the durable source of truth.

The live context window becomes disposable working memory.

That distinction changes a lot.

The Repo Should Remember

One realization that kept hitting me while working across multiple coding agents:

The repository itself is the only thing that actually persists.

Agents change.
Models change.
Sessions end.
Windows compact.

But the repo stays.

So instead of treating continuity as something trapped inside a chat session, I started treating continuity as a repo-native concern.

That means:

continuity artifacts live in the repo
handoffs live in the repo
operational state lives in the repo
regression memory lives in the repo
checkpoints live in the repo

The repo remembers, not the window.

Multi-Agent Development Is Already Here

A lot of tooling still assumes:

one human, one agent, one session.

That is not how many people are actually working anymore.

Real workflows increasingly look like:

Claude for architecture
Codex for implementation
Cursor for iteration
Gemini for exploration
a human reviewing all of it
another session tomorrow continuing the work

Continuity is no longer just memory. It is coordination across interchangeable execution surfaces. And once you frame it that way, the chat window stops looking like the right place to store anything important.

AI Agents Are Temporary. Repositories Persist.

I think we are entering a phase where software repositories themselves become cognitive systems:

accumulating decisions
preserving continuity
coordinating work
surviving agent turnover
carrying operational memory forward over time

Not because the models became infinitely smart.

But because the continuity stopped depending entirely on the model session.

That is the direction I have been exploring with Holistic, an open-source CLI for repo-native continuity across agents: https://github.com/lweiss01/holistic

Still early. Still evolving quickly. If you are working across multiple coding agents and running into the continuity problem, I would genuinely love feedback, critiques, or just a conversation about how you are solving it.

The repo remembers, not the window.

From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users

Chandrakant Kelgire — Sun, 24 May 2026 18:34:41 +0000

Hey Dev Community!A few weeks ago I quietly dropped a small project — an AI-powered tool for students. Today, I’m back with an update that genuinely surprised me.pptmaker.co.in has now helped 10,000+ students create presentations and resumes in record time.What Changed Since the First Post?Added PDF to PPT conversion (this feature exploded in usage)
Improved Gemini prompts → much better content quality and structure
Launched Company Vault — pre-loaded ATS keywords for TCS, Infosys, Wipro, Accenture, etc.
Added multiple modern themes (including Glassmorphism)
Better mobile experience

The Most Heartwarming FeedbackStudents are using it at 2 AM before seminars.
One guy told me he submitted his final-year project presentation 10 minutes before the deadline — fully AI-generated and edited.Another fresher landed his first interview because the resume looked professional and passed ATS easily.Technical Lessons I LearnedGemini is incredibly fast but you have to be very specific with system prompts.
Vercel + React is perfect for quick MVPs, but I’m now thinking about caching and rate limiting.
Indian students don’t want “beautiful” tools — they want fast + practical + free.

What’s Next? (Roadmap)Multi-language support (Hindi + English)
Presentation coach (AI feedback on content & delivery)
Batch PDF to PPT (upload multiple files)
Export to Google Slides directly

If you’re a student, developer, or someone who hates making slides — I’d love you to try it and roast it honestly. https://www.pptmaker.co.in What should I build next? Drop your suggestions below #webdev #AI #students #indiehacker #buildinpublic #GeminiAI

Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead.

Sriramprabhu Rajendran — Sun, 24 May 2026 18:34:17 +0000

Imagine this scenario at your next sprint review meeting: You're looking good on your velocity graph. But half your team is struggling in their own little hell. Estimations have devolved into Russian roulette. You get a "2-point" done in 45 minutes. You have another "2-point" that takes 4 days because AI-written code introduced a nasty bug in staging that was missed.

This isn't an effort issue; it’s a problem of predictability. Story points, meant for predicting the performance of stable humans, don’t even know what to do with this situation.

I believe it’s time to ditch story points. Let me explain how.

Why I Think the Old System Is Defective

"Story points are a commitment to the idea 'that task is about twice as hard as this task.'" These assumptions include the following:

The developer has approximately the same capacity
The level of complexity correlates well to duration

AI breaks both assumptions.

Throughput is no longer fixed. The same developer, same work, same day—make one tweak to the model and next thing you know they’re getting things done in 40% less time. Or something goes wrong with the model and it takes them two days to fix an hallucinated abstraction.

Complexity not linked to task duration. A very difficult task could become very easy if the AI does it right. But a very simple task could become very difficult if the AI gets it wrong. The variance is much larger than the mean—and story points measure only the mean.

Here’s what I believe a normal team would be seeing after a couple of sprints using AI:

Task Type         | Pre-AI Avg  | Post-AI Range
------------------|-------------|----------------
API endpoint      | 2 days      | 3 hours – 2.5 days
DB migration      | 3 days      | 1 day – 4 days  
UI component      | 1.5 days    | 30 min – 3 days
Legacy refactor   | 5 days      | 2 days – 8 days

If the range exceeds the estimation, then the estimation is noise.

The Hidden Cost That No One Is Estimating

This is my hypothesis on what most teams are failing to estimate: verifying the AI output is the highest-cost activity for almost any task.

A real-time estimate of costs involved would be something like this:

20% of the time: figuring out what to build
15% of the time: the AI building it
65% of the time: going through it, looking for problems

And that third one—the curation tax—falls into an unseen category for everyone. They're estimating the time spent on construction and not on curation. That would be like planning a home renovation based only on how fast someone bangs a hammer.

If teams started taking review-and-validation seriously as part of their estimate process, I'm convinced their accuracy will improve drastically.

What I Suggest to Replace Story Points

1. Confidence-Tagged Estimates

What I suggest: Every ticket must be provided with two things—a time estimate and a confidence tag.

ticket: INDE-002 — Migrate auth service to new SDK
estimate: 1.5 days
confidence: low
reason: "New SDK, AI hasn't seen our auth patterns before"
action: spike first (0.5 day), then re-estimate

The confidence tag is the important part. It tells the PM whether to trust the number or treat it as a hypothesis.

Three bands:

Tag	What it means	How to plan
`high`	Done this before with AI, know the variance	Plan on the estimate
`medium`	Familiar territory, some unknowns	Buffer 2x
`low`	Novel task or AI-unfriendly domain	Spike first, don't commit

My recommendation for the rule: always spike a ticket marked low before adding it to a sprint. My guess is that just one rule can prevent almost all sprint meltdowns in AI-driven teams.

2. "Free Second Time" Paradigm

It’s interesting how I notice a certain pattern: when doing any task for the first time with the help of artificial intelligence, it’s very costly. But the second time the same is done, the cost is reduced by 60%. And by the third time, it becomes pretty much negligible.

How can this happen? Well, the first attempt makes one develop a specific workflow – an optimal prompt structuring, a proper context window configuration, and everything that might go wrong during execution.

That being said, I believe the cost estimation should be done in a different way:

First instance of task type:  estimate × 2 (you're building the workflow)
Second instance:              estimate × 0.8 (refining the workflow)
Third+ instance:              estimate × 0.4 (executing the workflow)

Example: consider migration of 12 microservices using a new observability SDK.

Service 1: 6 hours (thinking about how to do it, writing prompts)
Service 2: 2.5 hours (fine-tuning, dealing with edge cases)
Services 3-12: ~45 minutes each (batches in bulk using known procedure)

Old estimate: 12 x 4 hours = 48 hours. This approach: ~16 hours. But only if you put in the effort on service 1 rather than rushing through.

3. Review-Weighted Sizing

I don’t believe that one should size by "how difficult will it be to develop." Instead, one must size by "how difficult will it be to verify?"

The easiest pieces to create are often very difficult to review (large refactors, verbose migrations), while difficult pieces to generate are simple to review (small algorithmic fixes with explicit test cases).

This sizing rubric must be inverted:
| Old thinking | New thinking |
|-------------|-------------|
| "Lots of code = big ticket" | "Lots of code to review = big ticket" |
| "Complex logic = big ticket" | "Ambiguous correctness criteria = big ticket" |
| "New framework = big ticket" | "AI-unfamiliar patterns = big ticket" |

500-lines of boilerplate migration needs to be large not because it’s difficult to generate, which an AI can do within minutes, but because checking for nuanced differences in 500 lines of code is truly costly.

How This Changes The PM Conversation

The hardest part of any estimation paradigm shift isn’t technical. It’s explaining the change.

Old conversation:

"This epic is 34 story points. At our velocity of 21/sprint, it’ll take ~1.6 sprints."

Where I think this discussion needs to go:

"This epic has 8 tickets. 5 of which are high-confidence tickets (we’re going to meet the estimates here). 2 are medium confidence (double our estimates). 1 is low confidence (we need a spike day to be sure about that). Optimistic estimate: 1 sprint. Pessimistic: 1.5 sprints. What if the low-confidence ticket is a problem? 2.5 sprints."

Bigger sentences? Yes. More useful? Absolutely. PMs have a choice to make now: "Pull out the low-confidence ticket and ship everything else on time" is now a discussion that you can have.

Metrics Worth Tracking Instead

Velocity as a metric should be scrapped in favor of more useful measures, which I would propose to track include the following:

Curating rate – proportion of review time vs creation time. Goal: below 3:1.
Confidence success rate – proportion of 'high' tickets that make into the estimate.
Process reuse rate – frequency of reusing a process for second similar task vs creating anew.
Spike conversion rate – after spike how often 'low' ticket turns into 'medium' or 'high'.

These measures will inform about the progress of the team in collaboration with the AI, as opposed to going 'fast'.

TL;DR – The Replacement Kit

If you are still Fibonacci-estimating stories in 2026 and asking why sprints are akin to playing Russian roulette, here’s my suggestion:

Confidence tagging for estimates — confidence level will matter more than estimate itself
Curation effort estimation vs Construction effort estimation — curation will be the hard work
Novelty tracking for each task — new tasks are 2-3 times costlier than recurring tasks
Task size based on difficulty of reviewing and not generation — reverse the complexity paradigm
Spike before undertaking high uncertainty tasks — one simple rule to massively reduce blowups

In my mind, however, the fundamental change that needs to happen is as follows: before, estimation was focused on how much time it takes to build something. Today, it’s time to focus on how much time it takes to validate it. Change the paradigm, and suddenly, sprint planning starts reflecting reality.

Obviously just one way of seeing things—there are many brilliant minds out there who have figured out how to make story points work using AI-driven adjustments. Where do you stand: adding to the old system or building a new one from scratch?