Show HN: Detect any object in satellite imagery using a text prompt
8 by eyasu6464 | 1 comments on Hacker News.
I built a browser-based tool that uses Vision-Language Models (VLMs) to detect objects in satellite imagery via natural language prompts. Draw a polygon on the map, type what you want to find (e.g., "swimming pools," "oil tanks," "solar panels"), and the system scans tile-by-tile, projecting bounding boxes back onto the globe as GeoJSON. The pipeline: pick zoom level + prompt → slice map into mercantile tiles → feed each tile + prompt to VLM → create bounding boxes → project to WGS84 coordinates → render on map. No login required for the demo. Works well for distinct structures zero-shot; struggles with dense/occluded objects where narrow YOLO models still win.
The Donald Trump
Thursday, 12 March 2026
New top story on Hacker News: Atlassian CEO: AI doesn't replace people here, but we're firing them anyway
Atlassian CEO: AI doesn't replace people here, but we're firing them anyway
7 by layer8 | 1 comments on Hacker News.
7 by layer8 | 1 comments on Hacker News.
Wednesday, 11 March 2026
New top story on Hacker News: Tested: How Many Times Can a DVD±RW Be Rewritten? Methodology and Results
Tested: How Many Times Can a DVD±RW Be Rewritten? Methodology and Results
16 by giuliomagnifico | 0 comments on Hacker News.
16 by giuliomagnifico | 0 comments on Hacker News.
Tuesday, 10 March 2026
New top story on Hacker News: Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
111 by sanchitmonga22 | 37 comments on Hacker News.
Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead. Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys. To get started: brew tap RunanywhereAI/rcli https://ift.tt/PXD8REu brew install rcli rcli setup # downloads ~1 GB of models rcli # interactive mode with push-to-talk Or: curl -fsSL https://ift.tt/OT3hJvX | bash The numbers (M4 Max, 64 GB, reproducible via `rcli bench`): LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper. TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx. We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is. The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind. We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together. MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology: LLM benchmarks: https://ift.tt/A1v4mSL... Speech benchmarks: https://ift.tt/jrnxzbm... How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly. Voice Pipeline optimizations details: https://ift.tt/IPb5czW... RAG optimizations: https://ift.tt/SKavQol... RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed. Source: https://ift.tt/c8A35LP (MIT) Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg What would you build if on-device AI were genuinely as fast as cloud?
111 by sanchitmonga22 | 37 comments on Hacker News.
Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead. Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys. To get started: brew tap RunanywhereAI/rcli https://ift.tt/PXD8REu brew install rcli rcli setup # downloads ~1 GB of models rcli # interactive mode with push-to-talk Or: curl -fsSL https://ift.tt/OT3hJvX | bash The numbers (M4 Max, 64 GB, reproducible via `rcli bench`): LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper. TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx. We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is. The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind. We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together. MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology: LLM benchmarks: https://ift.tt/A1v4mSL... Speech benchmarks: https://ift.tt/jrnxzbm... How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly. Voice Pipeline optimizations details: https://ift.tt/IPb5czW... RAG optimizations: https://ift.tt/SKavQol... RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed. Source: https://ift.tt/c8A35LP (MIT) Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg What would you build if on-device AI were genuinely as fast as cloud?
Subscribe to:
Comments (Atom)