Quesma BlogExploring reinforcement learning, RLVR, AI evals, and what it takes to make AI agents production-readyhttps://quesma.com/en-us© 2026 Quesma Inc.Wed, 06 May 2026 11:46:24 GMTAstro v5Compare harnesses not models: Blitzy vs GPT-5.4 on SWE-Bench Prohttps://quesma.com/blog/verifying-blitzy-swe-bench-pro/https://quesma.com/blog/verifying-blitzy-swe-bench-pro/An independent audit of agentic scaffolding and harnesses. We analyze how agent workflows, codebase documentation, and test verification impact performance compared to raw base models like GPT-5.4, Gemini 3.1 Pro, and Claude Code.Tue, 07 Apr 2026 00:00:00 GMTPiotr Migdał, Piotr GrabowskiReviving a 20-year-old puzzle game Chromatron with Ghidra and AIhttps://quesma.com/blog/chromatron-recompiled/https://quesma.com/blog/chromatron-recompiled/Decompiling the classic puzzle game with lasers Chromatron from WinXP and PowerPC executables into Rust. A pixel-perfect porting using Claude Code, Opus 4.6, Cursor, GPT-5.2-Codex, and Ghidra.Wed, 04 Mar 2026 00:00:00 GMTreverse-engineeringrustwasmghidraai-codingPiotr MigdałWe hid backdoors in ~40MB binaries and asked AI + Ghidra to find themhttps://quesma.com/blog/introducing-binaryaudit/https://quesma.com/blog/introducing-binaryaudit/BinaryAudit benchmarks AI agents using Ghidra to find backdoors in compiled binaries of real open-source servers, proxies, and network infrastructure.Tue, 10 Feb 2026 12:00:00 GMTaibenchmarkingsecurityreverse-engineeringllmbackdoorsghidraPiotr Grabowski, Rafał Strzaliński, Michał Kowalczyk, Piotr Migdał, Jacek MigdalReverse engineering River Raid with Claude, Ghidra, and MCPhttps://quesma.com/blog/ghidra-mcp-unlimited-lives/https://quesma.com/blog/ghidra-mcp-unlimited-lives/Connecting Claude to Ghidra via MCP to reverse engineer River Raid. A test of AI agents against 6502 assembly, memory mapping, and 80s game logic.Fri, 23 Jan 2026 00:00:00 GMTRafal StrzalinskiBenchmarking OpenTelemetry: Can AI trace your failed login?https://quesma.com/blog/introducing-otel-bench/https://quesma.com/blog/introducing-otel-bench/A lot of vendors pitch AI SRE. We tested 14 models across 11 programming languages; even the best ones struggle with instrumenting code with the leading open-source standard, OpenTelemetry.Sun, 18 Jan 2026 12:00:00 GMTaibenchmarkingopentelemetryobservabilityllminstrumentationtracingPrzemek Delewski, Rafał Strzaliński, Piotr Migdał, Jacek MigdałVibe coding needs git blamehttps://quesma.com/blog/vibe-code-git-blame/https://quesma.com/blog/vibe-code-git-blame/Prompts are specs, not code. This influences git workflows for vibe coding: tracking LLM prompts in GitHub repositories, managing commit messages, and debugging non-deterministic AI outputs.Fri, 09 Jan 2026 09:37:52 GMTPiotr MigdałHow 2025 took AI from party tricks to production toolshttps://quesma.com/blog/year-of-ai-2025/https://quesma.com/blog/year-of-ai-2025/AI reasoning models like DeepSeek-R1, agentic coding tools like Claude Code, and image generation with Nano Banana Pro set daily software engineering standards.Sat, 03 Jan 2026 00:00:00 GMTPiotr MigdałMigrating CompileBench to Harbor: standardizing AI agent evalshttps://quesma.com/blog/compilebench-in-harbor/https://quesma.com/blog/compilebench-in-harbor/Standardizing AI agent evaluation with Harbor: an open-source framework for reproducible benchmarks, reinforcement learning, and collaborative evals.Sun, 21 Dec 2025 00:00:00 GMTCompileBenchHarborAI AgentsBenchmarksOpen SourceLLMPrzemysław Hejman, Piotr MigdałAntigravity feels heavy and Claude Skills are lighthttps://quesma.com/blog/claude-skills-not-antigravity/https://quesma.com/blog/claude-skills-not-antigravity/Comparing Google Antigravity and Claude Code for AI-assisted workflows, and why custom Claude Skills might be the better approach.Tue, 16 Dec 2025 14:43:52 GMTaiclaudegeminicodingtoolsPiotr MigdałOutside of the bubble, AI is Black Mirrorhttps://quesma.com/blog/ai-is-black-mirror/https://quesma.com/blog/ai-is-black-mirror/A reality check on AI enthusiasm: how a simple chart generated vitriolic reactions outside the tech bubble.Fri, 05 Dec 2025 10:55:36 GMTPiotr MigdałNano Banana Pro: raw intelligence with tool usehttps://quesma.com/blog/nano-banana-pro-intelligence-with-tools/https://quesma.com/blog/nano-banana-pro-intelligence-with-tools/Finally, an AI that can draw a map or create an infographic. The capability of leveraging tools pushed the frontier of image generation.Mon, 24 Nov 2025 16:00:00 GMTAIimage-generationtool-useJacek Migdal, Piotr MigdałA postmortem on our $2.5M database gateway: lessons from pilot purgatoryhttps://quesma.com/blog/database-gateway-postmortem/https://quesma.com/blog/database-gateway-postmortem/We had a great team, $2.5M, and a validated problem. A year later, we sold our IP for parts. Here’s what we learned about urgency and co-foundership.Tue, 04 Nov 2025 11:04:56 GMTstartupproduct-market-fitdatabase-gatewaylessons-learnedJacek MigdalThe security paradox of local LLMshttps://quesma.com/blog/local-llms-security-paradox/https://quesma.com/blog/local-llms-security-paradox/Local LLMs prioritize privacy over security. Our research reveals a 95% backdoor injection success rate.Tue, 21 Oct 2025 00:00:00 GMTsecurityAIcode-generationvulnerabilityred-teamJacek MigdalAI for coding is still playing Go, not StarCrafthttps://quesma.com/blog/coding-is-starcraft-not-go/https://quesma.com/blog/coding-is-starcraft-not-go/AI excels at clean algorithms but fails at messy, real-world codebases. The solution lies not in Go-like intelligence, but StarCraft-like complexity.Tue, 14 Oct 2025 00:00:00 GMTaiPiotr MigdałCompileBench: Can AI Compile 22-year-old Code?https://quesma.com/blog/introducing-compilebench/https://quesma.com/blog/introducing-compilebench/We tested 19 LLMs on their ability to handle real-world software engineering tasks like compiling old code and cross-compiling. See how Anthropic, OpenAI, and Google models stack up in our new benchmark – CompileBench.Wed, 17 Sep 2025 14:06:56 GMTaibenchmarkingtestingllmperformancecompilationclitoolingPiotr GrabowskiTau² Benchmark: How a Prompt Rewrite Boosted GPT-5-mini by 22%https://quesma.com/blog/tau2-benchmark-improving-results-smaller-models/https://quesma.com/blog/tau2-benchmark-improving-results-smaller-models/We expected small models to be fast, but our benchmarks revealed a common reliability trap. Here’s our deep dive on finding and fixing it.Fri, 12 Sep 2025 00:00:00 GMTPrzemysław Hejman60-Second Linux Analysis - with Nix and LLMshttps://quesma.com/blog/60s-linux-analysis-nix-llms/https://quesma.com/blog/60s-linux-analysis-nix-llms/A one-line command to diagnose server health. Uses Nix to fetch tools without sudo and an LLM to summarize the output. No installation required.Fri, 05 Sep 2025 00:00:00 GMTlinuxperformancedevopsnixautomationcliAIPiotr GrabowskiTau²: From LLM Benchmark to Blueprint for Testing AI Agentshttps://quesma.com/blog/tau2-from-llm-benchmark-to-blueprint-for-testing-ai-agents/https://quesma.com/blog/tau2-from-llm-benchmark-to-blueprint-for-testing-ai-agents/Deep dive into the Tau² benchmark that goes beyond LLM evaluation to reveal innovative methodologies for testing AI agentic systems in realistic scenarios. Learn how this framework can transform how we test AI-powered software.Thu, 28 Aug 2025 00:00:00 GMTaibenchmarkingtestingPrzemysław HejmanObservability in Go: What Real Engineers Are Saying in 2025https://quesma.com/blog/observability-in-go-what-real-engineers-are-saying-in-2025/https://quesma.com/blog/observability-in-go-what-real-engineers-are-saying-in-2025/Learn how Go practitioners ship telemetry in 2025 – what works, what hurts, and the tools, workflows, and guardrails they rely on for metrics, traces, and logs.Thu, 14 Aug 2025 00:00:00 GMTPrzemek DelewskiSandboxing AI-Generated Code: Why We Moved from WebR to AWS Lambdahttps://quesma.com/blog/sandboxing-ai-generated-code-why-we-moved-from-webr-to-aws-lambda/https://quesma.com/blog/sandboxing-ai-generated-code-why-we-moved-from-webr-to-aws-lambda/Why we moved our AI chart generator from in-browser WebR (WASM) to AWS Lambda. A case study on the real-world trade-offs of running AI-generated R and ggplot2 code.Thu, 07 Aug 2025 00:00:00 GMTPiotr Migdał, Przemysław HejmanClaude Code + OpenTelemetry + Grafana: A guide to tracking usage and limitshttps://quesma.com/blog/track-claude-code-usage-and-limits-with-grafana-cloud/https://quesma.com/blog/track-claude-code-usage-and-limits-with-grafana-cloud/Learn to monitor Claude Code costs, tokens, and latency in 5 minutes using its native OpenTelemetry support with Grafana Cloud.Thu, 31 Jul 2025 00:00:00 GMTclaude-codeopentelemetrygrafanamonitoringobservabilityRafał StrzalińskiWhich chart would you swipe right?https://quesma.com/blog/which-chart-would-you-swipe-right/https://quesma.com/blog/which-chart-would-you-swipe-right/Same dating data, 5 different charts: Do you prefer the academic journal, The Economist, a Redditor's take, or creating your own with AI?Thu, 24 Jul 2025 00:00:00 GMTdata-visualizationchartsdesignPiotr Migdał, Cezary Piwowarczyk5 Grafana in Docker examples to get started with metrics, logs, and traceshttps://quesma.com/blog/5-grafana-docker-examples-to-get-started-with-metrics-logs-and-traces/https://quesma.com/blog/5-grafana-docker-examples-to-get-started-with-metrics-logs-and-traces/Learn how to spin up Grafana in Docker in 5 real-world examples - Prometheus metrics, Loki logs, Tempo traces, and Pyroscope profiling.Wed, 23 Jul 2025 00:00:00 GMTgrafanadockerprometheuslokiobservabilitymetricslogstracesPrzemysław HejmanBuilding Grafana dashboards with AI, CLI and a bit of pragmatismhttps://quesma.com/blog/building-grafana-dashboards-ai-cli/https://quesma.com/blog/building-grafana-dashboards-ai-cli/Discover a pragmatic approach to building Grafana dashboards more efficiently by combining the power of AI with the automation of a CLI workflow.Mon, 14 Jul 2025 00:00:00 GMTgrafanaaiclidashboardsobservabilityPiotr Migdał, Piotr GrabowskiHow does the Data Council conference keep finding future billion-dollar companies?https://quesma.com/blog/how-does-the-data-council-conference-keep-finding-future-billion-dollar-companies/https://quesma.com/blog/how-does-the-data-council-conference-keep-finding-future-billion-dollar-companies/Data Council 2025 blended a unique Masonic temple venue with deep-tech talks. Discover insights from future billion-dollar companies before they make headlines.Tue, 01 Jul 2025 00:00:00 GMTdata-councilconferencestartupsanalyticsdata-infrastructureJacek MigdalDon't Let Apache Iceberg Sink Your Analytics: Practical Limitations in 2025https://quesma.com/blog/apache-iceberg-practical-limitations-2025/https://quesma.com/blog/apache-iceberg-practical-limitations-2025/Apache Iceberg is a powerful open table format, but has practical limitations. Learn about its challenges with small data, metadata overhead, and real-time use cases.Thu, 22 May 2025 00:00:00 GMTapache-icebergdata-lakeanalyticsdata-architecturelimitationsJacek MigdalHighlights from the Iceberg Summit 2025https://quesma.com/blog/highlights-iceberg-summit-2025/https://quesma.com/blog/highlights-iceberg-summit-2025/Key insights from Iceberg Summit 2025 on Apache Iceberg's evolution, including Table Spec V3, PyIceberg, Go, and its central role in modern data lakehouses.Tue, 06 May 2025 00:00:00 GMTApache IcebergData LakeSummitData ArchitecturePrzemek DelewskiFebruary 2025 Newsletterhttps://quesma.com/blog/february-2025-newsletter/https://quesma.com/blog/february-2025-newsletter/February's tech roundup: ElasticON takeaways, Databricks' $10B funding, and IBM's DataStax buyout. We also dive into ClickHouse's price hike and AIOps failures.Wed, 26 Feb 2025 00:00:00 GMTQuesma TeamLessons from the pre-LLM AI in Observability: Anomaly Detection and AIOps vs P99https://quesma.com/blog/aiops-observability/https://quesma.com/blog/aiops-observability/See how a decade of artificial intelligence hype missed the mark in tooling and learn statistical tricks insiders use to spot trouble faster than alert storms.Mon, 24 Feb 2025 00:00:00 GMTAIObservabilityAIOpsMachine LearningMonitoringJacek MigdalClickHouse Cloud Pricing Change in January 2025: A Price Hike with Many Tweakshttps://quesma.com/blog/clickhouse-pricing/https://quesma.com/blog/clickhouse-pricing/We estimate a 30% price increase for a typical production workload. This article explains the complex pricing change announced on January 27, 2025, and provides a practical guide to what's changing.Mon, 27 Jan 2025 00:00:00 GMTclickhousepricingcloudanalyticsJacek MigdalJanuary 2025 newsletterhttps://quesma.com/blog/january-2025-newsletter/https://quesma.com/blog/january-2025-newsletter/Get the latest on Quesma's developments, including key product updates, our 2025 roadmap insights, and how AI is shaping the future of data analytics.Tue, 21 Jan 2025 00:00:00 GMTQuesma TeamTeam updatehttps://quesma.com/blog/team-update/https://quesma.com/blog/team-update/Quesma announces a leadership change as co-founder transitions to an advisory role, reaffirming the company’s mission and ongoing partnerships.Fri, 17 Jan 2025 00:00:00 GMTJacek MigdalDecember 2024 newsletterhttps://quesma.com/blog/december-2024-newsletter/https://quesma.com/blog/december-2024-newsletter/Explore Quesma's 2024 highlights, including team insights, event recaps, and industry updates on databases and analytics.Tue, 17 Dec 2024 00:00:00 GMTQuesma TeamQuesma database gatewayhttps://quesma.com/blog/quesma-database-gateway/https://quesma.com/blog/quesma-database-gateway/Learn how Quesma is applying microservice principles to databases, creating a gateway that simplifies migrations, enhances security, and improves observability.Sun, 24 Nov 2024 00:00:00 GMTPawel BrzoskaNovember 2024 newsletterhttps://quesma.com/blog/november-2024-newsletter/https://quesma.com/blog/november-2024-newsletter/Discover the latest from Quesma, including new product features , insights on current data management trends, and information on upcoming community events.Fri, 15 Nov 2024 00:00:00 GMTQuesma TeamWe are announcing our first public release at KubeCon!https://quesma.com/blog/our-first-public-release/https://quesma.com/blog/our-first-public-release/Quesma unveils its first public release at KubeCon! Learn about its key features and how it will transform your data journey.Wed, 13 Nov 2024 00:00:00 GMTPawel BrzoskaThe most successful open-source fork, worth $6Bhttps://quesma.com/blog/elastic-vs-grafana/https://quesma.com/blog/elastic-vs-grafana/Explore the history of Grafana, a successful Kibana fork. Learn how two multi-billion dollar companies emerged from one project to compete in observability.Thu, 31 Oct 2024 00:00:00 GMTJacek MigdalOctober 2024 newsletterhttps://quesma.com/blog/october-2024-newsletter/https://quesma.com/blog/october-2024-newsletter/Our first newsletter! Learn about Quesma's mission to revolutionize database architecture, our new partnership with Hydrolix, and our latest SQL invention.Wed, 16 Oct 2024 00:00:00 GMTQuesma TeamHydrolix and Quesma technical partnership announcementhttps://quesma.com/blog/kibana-on-hydrolix/https://quesma.com/blog/kibana-on-hydrolix/Learn how Quesma and Hydrolix are partnering to cut log data costs. Use your existing Kibana dashboards with Hydrolix's cost-effective streaming data lake.Thu, 10 Oct 2024 00:00:00 GMTJacek MigdalSQL from a Programming Language Perspective — Part IIhttps://quesma.com/blog/sql-from-a-programming-language-perspective-part-ii/https://quesma.com/blog/sql-from-a-programming-language-perspective-part-ii/Explore SQL as a declarative language. This article breaks down SQL operations into programming primitives and discusses syntax challenges and execution order.Thu, 12 Sep 2024 00:00:00 GMTPrzemysław DelewskiSQL from a Programming Language Perspective — Part Ihttps://quesma.com/blog/sql-from-a-programming-language-perspective/https://quesma.com/blog/sql-from-a-programming-language-perspective/Explore SQL as a programming language with its syntax, semantics, and type system. Part 1 of a series on SQL fundamentals.Thu, 05 Sep 2024 00:00:00 GMTPrzemysław DelewskiUnderstanding Elasticsearch Pricinghttps://quesma.com/blog/elastic-pricing/https://quesma.com/blog/elastic-pricing/Navigate Elasticsearch's complex pricing with our guide. We cover subscription models, cloud vs. self-hosted options, and hidden costs to help you budget.Thu, 29 Aug 2024 00:00:00 GMTAntoni Olendzki, Przemysław HejmanGetting started with Quesma in 3 simple stepshttps://quesma.com/blog/getting-started-with-quesma-in-3-simple-steps/https://quesma.com/blog/getting-started-with-quesma-in-3-simple-steps/Launch Quesma in minutes to run Kibana dashboards on ClickHouse. This guide walks you through installation, configuration, and your first project in just three easy steps.Thu, 22 Aug 2024 00:00:00 GMTQuesma TeamThe pancake SQL pattern: combine your SQLs into one for 50x better performance.https://quesma.com/blog/pancake-sql-pattern/https://quesma.com/blog/pancake-sql-pattern/Discover how Quesma merges multiple SQL statements to dramatically speed up your database queries for complex dashboards.Mon, 19 Aug 2024 00:00:00 GMTJacek MigdalSchema migrations: pitfalls and riskshttps://quesma.com/blog/schema-migrations/https://quesma.com/blog/schema-migrations/Learn how to overcome challenges in schema migrations and discover strategies for seamless, risk-free database updates.Fri, 26 Jul 2024 00:00:00 GMTJacek MigdalThe importance of design partners for product developmenthttps://quesma.com/blog/the-importance-of-design-partners-for-product-development/https://quesma.com/blog/the-importance-of-design-partners-for-product-development/Explore how Quesma's design partner program accelerates building better tools through collaborative feedback, early access, and tailored support.Thu, 11 Jul 2024 00:00:00 GMTAntoni OlendzkiMaximizing Elasticsearch's performance and cost efficiencyhttps://quesma.com/blog/optimizing-elasticsearch-performance-cost-efficiency/https://quesma.com/blog/optimizing-elasticsearch-performance-cost-efficiency/Maximize your Elasticsearch performance and cost efficiency. Our guide covers indexing, querying, resource management, and cost optimization for your deployments.Wed, 03 Jul 2024 00:00:00 GMTAntoni OlendzkiWhat SQL could learn from Elasticsearch Query DSLhttps://quesma.com/blog/what-sql-could-learn-from-elasticsearch-query-dsl/https://quesma.com/blog/what-sql-could-learn-from-elasticsearch-query-dsl/Discover what SQL can learn from Elasticsearch Query DSL. This article explores the strengths of Query DSL, such as flexible search, sane defaults, and composability.Tue, 18 Jun 2024 00:00:00 GMTJacek MigdalHow Quesma Can Help Kibana Users Reduce Elasticsearch Costshttps://quesma.com/blog/optimize-kibana-elasticsearch-costs-quesma/https://quesma.com/blog/optimize-kibana-elasticsearch-costs-quesma/Reduce your Elasticsearch costs without losing Kibana. Learn how Quesma's database gateway lets you switch to a cost-effective SQL database like ClickHouse.Wed, 22 May 2024 00:00:00 GMTAntoni OlendzkiElasticsearch and OpenSearch schema rough edgeshttps://quesma.com/blog/elasticsearch-opensearch-schema-rough-edges/https://quesma.com/blog/elasticsearch-opensearch-schema-rough-edges/Mapping issues, indexing challenges, schema management, and search engine pitfalls in Elasticsearch and OpenSearchWed, 15 May 2024 00:00:00 GMTelasticsearchopensearchschemadatabaseJacek MigdalOpenSearch vs. Elasticsearch: A Fork in the Roadhttps://quesma.com/blog/opensearch-vs-elasticsearch-a-fork-in-the-road/https://quesma.com/blog/opensearch-vs-elasticsearch-a-fork-in-the-road/Explore the differences between OpenSearch and Elasticsearch. This article compares their features, licensing, and community support to help you choose the right one.Fri, 10 May 2024 00:00:00 GMTPawel BrzoskaKibana vs. Grafana: Choosing the Right Data Visualization Toolhttps://quesma.com/blog/kibana-vs-grafana-choosing-the-right-data-visualization-tool/https://quesma.com/blog/kibana-vs-grafana-choosing-the-right-data-visualization-tool/Choosing between Kibana and Grafana? Our guide compares their core functionality, data source support, and ideal use cases to help you pick the right tool.Wed, 27 Mar 2024 00:00:00 GMTPawel BrzoskaKibana on Clickhouse data with Quesma - dashboardinghttps://quesma.com/blog/kibana-on-clickhouse-data-with-quesma-dashboarding/https://quesma.com/blog/kibana-on-clickhouse-data-with-quesma-dashboarding/Unlock the power of SQL on your ClickHouse data directly within Kibana. See our demo on how to build charts and integrate them into your dashboards seamlessly.Mon, 26 Feb 2024 00:00:00 GMTPawel BrzoskaKibana on Clickhouse data with Quesma - first working prototypehttps://quesma.com/blog/kibana-on-clickhouse-data-with-quesma-first-working-prototype/https://quesma.com/blog/kibana-on-clickhouse-data-with-quesma-first-working-prototype/See our first working prototype of the Quesma translation gateway. Connect Kibana to ClickHouse and explore your logs with the tools you already love to use.Mon, 26 Feb 2024 00:00:00 GMTPawel BrzoskaFull-text search X times faster: Inverted index vs. SQL OLAPhttps://quesma.com/blog/full-text-search-x-times-faster-inverted-indexes-vs-sql-olap/https://quesma.com/blog/full-text-search-x-times-faster-inverted-indexes-vs-sql-olap/Discover why SQL OLAP databases can be 10x faster and more cost-effective for full-text search in observability than traditional inverted indexes.Tue, 30 Jan 2024 00:00:00 GMTJacek MigdalBest tool for the jobhttps://quesma.com/blog/best-tool-for-the-job/https://quesma.com/blog/best-tool-for-the-job/We look into how the goal of 'best tool for the job' thinking can often lead to over-engineering. Learn when a simple solution might be better than the most elegant one.Thu, 11 Jan 2024 00:00:00 GMTPawel BrzoskaDatabase monolithhttps://quesma.com/blog/database-monolith/https://quesma.com/blog/database-monolith/Discover strategies to move beyond a database monolith, improving scalability and flexibility for your applications with modern data architectures.Fri, 10 Nov 2023 00:00:00 GMTPawel BrzoskaChallenges of changehttps://quesma.com/blog/challenges-of-change/https://quesma.com/blog/challenges-of-change/Explore why database innovation remains challenging, from developer resistance to legacy system inertia, and how AI demands are reshaping data strategies.Tue, 31 Oct 2023 00:00:00 GMTPawel Brzoska