[B! inference] mkusakaã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯

mkusaka id:mkusaka

inferenceã«é–¢ã™ã‚‹mkusakaã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯ (9)

${{author_name}}$

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

${{author_name}}$
{{author_name}}{{created}}
{{ #comment }}{{ comment }}{{ /comment }}
- {{ label }}

${{author_name}}$

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

https://x.com/i/status/2035794215721152936
mkusaka 2026/03/23
OpenRouterã®Claude Opus 4.6çµ±è¨ˆã§Vertex AIãŒãƒ¬ã‚¤ãƒ†ãƒ³ã‚·41%é«˜é€Ÿ(1.45s)ãƒ»ã‚¹ãƒ«ãƒ¼ãƒ—ãƒƒãƒˆ15%é«˜(39 tok/s)ã¨å ±å‘Šã€‚

AIè¦ç´„

ãƒ‹ãƒ¥ãƒ¼ã‚¹

Vertex AI

Claude

LLM

latency

inference

OpenRouter
ãƒªãƒ³ã‚¯
https://x.com/i/status/2032476817508974993
- 1 user
- x.com
- å¦ã³
mkusaka 2026/03/14
Cerebrasã®CS-3ãŒAWS Bedrockã§æä¾›ã•ã‚Œã€WSEã¯æœ€å¤§3,000 tokens/secã§æŽ¨è«–ã—ã€Trainiumã¨ã®åˆ†æ•£æ§‹æˆã§é«˜é€Ÿåº¦ãƒˆãƒ¼ã‚¯ãƒ³å®¹é‡ãŒ5xå‘ä¸Šã—ã¾ã™ã€‚

AIè¦ç´„

Cerebras

AWS

Bedrock

inference

news
ãƒªãƒ³ã‚¯
How to Run Local LLMs with Claude Code | Unsloth Documentation
How to Run Local LLMs with Claude CodeGuide to use open models with Claude Code on your local device. This step-by-step guide shows you how to connect open LLMs and APIs to Claude Code entirely locally, complete with screenshots. Run using any open model like Qwen3.5, DeepSeek and Gemma. For this tutorial, weâ€™ll use Qwen3.5 and GLM-4.7-Flash. Both are the strongest 35B MoE agentic & coding model a
mkusaka 2026/03/13
Claude Codeã¨llama.cppã§Qwen3.5ç‰ã‚’ãƒãƒ¼ã‚«ãƒ«å®Ÿè¡Œã™ã‚‹æ‰‹é †ã¨ã€KVã‚ãƒ£ãƒƒã‚·ãƒ¥ç„¡åŠ¹åŒ–ã§æŽ¨è«–ãŒ90%é…ããªã‚‹å•é¡Œã®å¯¾å‡¦ï¼ˆCLAUDE_CODE_ATTRIBUTION_HEADER

AIè¦ç´„

ClaudeCode

LLM

inference

tutorial
ãƒªãƒ³ã‚¯
Batch inference jobs | Snowflake Documentation
mkusaka 2026/03/07
Snowflakeã®Batch Inferenceã¯run_batchã§SPCSã‚’ä½¿ã„å¤§è¦æ¨¡éžåŒæœŸæŽ¨è«–ã‚’å®Ÿè¡Œã—ã€OutputSpec(stage_location)ã¸Parquetå‡ºåŠ›ã—ã¾ã™ã€‚

AIè¦ç´„

Snowflake

inference

batch

docs
ãƒªãƒ³ã‚¯
https://openai.com/index/cerebras-partnership/
mkusaka 2026/01/15
OpenAIãŒCerebrasã¨ææºã—750MWã®è¶…ä½Žé…å»¶AIã‚³ãƒ³ãƒ”ãƒ¥ãƒ¼ãƒˆã‚’å˜ä¸€ã®å·¨å¤§ãƒãƒƒãƒ—ã§å°Žå…¥ã€2028å¹´ã¾ã§ã«æ®µéšŽçš„ã«inferenceã®ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ å¿œç”ã‚’é«˜é€ŸåŒ–ã—ã¾ã™ã€‚

AIè¦ç´„

OpenAI

Cerebras

AI

inference

ãƒ‹ãƒ¥ãƒ¼ã‚¹
ãƒªãƒ³ã‚¯
Cerebras
mkusaka 2025/08/16
Cerebrasã¯é«˜é€Ÿã§æ‰‹é–“ã®ã‹ã‹ã‚‰ãªã„AIãƒˆãƒ¬ãƒ¼ãƒ‹ãƒ³ã‚°ãƒ—ãƒ©ãƒƒãƒˆãƒ•ã‚©ãƒ¼ãƒ ã‚’æä¾›ã—ã€ä¼æ¥ã®ãƒ¢ãƒ‡ãƒ«é–‹ç™ºã‚’åŠ é€Ÿã—ã¾ã™ã€‚æ‰‹è»½ã«AIãƒ¢ãƒ‡ãƒ«ã‚’æ§‹ç¯‰ã§ãã‚‹ç’°å¢ƒã§ã™ã€‚

AIè¦ç´„

Cerebras

AIchip

GPT-OSS

120B

inference
ãƒªãƒ³ã‚¯
https://groq.com/day-zero-support-for-openai-open-model
mkusaka 2025/08/06
GroqCloudãŒgpt-oss-120Bãƒ»20Bã‚’æ—¥ä»˜ã‚¼ãƒã§æä¾›ã€128Kãƒˆãƒ¼ã‚¯ãƒ³ã€500ãƒˆãƒ¼ã‚¯ãƒ³/ç§’ä»¥ä¸Šã€$0.15/ç™¾ä¸‡ãƒˆãƒ¼ã‚¯ãƒ³ã€‚

AIè¦ç´„

Groq

lowcost

LLM

inference

å‘ŠçŸ¥
ãƒªãƒ³ã‚¯
LLM Inference Handbook
LLM Inference Handbook is your technical glossary, guidebook, and reference - all in one. It covers everything you need to know about LLM inference, from core concepts and performance metrics (e.g., Time to First Token and Tokens per Second), to optimization techniques (e.g., continuous batching and prefix caching) and deployment patterns like BYOC and on-prem. Practical guidance for deploying, sc
mkusaka 2025/07/12
ã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢å‘ã‘ã«LLMæŽ¨è«–ã®åŸºç¤Žã‹ã‚‰ continuous batchingã€prefix caching ã¾ã§å®Ÿè·µçš„ã«è§£èª¬ã—ãŸãƒãƒ³ãƒ‰ãƒ–ãƒƒã‚¯ï¼ˆæœ€é©åŒ–æ‰‹æ³•å«ã‚€ï¼‰

AIè¦ç´„

LLM

inference

ã‚¬ã‚¤ãƒ‰

handbook

deployment
ãƒªãƒ³ã‚¯
Kimi.ai - å¸®ä½ çœ‹æ›´å¤§çš„ä¸–ç•Œ
Kimi æ˜¯ä¸€ä¸ªæœ‰ç€è¶…å¤§â€œå†…å˜â€çš„æ™ºèƒ½åŠ©æ‰‹ï¼Œå¯ä»¥ä¸€å£æ°”è¯»å®ŒäºŒåä¸‡å—çš„å°è¯´ï¼Œè¿˜ä¼šä¸Šç½‘å†²æµªï¼Œå¿«æ¥è·Ÿä»–èŠèŠå§ | Kimi.ai - Moonshot AI å‡ºå“çš„æ™ºèƒ½åŠ©æ‰‹
mkusaka 2025/01/28
Kimi AIã¯K2 Thinkingæè¼‰ã®ã‚ªãƒ¼ãƒ«ã‚¤ãƒ³ãƒ¯ãƒ³AIã‚¢ã‚·ã‚¹ã‚¿ãƒ³ãƒˆã§ã€æ•°å¼ã‚„è«–ç†ã‚’ã‚¹ãƒ†ãƒƒãƒ—ã”ã¨ã«è§£ãã€æ£ç¢ºã«æ¤œç´¢ã€æ§‹é€ åŒ–ã•ã‚ŒãŸæ–‡ç« ã‚„ã‚³ãƒ¼ãƒ‰ã‚’ç”Ÿæˆã—ã¾ã™

AIè¦ç´„

AI

ãƒ„ãƒ¼ãƒ«

ã‚¹ãƒ©ã‚¤ãƒ‰

KimiAI

Moonshot

inference

search
ãƒªãƒ³ã‚¯
1