OpenAI Jalapeño chip was announced June 24 as a custom AI processor built for large language model inference in data centers, the companies said.
OpenAI Jalapeño chip, designed for LLM inference
OpenAI said the Jalapeño chip is an intelligence processor purpose built to reduce data movement between compute engines, memory, and network fabric, addressing what the company called the main bottlenecks for LLM inference.
Partners and roles
OpenAI led the architecture and system requirements, Broadcom is responsible for silicon implementation and Tomahawk network technology, and Celestica will provide circuit boards, racks, and systems integration, the joint announcement said.
Architecture focus, not general compute
The design concentrates on inference for large language models, not on general purpose compute. OpenAI said the chip reduces the movement of data during inference, a strategy the company says improves per watt efficiency for LLM workloads.
Broadcom provided details on the network elements, noting the use of its Tomahawk technology to stitch chips into data center fabrics at scale, the company said.
Lab samples, limited public metrics so far
Engineering samples have completed lab runs at target frequency and power, and those tests included workloads based on GPT 5.3 Codex Spark, OpenAI said. The company reported that early per watt performance in internal tests was markedly higher than existing solutions, but it did not publish full technical reports.
OpenAI and Broadcom acknowledged there are no public, directly comparable benchmarks against NVIDIA Blackwell or Google TPU under identical conditions, and they said independent validation will be important to confirm any claimed advantages.
Rapid design cycle, missing public details
OpenAI said the design to production finalization took only nine months, aided in part by internal AI models that accelerated certain design tasks. The company did not disclose process node, HBM memory configuration, die size, actual inference latency figures, or per token cost.
Those omissions have prompted technical community questions about reproducibility and real world operating costs, industry analysts said.
Deployment timeline and what users might see
OpenAI targets first deployments by late 2026, with the platform evolving through multiple generations afterward. If the efficiency gains hold up in production, users could see faster ChatGPT response times, shorter wait for multi step Codex tasks, and potential improvements in API capacity during busy periods, the company said.
Whether OpenAI can use the Jalapeño chip to reduce reliance on NVIDIA will depend on repeatable benchmark results and actual service performance after the systems go live, analysts at technology research firms said.
OpenAI, Broadcom, and Celestica did not respond to requests for additional technical data beyond the joint announcement. Independent testing and published benchmarks remain the decisive evidence customers and cloud operators will expect.



