Zhonghao Xinying Unveils Next-Generation TPU Chip "Xuyu": Single Chip Achieves 896 TFLOPS Computing Power
2026-07-02
Zhonghao Xinying has officially launched the new generation of self-developed high-performance TPU AI dedicated computing chip "Suyu", and simultaneously released the software and hardware integrated intelligent computing base built with this chip - Taize 2.0 AI high-performance intelligent computing platform.
This upgrade is a comprehensive iteration after the first generation "Instant" chip and the first generation Taize server, achieving a leap forward in underlying architecture, peak computing power, on-chip storage, cluster interconnection, and computing energy efficiency.
As one of the earliest companies in China to focus on the research and development of TPU architecture AI chips, Zhonghao Xinying successfully taped out and mass-produced the first high-performance TPU chip "Instant" in China in 2023, accumulating three years of experience in large-scale implementation.
Based on this, the new generation of "Xuyu" chips has undergone architectural innovation to address pain points such as memory access latency, high energy consumption, and insufficient parallel efficiency in scenarios involving ultra large models, long contexts, and massive word element interactions.
The key performance indicators are as follows:
The single-chip mixed precision floating-point computing power reaches 896 TFLOPS, which is three times that of the previous generation "Instant"; 8-bit inference computing power reaches 1792 TOPS, suitable for high concurrency inference requirements.
The video memory capacity and inter chip interconnect speed have been significantly improved, supporting ultra long contexts and effectively reducing data transfer overhead in multiple rounds of conversations.
The rated power consumption of a single chip is only 600W, which reduces the power consumption by 50% compared to traditional chips with the same computing power level, and is more conducive to the construction of green and low-carbon intelligent computing centers.
By optimizing multidimensional tensor computing units and data reuse, the bottleneck of storage walls can be effectively alleviated, and the comprehensive computing efficiency can reach several times that of traditional GPUs under the same AI tasks. It has significant advantages in large model training and batch word element generation scenarios.
Suyu "continues the fully self-developed TPU technology route, covering IP cores, instruction sets, operator libraries, and overall system software, without relying on overseas core technologies. The company has the full chain capability of chip design, circuit development, compilation tools, and model adaptation, which can quickly complete the adaptation and deployment of new models and meet the information security compliance requirements of industries such as government affairs, finance, and power grids.
Platform "Taize 2.0": Single machine 7.168P computing power, supports kilocard clusters
The Taize 2.0 platform, which was launched synchronously, has a minimum computing unit (single node) composed of two high-performance CPUs and eight "Suyu" TPUs, with a mixed precision computing power of 7.168P. Under the same task, the energy consumption of the whole machine is only 80% of that of traditional GPU servers.
At the cluster level, Taize 2.0 uses its self-developed low latency and high parallel inter chip communication protocol. A single supernode can support up to 2048 "Xuyu" chips directly connected, capable of carrying heavy load tasks such as distributed training of trillion parameter large models, multi-agent collaborative computing, and concurrent inference of massive lexical elements. The platform also provides a complete visual operation and maintenance management system, integrating BMC hardware monitoring, fault warning, computing power billing, user permissions, and model market functions, achieving out of the box use.
In terms of software ecology, Taize 2.0 is compatible with mainstream AI frameworks such as PyTorch, vLLM, and SGLang, and is compatible with distributed training suites such as DeepSpeed and Megatron LM; We have completed deep adaptation of dozens of large models such as Qwen, DeepSeek, GLM, MiniMAX, etc. Developers can quickly complete model migration without large-scale code modifications, greatly reducing the threshold for domestic computing power substitution.
Adhering closely to the needs of the keyword economy and AI intelligent agents, empowering the landing of multiple industries
In 2026, the AI industry will enter the stage of implementing the term economy, shifting from traditional computing power hourly rent to the MaaS service model priced by the term economy. The "Xuyu" and Taize 2.0 are specifically optimized for word element generation, context caching, and batch inference at the hardware level, effectively reducing the cost of word element inference and helping AI service providers build independent and controllable pay as you go systems.
The product is deeply adapted to the open-source AI intelligent agent framework OpenClaw, supporting local private deployment to ensure that interaction keywords and business data are not out of domain, balancing digital employee automation execution with enterprise privacy and security. It can be applied to scenarios such as report automation, IT operations, data analysis, and personal intelligent assistance.
At the commercialization level, the construction cost of Taize 2.0's computing power is only 60% of that of high-end overseas products. Its low-power characteristics help reduce electricity bills and carbon emissions, which is in line with the policy orientation of low-carbon computing power parks in various regions.
Scale up application acceleration, continuous iteration drives the future
At present, the first generation "Instant" chips have been delivered on a large scale in multiple industries, and the products have been successfully deployed in super large scale intelligent computing centers built by operators, government agencies, and technology enterprises such as Shenzhen Unicom, Tianjin Mobile, Taiji Corporation, and Jiangxi Shangrao. They are widely used in university research platforms and teaching environments, covering fields such as finance, media, education, and healthcare.
In the future, Zhonghao Xinying will rely on its existing customer base to continuously optimize the computing power, energy efficiency, and storage architecture of TPU chips, and adapt to larger scale models and multi-agent clusters; At the same time, we will collaborate with mainstream model manufacturers, cloud service providers, and system integrators to deepen software and hardware collaboration and further expand the landing map of the independent and controllable computing power industry chain.