Qwen3-235B-A22B-Instruct-2507

NEW

Key Features

Significant improvements in general capabilities
Substantial gains in long-tail knowledge coverage across multiple languages
Markedly better alignment with user preferences in subjective and open-ended tasks
Enhanced capabilities in 256K long-context understanding
Causal language model type
Pretraining and post-training stages
235 billion parameters in total, with 22 billion activated
Supports a context length of 262,144 natively

The Qwen3-235B-A22B-Instruct-2507 model has several key features, including a causal language model type, pretraining and post-training stages, and 235 billion parameters in total, with 22 billion activated. It also has 94 layers, 64 attention heads for Q and 4 for KV, and 128 experts, with 8 activated. The model supports a context length of 262,144 natively and does not generate think blocks in its output. Additionally, specifying enable_thinking=False is no longer required.


To achieve optimal performance with Qwen3-235B-A22B-Instruct-2507, it is recommended to use specific settings, such as temperature=0.7, top_p=0.8, top_k=20, and min_p=0. The model also supports adjusting the presence_penalty parameter between 0 and 2 to reduce endless repetitions. Furthermore, an output length of 16,384 tokens is recommended for most queries, and using prompts to standardize model outputs is suggested when benchmarking. The model can be used for various tasks, including math problems and multiple-choice questions, with specific prompt structures.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!