Tutorial for Evaluating Intern-S1¶
OpenCompass now provides the necessary configs for evaluating Intern-S1. Please perform the following steps to initiate the evaluation of Intern-S1.
Model Download and Deployment¶
The Intern-S1 now has been open-sourced, which can be downloaded from Huggingface. After completing the model download, it is recommended to deploy it as an API service for calling. You can deploy it based on LMdeploy/vlLM/sglang according to this page.
Evaluation Configs¶
Model Configs¶
We provide a config example in opencompass/configs/models/interns1/intern_s1.py
.
Please make the changes according to your needs.
models = [
dict(
abbr="intern-s1",
key="YOUR_API_KEY", # Fill in your API KEY here
openai_api_base="YOUR_API_BASE", # Fill in your API BASE here
type=OpenAISDK,
path="internlm/Intern-S1",
temperature=0.7,
meta_template=api_meta_template,
query_per_second=1,
batch_size=8,
max_out_len=64000,
max_seq_len=65536,
openai_extra_kwargs={
'top_p': 0.95,
},
retry=10,
extra_body={
"chat_template_kwargs": {"enable_thinking": True} # Control the thinking mode when deploying the model based on vllm or sglang
},
pred_postprocessor=dict(type=extract_non_reasoning_content), # Extract non-reasoning contents when opening the thinking mode
),
]
Dataset Configs¶
We provide a config for datasets used for evaluating Intern-S1 in examples/eval_bench_intern_s1.py
.
You can also add other datasets as needed.
In addition, you need to add the configuration of the LLM Judger in this config file, as shown in the following example:
judge_cfg = dict(
abbr='YOUR_JUDGE_MODEL',
type=OpenAISDK,
path='YOUR_JUDGE_MODEL_PATH',
key='YOUR_API_KEY',
openai_api_base='YOUR_API_BASE',
meta_template=dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
]),
query_per_second=1,
batch_size=1,
temperature=0.001,
max_out_len=8192,
max_seq_len=32768,
mode='mid',
)
Start Evaluation¶
After completing the above configuration, enter the following command to start the evaluation:
opencompass examples/eval_bench_intern_s1.py