Evaluation
You can run various evaluation by simply adding a few arguments
Common Arguments
These arguments are common across all benchmarks. Note that you must select at least one benchmark from the list provided in this section.
args
default
description
ckpt_path
upstage/SOLAR-10.7B-Instruct-v1.0
Model name or ckpt_path to be evaluated
output_path
./results
Path to save evaluation results
model_name
SOLAR-10.7B-Instruct-v1.0
Model name used in saving eval results
use_fast_tokenizer
False
Flag to use fast tokenizer
use_flash_attention_2
False
Flag to use flash attention 2 (highly suggested)
Example
By running the code below,
h6_en
,mt_bench
,ifeval
,eq_bench
benchmarks will execute.
Arguments for Each Benchmark
Last updated