Quickstart
Quickstart for the evaluations and generating a report with Library or CLI
Evaluation
Evaluation with Library
The following code is a simple example to evaluate the SOLAR-10.7B-Instruct-v1.0 model from upstage on h6_en
(Open LLM Leaderboard).
import evalverse as ev
evaluator = ev.Evaluator()
model = "upstage/SOLAR-10.7B-Instruct-v1.0"
benchmark = "h6_en"
evaluator.run(model=model, benchmark=benchmark)
Evaluation with CLI
Here is a script that produces the same result as the above code:
cd evalverse
python3 evaluator.py \
--h6_en \
--ckpt_path upstage/SOLAR-10.7B-Instruct-v1.0
Report
Currently, generating a report is only available through the library. We will work on a Command Line Interface (CLI) version as soon as possible.
import evalverse as ev
db_path = "./db"
output_path = "./results"
evaluator = ev.Reporter(db_path=db_path, output_path=output_path)
reporter.update_db(save=True)
model_list = ["SOLAR-10.7B-Instruct-v1.0"]
benchmark_list = ["h6_en", "mt_bench", "ifeval", "eq_bench"]
reporter.run(model_list=model_list, benchmark_list=benchmark_list)
Last updated