Quickstart

Quickstart for the evaluations and generating a report with Library or CLI

Evaluation

Evaluation with Library

The following code is a simple example to evaluate the SOLAR-10.7B-Instruct-v1.0 model from upstage on h6_en (Open LLM Leaderboard).

import evalverse as ev

evaluator = ev.Evaluator()

model = "upstage/SOLAR-10.7B-Instruct-v1.0"
benchmark = "h6_en"

evaluator.run(model=model, benchmark=benchmark)

Evaluation with CLI

Here is a script that produces the same result as the above code:

cd evalverse

python3 evaluator.py \
--h6_en \
--ckpt_path upstage/SOLAR-10.7B-Instruct-v1.0



Report

Currently, generating a report is only available through the library. We will work on a Command Line Interface (CLI) version as soon as possible.

import evalverse as ev

db_path = "./db"
output_path = "./results"
evaluator = ev.Reporter(db_path=db_path, output_path=output_path)

reporter.update_db(save=True)

model_list = ["SOLAR-10.7B-Instruct-v1.0"]
benchmark_list = ["h6_en", "mt_bench", "ifeval", "eq_bench"]
reporter.run(model_list=model_list, benchmark_list=benchmark_list)

Last updated