Basic Performance Metrics
Our framework BeGin provides the evaluator, which computes basic metrics (specifically, accuracy, AUROC, and HITS@K) based on the ground-truth and predicted answers for the queries in Q provided by the loader after each task is processed.
The basic evaluator can easily be extended by users for additional basic metrics.
BaseEvaluator
- class BaseEvaluator(num_tasks, task_ids)[source]
Base class for evaluating the performance. Users can create their own evaluator by extending this class.
- Parameters:
num_tasks (int) – The number of tasks in the target scenario.
task_ids (torch.Tensor) – task ids of each instance.
- simple_eval(prediction, answer)[source]
Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.
- Parameters:
prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer
Accuracy
- class AccuracyEvaluator(num_tasks, task_ids)[source]
The evaluator for computing accuracy.
Bases:
BaseEvaluator- simple_eval(prediction, answer)[source]
Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.
- Parameters:
prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer
ROCAUC
- class ROCAUCEvaluator(num_tasks, task_ids)[source]
The evaluator for computing ROCAUC score.
Bases:
BaseEvaluator- simple_eval(prediction, answer)[source]
Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.
- Parameters:
prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer
HITS@K
- class HitsEvaluator(num_tasks, k)[source]
The evaluator for computing Hits@K. This module inputs K, instead of task_ids as the second parameter.
Bases:
BaseEvaluator- simple_eval(prediction, answer)[source]
Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.
- Parameters:
prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer