Basic Performance Metrics

Our framework BeGin provides the evaluator, which computes basic metrics (specifically, accuracy, AUROC, and HITS@K) based on the ground-truth and predicted answers for the queries in Q provided by the loader after each task is processed. The basic evaluator can easily be extended by users for additional basic metrics.

BaseEvaluator

class BaseEvaluator(num_tasks, task_ids)[source]

Base class for evaluating the performance. Users can create their own evaluator by extending this class.

Parameters:

num_tasks (int) – The number of tasks in the target scenario.
task_ids (torch.Tensor) – task ids of each instance.

simple_eval(prediction, answer)[source]

Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.

Parameters:

prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer

Accuracy

class AccuracyEvaluator(num_tasks, task_ids)[source]

The evaluator for computing accuracy.

Bases: BaseEvaluator

simple_eval(prediction, answer)[source]

Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.

Parameters:

prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer

ROCAUC

class ROCAUCEvaluator(num_tasks, task_ids)[source]

The evaluator for computing ROCAUC score.

Bases: BaseEvaluator

simple_eval(prediction, answer)[source]

Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.

Parameters:

prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer

HITS@K

class HitsEvaluator(num_tasks, k)[source]

The evaluator for computing Hits@K. This module inputs K, instead of task_ids as the second parameter.

Bases: BaseEvaluator

simple_eval(prediction, answer)[source]

Compute performance for the given batch when we ignore task configuration. During the training procedure, this function is called by the function get_simple_eval_result implemented in ScenarioLoaders.

Parameters:

prediction (torch.Tensor) – predicted output of the current model
answer (torch.Tensor) – ground-truth answer