=================================== Load Custom Scenarios =================================== BeGin supports loading custom scenarios with user-defined dataset for users who want to create custom benchmark scenarios. In this material, we briefly describe how to load custom benchmark scenarios with examples. ------------------------------------------------- Create custom dataset and its loader ------------------------------------------------- Since v0.3, ScenarioLoader inputs additional argument `dataset_load_func` for loading custom scenarios. Below, we provide an example for loading custom dataset with NCScenarioLoader. .. code-block:: python >>> from begin.scenarios.nodes import NCScenarioLoader >>> scenario = NCScenarioLoader(dataset_name='custom_dataset_name', dataset_load_func=name_of_custom_function, ...) Currently, BeGin requires different outputs of the loader function, depending on the target problem: - Node Classification (NC): The loader function should output a dictionary with keys `graph`, `num_feats`, and `num_classes`. + `graph` (`dgl.DGLGraph`) : It should contain node features in `graph.ndata['feat']` and ground-truth labels in `graph.ndata['label']. For Time-IL, time information for constructing tasks in `graph.ndata['time']` is additionally needed. For Domain-IL, domain information for constructing tasks in `graph.ndata['domain']`. The nodes with values in `graph.ndata['time']` or `graph.ndata['domain']` greater than or equal to `num_tasks` will be ignored during the training and evaluation process. + `num_feats` (`int`) : Number of node features. `graph.ndata['feat']` should be matched with this value. + `num_tasks` (`int`) : Number of tasks for constructing benchmark scenario. - Link Classification (LC): The loader function should output a dictionary with keys `graph`, `num_feats`, and `num_classes`. + `graph` (`dgl.DGLGraph`) : It should contain node features in `graph.ndata['feat']` and ground-truth labels in `graph.edata['label']. For Time-IL, time information for constructing tasks in `graph.edata['time']` is additionally needed. For Domain-IL, domain information for constructing tasks in `graph.edata['domain']`. The nodes with values in `graph.edata['time']` or `graph.edata['domain']` greater than or equal to `num_tasks` will be ignored during the training and evaluation process. + `num_feats` (`int`) : Number of node features. `graph.ndata['feat']` should be matched with this value. + `num_tasks` (`int`) : Number of tasks for constructing benchmark scenario. - Link Prediction (LP): The loader function should output a dictionary with keys `graph`, `num_feats`, `tvt_splits`, and `neg_edges`. + `graph` (`dgl.DGLGraph`) : It should contain node features in `graph.ndata['feat']`. For Time-IL, time information for constructing tasks in `graph.edata['time']` is additionally needed. For Domain-IL, domain information for constructing tasks in `graph.edata['domain']`. The nodes with values in `graph.edata['time']` or `graph.edata['domain']` greater than or equal to `num_tasks` will be ignored during the training and evaluation process. Currently, BeGin only supports undirected graphs for this problem. Therefore, edges in `graph` should satisfy ``graph.edges()[0][0::2] == graph.edges()[1][1::2]`` and ``graph.edges()[0][1::2] == graph.edges()[1][0::2]``. + `num_feats` (`int`) : Number of node features. `graph.ndata['feat']` should be matched with this value. + `tvt_splits` (`torch.LongTensor`) : The information for train/val/test splits. In this tensor, the value `0`, `1`, and `2` indicates the corresponding edge should be used for train, validation, and test, respectively. Its shape should be `(graph.num_edges,)`. + `neg_edges` (`dict`) : The dictionary contains negative edges. It should contain keys `val` and `test`. the corresponding value of the key `val` is used as negative edges for validation, and that of `test` is used as negative edges for test. The types of the values and shapes of the values should be `torch.LongTensor`, and `(*, 2)`, respectively. - Graph Classification (GC): The loader function should output a dictionary with keys `graphs`, `num_feats`, and `num_classes`, `domain_info` (optional, for Domain-IL) and `time_info` (optional, for Time-IL). + `graphs` (`Iterable[dgl.DGLGraph, int]`) : It should be the iterable object, which outputs graph object with type `dgl.DGLGraph` and its corresponding label for each iteration. Each graph should contain node features in `graph.ndata['feat']`. + `num_feats` (`int`) : Number of node features. `graph.ndata['feat']` should be matched with this value. + `num_tasks` (`int`) : Number of tasks for constructing benchmark scenario. + `domain_info` (`torch.LongTensor`) : For domain-IL, it should contain domain information for constructing the tasks. Its shape should be `(len(graphs),)`. The graphs with values in `domain_info` greater than or equal to `num_tasks` will be ignored during the training and evaluation process. + `time_info` (`torch.LongTensor`) : For time-IL, it should contain time information for constructing the tasks. Its shape should be `(len(graphs),)`. The graphs with values in `time_info` greater than or equal to `num_tasks` will be ignored during the training and evaluation process. ------------------------------------------------- Example with Cora dataset ------------------------------------------------- For example, consider we need to load Cora dataset. One of the easiest way to write the function is to extract graphs and from `dgl.data.CoraGraphDataset`. .. code-block:: python def dataset_load_func(save_path): dataset = dgl.data.CoraGraphDataset(raw_dir=save_path, verbose=False) Then, we can extract `graph`, `num_classes`, and `num_tasks` in the `dataset` object. .. code-block:: python def dataset_load_func(save_path): dataset = dgl.data.CoraGraphDataset(raw_dir=save_path, verbose=False) graph = dataset._g num_feats, num_classes = graph.ndata['feat'].shape[-1], dataset.num_classes Now, All you need is just adding one line to return the dictionary contains `graph`, `num_feats`, and `num_classes`! .. code-block:: python def dataset_load_func(save_path): dataset = dgl.data.CoraGraphDataset(raw_dir=save_path, verbose=False) graph = dataset._g num_feats, num_classes = graph.ndata['feat'].shape[-1], dataset.num_classes return {'graph': graph, 'num_classes': num_classes, 'num_feats': num_feats}