logitorch.datasets.mlm.wiki20k_dataset ====================================== .. py:module:: logitorch.datasets.mlm.wiki20k_dataset Attributes ---------- .. autoapisummary:: logitorch.datasets.mlm.wiki20k_dataset.WIKI20K_DATASET logitorch.datasets.mlm.wiki20k_dataset.WIKI20K_DATASET_FOLDER logitorch.datasets.mlm.wiki20k_dataset.WIKI20K_DATASET_ZIP_URL logitorch.datasets.mlm.wiki20k_dataset.WIKI20K_SUB_DATASETS Classes ------- .. autoapisummary:: logitorch.datasets.mlm.wiki20k_dataset.Wiki20KDataset Module Contents --------------- .. py:class:: Wiki20KDataset(dataset_name: str, size: int = None) A class representing the Wiki20K dataset for RuleTaker. Attributes: dataset_name (str): The name of the dataset. dataset_path (str): The path to the dataset file. sentences (List[str]): The list of sentences in the dataset. labels (List[str]): The list of labels in the dataset. Methods: __init__(self, dataset_name: str, size: int = None) -> None: Initializes a Wiki20KDataset object. __read_dataset(self, sentences_key: str, labels_key: str, size: int = None) -> Tuple[List[str], List[str], List[int]]: Reads the dataset file and returns the sentences and labels. __getitem__(self, index: int) -> Tuple[str, str, int]: Returns the sentence, label, and index at the given index. __str__(self) -> str: Returns a string representation of the dataset. __len__(self) -> int: Returns the number of instances in the dataset. Initializes a Wiki20KDataset object. Args: dataset_name (str): The name of the dataset. size (int, optional): The number of instances to load from the dataset. Defaults to None. .. py:data:: WIKI20K_DATASET :value: 'wiki20k_dataset' .. py:data:: WIKI20K_DATASET_FOLDER :value: '/logitorch_datasets/wiki20k_dataset' .. py:data:: WIKI20K_DATASET_ZIP_URL :value: 'https://www.dropbox.com/s/yeh70n6etbg0a95/wiki20k_dataset.zip?dl=1' .. py:data:: WIKI20K_SUB_DATASETS :value: ['lm_wiki20k', 'positive_lm_wiki20k', 'negated_lm_wiki20k']