# ChonjukChapterParser Class Documentation ## Overview The `ChonjukChapterParser` class is responsible for extracting chapter annotations from Tibetan text data that contains specific chapter markers. ## Input Data format ``` chX-"Chapter Title" Chapter Text ``` - X represents the chapter number. - Chapter Title is the title of the chapter in double quotes. - Chapter Text is the body of the chapter. ## Class Methods ### `__init__(self)` - Initializes the `ChonjukChapterParser` instance. - Sets up the configuration needed for parsing chapter annotations. ### `get_updated_text(self, text: str) -> str` - Cleans the input text by removing chapter markers. - Returns the cleaned text. ### `get_annotations(self, text: str) -> List[Dict]` - Extracts chapter annotations from the input text. - Get the updated annotation span after removing chapter markers. - Returns a list of chapter annotations. ### `parse(self, input: str, output_path: Path = PECHAS_PATH, metadata: Union[Dict, Path] = None)` - Extract chapter annotations from the text. - Instantiate `Pecha` class and save the chapter annotations to the output path. ## Example Usage Here is an example of how to use the `ChonjukChapterParser` to parse text and extract chapter annotations. ```python from pathlib import Path # Initialize the parser parser = ChonjukChapterParser() # Example input text input_text = ''' རྒྱ་གར་སྐད་དུ། བོ་དྷི་སཏྭ་ཙརྱ་ཨ་བ་ཏཱ་ར། བོད་སྐད་དུ། བྱང་ཆུབ་སེམས་དཔའི་སྤྱོད་པ་ལ་འཇུག་པ། ch1-"བྱང་ཆུབ་སེམས་ཀྱི་ཕན་ཡོན་བཤད་པ།" བདེ་གཤེགས་ཆོས་ཀྱི་སྐུ་མངའ་སྲས་བཅས་དང་། ། ཕྱག་འོས་ཀུན་ལའང་གུས་པར་ཕྱག་འཚལ་ཏེ། ། ch2-"སྡིག་པ་བཤགས་པ།" དགེ་བ་བསྒོམ་ཕྱིར་བདག་གི་དད་པའི་ཤུགས། ། ''' # Parse the input text and save to an output path parser.parse(input_text, output_path=Path("/path/to/output")) ``` After running the above code, the chapter annotations will be extracted from the input text and saved to the specified output path.The `annotations` attribute of parser would look like this. ```python assert parser.annotations == [ { "chapter_number": "1", "chapter_title": "བྱང་ཆུབ་སེམས་ཀྱི་ཕན་ཡོན་བཤད་པ།", "Chapter": {"start": 145, "end": 446}, }, { "chapter_number": "2", "chapter_title": "སྡིག་པ་བཤགས་པ།", "Chapter": {"start": 449, "end": 896}, }, ] ``` The file structure on the output path would look like this: ``` - output_path(dir) - I00B6F749(dir) - base(dir) - da0c.txt - layers(dir) - da0c - Chapter-123.json ```