# Usage Guide ### I. Create Pecha To create a new Pecha (an annotated text corpus), you can use the `Pecha.create` method directly, or use a parser (e.g., for DOCX files): ```python from pathlib import Path from openpecha.pecha import Pecha # Create an empty Pecha in a given output directory output_path = Path("./output") pecha = Pecha.create(output_path) ``` Or, to create a Pecha after parsing: ```python from openpecha.pecha.parsers.docx.root import DocxRootParser from openpecha.pecha.layer import AnnotationType parser = DocxRootParser() pecha, annotation_path = parser.parse( input="/path/to/file.docx", annotation_type=AnnotationType.SEGMENTATION, metadata={"title": {"en": "Sample Title"}, "language": "bo"}, output_path=Path("/output_path/") ) ``` ### II. Load Pecha You can load an existing Pecha either from a local path after downloading from the openpecha backend: ```python from openpecha.pecha import Pecha from pathlib import Path # Load from local path pecha = Pecha.from_path(Path("/path/to/pecha")) ``` ### III. Pecha Attributes A `Pecha` object exposes several useful attributes: - `pecha.id`: The Pecha's unique ID, generated from 8 digits UUID - `pecha.pecha_path`: Filesystem path to the Pecha - `pecha.metadata`: Metadata object (see below) - `pecha.bases`: Dictionary of base file names to text - `pecha.layers`: Dictionary of annotation layers ### IV. Metadata Each Pecha has a `metadata` attribute, which is a `PechaMetaData` object. Example fields include: - `id`: Pecha ID - `title`: Title (can be a dict with language keys) - `author`: Author(s) - `language`: Language code (e.g., 'bo', 'en') - `parser`: Name of the parser used - `initial_creation_type`: How the Pecha was created (e.g., 'google_docx', 'ocr') - `source_metadata`: Additional source info - `copyright`, `licence`, etc. You can update metadata by passing a dictionary: ```python pecha.set_metadata({ "title": {"en": "New Title"}, "author": "Author Name", # ... other fields ... }) ``` ### V. Base File The base file is the plain text of the work. You can access and set base files: ```python # Get base text by name base_text = pecha.get_base("base1") # Set a new base text pecha.set_base("This is the text.", base_name="base1") ``` ### VI. Annotations Annotations are stored in layers, each corresponding to a type (segmentation, alignment, etc.). - To access all layers for a base: ```python for layer_name, layer_store in pecha.get_layers("base1"): print(layer_name, layer_store) ``` - To add a new annotation layer: ```python from openpecha.pecha.layer import AnnotationType layer, layer_path = pecha.add_layer("base1", AnnotationType.SEGMENTATION) ``` - To add an annotation to a layer: ```python from openpecha.pecha.annotations import Span, SegmentationAnnotation ann = SegmentationAnnotation(span=Span(start=0, end=10), index=1) pecha.add_annotation(layer, ann, AnnotationType.SEGMENTATION) layer.save() ``` - To get annotation data: ```python from openpecha.pecha import get_anns anns = get_anns(layer) for ann in anns: print(ann) ``` ### VII. Alignment Transfer Alignment transfer allows you to map and serialize aligned segments between a root text and a commentary or translation Pecha. This is useful for exporting how commentary or translation segments correspond to the root text. #### Commentary Alignment Transfer To transfer alignment from a root Pecha to a commentary Pecha: ```python from openpecha.pecha import Pecha from openpecha.alignment.commentary_transfer import CommentaryAlignmentTransfer # Load the root and commentary Pechas root_pecha = Pecha.from_path("/path/to/root_pecha") commentary_pecha = Pecha.from_path("/path/to/commentary_pecha") # Specify the alignment layer IDs (relative to the layer directory) root_alignment_id = "B5FE/alignment-6707.json" commentary_alignment_id = "B014/alignment-2127.json" # Get the transferred commentary segments as a list of strings transfer = CommentaryAlignmentTransfer() aligned_commentary = transfer.get_serialized_commentary( root_pecha, root_alignment_id, commentary_pecha, commentary_alignment_id, ) for segment in aligned_commentary: print(segment) ``` If your commentary Pecha also has a segmentation layer, you can use: ```python commentary_segmentation_id = "B014/segmentation-33FC.json" aligned_commentary = transfer.get_serialized_commentary_segmentation( root_pecha, root_alignment_id, commentary_pecha, commentary_alignment_id, commentary_segmentation_id, ) ``` #### Translation Alignment Transfer For translation alignment transfer, use the `TranslationAlignmentTransfer` class: ```python from openpecha.pecha import Pecha from openpecha.alignment.translation_transfer import TranslationAlignmentTransfer root_pecha = Pecha.from_path("/path/to/root_pecha") translation_pecha = Pecha.from_path("/path/to/translation_pecha") root_alignment_id = "B5FE/alignment-6707.json" translation_alignment_id = "B014/alignment-2127.json" transfer = TranslationAlignmentTransfer() aligned_translation = transfer.get_serialized_translation_alignment( root_pecha, root_alignment_id, translation_pecha, translation_alignment_id, ) for segment in aligned_translation: print(segment) ``` If your translation Pecha also has a segmentation layer, use: ```python translation_segmentation_id = "B014/segmentation-33FC.json" aligned_translation = transfer.get_serialized_translation_segmentation( root_pecha, root_alignment_id, translation_pecha, translation_alignment_id, translation_segmentation_id, ) ``` #### Notes - The alignment and segmentation layer IDs are typically found in the `layers` directory of each Pecha. - The output is a list of strings, each representing a segment in the commentary or translation, aligned to the root text.