API REFERENCES

Pecha

DocxRootParser

DocxSimpleCommentaryParser

DocxAnnotationParser

DocxAnnotationUpdate

TranslationAlignmentTransfer

CommentaryAlignmentTransfer

Pecha.from_path() -> Pecha

Loads a Pecha instance from a local path.

  • Parameters:

    • pecha_path (Path): Path to the Pecha directory

  • Returns: Pecha instance

  • Example:

    from pathlib import Path
    from openpecha.pecha import Pecha
    
    pecha = Pecha.from_path(Path("/path/to/pecha"))
    

Pecha.create() -> Pecha

Creates a new Pecha instance in the specified output directory.

  • Parameters:

    • output_path (Path): Directory where the Pecha should be created

    • pecha_id (str, optional): Custom Pecha ID. If not provided, a new ID will be generated

  • Returns: Pecha instance

  • Example:

    from pathlib import Path
    from openpecha.pecha import Pecha
    
    pecha = Pecha.create(Path("./output"))
    

Pecha.base_path() -> Path

Returns the path to the base directory which contains all the base files. If the directory does not exist, it is created.

  • Returns: Path object pointing to the base directory

  • Example:

    base_dir = pecha.base_path
    print(base_dir)  # /path/to/pecha/base
    

Pecha.layer_path() -> Path

Returns the path to the layers directory which contains all the annotation files. If the directory does not exist, it is created.

  • Returns: Path object pointing to the layers directory

  • Example:

    layer_dir = pecha.layer_path
    print(layer_dir)  # /path/to/pecha/layers
    

Pecha.metadata_path() -> Path

Returns the path to the metadata file.

  • Returns: Path object pointing to the metadata file

  • Example:

    metadata_file = pecha.metadata_path
    print(metadata_file)  # /path/to/pecha/metadata.json
    

Pecha.get_base() -> str

Gets the content of a base file by its name.

  • Parameters:

    • base_name (str): Name of the base file

  • Returns: str containing the base text content

  • Example:

    base_text = pecha.get_base("base1")
    

Pecha.set_base() -> str

Sets the content of a base file.

  • Parameters:

    • content (str): Text content to write to the base file

    • base_name (str, optional): Name for the base file. If not provided, a new ID will be generated

  • Returns: str containing the base name

  • Example:

    base_name = pecha.set_base("This is the text content", "base1")
    

Pecha.add_layer() -> Tuple[AnnotationStore, Path]

Adds a new annotation layer for a given base.

  • Parameters:

    • base_name (str): Name of the base file to associate with this layer

    • layer_type (AnnotationType): Type of annotation layer (must be included in AnnotationType enum)

  • Returns: Tuple of (AnnotationStore, Path) containing:

    • AnnotationStore: The created annotation store

    • Path: Path to the layer file

  • Example:

    from openpecha.pecha.layer import AnnotationType
    
    # Add a segmentation layer
    layer, layer_path = pecha.add_layer("base1", AnnotationType.SEGMENTATION)
    
    # Add a chapter layer
    layer, layer_path = pecha.add_layer("base1", AnnotationType.CHAPTER)
    
  • Note: The layer file will be created with a name format of {layer_type}-{random_id}.json in the layers directory under the base name folder.

Pecha.add_annotation() -> AnnotationStore

Adds an annotation to an existing annotation layer (Annotation Store).

  • Parameters:

    • ann_store (AnnotationStore): The annotation store/layer to add the annotation to

    • annotation (BaseAnnotation): The annotation object to add (e.g., SegmentationAnnotation, CitationAnnotation)

    • layer_type (AnnotationType): The type of annotation (must match the layer type)

  • Returns: AnnotationStore with the added annotation

  • Example:

    from openpecha.pecha.annotations import Span, SegmentationAnnotation
    from openpecha.pecha.layer import AnnotationType
    
    # Create a segmentation annotation
    ann = SegmentationAnnotation(span=Span(start=0, end=10), index=1)
    
    # Add the annotation to the layer
    layer = pecha.add_annotation(layer, ann, AnnotationType.SEGMENTATION)
    
    # Save the layer after adding annotations
    layer.save()
    
  • Note:

    • The annotation’s span must be valid for the base text

    • The layer_type must match the type of annotation being added

    • The layer must be saved after adding annotations to persist the changes

Pecha.set_metadata() -> PechaMetaData

Updates the Pecha’s metadata with new values while preserving existing metadata fields if not overridden.

  • Parameters:

    • pecha_metadata (Dict): Dictionary containing metadata fields to update. Can include:

      • title (Dict[str, str] | str): Title in different languages or single language

      • author (List[str] | Dict[str, str] | str): Author(s) information

      • language (str): Language code (e.g., ‘bo’, ‘en’)

      • parser (str): Name of the parser used

      • initial_creation_type (str): How the Pecha was created

      • source_metadata (Dict): Additional source information

      • copyright (Dict): Copyright information

      • licence (str): License type

  • Returns: Updated PechaMetaData object

  • Example:

    # Update metadata with new values
    pecha.set_metadata({
        "title": {"en": "New Title", "bo": "གསར་བཅོས་ཁ་བྱང་།"},
        "author": ["Author 1", "Author 2"],
        "language": "bo",
        "source_metadata": {
            "id": "source123",
            "publisher": "Publisher Name"
        }
    })
    
    # Update specific fields while preserving others
    pecha.set_metadata({
        "title": {"en": "Updated Title"},
        "copyright": {
            "year": "2024",
            "holder": "Copyright Holder"
        }
    })
    
  • Note:

    • Existing metadata fields not included in the update dictionary will be preserved

    • The parser and initial_creation_type fields will be preserved from existing metadata if not specified

    • The metadata is automatically saved to the metadata.json file

    • Invalid metadata will raise a ValueError

Pecha.get_layers() -> Generator[Tuple[str, AnnotationStore]

Returns all layers from the Pecha associated with the given base.

  • Parameters:

    • base_name (str): Name of the base file

    • from_cache (bool, optional): Whether to load from cache. Defaults to False

  • Returns: Generator yielding tuples of (layer_name, AnnotationStore)

  • Example:

    for layer_name, layer_store in pecha.get_layers("base1"):
        print(layer_name, layer_store)
    

Pecha.get_segmentation_layer_path() -> str

Gets the path to the first segmentation layer file.

  • Returns: str containing the relative path to the segmentation layer file

  • Example:

    layer_path = pecha.get_segmentation_layer_path()
    

Pecha.get_first_layer_path() -> str

Gets the path to the first layer file.

  • Returns: str containing the relative path to the first layer file

  • Example:

    layer_path = pecha.get_first_layer_path()
    

Pecha.get_layer_by_ann_type() -> Union[Tuple[AnnotationStore, Path], Tuple[List[AnnotationStore], List[Path]]]

Gets layers by annotation type.

  • Parameters:

    • base_name (str): Name of the base file

    • layer_type (AnnotationType): Type of annotation to retrieve

  • Returns: Tuple of (AnnotationStore or list of AnnotationStore, Path or list of Path)

  • Example:

    layer, layer_path = pecha.get_layer_by_ann_type("base1", AnnotationType.SEGMENTATION)
    

Pecha.get_layer_by_filename() -> Optional[AnnotationStore]

Gets a layer by its filename.

  • Parameters:

    • base_name (str): Name of the base file

    • filename (str): Name of the layer file

  • Returns: AnnotationStore or None if not found

  • Example:

    layer = pecha.get_layer_by_filename("base1", "segmentation-1234.json")
    

Pecha.publish() -> None

Publishes the Pecha to GitHub and optionally creates a release with assets.

  • Parameters:

    • asset_path (Path, optional): Path to the asset directory

    • asset_name (str, optional): Name for the asset. Defaults to “source_data”

    • branch (str, optional): Branch to publish to. Defaults to “main”

    • is_private (bool, optional): Whether the repository should be private. Defaults to False

  • Example:

    pecha.publish(
        asset_path=Path("./assets"),
        asset_name="source_data",
        branch="main",
        is_private=False
    )
    

Pecha.merge_pecha() -> None

Merges the layers of a source pecha into the current pecha.

  • Parameters:

    • source_pecha (Pecha): The source Pecha instance

    • source_base_name (str): The base name of the source pecha

    • target_base_name (str): The base name of the target (current) pecha

  • Example:

    pecha.merge_pecha(source_pecha, "source_base", "target_base")
    

DocxRootParser.parse() -> Tuple[Pecha, annotation_path]

Parses a DOCX file and creates a Pecha object with annotations.

  • Parameters:

    • input (str | Path): Path to the DOCX file to be parsed

    • annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)

    • metadata (Dict): Dictionary containing metadata for the Pecha

    • output_path (Path, optional): Directory where the Pecha should be created. Defaults to PECHAS_PATH

    • pecha_id (str | None, optional): Custom Pecha ID. If not provided, a new ID will be generated

  • Returns: Tuple containing:

    • Pecha: The created Pecha instance

    • annotation_path: Path to the created annotation layer file

  • Example:

    from pathlib import Path
    from openpecha.pecha.layer import AnnotationType
    from openpecha.pecha.parsers.docx.root import DocxRootParser
    
    parser = DocxRootParser()
    pecha, layer_path = parser.parse(
        input="path/to/file.docx",
        annotation_type=AnnotationType.SEGMENTATION,
        metadata={"title": "Sample Title"},
        output_path=Path("./output")
    )
    

DocxRootParser.extract_anns() -> Tuple[List[BaseAnnotation], str]

Extracts text and annotations from a DOCX file.

  • Parameters:

    • docx_file (Path): Path to the DOCX file

    • annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)

  • Returns: Tuple containing:

    • List[BaseAnnotation]: List of extracted annotations

    • str: The extracted base text

  • Example:

    from pathlib import Path
    from openpecha.pecha.layer import AnnotationType
    from openpecha.pecha.parsers.docx.root import DocxRootParser
    
    parser = DocxRootParser()
    anns, base = parser.extract_anns(
        Path("path/to/file.docx"),
        AnnotationType.SEGMENTATION
    )
    

DocxRootParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]

Extracts segmentation annotations from numbered text.

  • Parameters:

    • numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content

  • Returns: Tuple containing:

    • List[SegmentationAnnotation]: List of segmentation annotations

    • str: The concatenated base text

  • Example:

    from openpecha.pecha.parsers.docx.root import DocxRootParser
    
    parser = DocxRootParser()
    numbered_text = {
        "1": "First segment",
        "2": "Second segment"
    }
    anns, base = parser.extract_segmentation_anns(numbered_text)
    

DocxRootParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]

Extracts alignment annotations from numbered text.

  • Parameters:

    • numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content

  • Returns: Tuple containing:

    • List[AlignmentAnnotation]: List of alignment annotations

    • str: The concatenated base text

  • Example:

    from openpecha.pecha.parsers.docx.root import DocxRootParser
    
    parser = DocxRootParser()
    numbered_text = {
        "1": "First segment",
        "2": "Second segment"
    }
    anns, base = parser.extract_alignment_anns(numbered_text)
    

DocxSimpleCommentaryParser.parse() -> Tuple[Pecha, annotation_path]

Parses a DOCX file and creates a commentary Pecha object with annotations.

  • Parameters:

    • input (str | Path): Path to the DOCX file to be parsed

    • annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)

    • metadata (Dict[str, Any]): Dictionary containing metadata for the Pecha

    • output_path (Path, optional): Directory where the Pecha should be created. Defaults to PECHAS_PATH

    • pecha_id (str | None, optional): Custom Pecha ID. If not provided, a new ID will be generated

  • Returns: Tuple containing:

    • Pecha: The created Pecha instance

    • annotation_path: Path to the created annotation layer file

  • Example:

    from pathlib import Path
    from openpecha.pecha.layer import AnnotationType
    from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser
    
    parser = DocxSimpleCommentaryParser()
    pecha, layer_path = parser.parse(
        input="path/to/commentary.docx",
        annotation_type=AnnotationType.ALIGNMENT,
        metadata={"title": "Commentary Title", "commentary_of": "P0001"},
        output_path=Path("./output")
    )
    

DocxSimpleCommentaryParser.extract_anns() -> Tuple[List[BaseAnnotation], str]

Extracts text and annotations from a commentary DOCX file.

  • Parameters:

    • docx_file (Path): Path to the DOCX file

    • annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)

  • Returns: Tuple containing:

    • List[BaseAnnotation]: List of extracted annotations

    • str: The extracted base text

  • Example:

    from pathlib import Path
    from openpecha.pecha.layer import AnnotationType
    from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser
    
    parser = DocxSimpleCommentaryParser()
    anns, base = parser.extract_anns(
        Path("path/to/commentary.docx"),
        AnnotationType.ALIGNMENT
    )
    

DocxSimpleCommentaryParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]

Extracts segmentation annotations from numbered commentary text.

  • Parameters:

    • numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content

  • Returns: Tuple containing:

    • List[SegmentationAnnotation]: List of segmentation annotations

    • str: The concatenated base text

  • Example:

    from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser
    
    parser = DocxSimpleCommentaryParser()
    numbered_text = {
        "1": "First commentary segment",
        "2": "Second commentary segment"
    }
    anns, base = parser.extract_segmentation_anns(numbered_text)
    

DocxSimpleCommentaryParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]

Extracts alignment annotations from numbered commentary text, handling root text references.

  • Parameters:

    • numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content

  • Returns: Tuple containing:

    • List[AlignmentAnnotation]: List of alignment annotations with root text references

    • str: The concatenated base text

  • Example:

    from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser
    
    parser = DocxSimpleCommentaryParser()
    numbered_text = {
        "1": "1-2 First commentary segment",
        "2": "3-4 Second commentary segment"
    }
    anns, base = parser.extract_alignment_anns(numbered_text)
    
  • Note: The commentary text can include root text references in the format “1-2 Commentary text” where “1-2” refers to the root text segments being commented on.

DocxAnnotationParser.add_annotation() -> Tuple[Pecha, annotation_path]

Adds annotations to an existing Pecha from a DOCX file.

  • Parameters:

    • pecha (Pecha): The Pecha instance to add annotations to

    • type (AnnotationType | str): Type of annotation to extract (ALIGNMENT, SEGMENTATION, or FOOTNOTE)

    • docx_file (Path): Path to the DOCX file containing annotations

    • metadatas (List[Any]): List of metadata objects to determine if the Pecha is root-related

  • Returns: Tuple containing:

    • Pecha: The updated Pecha instance

    • annotation_path: Path to the created annotation layer file

  • Example:

    from pathlib import Path
    from openpecha.pecha.layer import AnnotationType
    from openpecha.pecha.parsers.docx.annotation import DocxAnnotationParser
    
    parser = DocxAnnotationParser()
    pecha, layer_path = parser.add_annotation(
        pecha=existing_pecha,
        type=AnnotationType.FOOTNOTE,
        docx_file=Path("path/to/annotations.docx"),
        metadatas=[metadata]
    )
    
  • Note:

    • The parser supports three types of annotations: ALIGNMENT, SEGMENTATION, and FOOTNOTE

    • For FOOTNOTE annotations, it uses DocxFootnoteParser

    • For root-related Pechas, it uses DocxRootParser

    • For other cases, it uses DocxSimpleCommentaryParser

    • The coordinates of annotations are automatically updated to match the base text

DocxAnnotationUpdate.extract_layer_name() -> str

Extracts the layer name from a layer path.

  • Parameters:

    • layer_path (str): Path to the layer file

  • Returns: str containing the layer name (filename without extension)

  • Example:

    updater = DocxAnnotationUpdate()
    layer_name = updater.extract_layer_name("path/to/segmentation-1234.json")
    print(layer_name)  # "segmentation-1234"
    

DocxAnnotationUpdate.extract_layer_id() -> str

Extracts the layer ID from a layer path.

  • Parameters:

    • layer_path (str): Path to the layer file

  • Returns: str containing the layer ID (last part of the filename after the hyphen)

  • Example:

    updater = DocxAnnotationUpdate()
    layer_id = updater.extract_layer_id("path/to/segmentation-1234.json")
    print(layer_id)  # "1234"
    

DocxAnnotationUpdate.extract_layer_enum() -> AnnotationType

Extracts the annotation type from a layer path.

  • Parameters:

    • layer_path (str): Path to the layer file

  • Returns: AnnotationType enum value corresponding to the layer type

  • Example:

    updater = DocxAnnotationUpdate()
    layer_type = updater.extract_layer_enum("path/to/segmentation-1234.json")
    print(layer_type)  # AnnotationType.SEGMENTATION
    

DocxAnnotationUpdate.update_annotation() -> Pecha

Updates annotations in an existing Pecha from a DOCX file while preserving the layer ID.

  • Parameters:

    • pecha (Pecha): The Pecha instance to update annotations in

    • annotation_path (str): Path to the existing annotation layer file

    • docx_file (Path): Path to the DOCX file containing new annotations

    • metadatas (List[Any]): List of metadata objects to determine if the Pecha is root-related

  • Returns: Updated Pecha instance

  • Example:

    from pathlib import Path
    from openpecha.pecha.parsers.docx.update import DocxAnnotationUpdate
    
    updater = DocxAnnotationUpdate()
    updated_pecha = updater.update_annotation(
        pecha=existing_pecha,
        annotation_path="path/to/segmentation-1234.json",
        docx_file=Path("path/to/updated_annotations.docx"),
        metadatas=[metadata]
    )
    
  • Note:

    • The method preserves the original layer ID when updating annotations

    • It automatically determines the annotation type from the existing layer path

    • Uses DocxAnnotationParser internally to handle the actual annotation update

TranslationAlignmentTransfer.is_empty() -> bool

Checks if a text string is empty (contains only whitespace and newlines).

  • Parameters:

    • text (str): The text to check

  • Returns: bool indicating if the text is empty

  • Example:

    transfer = TranslationAlignmentTransfer()
    is_empty = transfer.is_empty("  \n  ")  # True
    is_empty = transfer.is_empty("Some text")  # False
    

TranslationAlignmentTransfer.get_segmentation_ann_path() -> Path

Gets the path to the first segmentation layer JSON file in a Pecha.

  • Parameters:

    • pecha (Pecha): The Pecha instance to search in

  • Returns: Path object pointing to the segmentation layer file

  • Example:

    transfer = TranslationAlignmentTransfer()
    seg_path = transfer.get_segmentation_ann_path(pecha)
    

TranslationAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]

Maps annotations from source layer to target layer based on span overlap or containment.

  • Parameters:

    • src_layer (AnnotationStore): Source annotation layer

    • tgt_layer (AnnotationStore): Target annotation layer

  • Returns: Dictionary mapping source indices to lists of target indices

  • Example:

    transfer = TranslationAlignmentTransfer()
    mapping = transfer.map_layer_to_layer(source_layer, target_layer)
    
  • Note:

    • Maps based on span overlap or containment

    • Excludes edge overlaps

    • Returns a sorted dictionary

TranslationAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]

Gets mapping from a Pecha’s alignment layer to its segmentation layer.

  • Parameters:

    • pecha (Pecha): The Pecha instance

    • alignment_id (str): ID of the alignment layer

  • Returns: Dictionary mapping alignment indices to segmentation indices

  • Example:

    transfer = TranslationAlignmentTransfer()
    mapping = transfer.get_root_pechas_mapping(pecha, "alignment-1234.json")
    

TranslationAlignmentTransfer.get_translation_pechas_mapping() -> Dict[int, List]

Gets mapping from segmentation to alignment layer in a translation Pecha.

  • Parameters:

    • pecha (Pecha): The translation Pecha instance

    • alignment_id (str): ID of the alignment layer

    • segmentation_id (str): ID of the segmentation layer

  • Returns: Dictionary mapping segmentation indices to alignment indices

  • Example:

    transfer = TranslationAlignmentTransfer()
    mapping = transfer.get_translation_pechas_mapping(
        pecha,
        "alignment-1234.json",
        "segmentation-5678.json"
    )
    

TranslationAlignmentTransfer.mapping_to_text_list() -> List[str]

Flattens a mapping from translation to root text into a list of texts.

  • Parameters:

    • mapping (Dict[int, List[str]]): Mapping of indices to text lists

  • Returns: List of texts, with empty strings for missing indices

  • Example:

    transfer = TranslationAlignmentTransfer()
    texts = transfer.mapping_to_text_list({1: ["text1"], 3: ["text2"]})
    # ["text1", "", "text2"]
    

TranslationAlignmentTransfer.get_serialized_translation_alignment() -> List[str]

Serializes root translation alignment text mapped to root segmentation text.

  • Parameters:

    • root_pecha (Pecha): The root Pecha instance

    • root_alignment_id (str): ID of the root alignment layer

    • root_translation_pecha (Pecha): The translation Pecha instance

    • translation_alignment_id (str): ID of the translation alignment layer

  • Returns: List of texts aligned with root segmentation

  • Example:

    transfer = TranslationAlignmentTransfer()
    texts = transfer.get_serialized_translation_alignment(
        root_pecha,
        "alignment-1234.json",
        translation_pecha,
        "alignment-5678.json"
    )
    

TranslationAlignmentTransfer.get_serialized_translation_segmentation() -> List[str]

Serializes root translation segmentation text mapped to root segmentation text.

  • Parameters:

    • root_pecha (Pecha): The root Pecha instance

    • root_alignment_id (str): ID of the root alignment layer

    • translation_pecha (Pecha): The translation Pecha instance

    • translation_alignment_id (str): ID of the translation alignment layer

    • translation_segmentation_id (str): ID of the translation segmentation layer

  • Returns: List of texts aligned with root segmentation

  • Example:

    transfer = TranslationAlignmentTransfer()
    texts = transfer.get_serialized_translation_segmentation(
        root_pecha,
        "alignment-1234.json",
        translation_pecha,
        "alignment-5678.json",
        "segmentation-9012.json"
    )
    

CommentaryAlignmentTransfer.get_first_valid_root_idx() -> int | None

Gets the first valid root index from an annotation’s alignment index.

  • Parameters:

    • ann (dict): The annotation dictionary containing alignment_index

  • Returns: First valid root index or None if no valid indices found

  • Example:

    transfer = CommentaryAlignmentTransfer()
    idx = transfer.get_first_valid_root_idx({"alignment_index": "1,2-4"})  # 1
    

CommentaryAlignmentTransfer.is_valid_ann() -> bool

Checks if an annotation is valid (exists and has non-empty text).

  • Parameters:

    • anns (Dict[int, Dict[str, Any]]): Dictionary of annotations

    • idx (int): Index to check

  • Returns: bool indicating if the annotation is valid

  • Example:

    transfer = CommentaryAlignmentTransfer()
    is_valid = transfer.is_valid_ann(annotations, 1)
    

CommentaryAlignmentTransfer.get_segmentation_ann_path() -> Path

Gets the path to the first segmentation layer JSON file in a Pecha.

  • Parameters:

    • pecha (Pecha): The Pecha instance to search in

  • Returns: Path object pointing to the segmentation layer file

  • Example:

    transfer = CommentaryAlignmentTransfer()
    seg_path = transfer.get_segmentation_ann_path(pecha)
    

CommentaryAlignmentTransfer.index_annotations_by_root() -> Dict[int, Dict[str, Any]]

Indexes annotations by their root index.

  • Parameters:

    • anns (List[Dict[str, Any]]): List of annotation dictionaries

  • Returns: Dictionary mapping root indices to annotation dictionaries

  • Example:

    transfer = CommentaryAlignmentTransfer()
    indexed_anns = transfer.index_annotations_by_root(annotations)
    

CommentaryAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]

Maps annotations from source layer to target layer based on span overlap or containment.

  • Parameters:

    • src_layer (AnnotationStore): Source annotation layer

    • tgt_layer (AnnotationStore): Target annotation layer

  • Returns: Dictionary mapping source indices to lists of target indices

  • Example:

    transfer = CommentaryAlignmentTransfer()
    mapping = transfer.map_layer_to_layer(source_layer, target_layer)
    
  • Note:

    • Maps based on span overlap or containment

    • Excludes edge overlaps

    • Returns a sorted dictionary

    • Handles complex alignment indices (e.g., “1,2-4”)

CommentaryAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]

Gets mapping from a Pecha’s alignment layer to its segmentation layer.

  • Parameters:

    • pecha (Pecha): The Pecha instance

    • alignment_id (str): ID of the alignment layer

  • Returns: Dictionary mapping alignment indices to segmentation indices

  • Example:

    transfer = CommentaryAlignmentTransfer()
    mapping = transfer.get_root_pechas_mapping(pecha, "alignment-1234.json")
    

CommentaryAlignmentTransfer.get_commentary_pechas_mapping() -> Dict[int, List[int]]

Gets mapping from commentary Pecha’s segmentation layer to alignment layer.

  • Parameters:

    • pecha (Pecha): The commentary Pecha instance

    • alignment_id (str): ID of the alignment layer

    • segmentation_id (str): ID of the segmentation layer

  • Returns: Dictionary mapping segmentation indices to alignment indices

  • Example:

    transfer = CommentaryAlignmentTransfer()
    mapping = transfer.get_commentary_pechas_mapping(
        pecha,
        "alignment-1234.json",
        "segmentation-5678.json"
    )
    

CommentaryAlignmentTransfer.get_serialized_commentary() -> List[str]

Serializes commentary annotations with root/segmentation mapping and formatting.

  • Parameters:

    • root_pecha (Pecha): The root Pecha instance

    • root_alignment_id (str): ID of the root alignment layer

    • commentary_pecha (Pecha): The commentary Pecha instance

    • commentary_alignment_id (str): ID of the commentary alignment layer

  • Returns: List of formatted commentary texts

  • Example:

    transfer = CommentaryAlignmentTransfer()
    texts = transfer.get_serialized_commentary(
        root_pecha,
        "alignment-1234.json",
        commentary_pecha,
        "alignment-5678.json"
    )
    

CommentaryAlignmentTransfer.get_serialized_commentary_segmentation() -> List[str]

Serializes commentary segmentation annotations with root/segmentation mapping and formatting.

  • Parameters:

    • root_pecha (Pecha): The root Pecha instance

    • root_alignment_id (str): ID of the root alignment layer

    • commentary_pecha (Pecha): The commentary Pecha instance

    • commentary_alignment_id (str): ID of the commentary alignment layer

    • commentary_segmentation_id (str): ID of the commentary segmentation layer

  • Returns: List of formatted commentary texts

  • Example:

    transfer = CommentaryAlignmentTransfer()
    texts = transfer.get_serialized_commentary_segmentation(
        root_pecha,
        "alignment-1234.json",
        commentary_pecha,
        "alignment-5678.json",
        "segmentation-9012.json"
    )
    

CommentaryAlignmentTransfer.format_serialized_commentary() -> str

Formats a commentary text with chapter and segment information.

  • Parameters:

    • chapter_num (int): Chapter number

    • seg_idx (int): Segment index

    • text (str): Commentary text

  • Returns: Formatted string in the format “text”

  • Example:

    transfer = CommentaryAlignmentTransfer()
    formatted = transfer.format_serialized_commentary(1, 2, "Commentary text")
    # "<1><2>Commentary text"
    

CommentaryAlignmentTransfer.process_commentary_ann() -> str | None

Processes a single commentary annotation and returns the serialized string.

  • Parameters:

    • ann (dict): The commentary annotation to process

    • root_anns (dict): Dictionary of root annotations

    • root_map (dict): Mapping from root alignment to segmentation

    • root_segmentation_anns (dict): Dictionary of root segmentation annotations

  • Returns: Formatted commentary string or None if not valid

  • Example:

    transfer = CommentaryAlignmentTransfer()
    result = transfer.process_commentary_ann(
        commentary_ann,
        root_anns,
        root_map,
        root_segmentation_anns
    )