API REFERENCES

Pecha

Pecha.from_path()
Pecha.create()
Pecha.base_path()
Pecha.layer_path()
Pecha.metadata_path()
Pecha.get_base()
Pecha.set_base()
Pecha.add_layer()
Pecha.add_annotation()
Pecha.set_metadata()
Pecha.get_layers()
Pecha.get_segmentation_layer_path()
Pecha.get_first_layer_path()
Pecha.get_layer_by_ann_type()
Pecha.get_layer_by_filename()
Pecha.publish()
Pecha.merge_pecha()

DocxRootParser

DocxRootParser.parse()
DocxRootParser.extract_anns()
DocxRootParser.extract_segmentation_anns()
DocxRootParser.extract_alignment_anns()

DocxSimpleCommentaryParser

DocxSimpleCommentaryParser.parse()
DocxSimpleCommentaryParser.extract_anns()
DocxSimpleCommentaryParser.extract_segmentation_anns()
DocxSimpleCommentaryParser.extract_alignment_anns()

DocxAnnotationParser

DocxAnnotationParser.add_annotation()

DocxAnnotationUpdate

DocxAnnotationUpdate.extract_layer_name()
DocxAnnotationUpdate.extract_layer_id()
DocxAnnotationUpdate.extract_layer_enum()
DocxAnnotationUpdate.update_annotation()

TranslationAlignmentTransfer

TranslationAlignmentTransfer.is_empty()
TranslationAlignmentTransfer.get_segmentation_ann_path()
TranslationAlignmentTransfer.map_layer_to_layer()
TranslationAlignmentTransfer.get_root_pechas_mapping()
TranslationAlignmentTransfer.get_translation_pechas_mapping()
TranslationAlignmentTransfer.mapping_to_text_list()
TranslationAlignmentTransfer.get_serialized_translation_alignment()
TranslationAlignmentTransfer.get_serialized_translation_segmentation()

CommentaryAlignmentTransfer

CommentaryAlignmentTransfer.get_first_valid_root_idx()
CommentaryAlignmentTransfer.is_valid_ann()
CommentaryAlignmentTransfer.get_segmentation_ann_path()
CommentaryAlignmentTransfer.index_annotations_by_root()
CommentaryAlignmentTransfer.map_layer_to_layer()
CommentaryAlignmentTransfer.get_root_pechas_mapping()
CommentaryAlignmentTransfer.get_commentary_pechas_mapping()
CommentaryAlignmentTransfer.get_serialized_commentary()
CommentaryAlignmentTransfer.get_serialized_commentary_segmentation()
CommentaryAlignmentTransfer.format_serialized_commentary()
CommentaryAlignmentTransfer.process_commentary_ann()

`Pecha.from_path() -> Pecha`

Loads a Pecha instance from a local path.

Parameters:
- pecha_path (Path): Path to the Pecha directory
Returns: Pecha instance

Example:

from pathlib import Path
from openpecha.pecha import Pecha

pecha = Pecha.from_path(Path("/path/to/pecha"))

`Pecha.create() -> Pecha`

Creates a new Pecha instance in the specified output directory.

Parameters:
- output_path (Path): Directory where the Pecha should be created
- pecha_id (str, optional): Custom Pecha ID. If not provided, a new ID will be generated
Returns: Pecha instance

Example:

from pathlib import Path
from openpecha.pecha import Pecha

pecha = Pecha.create(Path("./output"))

`Pecha.base_path() -> Path`

Returns the path to the base directory which contains all the base files. If the directory does not exist, it is created.

Returns: Path object pointing to the base directory

Example:

base_dir = pecha.base_path
print(base_dir)  # /path/to/pecha/base

`Pecha.layer_path() -> Path`

Returns the path to the layers directory which contains all the annotation files. If the directory does not exist, it is created.

Returns: Path object pointing to the layers directory

Example:

layer_dir = pecha.layer_path
print(layer_dir)  # /path/to/pecha/layers

`Pecha.metadata_path() -> Path`

Returns the path to the metadata file.

Returns: Path object pointing to the metadata file

Example:

metadata_file = pecha.metadata_path
print(metadata_file)  # /path/to/pecha/metadata.json

`Pecha.get_base() -> str`

Gets the content of a base file by its name.

Parameters:
- base_name (str): Name of the base file
Returns: str containing the base text content
Example:
```
base_text = pecha.get_base("base1")
```

`Pecha.set_base() -> str`

Sets the content of a base file.

Parameters:
- content (str): Text content to write to the base file
- base_name (str, optional): Name for the base file. If not provided, a new ID will be generated
Returns: str containing the base name

Example:

base_name = pecha.set_base("This is the text content", "base1")

`Pecha.add_layer() -> Tuple[AnnotationStore, Path]`

Adds a new annotation layer for a given base.

Parameters:
- base_name (str): Name of the base file to associate with this layer
- layer_type (AnnotationType): Type of annotation layer (must be included in AnnotationType enum)
Returns: Tuple of (AnnotationStore, Path) containing:
- AnnotationStore: The created annotation store
- Path: Path to the layer file

Example:

from openpecha.pecha.layer import AnnotationType

# Add a segmentation layer
layer, layer_path = pecha.add_layer("base1", AnnotationType.SEGMENTATION)

# Add a chapter layer
layer, layer_path = pecha.add_layer("base1", AnnotationType.CHAPTER)

Note: The layer file will be created with a name format of {layer_type}-{random_id}.json in the layers directory under the base name folder.

`Pecha.add_annotation() -> AnnotationStore`

Adds an annotation to an existing annotation layer (Annotation Store).

Parameters:
- ann_store (AnnotationStore): The annotation store/layer to add the annotation to
- annotation (BaseAnnotation): The annotation object to add (e.g., SegmentationAnnotation, CitationAnnotation)
- layer_type (AnnotationType): The type of annotation (must match the layer type)
Returns: AnnotationStore with the added annotation

Example:

from openpecha.pecha.annotations import Span, SegmentationAnnotation
from openpecha.pecha.layer import AnnotationType

# Create a segmentation annotation
ann = SegmentationAnnotation(span=Span(start=0, end=10), index=1)

# Add the annotation to the layer
layer = pecha.add_annotation(layer, ann, AnnotationType.SEGMENTATION)

# Save the layer after adding annotations
layer.save()

Note:
- The annotation’s span must be valid for the base text
- The layer_type must match the type of annotation being added
- The layer must be saved after adding annotations to persist the changes

`Pecha.set_metadata() -> PechaMetaData`

Updates the Pecha’s metadata with new values while preserving existing metadata fields if not overridden.

Parameters:
- pecha_metadata (Dict): Dictionary containing metadata fields to update. Can include:
  - title (Dict[str, str] | str): Title in different languages or single language
  - author (List[str] | Dict[str, str] | str): Author(s) information
  - language (str): Language code (e.g., ‘bo’, ‘en’)
  - parser (str): Name of the parser used
  - initial_creation_type (str): How the Pecha was created
  - source_metadata (Dict): Additional source information
  - copyright (Dict): Copyright information
  - licence (str): License type
Returns: Updated PechaMetaData object

Example:

# Update metadata with new values
pecha.set_metadata({
    "title": {"en": "New Title", "bo": "གསར་བཅོས་ཁ་བྱང་།"},
    "author": ["Author 1", "Author 2"],
    "language": "bo",
    "source_metadata": {
        "id": "source123",
        "publisher": "Publisher Name"
    }
})

# Update specific fields while preserving others
pecha.set_metadata({
    "title": {"en": "Updated Title"},
    "copyright": {
        "year": "2024",
        "holder": "Copyright Holder"
    }
})

Note:
- Existing metadata fields not included in the update dictionary will be preserved
- The parser and initial_creation_type fields will be preserved from existing metadata if not specified
- The metadata is automatically saved to the metadata.json file
- Invalid metadata will raise a ValueError

`Pecha.get_layers() -> Generator[Tuple[str, AnnotationStore]`

Returns all layers from the Pecha associated with the given base.

Parameters:
- base_name (str): Name of the base file
- from_cache (bool, optional): Whether to load from cache. Defaults to False
Returns: Generator yielding tuples of (layer_name, AnnotationStore)

Example:

for layer_name, layer_store in pecha.get_layers("base1"):
    print(layer_name, layer_store)

`Pecha.get_segmentation_layer_path() -> str`

Gets the path to the first segmentation layer file.

Returns: str containing the relative path to the segmentation layer file

Example:

layer_path = pecha.get_segmentation_layer_path()

`Pecha.get_first_layer_path() -> str`

Gets the path to the first layer file.

Returns: str containing the relative path to the first layer file

Example:

layer_path = pecha.get_first_layer_path()

`Pecha.get_layer_by_ann_type() -> Union[Tuple[AnnotationStore, Path], Tuple[List[AnnotationStore], List[Path]]]`

Gets layers by annotation type.

Parameters:
- base_name (str): Name of the base file
- layer_type (AnnotationType): Type of annotation to retrieve
Returns: Tuple of (AnnotationStore or list of AnnotationStore, Path or list of Path)

Example:

layer, layer_path = pecha.get_layer_by_ann_type("base1", AnnotationType.SEGMENTATION)

`Pecha.get_layer_by_filename() -> Optional[AnnotationStore]`

Gets a layer by its filename.

Parameters:
- base_name (str): Name of the base file
- filename (str): Name of the layer file
Returns: AnnotationStore or None if not found

Example:

layer = pecha.get_layer_by_filename("base1", "segmentation-1234.json")

`Pecha.publish() -> None`

Publishes the Pecha to GitHub and optionally creates a release with assets.

Parameters:
- asset_path (Path, optional): Path to the asset directory
- asset_name (str, optional): Name for the asset. Defaults to “source_data”
- branch (str, optional): Branch to publish to. Defaults to “main”
- is_private (bool, optional): Whether the repository should be private. Defaults to False

Example:

pecha.publish(
    asset_path=Path("./assets"),
    asset_name="source_data",
    branch="main",
    is_private=False
)

`Pecha.merge_pecha() -> None`

Merges the layers of a source pecha into the current pecha.

Parameters:
- source_pecha (Pecha): The source Pecha instance
- source_base_name (str): The base name of the source pecha
- target_base_name (str): The base name of the target (current) pecha

Example:

pecha.merge_pecha(source_pecha, "source_base", "target_base")

`DocxRootParser.parse() -> Tuple[Pecha, annotation_path]`

Parses a DOCX file and creates a Pecha object with annotations.

Parameters:
- input (str | Path): Path to the DOCX file to be parsed
- annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)
- metadata (Dict): Dictionary containing metadata for the Pecha
- output_path (Path, optional): Directory where the Pecha should be created. Defaults to PECHAS_PATH
- pecha_id (str | None, optional): Custom Pecha ID. If not provided, a new ID will be generated
Returns: Tuple containing:
- Pecha: The created Pecha instance
- annotation_path: Path to the created annotation layer file

Example:

from pathlib import Path
from openpecha.pecha.layer import AnnotationType
from openpecha.pecha.parsers.docx.root import DocxRootParser

parser = DocxRootParser()
pecha, layer_path = parser.parse(
    input="path/to/file.docx",
    annotation_type=AnnotationType.SEGMENTATION,
    metadata={"title": "Sample Title"},
    output_path=Path("./output")
)

`DocxRootParser.extract_anns() -> Tuple[List[BaseAnnotation], str]`

Extracts text and annotations from a DOCX file.

Parameters:
- docx_file (Path): Path to the DOCX file
- annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)
Returns: Tuple containing:
- List[BaseAnnotation]: List of extracted annotations
- str: The extracted base text

Example:

from pathlib import Path
from openpecha.pecha.layer import AnnotationType
from openpecha.pecha.parsers.docx.root import DocxRootParser

parser = DocxRootParser()
anns, base = parser.extract_anns(
    Path("path/to/file.docx"),
    AnnotationType.SEGMENTATION
)

`DocxRootParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]`

Extracts segmentation annotations from numbered text.

Parameters:
- numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content
Returns: Tuple containing:
- List[SegmentationAnnotation]: List of segmentation annotations
- str: The concatenated base text

Example:

from openpecha.pecha.parsers.docx.root import DocxRootParser

parser = DocxRootParser()
numbered_text = {
    "1": "First segment",
    "2": "Second segment"
}
anns, base = parser.extract_segmentation_anns(numbered_text)

`DocxRootParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]`

Extracts alignment annotations from numbered text.

Parameters:
- numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content
Returns: Tuple containing:
- List[AlignmentAnnotation]: List of alignment annotations
- str: The concatenated base text

Example:

from openpecha.pecha.parsers.docx.root import DocxRootParser

parser = DocxRootParser()
numbered_text = {
    "1": "First segment",
    "2": "Second segment"
}
anns, base = parser.extract_alignment_anns(numbered_text)

`DocxSimpleCommentaryParser.parse() -> Tuple[Pecha, annotation_path]`

Parses a DOCX file and creates a commentary Pecha object with annotations.

Parameters:
- input (str | Path): Path to the DOCX file to be parsed
- annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)
- metadata (Dict[str, Any]): Dictionary containing metadata for the Pecha
- output_path (Path, optional): Directory where the Pecha should be created. Defaults to PECHAS_PATH
- pecha_id (str | None, optional): Custom Pecha ID. If not provided, a new ID will be generated
Returns: Tuple containing:
- Pecha: The created Pecha instance
- annotation_path: Path to the created annotation layer file

Example:

from pathlib import Path
from openpecha.pecha.layer import AnnotationType
from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser

parser = DocxSimpleCommentaryParser()
pecha, layer_path = parser.parse(
    input="path/to/commentary.docx",
    annotation_type=AnnotationType.ALIGNMENT,
    metadata={"title": "Commentary Title", "commentary_of": "P0001"},
    output_path=Path("./output")
)

`DocxSimpleCommentaryParser.extract_anns() -> Tuple[List[BaseAnnotation], str]`

Extracts text and annotations from a commentary DOCX file.

Parameters:
- docx_file (Path): Path to the DOCX file
- annotation_type (AnnotationType): Type of annotation to extract (SEGMENTATION or ALIGNMENT)
Returns: Tuple containing:
- List[BaseAnnotation]: List of extracted annotations
- str: The extracted base text

Example:

from pathlib import Path
from openpecha.pecha.layer import AnnotationType
from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser

parser = DocxSimpleCommentaryParser()
anns, base = parser.extract_anns(
    Path("path/to/commentary.docx"),
    AnnotationType.ALIGNMENT
)

`DocxSimpleCommentaryParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]`

Extracts segmentation annotations from numbered commentary text.

Parameters:
- numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content
Returns: Tuple containing:
- List[SegmentationAnnotation]: List of segmentation annotations
- str: The concatenated base text

Example:

from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser

parser = DocxSimpleCommentaryParser()
numbered_text = {
    "1": "First commentary segment",
    "2": "Second commentary segment"
}
anns, base = parser.extract_segmentation_anns(numbered_text)

`DocxSimpleCommentaryParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]`

Extracts alignment annotations from numbered commentary text, handling root text references.

Parameters:
- numbered_text (Dict[str, str]): Dictionary mapping segment numbers to text content
Returns: Tuple containing:
- List[AlignmentAnnotation]: List of alignment annotations with root text references
- str: The concatenated base text

Example:

from openpecha.pecha.parsers.docx.commentary.simple import DocxSimpleCommentaryParser

parser = DocxSimpleCommentaryParser()
numbered_text = {
    "1": "1-2 First commentary segment",
    "2": "3-4 Second commentary segment"
}
anns, base = parser.extract_alignment_anns(numbered_text)

Note: The commentary text can include root text references in the format “1-2 Commentary text” where “1-2” refers to the root text segments being commented on.

`DocxAnnotationParser.add_annotation() -> Tuple[Pecha, annotation_path]`

Adds annotations to an existing Pecha from a DOCX file.

Parameters:
- pecha (Pecha): The Pecha instance to add annotations to
- type (AnnotationType | str): Type of annotation to extract (ALIGNMENT, SEGMENTATION, or FOOTNOTE)
- docx_file (Path): Path to the DOCX file containing annotations
- metadatas (List[Any]): List of metadata objects to determine if the Pecha is root-related
Returns: Tuple containing:
- Pecha: The updated Pecha instance
- annotation_path: Path to the created annotation layer file

Example:

from pathlib import Path
from openpecha.pecha.layer import AnnotationType
from openpecha.pecha.parsers.docx.annotation import DocxAnnotationParser

parser = DocxAnnotationParser()
pecha, layer_path = parser.add_annotation(
    pecha=existing_pecha,
    type=AnnotationType.FOOTNOTE,
    docx_file=Path("path/to/annotations.docx"),
    metadatas=[metadata]
)

Note:
- The parser supports three types of annotations: ALIGNMENT, SEGMENTATION, and FOOTNOTE
- For FOOTNOTE annotations, it uses DocxFootnoteParser
- For root-related Pechas, it uses DocxRootParser
- For other cases, it uses DocxSimpleCommentaryParser
- The coordinates of annotations are automatically updated to match the base text

`DocxAnnotationUpdate.extract_layer_name() -> str`

Extracts the layer name from a layer path.

Parameters:
- layer_path (str): Path to the layer file
Returns: str containing the layer name (filename without extension)

Example:

updater = DocxAnnotationUpdate()
layer_name = updater.extract_layer_name("path/to/segmentation-1234.json")
print(layer_name)  # "segmentation-1234"

`DocxAnnotationUpdate.extract_layer_id() -> str`

Extracts the layer ID from a layer path.

Parameters:
- layer_path (str): Path to the layer file
Returns: str containing the layer ID (last part of the filename after the hyphen)

Example:

updater = DocxAnnotationUpdate()
layer_id = updater.extract_layer_id("path/to/segmentation-1234.json")
print(layer_id)  # "1234"

`DocxAnnotationUpdate.extract_layer_enum() -> AnnotationType`

Extracts the annotation type from a layer path.

Parameters:
- layer_path (str): Path to the layer file
Returns: AnnotationType enum value corresponding to the layer type

Example:

updater = DocxAnnotationUpdate()
layer_type = updater.extract_layer_enum("path/to/segmentation-1234.json")
print(layer_type)  # AnnotationType.SEGMENTATION

`DocxAnnotationUpdate.update_annotation() -> Pecha`

Updates annotations in an existing Pecha from a DOCX file while preserving the layer ID.

Parameters:
- pecha (Pecha): The Pecha instance to update annotations in
- annotation_path (str): Path to the existing annotation layer file
- docx_file (Path): Path to the DOCX file containing new annotations
- metadatas (List[Any]): List of metadata objects to determine if the Pecha is root-related
Returns: Updated Pecha instance

Example:

from pathlib import Path
from openpecha.pecha.parsers.docx.update import DocxAnnotationUpdate

updater = DocxAnnotationUpdate()
updated_pecha = updater.update_annotation(
    pecha=existing_pecha,
    annotation_path="path/to/segmentation-1234.json",
    docx_file=Path("path/to/updated_annotations.docx"),
    metadatas=[metadata]
)

Note:
- The method preserves the original layer ID when updating annotations
- It automatically determines the annotation type from the existing layer path
- Uses DocxAnnotationParser internally to handle the actual annotation update

`TranslationAlignmentTransfer.is_empty() -> bool`

Checks if a text string is empty (contains only whitespace and newlines).

Parameters:
- text (str): The text to check
Returns: bool indicating if the text is empty

Example:

transfer = TranslationAlignmentTransfer()
is_empty = transfer.is_empty("  \n  ")  # True
is_empty = transfer.is_empty("Some text")  # False

`TranslationAlignmentTransfer.get_segmentation_ann_path() -> Path`

Gets the path to the first segmentation layer JSON file in a Pecha.

Parameters:
- pecha (Pecha): The Pecha instance to search in
Returns: Path object pointing to the segmentation layer file

Example:

transfer = TranslationAlignmentTransfer()
seg_path = transfer.get_segmentation_ann_path(pecha)

`TranslationAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]`

Maps annotations from source layer to target layer based on span overlap or containment.

Parameters:
- src_layer (AnnotationStore): Source annotation layer
- tgt_layer (AnnotationStore): Target annotation layer
Returns: Dictionary mapping source indices to lists of target indices

Example:

transfer = TranslationAlignmentTransfer()
mapping = transfer.map_layer_to_layer(source_layer, target_layer)

Note:
- Maps based on span overlap or containment
- Excludes edge overlaps
- Returns a sorted dictionary

`TranslationAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]`

Gets mapping from a Pecha’s alignment layer to its segmentation layer.

Parameters:
- pecha (Pecha): The Pecha instance
- alignment_id (str): ID of the alignment layer
Returns: Dictionary mapping alignment indices to segmentation indices

Example:

transfer = TranslationAlignmentTransfer()
mapping = transfer.get_root_pechas_mapping(pecha, "alignment-1234.json")

`TranslationAlignmentTransfer.get_translation_pechas_mapping() -> Dict[int, List]`

Gets mapping from segmentation to alignment layer in a translation Pecha.

Parameters:
- pecha (Pecha): The translation Pecha instance
- alignment_id (str): ID of the alignment layer
- segmentation_id (str): ID of the segmentation layer
Returns: Dictionary mapping segmentation indices to alignment indices

Example:

transfer = TranslationAlignmentTransfer()
mapping = transfer.get_translation_pechas_mapping(
    pecha,
    "alignment-1234.json",
    "segmentation-5678.json"
)

`TranslationAlignmentTransfer.mapping_to_text_list() -> List[str]`

Flattens a mapping from translation to root text into a list of texts.

Parameters:
- mapping (Dict[int, List[str]]): Mapping of indices to text lists
Returns: List of texts, with empty strings for missing indices

Example:

transfer = TranslationAlignmentTransfer()
texts = transfer.mapping_to_text_list({1: ["text1"], 3: ["text2"]})
# ["text1", "", "text2"]

`TranslationAlignmentTransfer.get_serialized_translation_alignment() -> List[str]`

Serializes root translation alignment text mapped to root segmentation text.

Parameters:
- root_pecha (Pecha): The root Pecha instance
- root_alignment_id (str): ID of the root alignment layer
- root_translation_pecha (Pecha): The translation Pecha instance
- translation_alignment_id (str): ID of the translation alignment layer
Returns: List of texts aligned with root segmentation

Example:

transfer = TranslationAlignmentTransfer()
texts = transfer.get_serialized_translation_alignment(
    root_pecha,
    "alignment-1234.json",
    translation_pecha,
    "alignment-5678.json"
)

`TranslationAlignmentTransfer.get_serialized_translation_segmentation() -> List[str]`

Serializes root translation segmentation text mapped to root segmentation text.

Parameters:
- root_pecha (Pecha): The root Pecha instance
- root_alignment_id (str): ID of the root alignment layer
- translation_pecha (Pecha): The translation Pecha instance
- translation_alignment_id (str): ID of the translation alignment layer
- translation_segmentation_id (str): ID of the translation segmentation layer
Returns: List of texts aligned with root segmentation

Example:

transfer = TranslationAlignmentTransfer()
texts = transfer.get_serialized_translation_segmentation(
    root_pecha,
    "alignment-1234.json",
    translation_pecha,
    "alignment-5678.json",
    "segmentation-9012.json"
)

`CommentaryAlignmentTransfer.get_first_valid_root_idx() -> int | None`

Gets the first valid root index from an annotation’s alignment index.

Parameters:
- ann (dict): The annotation dictionary containing alignment_index
Returns: First valid root index or None if no valid indices found

Example:

transfer = CommentaryAlignmentTransfer()
idx = transfer.get_first_valid_root_idx({"alignment_index": "1,2-4"})  # 1

`CommentaryAlignmentTransfer.is_valid_ann() -> bool`

Checks if an annotation is valid (exists and has non-empty text).

Parameters:
- anns (Dict[int, Dict[str, Any]]): Dictionary of annotations
- idx (int): Index to check
Returns: bool indicating if the annotation is valid

Example:

transfer = CommentaryAlignmentTransfer()
is_valid = transfer.is_valid_ann(annotations, 1)

`CommentaryAlignmentTransfer.get_segmentation_ann_path() -> Path`

Gets the path to the first segmentation layer JSON file in a Pecha.

Parameters:
- pecha (Pecha): The Pecha instance to search in
Returns: Path object pointing to the segmentation layer file

Example:

transfer = CommentaryAlignmentTransfer()
seg_path = transfer.get_segmentation_ann_path(pecha)

`CommentaryAlignmentTransfer.index_annotations_by_root() -> Dict[int, Dict[str, Any]]`

Indexes annotations by their root index.

Parameters:
- anns (List[Dict[str, Any]]): List of annotation dictionaries
Returns: Dictionary mapping root indices to annotation dictionaries

Example:

transfer = CommentaryAlignmentTransfer()
indexed_anns = transfer.index_annotations_by_root(annotations)

`CommentaryAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]`

Maps annotations from source layer to target layer based on span overlap or containment.

Parameters:
- src_layer (AnnotationStore): Source annotation layer
- tgt_layer (AnnotationStore): Target annotation layer
Returns: Dictionary mapping source indices to lists of target indices

Example:

transfer = CommentaryAlignmentTransfer()
mapping = transfer.map_layer_to_layer(source_layer, target_layer)

Note:
- Maps based on span overlap or containment
- Excludes edge overlaps
- Returns a sorted dictionary
- Handles complex alignment indices (e.g., “1,2-4”)

`CommentaryAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]`

Gets mapping from a Pecha’s alignment layer to its segmentation layer.

Parameters:
- pecha (Pecha): The Pecha instance
- alignment_id (str): ID of the alignment layer
Returns: Dictionary mapping alignment indices to segmentation indices

Example:

transfer = CommentaryAlignmentTransfer()
mapping = transfer.get_root_pechas_mapping(pecha, "alignment-1234.json")

`CommentaryAlignmentTransfer.get_commentary_pechas_mapping() -> Dict[int, List[int]]`

Gets mapping from commentary Pecha’s segmentation layer to alignment layer.

Parameters:
- pecha (Pecha): The commentary Pecha instance
- alignment_id (str): ID of the alignment layer
- segmentation_id (str): ID of the segmentation layer
Returns: Dictionary mapping segmentation indices to alignment indices

Example:

transfer = CommentaryAlignmentTransfer()
mapping = transfer.get_commentary_pechas_mapping(
    pecha,
    "alignment-1234.json",
    "segmentation-5678.json"
)

`CommentaryAlignmentTransfer.get_serialized_commentary() -> List[str]`

Serializes commentary annotations with root/segmentation mapping and formatting.

Parameters:
- root_pecha (Pecha): The root Pecha instance
- root_alignment_id (str): ID of the root alignment layer
- commentary_pecha (Pecha): The commentary Pecha instance
- commentary_alignment_id (str): ID of the commentary alignment layer
Returns: List of formatted commentary texts

Example:

transfer = CommentaryAlignmentTransfer()
texts = transfer.get_serialized_commentary(
    root_pecha,
    "alignment-1234.json",
    commentary_pecha,
    "alignment-5678.json"
)

`CommentaryAlignmentTransfer.get_serialized_commentary_segmentation() -> List[str]`

Serializes commentary segmentation annotations with root/segmentation mapping and formatting.

Parameters:
- root_pecha (Pecha): The root Pecha instance
- root_alignment_id (str): ID of the root alignment layer
- commentary_pecha (Pecha): The commentary Pecha instance
- commentary_alignment_id (str): ID of the commentary alignment layer
- commentary_segmentation_id (str): ID of the commentary segmentation layer
Returns: List of formatted commentary texts

Example:

transfer = CommentaryAlignmentTransfer()
texts = transfer.get_serialized_commentary_segmentation(
    root_pecha,
    "alignment-1234.json",
    commentary_pecha,
    "alignment-5678.json",
    "segmentation-9012.json"
)

`CommentaryAlignmentTransfer.format_serialized_commentary() -> str`

Formats a commentary text with chapter and segment information.

Parameters:
- chapter_num (int): Chapter number
- seg_idx (int): Segment index
- text (str): Commentary text
Returns: Formatted string in the format “text”

Example:

transfer = CommentaryAlignmentTransfer()
formatted = transfer.format_serialized_commentary(1, 2, "Commentary text")
# "<1><2>Commentary text"

`CommentaryAlignmentTransfer.process_commentary_ann() -> str | None`

Processes a single commentary annotation and returns the serialized string.

Parameters:
- ann (dict): The commentary annotation to process
- root_anns (dict): Dictionary of root annotations
- root_map (dict): Mapping from root alignment to segmentation
- root_segmentation_anns (dict): Dictionary of root segmentation annotations
Returns: Formatted commentary string or None if not valid

Example:

transfer = CommentaryAlignmentTransfer()
result = transfer.process_commentary_ann(
    commentary_ann,
    root_anns,
    root_map,
    root_segmentation_anns
)

API REFERENCES

Pecha

DocxRootParser

DocxSimpleCommentaryParser

DocxAnnotationParser

DocxAnnotationUpdate

TranslationAlignmentTransfer

CommentaryAlignmentTransfer

Pecha.from_path() -> Pecha

Pecha.create() -> Pecha

Pecha.base_path() -> Path

Pecha.layer_path() -> Path

Pecha.metadata_path() -> Path

Pecha.get_base() -> str

Pecha.set_base() -> str

Pecha.add_layer() -> Tuple[AnnotationStore, Path]

Pecha.add_annotation() -> AnnotationStore

Pecha.set_metadata() -> PechaMetaData

Pecha.get_layers() -> Generator[Tuple[str, AnnotationStore]

Pecha.get_segmentation_layer_path() -> str

Pecha.get_first_layer_path() -> str

Pecha.get_layer_by_ann_type() -> Union[Tuple[AnnotationStore, Path], Tuple[List[AnnotationStore], List[Path]]]

Pecha.get_layer_by_filename() -> Optional[AnnotationStore]

Pecha.publish() -> None

Pecha.merge_pecha() -> None

DocxRootParser.parse() -> Tuple[Pecha, annotation_path]

DocxRootParser.extract_anns() -> Tuple[List[BaseAnnotation], str]

DocxRootParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]

DocxRootParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]

DocxSimpleCommentaryParser.parse() -> Tuple[Pecha, annotation_path]

DocxSimpleCommentaryParser.extract_anns() -> Tuple[List[BaseAnnotation], str]

DocxSimpleCommentaryParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]

DocxSimpleCommentaryParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]

DocxAnnotationParser.add_annotation() -> Tuple[Pecha, annotation_path]

DocxAnnotationUpdate.extract_layer_name() -> str

DocxAnnotationUpdate.extract_layer_id() -> str

DocxAnnotationUpdate.extract_layer_enum() -> AnnotationType

DocxAnnotationUpdate.update_annotation() -> Pecha

TranslationAlignmentTransfer.is_empty() -> bool

TranslationAlignmentTransfer.get_segmentation_ann_path() -> Path

TranslationAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]

TranslationAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]

TranslationAlignmentTransfer.get_translation_pechas_mapping() -> Dict[int, List]

TranslationAlignmentTransfer.mapping_to_text_list() -> List[str]

TranslationAlignmentTransfer.get_serialized_translation_alignment() -> List[str]

TranslationAlignmentTransfer.get_serialized_translation_segmentation() -> List[str]

CommentaryAlignmentTransfer.get_first_valid_root_idx() -> int | None

CommentaryAlignmentTransfer.is_valid_ann() -> bool

CommentaryAlignmentTransfer.get_segmentation_ann_path() -> Path

CommentaryAlignmentTransfer.index_annotations_by_root() -> Dict[int, Dict[str, Any]]

CommentaryAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]

CommentaryAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]

CommentaryAlignmentTransfer.get_commentary_pechas_mapping() -> Dict[int, List[int]]

CommentaryAlignmentTransfer.get_serialized_commentary() -> List[str]

CommentaryAlignmentTransfer.get_serialized_commentary_segmentation() -> List[str]

CommentaryAlignmentTransfer.format_serialized_commentary() -> str

CommentaryAlignmentTransfer.process_commentary_ann() -> str | None

`Pecha.from_path() -> Pecha`

`Pecha.create() -> Pecha`

`Pecha.base_path() -> Path`

`Pecha.layer_path() -> Path`

`Pecha.metadata_path() -> Path`

`Pecha.get_base() -> str`

`Pecha.set_base() -> str`

`Pecha.add_layer() -> Tuple[AnnotationStore, Path]`

`Pecha.add_annotation() -> AnnotationStore`

`Pecha.set_metadata() -> PechaMetaData`

`Pecha.get_layers() -> Generator[Tuple[str, AnnotationStore]`

`Pecha.get_segmentation_layer_path() -> str`

`Pecha.get_first_layer_path() -> str`

`Pecha.get_layer_by_ann_type() -> Union[Tuple[AnnotationStore, Path], Tuple[List[AnnotationStore], List[Path]]]`

`Pecha.get_layer_by_filename() -> Optional[AnnotationStore]`

`Pecha.publish() -> None`

`Pecha.merge_pecha() -> None`

`DocxRootParser.parse() -> Tuple[Pecha, annotation_path]`

`DocxRootParser.extract_anns() -> Tuple[List[BaseAnnotation], str]`

`DocxRootParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]`

`DocxRootParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]`

`DocxSimpleCommentaryParser.parse() -> Tuple[Pecha, annotation_path]`

`DocxSimpleCommentaryParser.extract_anns() -> Tuple[List[BaseAnnotation], str]`

`DocxSimpleCommentaryParser.extract_segmentation_anns() -> Tuple[List[SegmentationAnnotation], str]`

`DocxSimpleCommentaryParser.extract_alignment_anns() -> Tuple[List[AlignmentAnnotation], str]`

`DocxAnnotationParser.add_annotation() -> Tuple[Pecha, annotation_path]`

`DocxAnnotationUpdate.extract_layer_name() -> str`

`DocxAnnotationUpdate.extract_layer_id() -> str`

`DocxAnnotationUpdate.extract_layer_enum() -> AnnotationType`

`DocxAnnotationUpdate.update_annotation() -> Pecha`

`TranslationAlignmentTransfer.is_empty() -> bool`

`TranslationAlignmentTransfer.get_segmentation_ann_path() -> Path`

`TranslationAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]`

`TranslationAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]`

`TranslationAlignmentTransfer.get_translation_pechas_mapping() -> Dict[int, List]`

`TranslationAlignmentTransfer.mapping_to_text_list() -> List[str]`

`TranslationAlignmentTransfer.get_serialized_translation_alignment() -> List[str]`

`TranslationAlignmentTransfer.get_serialized_translation_segmentation() -> List[str]`

`CommentaryAlignmentTransfer.get_first_valid_root_idx() -> int | None`

`CommentaryAlignmentTransfer.is_valid_ann() -> bool`

`CommentaryAlignmentTransfer.get_segmentation_ann_path() -> Path`

`CommentaryAlignmentTransfer.index_annotations_by_root() -> Dict[int, Dict[str, Any]]`

`CommentaryAlignmentTransfer.map_layer_to_layer() -> Dict[int, List[int]]`

`CommentaryAlignmentTransfer.get_root_pechas_mapping() -> Dict[int, List[int]]`

`CommentaryAlignmentTransfer.get_commentary_pechas_mapping() -> Dict[int, List[int]]`

`CommentaryAlignmentTransfer.get_serialized_commentary() -> List[str]`

`CommentaryAlignmentTransfer.get_serialized_commentary_segmentation() -> List[str]`

`CommentaryAlignmentTransfer.format_serialized_commentary() -> str`

`CommentaryAlignmentTransfer.process_commentary_ann() -> str | None`