Struct AnnotatedText¶
Defined in File annotation.h
Struct Documentation¶
-
struct
AnnotatedText
¶ AnnotatedText is effectively std::string text + Annotation, providing the following additional desiderata.
Access to processed string_views for convenience rather than ByteRanges (which only provides index information).
Transparently convert string_views into ByteRanges for the Annotation referring to the text bound by this structure.
Bind the text and annotations together, to move around as a meaningful unit.
Public Functions
-
AnnotatedText
()¶ Construct an empty AnnotatedText.
This is useful when the target string or ByteRanges are not known yet, but the public members can be used to populate it. One use-case, when translated-text is created decoding from histories and the ByteRanges only known after the string has been constructed.
-
AnnotatedText
(std::string &&text)¶ Construct moving in a string (for efficiency purposes, copying string constructor is disallowed).
-
void
appendSentence
(string_view prefix, std::vector<string_view>::iterator tokens_begin, std::vector<string_view>::iterator tokens_end)¶ Appends a sentence to the existing text and transparently rebases string_views.
Since this tracks only prefix, remember appendEndingWhitespace. The string_views must not already be in text.
-
void
appendEndingWhitespace
(string_view whitespace)¶ Append the whitespace at the end of input.
string_view must not be in text.
-
void
recordExistingSentence
(std::vector<string_view>::iterator tokens_begin, std::vector<string_view>::iterator tokens_end, const char *sentence_begin)¶ Record the existence of a sentence that is already in text.
The iterators are over string_views for each token that must be in text already. This function must be called to record sentences in order. Normally the beginning of the sentence can be inferred from tokens_begin->data() but the tokens could be empty, so sentence_begin is required to know where the sentence is.
-
const size_t
numSentences
() const¶ Returns the number of sentences in the annotation structure.
-
const size_t
numWords
(size_t sentenceIdx) const¶ Returns number of words in the sentece identified by sentenceIdx.
-
string_view
word
(size_t sentenceIdx, size_t wordIdx) const¶ Returns a string_view representing wordIdx in sentenceIdx.
-
string_view
sentence
(size_t sentenceIdx) const¶ Returns a string_view representing sentence corresponding to sentenceIdx.
-
string_view
gap
(size_t sentenceIdx) const¶ Returns the string_view of the gap between two sentences in the container.
More precisely where
i = sentenceIdx, N = numSentences()
for brevity:For
i = 0
: The gap between the start of text and the 0th sentence.For
i = 1...N-1
, returns the text comprising of the gap between thei
-th andi+1
-th sentence.For
i = N
, the gap between the last (N-1th) sentence and end of text.- Parameters
sentenceIdx
: Can be between[0, numSentences()]
.
-
ByteRange
wordAsByteRange
(size_t sentenceIdx, size_t wordIdx) const¶ Returns a ByteRange representing wordIdx in sentenceIdx.
-
ByteRange
sentenceAsByteRange
(size_t sentenceIdx) const¶ Returns a ByteRange representing sentence corresponding to sentenceIdx.
-
template<typename
Fun
>
AnnotatedTextapply
(Fun fun) const¶ Utility function to call
fun
on each word (subword token effectively) in anAnnotatedText
.fun
is called with theByteRange
, thestring_view
with the word, and abool
to indicate whether it is the last word in theAnnotatedText
, which is also the ending whitespace slot of AnnotatedText.
Public Members
-
std::string
text
¶ Blob of string elements in annotation refers to.
-
Annotation
annotation
¶ sentence and (sub-) word annotations.