-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Generative AI Foundations in Python
By :

Now that we have leveraged Langchain to load multiple models and prepared testing data, we are ready to begin applying evaluation metrics. These metrics capture accuracy and alignment with product images and will help us assess how well the models generate product descriptions compared to humans. As discussed, we focused on two categories of metrics, lexical and semantic similarity, which provide a measure of how many of the same words were used and how much semantic information is common to both the human and AI-generated product descriptions.
In the following code block, we apply BLEU
, ROUGE
, and METEOR
to evaluate the lexical similarity between the generated text and the reference text. Each of these has a reference-based assumption. This means that each metric assumes we are comparing against a human reference. We have already set aside our reference descriptions (or gold standard) for a diverse set of products to compare side-by-side with the...