GenAI Use Cases #2: Enhancing Invoice Product Identification with OpenAI Fine-Tuned Model

GenAI Use Cases #2: Enhancing Invoice Product Identification with OpenAI Fine-Tuned Model

Welcome to the second article of our GenAI Use Cases series, where we continue to explore the transformative applications of Generative AI. In this article, we delve into our approach to leveraging Generative AI Fine-Tuning Models to optimize product identification on invoices. This initiative marks a significant stride in our ongoing exploration of the myriad applications of Generative AI across diverse domains.

The Challenge of Invoice Product Identification

The task of identifying products listed on invoices presents a formidable challenge due to the diverse and varied ways suppliers label their products. Each supplier employs distinct conventions for product naming, making direct product identification a formidable task. To overcome this obstacle, a multifaceted approach involving brand identification and filtering, price-based filtering, and fuzzy matching was devised. However, to further optimize the solution and improve efficiency, we integrated the OpenAI Fine-Tuned Model into the existing pipeline.

The OpenAI Fine-Tuning Model Process

The process of fine-tuning the OpenAI model involved several key stages:

1) Data Collection and Annotation:

We collated a dataset comprising pairs of invoice product names and their corresponding entries in the master product list. Additionally, negative data were also included, labelled as "Others" to account for unspecified products.

2) Data Preprocessing:

To prepare the data for fine-tuning, standardization of product names and removal of special characters were necessary. A script was devised to format the data appropriately for the fine-tuning process. The prepared dataset was uploaded to the OpenAI platform for fine-tuning.

import json
import pandas as pd

def prepare_data_for_fine_tuning():
    data = []
    # Load the excel file
    df = pd.read_excel("Sample_Data.xlsx")
    # Prepare the data
    examples = []
    for _, record in df.iterrows():
        examples.append(
            {
                "invoice_product": record["invoice_product"],
                "master_list_product": record["master_list_product"],
            }
        )

    for example in examples:
        data.append(
            {
                "messages": [
                    {
                        "role": "user",
                        "content": f"Map the invoice product '{example['invoice_product']}' to its corresponding master list product:",
                    },
                    {"role": "assistant", "content": example["master_list_product"]},
                ]
            }
        )

    # Save the prepared data to a file
    with open("fine_tuning_data.jsonl", "w") as f:
        for example in data:
            f.write(json.dumps(example) + "\n")


if _name_ == "_main_":
    prepare_data_for_fine_tuning()
import openai

openai.api_key = "OPENAI_PLATFORM_API_KEY"
openai.File.create(file=open("fine_tuning_data.jsonl", "rb"), purpose="fine-tune")

3) Initiating Fine-Tuning:

Once the dataset was uploaded, a fine-tuning job was initiated to create a specialized model based on the supplied data. The duration for generating the fine-tuned model varied based on the input dataset, ranging from 30 minutes to a few hours.

import openai

openai.api_key = "OPENAI_PLATFORM_API_KEY"
print(
    openai.FineTuningJob.create(
        training_file="file-name",
        model="gpt-3.5-turbo",
        suffix="variantmap_test_v1",
    )
)

4) Evaluation:

After the fine-tuned model was ready, it underwent an evaluation process. Testing was conducted within the OpenAI Platform using Playground mode or through API integration to validate the model's accuracy and efficiency in identifying products on invoices.

OpenAI Fine-Tuning vs. NLP-Based Models: Efficiency and Need Comparison

The decision to employ OpenAI Fine-Tuned Models or develop a custom NLP-based model hinges on multiple factors such as the specific use case, available resources, desired customization level, and performance requirements.

In our use case, the necessity for a quick go-to-market solution and a preference for iterative development with frequent updates made fine-tuning OpenAI models the optimal choice. This approach facilitates rapid iterations, outpacing the time and resources required to build and train a custom NLP-based model.

In conclusion, Generative AI Fine-Tuning Models offer immense potential to streamline complex processes like invoice product identification. Our successful integration of OpenAI Fine-Tuned Models showcases the efficacy of this approach, emphasizing the benefits of quicker iterations and enhanced efficiency in addressing real-world challenges.

Exploring Better Approaches

While the process of fine-tuning OpenAI models has proven effective for GoApptiv, continuous improvement and exploration of better approaches are crucial. We are always open to discussions and collaborative efforts to enhance our implementation and achieve even greater efficiency and accuracy in product identification on invoices.

If you have any innovative ideas or suggestions for a more streamlined and efficient approach, we invite you to join the conversation. Together, we can innovate and optimize the use of Generative AI in solving real-world challenges.

Stay tuned for more exciting use cases in our GenAI Use Cases Series, as we explore the transformative potential of Generative AI across diverse industries and applications.