Skip to content

[Feature] Using DSPy with Google Gemini Models unable to read uploaded file's content #8974

@synaptiz

Description

@synaptiz

What feature would you like to see?

Hi team

Thanks for this amazing library.

I am running into an issue and I am not sure if it is a bug or potentially a new feature.

When using the Google Gemini model in DSPy to summarize a file uploaded via the Google genai library, DSPy doesn't seem to have access to the file's content, even though the Google Gemini model is correctly configured in DSPy. The DSPy library can access the Google Gemini service in the cloud, but for some reason, this method doesn't allow access to the file's content.

Here is a simple file Q&A example natively built using Google genai library:

import os
from google import genai

# --- 1. Set up the Google Generative AI client ---

# Initialize the genai client from the environment variable
client = genai.Client()

# --- 2. Run the program ---

# Create a sample text file for testing
sample_file_path = "sample_document.txt"
with open(sample_file_path, "w") as f:
    f.write("The annual company picnic will be held on December 15th at the Central Park Pavilion. "
            "All employees and their families are invited to attend.")

# Upload the file using the genai client
uploaded_file = client.files.upload(file=sample_file_path)
print(f"uploaded_file details: {uploaded_file.mime_type}, {uploaded_file.size_bytes}, {uploaded_file.state}")

# Define the question
my_question = "When is the annual company picnic scheduled?"

response = client.models.generate_content(
    model="gemini-flash-latest", contents=[uploaded_file, my_question]
)

# Print the final answer
print(f"Question: {my_question}")
print(f"Answer: {response.text}")

# Remove the local sample file
os.remove(sample_file_path)

Response:

Question: When is the annual company picnic scheduled?
Answer: The annual company picnic is scheduled for **December 15th**.

Sample example developed using DSPy:

import dspy
from google import genai
import os

# --- 1. Set up the Google Generative AI client and DSPy LM ---

# Initialize the genai client from the environment variable
client = genai.Client()

# Configure DSPy to use the Gemini model
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
gemini_lm = dspy.LM("gemini/gemini-flash-latest", api_key=GOOGLE_API_KEY)
dspy.configure(lm=gemini_lm)


# --- 2. Define the DSPy Signature ---

class AnswerWithFile(dspy.Signature):
    """Answer a question based on the contents of an uploaded file."""
    question = dspy.InputField()
    file = dspy.InputField()
    answer = dspy.OutputField(desc="A concise answer based on the file content.")


# --- 3. Create the DSPy Module ---

class FileQA(dspy.Module):
    def __init__(self):
        super().__init__()
        # Use a Predict module with the AnswerWithFile signature
        self.predict = dspy.Predict(AnswerWithFile)

    def forward(self, question, file_path):
        # Upload the file using the genai client
        uploaded_file = client.files.upload(file=file_path)
        print(f"uploaded_file details: {uploaded_file.mime_type}, {uploaded_file.size_bytes}, {uploaded_file.state}")

        # Call the DSPy Predict module, passing the question and the uploaded file object
        result = self.predict(question=question, file=uploaded_file)

        return result

# --- 4. Run the program ---

# Create a sample text file for testing
sample_file_path = "sample_document.txt"
with open(sample_file_path, "w") as f:
    f.write("The annual company picnic will be held on December 15th at the Central Park Pavilion."
            "All employees and their families are invited to attend.")

# Create an instance of our module
file_qa_program = FileQA()

# Define the question
my_question = "When is the annual company picnic scheduled?"

# Execute the program
response = file_qa_program(question=my_question, file_path=sample_file_path)

# Print the final answer
print(f"Question: {my_question}")
print(f"Answer: {response.answer}")

# Remove the local sample file
os.remove(sample_file_path)

Response:

Question: When is the annual company picnic scheduled?
Answer: The provided file is metadata and does not contain the text content necessary to determine when the annual company picnic is scheduled.

As you can see from the DSPy version, it can't access the uploaded file content. The native Google genai example works without any additional steps, it simply uses the metadata returned by client.files.upload().

Library versions:

dspy                         3.0.3
google-genai                 1.43.0

Steps to reproduce

  • export GOOGLE_API_KEY='***'
  • Install dspy and google-genai packages
  • Run the examples

Question

Are there any known limitations or additional configurations needed to make uploaded files accessible via DSPy? Any tips on working with uploaded files would be greatly appreciated.

Thanks a lot!

Would you like to contribute?

  • Yes, I'd like to help implement this.
  • No, I just want to request it.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions