SUTRA Multilingual Question Generator - Streamlit App with LangChain
This is a Streamlit-based SUTRA Multilingual Question Generator application, sourced from the Sutra Cookbook repository. The app leverages the SUTRA LLM model via LangChain to generate questions from uploaded PDF files or manually entered text in over 50 languages, with streaming responses for a dynamic user experience.
Overview
The SUTRA Multilingual Question Generator enables users to upload a PDF or input text, select a language, and generate a list of five relevant questions based on the provided content. The app uses Streamlit for the user interface, LangChain for integrating the SUTRA LLM (sutra-v2
model), and PyPDF2
for PDF text extraction. Questions are streamed in real-time and stored in session state for persistence.
Prerequisites
To run this Streamlit app, ensure you have the following installed:
- Python (v3.8 or higher)
- pip (Python package manager)
- Streamlit (
pip install streamlit
) - LangChain OpenAI (
pip install langchain-openai
) - PyPDF2 (
pip install PyPDF2
) - python-dotenv (
pip install python-dotenv
) - A valid Sutra API key (set in a
.env
file asSUTRA_API_KEY
) - A modern web browser (e.g., Chrome, Firefox)
Setup Instructions
-
Install Dependencies:
pip install streamlit langchain-openai PyPDF2 python-dotenv
-
Set Up Environment Variables: Create a
.env
file in the project directory with the following content:SUTRA_API_KEY=your_sutra_api_key_here
-
Save the Code: Save the provided code in a file named
app.py
. -
Run the Application:
streamlit run app.py
This will start the Streamlit server, and the app will be accessible at
http://localhost:8501
.
Streamlit App Code
Below is the complete Python code for the Streamlit app, implementing the Multilingual Question Generator functionality.
import streamlit as st
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
from langchain.callbacks.base import BaseCallbackHandler
import PyPDF2
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
api_key = os.getenv("SUTRA_API_KEY")
# Page configuration
st.set_page_config(
page_title="Sutra Multilingual Question Generator",
page_icon="❓",
layout="wide"
)
# Define supported languages
languages = [
"English", "Hindi", "Gujarati", "Bengali", "Tamil",
"Telugu", "Kannada", "Malayalam", "Punjabi", "Marathi",
"Urdu", "Assamese", "Odia", "Sanskrit", "Korean",
"Japanese", "Arabic", "French", "German", "Spanish",
"Portuguese", "Russian", "Chinese", "Vietnamese", "Thai",
"Indonesian", "Turkish", "Polish", "Ukrainian", "Dutch",
"Italian", "Greek", "Hebrew", "Persian", "Swedish",
"Norwegian", "Danish", "Finnish", "Czech", "Hungarian",
"Romanian", "Bulgarian", "Croatian", "Serbian", "Slovak",
"Slovenian", "Estonian", "Latvian", "Lithuanian", "Malay",
"Tagalog", "Swahili"
]
# Streaming callback handler
class StreamHandler(BaseCallbackHandler):
def __init__(self, container, initial_text=""):
self.container = container
self.text = initial_text
def on_llm_new_token(self, token: str, **kwargs):
self.text += token
self.container.markdown(self.text)
# Initialize the ChatOpenAI model - base instance for caching
@st.cache_resource
def get_base_chat_model():
return ChatOpenAI(
api_key=os.getenv("SUTRA_API_KEY"),
base_url="https://api.two.ai/v2",
model="sutra-v2",
temperature=0.7,
)
# Create a streaming version of the model with callback handler
def get_streaming_chat_model(callback_handler=None):
return ChatOpenAI(
api_key=os.getenv("SUTRA_API_KEY"),
base_url="https://api.two.ai/v2",
model="sutra-v2",
temperature=0.7,
streaming=True,
callbacks=[callback_handler] if callback_handler else None
)
# Sidebar for input options
st.sidebar.image("https://framerusercontent.com/images/3Ca34Pogzn9I3a7uTsNSlfs9Bdk.png", use_container_width=True)
with st.sidebar:
st.title("❓ Sutra Question Generator")
# Language selector
selected_language = st.selectbox("Select language for questions:", languages)
# File uploader for PDF
uploaded_file = st.file_uploader("Upload a PDF document (optional)", type="pdf")
# Text input for manual text
input_text = st.text_area("Or enter text manually:", height=200)
st.divider()
st.markdown(f"Generating questions in: **{selected_language}**")
# About section
st.markdown("### About Sutra Question Generator")
st.markdown("This app generates questions from uploaded PDFs or text inputs using the Sutra LLM, supporting 50+ languages.")
# Main app title
st.markdown(
f'<h1><img src="https://framerusercontent.com/images/9vH8BcjXKRcC5OrSfkohhSyDgX0.png" width="60"/> Sutra Multilingual Question Generator 🤖</h1>',
unsafe_allow_html=True
)
# Initialize session state for generated questions
if "questions" not in st.session_state:
st.session_state.questions = []
# Process input and generate questions
if st.button("Generate Questions"):
content = ""
# Handle PDF upload
if uploaded_file is not None:
try:
pdf_reader = PyPDF2.PdfReader(uploaded_file)
for page in pdf_reader.pages:
content += page.extract_text() or ""
st.success("PDF processed successfully!")
except Exception as e:
st.error(f"Error processing PDF: {str(e)}")
# Handle manual text input
if input_text:
content = input_text
# Generate questions if content is available
if content:
try:
# Create message placeholder
with st.container():
st.markdown("### Generated Questions")
response_placeholder = st.empty()
# Create a stream handler
stream_handler = StreamHandler(response_placeholder)
# Get streaming model with handler
chat = get_streaming_chat_model(stream_handler)
# Create system message for question generation
system_message = f"You are a helpful assistant. Generate 5 relevant questions based on the provided content in {selected_language}. Format the questions as a numbered list."
# Generate questions
messages = [
HumanMessage(content=f"{system_message}\n\nContent: {content}")
]
response = chat.invoke(messages)
questions = response.content
# Add questions to session state
st.session_state.questions.append({"content": questions})
except Exception as e:
st.error(f"Error: {str(e)}")
if "API key" in str(e):
st.error("Please check your Sutra API key in the environment variables.")
else:
st.error("Please upload a PDF or enter text to generate questions.")
# Display generated questions
for item in st.session_state.questions:
with st.container():
st.markdown(item["content"])
Functionality
- Input Options: Users can upload a PDF file via a sidebar file uploader or enter text manually in a text area.
- Language Selection: A dropdown supports over 50 languages (e.g., English, Hindi, Spanish, Arabic) for question generation.
- Question Generation: The app generates five relevant questions based on the input content, formatted as a numbered list in the selected language, using the SUTRA LLM (
sutra-v2
model). - Streaming Responses: A custom
StreamHandler
class streams the generated questions for a dynamic, real-time display. - Session State: Persists the history of generated questions using
st.session_state
. - Error Handling: Displays errors for invalid API keys, PDF processing issues, or missing input content (e.g., no PDF or text provided).
- Styling: Uses custom HTML for the title with an icon and a logo in the sidebar for branding, consistent with the Sutra Cookbook’s visual identity.
Running the App
- Ensure all dependencies are installed and the
.env
file contains a validSUTRA_API_KEY
. - Save the code as
app.py
. - Run:
streamlit run app.py
- Open
http://localhost:8501
in your browser. - Select a language from the sidebar dropdown (e.g., French).
- Upload a PDF or enter text in the provided text area.
- Click the “Generate Questions” button to receive a list of five questions in the chosen language, displayed in a dedicated section.
Usage Examples
Example 1: PDF Input
- Select “Spanish” from the sidebar.
- Upload a PDF about renewable energy.
- Click “Generate Questions.”
- Receive questions like:
1. ¿Cuáles son las principales fuentes de energía renovable mencionadas en el documento? 2. ¿Cómo contribuye la energía solar a la sostenibilidad? 3. ¿Qué desafíos enfrenta la adopción de energías renovables? 4. ¿Cómo se compara la eficiencia de la energía eólica con otras fuentes? 5. ¿Qué políticas se mencionan para promover la energía renovable?
- Questions are stored in
st.session_state.questions
and displayed below the “Generated Questions” header.
Example 2: Manual Text Input
- Select “Hindi” from the sidebar.
- Enter text in the text area: “जलवायु परिवर्तन वैश्विक पारिस्थितिकी तंत्र को प्रभावित करता है।”
- Click “Generate Questions.”
- Receive questions like:
1. जलवायु परिवर्तन वैश्विक पारिस्थितिकी तंत्र को कैसे प्रभावित करता है? 2. जलवायु परिवर्तन के प्रमुख कारण क्या हैं? 3. जलवायु परिवर्तन को कम करने के लिए क्या उपाय किए जा सकते हैं? 4. जलवायु परिवर्तन का जैव विविधता पर क्या प्रभाव पड़ता है? 5. जलवायु परिवर्तन से निपटने के लिए वैश्विक सहयोग क्यों महत्वपूर्ण है?
- Questions are appended to the session state and displayed in the app.
Notes
- API Key: The app requires a valid Sutra API key, obtainable from Two AI's API service. Ensure it’s correctly set in the
.env
file. - Multilingual Support: The app relies on the SUTRA LLM’s ability to generate questions in the selected language. Verify that the API supports the chosen language for optimal results.
- PDF Processing: Uses
PyPDF2
for text extraction. Complex or scanned PDFs may require OCR tools for better text extraction, which are not included in this app. - Question Generation: The app generates five questions by default, as specified in the system message. Modify the
system_message
to change the number or format of questions. - Streaming: The
StreamHandler
class ensures a smooth, real-time display of questions, enhancing user engagement. - Styling: Minimal custom CSS and HTML are used for branding (e.g., logo and title). Additional styling can be implemented using Streamlit’s
st.markdown
with CSS. - Limitations:
- Requires a stable internet connection for API calls.
- Large PDFs may cause performance issues; consider adjusting text extraction or chunking strategies for optimization.
- The quality of questions depends on the input content’s clarity and the SUTRA LLM’s language proficiency.
- Extensibility:
- Add support for other file formats (e.g.,
.txt
,.docx
) by extending the input processing logic. - Implement question type selection (e.g., factual, analytical) by modifying the system message.
- Enhance the UI with additional styling or interactive elements like question filtering.
- Add support for other file formats (e.g.,