Flask Chat UI with a Non-Chat HF Model

1. What We're Building

⚡ Beginner ⏱ ~3 min

In Tutorial 1 we built the conversation wrapper in pure Python — useful for scripts, but not usable by real users. Now we add the two missing layers: a Flask REST API that exposes the wrapper over HTTP, and a browser UI that talks to that API.

Hello! I'm your AI assistant. What would you like to explore today?

What is supply chain disruption?

Supply chain disruption refers to unexpected events that interrupt the normal flow of goods from suppliers to consumers — such as natural disasters, port congestion, or geopolitical events.

Give me a real-world example.

The 2021 Suez Canal blockage is a classic example — a single grounded ship halted ~$9.6 billion in trade per day for six days, causing ripple effects across global supply chains.

Ask a follow-up question...

↑ The chat UI you will have built by the end of this tutorial.

The complete architecture looks like this:

🏗 Full System Architecture

Browser UI — HTML/CSS/JS chat window, sends fetch() POST requests
Flask API — receives messages, manages session memory, calls HF
Prompt Builder — from Tutorial 1, serialises history into one string
HuggingFace Inference API — the actual LLM endpoint
Flask Session — stores per-user conversation in a signed cookie

2. Project Structure

⚡ Beginner ⏱ ~2 min

Keep everything in one folder. Flask's built-in template and static file serving makes this self-contained without any bundler or build step:

flask-llm-chat/
  ├── app.py                # Flask server + LLM wrapper
  ├── .env                  # secrets — never commit this
  ├── requirements.txt      # pip dependencies
  └── templates/
        └── index.html       # the chat UI (Flask serves this)

Why templates/? Flask automatically looks for HTML files in a folder called templates/. When you call render_template("index.html"), Flask finds and serves it — no manual file path needed.

3. Installing Dependencies

⚡ Beginner ⏱ ~2 min

Create requirements.txt with exactly these packages — nothing more is needed:

text

flask==3.0.3          # web framework
requests==2.32.3      # HTTP calls to HuggingFace API
python-dotenv==1.0.1  # loads .env file into os.environ

Install Command

pip install -r requirements.txt

Now create your .env file. This keeps your token out of source code:

bash

HF_TOKEN=hf_your_actual_token_here         # HuggingFace API token
HF_MODEL_URL=https://api-inference.huggingface.co/models/YOUR_MODEL
FLASK_SECRET_KEY=some-long-random-string   # used to sign session cookies

Important

Add .env to your .gitignore — never push API tokens to GitHub.

4. The Flask Backend — app.py

⚡ Intermediate ⏱ ~6 min

This is the core of the system. The file has four responsibilities: load config, define the conversation logic, expose two routes, and handle errors gracefully.

python

import os
import requests
from flask import Flask, request, jsonify, session, render_template
from dotenv import load_dotenv

load_dotenv()  # reads .env into os.environ

app = Flask(__name__)
app.secret_key = os.getenv("FLASK_SECRET_KEY")  # required to use Flask sessions

HF_URL     = os.getenv("HF_MODEL_URL")
HF_HEADERS = {"Authorization": f"Bearer {os.getenv('HF_TOKEN')}"}

MAX_TURNS  = 10  # sliding window — keep only recent history
SYSTEM     = (
    "You are a helpful, clear, and concise educational assistant. "
    "Answer accurately. Never provide harmful information."
)


# ─── PROMPT BUILDER ──────────────────────────────────────────────
def build_prompt(history):
    """Serialise conversation history → single prompt string."""
    prompt = SYSTEM + "\n\n"
    for turn in history[-MAX_TURNS:]:                # sliding window applied here
        role = turn["role"].capitalize()
        prompt += f"### {role}:\n{turn['content']}\n\n"
    prompt += "### Assistant:\n"                     # open for model to complete
    return prompt


# ─── ROUTES ──────────────────────────────────────────────────────
@app.route("/")
def index():
    session.setdefault("history", [])               # init history if new session
    return render_template("index.html")


@app.route("/chat", methods=["POST"])
def chat():
    data       = request.get_json()
    user_input = data.get("message", "").strip()

    if not user_input:
        return jsonify({"error": "Empty message"}), 400

    history = session.get("history", [])
    history.append({"role": "user", "content": user_input})  # add user turn

    prompt = build_prompt(history)

    try:
        resp = requests.post(
            HF_URL, headers=HF_HEADERS,
            json={"inputs": prompt, "parameters": {"max_new_tokens": 200}},
            timeout=30                               # don't hang forever on slow models
        )
        resp.raise_for_status()
        raw    = resp.json()[0]["generated_text"]
        reply  = raw.split("### Assistant:")[-1].strip()  # extract only assistant reply
        reply  = reply.split("### User:")[0].strip()      # stop if model hallucinates next turn

    except Exception as e:
        return jsonify({"error": str(e)}), 500

    history.append({"role": "assistant", "content": reply})  # add assistant turn
    session["history"] = history                             # persist back to session
    session.modified = True                                  # tell Flask the session changed

    return jsonify({"reply": reply})


@app.route("/reset", methods=["POST"])
def reset():
    session["history"] = []                         # wipe conversation for this user
    return jsonify({"status": "reset"})


if __name__ == "__main__":
    app.run(debug=True, port=5000)

What This File Does

Loads config → builds prompts → exposes /chat POST endpoint → stores history in session cookie → returns JSON reply

Method	Route	What It Does	Returns
GET	`/`	Serves the chat UI HTML page. Initialises an empty history in the session if not present.	HTML page
POST	`/chat`	Receives `{"message": "..."}`, builds prompt, calls HF, parses reply, updates session, returns reply.	`{"reply": "..."}`
POST	`/reset`	Clears this user's conversation history from the session. The user starts fresh.	`{"status": "reset"}`

The prompt builder in this Flask version uses the ### delimiter format from Tutorial 1's production template — reducing role confusion in the model's output:

python

def build_prompt(history):
    prompt = SYSTEM + "\n\n"                          # always prepend system instruction
    for turn in history[-MAX_TURNS:]:                 # sliding window: last 10 turns only
        role = turn["role"].capitalize()
        prompt += f"### {role}:\n{turn['content']}\n\n"  # strong delimiters per role
    prompt += "### Assistant:\n"                      # cue for model to continue
    return prompt

Serialised Output Example

You are a helpful, clear, and concise educational assistant...

### User:
What is supply chain disruption?

### Assistant:
Supply chain disruption refers to...

### User:
Give me a real-world example.

### Assistant:

5. Session-Based Memory

⚡ Intermediate ⏱ ~4 min

In Tutorial 1 we stored the conversation list in a plain Python variable. That works for a single user script — but a Flask server handles many users at once. A shared Python variable would mix up everyone's conversations.

Flask's session solves this. It stores data in a signed cookie sent to the browser — each user carries their own history.

1

User sends a message (POST /chat)

Browser sends {"message": "What is inflation?"} plus its session cookie.

2

Flask reads the session

history = session.get("history", []) retrieves this user's conversation list — not anyone else's.

3

History is updated and saved

After getting the reply, the new turns are appended and session["history"] = history writes them back into the cookie.

4

Flask sends back the signed cookie

The updated session cookie is included in the HTTP response. The browser stores it and sends it with every future request.

⚠️ Cookie Size Limit: Browser cookies are capped at ~4KB. Long conversations stored in a cookie will exceed this. For production use, replace session with a server-side store like Redis or a database, keyed by a session ID.

python

history = session.get("history", [])     # read this user's history (or empty list)
history.append({"role": "user", "content": user_input})

# ... build prompt, call API, parse reply ...

history.append({"role": "assistant", "content": reply})
session["history"] = history             # write back — Flask signs and sends as cookie
session.modified = True                  # required when mutating mutable objects

Why session.modified = True?

Flask only auto-detects session changes when you replace the value entirely.
If you mutate a list in-place (append), you must set session.modified = True manually — otherwise Flask won't re-send the updated cookie.

6. The Chat UI — templates/index.html

⚡ Intermediate ⏱ ~5 min

The frontend is a single HTML file with embedded CSS and JavaScript. No React, no Vue, no build step. It sends a fetch() POST to /chat on every message and renders the reply.

html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>aplly Chat</title>
  <style>
    * { box-sizing: border-box; margin: 0; padding: 0; }
    body { font-family: sans-serif; background: #F3F4F6; display: flex;
           flex-direction: column; height: 100vh; }

    #topbar { background: #5B21B6; color: #fff; padding: .8rem 1.2rem;
              font-weight: 700; font-size: 1rem; }

    #messages { flex: 1; overflow-y: auto; padding: 1rem; display: flex;
                flex-direction: column; gap: .7rem; }

    .msg { max-width: 72%; padding: .6rem 1rem; border-radius: 12px;
           font-size: .92rem; line-height: 1.55; }
    .user { background: #5B21B6; color: #fff; align-self: flex-end;
            border-bottom-right-radius: 3px; }
    .bot  { background: #fff; color: #1F1F2E; align-self: flex-start;
            border: 1px solid #E5E7EB; border-bottom-left-radius: 3px; }
    .thinking { color: #6B7280; font-style: italic; }  # loading state

    #input-bar { display: flex; padding: .8rem; gap: .6rem;
                 background: #fff; border-top: 1px solid #E5E7EB; }
    #msg-input { flex: 1; padding: .55rem .9rem; border: 1.5px solid #E5E7EB;
                 border-radius: 8px; font-size: .92rem; outline: none; }
    #msg-input:focus { border-color: #7C3AED; }
    #send-btn { background: #5B21B6; color: #fff; border: none; padding: .55rem 1.2rem;
                border-radius: 8px; font-weight: 700; cursor: pointer; }
    #send-btn:hover { background: #7C3AED; }
    #reset-btn { background: #F3F4F6; border: 1px solid #E5E7EB; padding: .55rem .9rem;
                 border-radius: 8px; cursor: pointer; font-size: .82rem; color: #6B7280; }
    #reset-btn:hover { background: #EDE9FE; color: #5B21B6; }
  </style>
</head>
<body>

<div id="topbar">🤖 aplly Assistant</div>

<div id="messages">
  <div class="msg bot">Hello! Ask me anything.</div>   <!-- initial greeting -->
</div>

<div id="input-bar">
  <input id="msg-input" type="text" placeholder="Type a message..." />
  <button id="reset-btn" onclick="resetChat()">New Chat</button>
  <button id="send-btn"  onclick="sendMessage()">Send</button>
</div>

<script>
  const msgBox   = document.getElementById("messages");
  const msgInput = document.getElementById("msg-input");

  // Send on Enter key
  msgInput.addEventListener("keydown", e => {
    if (e.key === "Enter") sendMessage();
  });

  function appendMsg(text, cls) {
    const div = document.createElement("div");
    div.className = "msg " + cls;
    div.textContent = text;
    msgBox.appendChild(div);
    msgBox.scrollTop = msgBox.scrollHeight;   // auto-scroll to latest
    return div;
  }

  async function sendMessage() {
    const text = msgInput.value.trim();
    if (!text) return;

    appendMsg(text, "user");                  // show user message immediately
    msgInput.value = "";

    const thinking = appendMsg("Thinking...", "bot thinking");  // loading indicator

    try {
      const res  = await fetch("/chat", {
        method:  "POST",
        headers: { "Content-Type": "application/json" },
        body:    JSON.stringify({ message: text })
      });
      const data = await res.json();
      thinking.remove();                      // remove loading bubble
      appendMsg(data.reply || data.error, "bot");
    } catch (err) {
      thinking.remove();
      appendMsg("Connection error. Is the server running?", "bot");
    }
  }

  async function resetChat() {
    await fetch("/reset", { method: "POST" });
    msgBox.innerHTML = '<div class="msg bot">Chat reset. Start fresh!</div>';
  }
</script>
</body>
</html>

Key Patterns

appendMsg() — reusable bubble creator for both user and bot messages
"Thinking..." bubble — shows while awaiting API, removed on response
scrollTop = scrollHeight — always auto-scrolls to the latest message
Enter key listener — sends without clicking the button

7. Wiring Frontend to Backend

⚡ Intermediate ⏱ ~3 min

Let's trace one complete message through the entire system so every piece is clear:

1

User types & hits Send

sendMessage() is called. The text is appended to the UI as a user bubble and the input is cleared.

2

fetch() POSTs to /chat

The browser sends POST /chat with body {"message": "your text"} and the session cookie automatically attached.

3

Flask reads session, builds prompt

The server retrieves this user's history, appends the new message, and calls build_prompt() to serialise everything.

4

HuggingFace API returns a completion

The full serialised prompt is sent to the HF model. The model returns the entire prompt text plus its new reply appended at the end.

5

Flask parses & stores the reply

The reply is extracted by splitting on ### Assistant:. Both new turns are appended to history and saved back to the session.

6

JSON response reaches the browser

{"reply": "..."} arrives. The "Thinking..." bubble is removed and the actual reply is rendered as a bot message bubble.

8. Running & Testing

⚡ Beginner ⏱ ~3 min

Start the server and open the browser:

bash

cd flask-llm-chat
python app.py

Expected Terminal Output

 * Running on http://127.0.0.1:5000
 * Debug mode: on

Open http://127.0.0.1:5000 in your browser. You can also test the API directly with curl:

bash

curl -X POST http://127.0.0.1:5000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is inflation?"}'  # test without the browser UI

Expected Response

{"reply": "Inflation is the rate at which the general level of prices for goods and services rises..."}

Testing the reset: Send two messages, note that the second has context of the first. Then POST to /reset and send another — confirm it no longer references the earlier exchange.

9. Common Bugs & Fixes

⚡ Intermediate ⏱ ~4 min

These are the three bugs you'll almost certainly hit the first time you run this setup:

Bug 1 — session.modified not set

# BUG: history updates vanish after one turn
history = session.get("history", [])
history.append({"role": "assistant", "content": reply})
session["history"] = history
# Missing: session.modified = True  ← Flask won't re-send cookie without this

# FIX:
session["history"] = history
session.modified = True             ← always add this when mutating mutable objects

Symptom

Model has no memory of previous turns — every reply seems to come from a fresh session.

Bug 2 — Model echoes entire prompt

# BUG: reply contains the full history, not just the new answer
raw   = resp.json()[0]["generated_text"]
reply = raw  ← raw includes the entire prompt + completion

# FIX: split on the last assistant delimiter
reply = raw.split("### Assistant:")[-1].strip()   ← take only what comes after the last marker
reply = reply.split("### User:")[0].strip()        ← also stop if model hallucinates next user turn

Symptom

The reply bubble contains the full serialised conversation history, not just the new answer.

Bug 3 — CORS error (future: separate frontend)

# BUG: browser blocks fetch() if frontend is on a different origin
# (e.g., frontend on port 3000, Flask on port 5000)

# FIX: install flask-cors and enable it
pip install flask-cors

from flask_cors import CORS
app = Flask(__name__)
CORS(app, supports_credentials=True)  ← supports_credentials needed for session cookies

Symptom

Browser console: "Access to fetch at 'http://127.0.0.1:5000/chat' from origin 'http://localhost:3000' has been blocked by CORS policy"

⚡ Key Takeaways

Always set session.modified = True after mutating a mutable session value
Always split on your role delimiter to extract only the new reply from the model's output
For separate frontend/backend deployments, add flask-cors with supports_credentials=True
Set a timeout on all HuggingFace requests — slow models will block your server forever without one

10. Concept Flashcards

⚡ Beginner ⏱ ~3 min

Click each card to reveal the explanation. Use the arrows to navigate.

👆 Click a card to flip it

Flask session

A signed cookie stored in the browser that holds per-user data (like conversation history). Each user gets their own isolated session — preventing conversations from mixing across users.

1 / 7

11. Knowledge Check Quiz

⚡ Intermediate ⏱ ~5 min

Questions focus on practical code behaviour — what happens, why it breaks, and what the fix is.

Q1. You append a new turn to the history list and call session["history"] = history, but on the next request the history is empty again. What's most likely missing?

Q2. The model's reply in the JSON response contains the full serialised prompt plus the new answer. Which line of code extracts only the new assistant reply?

Q3. Two users are chatting simultaneously. User A's messages start appearing in User B's conversation. What caused this?

Q4. The user types a message and hits Send. The "Thinking..." bubble appears but never goes away. What is most likely wrong?

Q5. You deploy the Flask API on port 5000 and the frontend separately on port 3000. The browser throws a CORS error when fetch() is called. What is the minimal fix in app.py?

Q6. Predict what happens: the user sends 15 messages in a row, but MAX_TURNS = 10. What does the model receive on the 15th call?

0 / 6

Building a Flask Chat UI with a Non-Chat HF Model

🎯 What You'll Learn

📋 Before You Begin

1. What We're Building

🏗 Full System Architecture

2. Project Structure

3. Installing Dependencies

4. The Flask Backend — app.py

5. Session-Based Memory

User sends a message (POST /chat)

Flask reads the session

History is updated and saved

Flask sends back the signed cookie

6. The Chat UI — templates/index.html

7. Wiring Frontend to Backend

User types & hits Send

fetch() POSTs to /chat

Flask reads session, builds prompt

HuggingFace API returns a completion

Flask parses & stores the reply

JSON response reaches the browser

8. Running & Testing

9. Common Bugs & Fixes

⚡ Key Takeaways

10. Concept Flashcards

11. Knowledge Check Quiz