Automating Gmail with Local AI Agents and Python

Contents

You can automate your Gmail inbox entirely on your own machine by using the Gmail API to feed messages into a private Python-based local LLM for summarization, classification, and draft reply generation. This setup delivers the kind of intelligent email management you’d expect from tools like Google’s Gemini sidebar or Microsoft Copilot for Outlook - but without ever sending a single word of your email content to an external server.

This guide walks through the complete implementation: setting up the Gmail API with minimal OAuth scopes, fetching and parsing raw email data, applying PII masking with Microsoft Presidio before any model sees the content, building a daily summarizer that outputs urgency-ranked digests, constructing a smart draft generator that learns from your sent mail history, and scheduling the whole pipeline with cron. By the end you’ll have a working local email agent you can run on any mid-range Linux or macOS machine with Ollama installed.

Prerequisites: Python 3.11 or later, a Google account, a Google Cloud project (free tier is sufficient), and Ollama running a local model such as mistral-nemo or llama3.1:8b. The full package list is: google-api-python-client, google-auth-oauthlib, presidio-analyzer, presidio-anonymizer, spacy, beautifulsoup4, ollama, sqlite-utils, and chromadb for the vector search component.

Why Local? The Privacy Risk of Cloud Email AI

When you hand a cloud email AI tool access to your Gmail, you are not simply enabling a feature - you are agreeing to a data relationship with meaningful legal and technical implications. Google’s own AI features within Workspace are governed by terms that distinguish between “core services” and “additional services,” and the training data exemptions in those agreements have changed more than once in the past three years. Third-party tools that plug into Gmail - productivity apps, CRM integrations, AI writing assistants - typically request the broadest possible OAuth scope: https://mail.google.com/, which grants full read, compose, send, and delete permissions across your entire mailbox history. That scope is not a technical requirement for summarization. It is a business decision.

The risk is not purely theoretical. Several productivity tools that integrated with Gmail via broad OAuth scopes were acquired by larger companies after 2022, and their data retention policies changed post-acquisition without explicit user notification. In the email automation space, at least two tools with millions of users were found to have retained email content for model fine-tuning purposes under terms that were technically compliant but practically opaque to users. Email contains some of the most sensitive data in a person’s digital life: employment negotiations, medical correspondence, financial statements, legal communications, and personal relationships. The asymmetry between the value of that data to a training pipeline and the benefit the user receives from a slightly smarter subject-line suggestion is worth pausing on.

The OAuth scope problem is the most concrete and correctable part of this risk. When you build your own local agent, you select only the scopes you actually need. For a read-only summarizer, that is https://www.googleapis.com/auth/gmail.readonly. For draft creation, you add https://www.googleapis.com/auth/gmail.compose, which permits creating and updating drafts but explicitly cannot send email or access labels or delete messages. No third-party service ever touches your credential. The refresh token stays in your local system keychain. The email content never leaves your machine. This is the local alternative: a Python script that runs on your own hardware, uses the least-privileged OAuth scopes, and processes everything through an LLM that is also running locally.

Setting Up the Gmail API with Minimal Permissions

Start at the Google Cloud Console . Create a new project - call it something like “local-email-agent” - and navigate to “APIs & Services” > “Library.” Search for “Gmail API” and enable it. This takes about thirty seconds and costs nothing.

Next, go to “APIs & Services” > “Credentials” and click “Create Credentials” > “OAuth client ID.” When prompted for the application type, select Desktop app. This is the correct choice for a locally-running Python script. The “Web application” type requires a redirect URI pointing to an HTTP server, which adds unnecessary complexity and a network listener to a local script. The Desktop app type uses a loopback redirect (http://localhost) that the google-auth-oauthlib library handles automatically. Download the resulting credentials.json file and store it somewhere safe - this file does not contain your actual tokens, only the client ID and secret used to initiate the OAuth flow.

Before creating the OAuth consent screen, configure it in “External” mode (unless you are a Google Workspace admin who wants to restrict it to your org). Add yourself as a test user. Under “Scopes,” add only the two scopes you need: https://www.googleapis.com/auth/gmail.readonly and https://www.googleapis.com/auth/gmail.compose. Resist the temptation to add broader scopes “just in case” - the principle of least privilege applies even when you are granting access to yourself.

The first time you run the authentication flow, a browser window opens, you log in and approve the scopes, and the library writes a token.json file containing your refresh token. Store this token in your operating system’s keychain rather than leaving it as a plain file. On Linux, use keyring (backed by GNOME Keyring or KWallet); on macOS, the system Keychain is available via the same keyring package. Here is the complete authentication helper:

import os
import json
import keyring
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

SCOPES = [
    "https://www.googleapis.com/auth/gmail.readonly",
    "https://www.googleapis.com/auth/gmail.compose",
]
KEYRING_SERVICE = "local-email-agent"
KEYRING_USERNAME = "gmail-token"


def get_credentials(credentials_path: str = "credentials.json") -> Credentials:
    token_json = keyring.get_password(KEYRING_SERVICE, KEYRING_USERNAME)
    creds = None

    if token_json:
        creds = Credentials.from_authorized_user_info(json.loads(token_json), SCOPES)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(credentials_path, SCOPES)
            creds = flow.run_local_server(port=0)

        keyring.set_password(
            KEYRING_SERVICE,
            KEYRING_USERNAME,
            creds.to_json(),
        )

    return creds

Run this once interactively to complete the browser auth flow. On all subsequent runs, the refresh token is retrieved from the keychain and exchanged for a fresh access token automatically, with no browser required.

Local Processing of Sensitive Data

Open WebUI chat interface showing a local LLM conversation with Ollama backend — Open WebUI provides a ChatGPT-like interface for interacting with local models through Ollama

With credentials in hand, you can start fetching mail. The Gmail API’s messages.list endpoint returns message IDs and thread IDs - it does not return message content. You then fetch full message content with messages.get using the format=full parameter. For a daily digest use case, query the last 24 hours of unread messages:

from datetime import datetime, timedelta
from googleapiclient.discovery import build


def fetch_recent_messages(service, hours: int = 24, max_results: int = 100):
    after_timestamp = int((datetime.utcnow() - timedelta(hours=hours)).timestamp())
    query = f"is:unread after:{after_timestamp}"

    result = service.users().messages().list(
        userId="me",
        q=query,
        maxResults=max_results,
    ).execute()

    messages = result.get("messages", [])
    full_messages = []

    for msg_stub in messages:
        msg = service.users().messages().get(
            userId="me",
            id=msg_stub["id"],
            format="full",
        ).execute()
        full_messages.append(msg)

    return full_messages

For inboxes with more than 100 unread messages in 24 hours, implement pagination using the nextPageToken field in the response. The Gmail API imposes a quota of 250 “units” per second per user - a messages.get call costs 5 units, so stay well within the limit by processing messages in batches with a small time.sleep(0.1) between requests.

Parsing raw MIME messages from the Gmail API requires a bit of work because the API returns the body as base64url-encoded data nested within a parts tree. For multipart messages, you need to walk the part tree to find the text/plain part. The HTML fallback should be stripped using BeautifulSoup rather than a regex, because email HTML is too unpredictable for pattern matching:

import base64
from bs4 import BeautifulSoup


def extract_body(payload: dict) -> str:
    mime_type = payload.get("mimeType", "")

    if mime_type == "text/plain":
        data = payload.get("body", {}).get("data", "")
        return base64.urlsafe_b64decode(data + "==").decode("utf-8", errors="replace")

    if mime_type == "text/html":
        data = payload.get("body", {}).get("data", "")
        html = base64.urlsafe_b64decode(data + "==").decode("utf-8", errors="replace")
        return BeautifulSoup(html, "html.parser").get_text(separator="\n")

    if "parts" in payload:
        # Prefer plain text; fall back to HTML
        plain = next(
            (p for p in payload["parts"] if p.get("mimeType") == "text/plain"), None
        )
        if plain:
            return extract_body(plain)
        html_part = next(
            (p for p in payload["parts"] if p.get("mimeType") == "text/html"), None
        )
        if html_part:
            return extract_body(html_part)
        # Recurse into nested multipart
        for part in payload["parts"]:
            result = extract_body(part)
            if result.strip():
                return result

    return ""

Attachments are skipped entirely - we never load binary attachment content into memory for this pipeline.

Before any email body reaches the LLM, it passes through a PII masking layer. Even with a fully local model, this is a good defensive practice: it prevents the LLM from repeating sensitive tokens verbatim in its output (which might then be logged), and it trains you to think about email data as structured information rather than raw text. Microsoft Presidio is the right tool for this. Install it with pip install presidio-analyzer presidio-anonymizer and download the spaCy model with python -m spacy download en_core_web_lg:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

PII_ENTITIES = [
    "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
    "CREDIT_CARD", "US_SSN", "IBAN_CODE", "IP_ADDRESS",
]


def mask_pii(text: str) -> str:
    results = analyzer.analyze(text=text, entities=PII_ENTITIES, language="en")
    anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
    return anonymized.text

After masking, John Smith <john.smith@acmecorp.com> becomes <PERSON> <<EMAIL_ADDRESS>>. The LLM summarizes the redacted text, and the summary never contains the raw PII - only the structural meaning of the email.

For long email threads, chunk the content to fit within the model’s context window. A practical strategy is to include the three most recent messages in a thread in full, and summarize older messages in the thread with a single prior-context sentence. This preserves the conversational context the LLM needs without exceeding a typical 8k–32k token window.

Building a Smart Email Summarizer

The core of the summarizer is a structured prompt that requests JSON output. Using a local model like mistral-nemo or llama3.1:8b through Ollama, you can reliably extract structured data by specifying the output format explicitly and using Ollama’s format="json" parameter:

import ollama
import json


SUMMARIZE_PROMPT = """You are an email assistant. Analyze the following email and respond with a JSON object containing exactly these fields:
- "summary": A 2-3 sentence plain-English summary of the email's content and purpose.
- "action_items": A list of strings, each describing a specific action the recipient needs to take. Empty list if none.
- "urgency": An integer from 1 (low, informational) to 5 (high, requires response today).
- "category": One of: "Task", "FYI", "Social", "Spam", "Receipt", "Newsletter".
- "reply_needed": A boolean indicating whether a reply is expected.

Email (PII redacted):
---
{body}
---

Respond with only the JSON object, no other text."""


def summarize_email(body: str, model: str = "mistral-nemo") -> dict:
    masked_body = mask_pii(body)
    # Truncate to ~3000 words to stay within context
    words = masked_body.split()
    if len(words) > 3000:
        masked_body = " ".join(words[:3000]) + "\n\n[Thread truncated]"

    response = ollama.chat(
        model=model,
        messages=[{"role": "user", "content": SUMMARIZE_PROMPT.format(body=masked_body)}],
        format="json",
    )

    try:
        return json.loads(response["message"]["content"])
    except json.JSONDecodeError:
        return {
            "summary": "Parse error - raw response logged.",
            "action_items": [],
            "urgency": 1,
            "category": "FYI",
            "reply_needed": False,
        }

A complete daily digest run fetches all unread messages from the past 24 hours, summarizes each one, and persists the results to a local SQLite database using sqlite-utils:

import sqlite_utils
from datetime import datetime


def run_daily_digest(credentials_path: str = "credentials.json", db_path: str = "email_agent.db"):
    creds = get_credentials(credentials_path)
    service = build("gmail", "v1", credentials=creds)
    messages = fetch_recent_messages(service, hours=24)

    db = sqlite_utils.Database(db_path)
    if "summaries" not in db.table_names():
        db["summaries"].create({
            "message_id": str,
            "thread_id": str,
            "subject": str,
            "sender": str,
            "received_at": str,
            "summary": str,
            "action_items": str,
            "urgency": int,
            "category": str,
            "reply_needed": int,
            "processed_at": str,
        }, pk="message_id")

    results = []
    for msg in messages:
        message_id = msg["id"]
        # Skip already-processed messages
        if db["summaries"].count_where("message_id = ?", [message_id]) > 0:
            continue

        headers = {h["name"]: h["value"] for h in msg["payload"].get("headers", [])}
        body = extract_body(msg["payload"])
        if not body.strip():
            continue

        analysis = summarize_email(body)

        db["summaries"].insert({
            "message_id": message_id,
            "thread_id": msg["threadId"],
            "subject": headers.get("Subject", "(no subject)"),
            "sender": headers.get("From", "unknown"),
            "received_at": headers.get("Date", ""),
            "summary": analysis.get("summary", ""),
            "action_items": json.dumps(analysis.get("action_items", [])),
            "urgency": analysis.get("urgency", 1),
            "category": analysis.get("category", "FYI"),
            "reply_needed": int(analysis.get("reply_needed", False)),
            "processed_at": datetime.utcnow().isoformat(),
        }, replace=True)

        results.append(analysis)

    return results

The output can be rendered as a Markdown digest file sorted by urgency - useful as a morning briefing you open in any text editor - or piped to Home Assistant via its REST API as a push notification for high-urgency items. Filter for messages with urgency >= 4 and reply_needed = True to build a “must act today” list that stays under five items on most days.

Building a Smart Draft System

The smart draft system adds a second capability: given an incoming email that needs a reply, generate a contextually appropriate draft in your own writing voice and deposit it in Gmail’s Drafts folder for your review. The system never sends anything automatically - that would be a dangerous default. It only creates drafts.

To match your voice, collect a corpus of your sent emails from the past six months. This becomes the reference set for a local vector database that retrieves stylistically similar past replies:

import chromadb
from chromadb.utils import embedding_functions


def build_style_corpus(service, chroma_path: str = "./chroma_db"):
    """Fetch sent mail and index it for vector retrieval."""
    client = chromadb.PersistentClient(path=chroma_path)
    ef = embedding_functions.OllamaEmbeddingFunction(
        model_name="nomic-embed-text",
        url="http://localhost:11434/api/embeddings",
    )
    collection = client.get_or_create_collection("sent_mail", embedding_function=ef)

    # Fetch up to 500 recent sent messages
    result = service.users().messages().list(
        userId="me",
        labelIds=["SENT"],
        maxResults=500,
    ).execute()

    for msg_stub in result.get("messages", []):
        msg = service.users().messages().get(
            userId="me", id=msg_stub["id"], format="full"
        ).execute()
        body = extract_body(msg["payload"])
        if len(body.strip()) < 50:
            continue  # Skip one-liners and auto-replies

        headers = {h["name"]: h["value"] for h in msg["payload"].get("headers", [])}
        collection.add(
            documents=[body[:2000]],
            ids=[msg["id"]],
            metadatas=[{"subject": headers.get("Subject", "")}],
        )

    return collection

Draft generation retrieves the five most similar past replies from the corpus and uses them as few-shot examples in the prompt. This is more reliable than fine-tuning a model on your sent mail for most people - fine-tuning requires careful data preparation and GPU time, while few-shot prompting with retrieval gives good results with no training overhead:

def generate_draft_reply(
    incoming_email: str,
    subject: str,
    style_collection,
    model: str = "mistral-nemo",
) -> str:
    masked_incoming = mask_pii(incoming_email)

    # Retrieve similar past replies for style reference
    similar = style_collection.query(
        query_texts=[masked_incoming[:500]],
        n_results=5,
    )
    style_examples = "\n\n---\n\n".join(
        doc[:500] for doc in similar["documents"][0]
    )

    prompt = f"""You are drafting an email reply. Study the writing style of the examples below (these are real replies written by the user), then write a reply to the incoming email in the same voice and tone.

STYLE EXAMPLES (user's past replies):
{style_examples}

INCOMING EMAIL (PII redacted):
Subject: {subject}
---
{masked_incoming[:2000]}
---

Write a natural, professional reply in the user's voice. Do not include a subject line. Start directly with the greeting."""

    response = ollama.chat(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )
    return response["message"]["content"]

Once you have the draft text, create it in Gmail using the users.drafts.create method. The draft must be formatted as a MIME message encoded in base64:

import email.mime.text
import email.mime.multipart
import base64


def create_gmail_draft(service, thread_id: str, to: str, subject: str, body: str):
    message = email.mime.text.MIMEText(body)
    message["to"] = to
    message["subject"] = f"Re: {subject}" if not subject.startswith("Re:") else subject

    raw = base64.urlsafe_b64encode(message.as_bytes()).decode("utf-8")

    draft = service.users().drafts().create(
        userId="me",
        body={"message": {"raw": raw, "threadId": thread_id}},
    ).execute()

    return draft["id"]

The draft appears in your Gmail Drafts folder, attributed to your account, ready for you to review, edit, and send - or discard. The LLM output is a starting point, not a finished product.

Scheduling and Automation

A script you have to invoke manually provides convenience but not automation. The simplest production-ready scheduling approach on Linux and macOS is cron. Edit your crontab with crontab -e and add:

# Run email digest every 30 minutes, log output
*/30 * * * * /usr/bin/python3 /home/youruser/email-agent/main.py digest >> /var/log/email-agent.log 2>&1

# Rebuild sent-mail style corpus once daily at 3am
0 3 * * * /usr/bin/python3 /home/youruser/email-agent/main.py rebuild-corpus >> /var/log/email-agent.log 2>&1

Use the full path to your Python interpreter (find it with which python3) and the full path to your script. On macOS, prefer launchd plist files over cron for more reliable scheduling after sleep/wake cycles - but cron works fine for always-on machines. On Windows, use Task Scheduler with the “Run whether user is logged on or not” option and store credentials using the Windows Credential Manager.

For Home Assistant integration, the digest script can POST high-urgency summaries to HA’s REST notification API. After enabling the API in HA with a long-lived access token, a single requests.post call delivers the summary as a push notification to your phone:

import requests


def notify_home_assistant(summary_text: str, ha_url: str, ha_token: str):
    requests.post(
        f"{ha_url}/api/services/notify/mobile_app_your_phone",
        headers={"Authorization": f"Bearer {ha_token}"},
        json={"message": summary_text, "title": "Email Digest - Urgent"},
        timeout=10,
    )

Error handling deserves explicit attention in a scheduled script because failures are silent. Wrap each external call - Gmail API, Ollama, ChromaDB - in a try/except block and log failures with enough context to diagnose them later. The most common failure modes are: the Gmail API returning a 429 (rate limit exceeded, back off and retry after the Retry-After header value), Ollama returning a malformed JSON string for the structured summarization prompt (catch json.JSONDecodeError and log the raw response), and the OAuth token expiring or being revoked (the google.auth.exceptions.RefreshError exception, which requires re-running the interactive auth flow).

import logging
import time

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
    handlers=[logging.StreamHandler()],
)
logger = logging.getLogger(__name__)


def safe_summarize(body: str, retries: int = 2) -> dict:
    for attempt in range(retries + 1):
        try:
            return summarize_email(body)
        except Exception as e:
            logger.warning(f"Summarization attempt {attempt + 1} failed: {e}")
            if attempt < retries:
                time.sleep(2 ** attempt)
    return {"summary": "Failed after retries", "urgency": 1, "category": "FYI",
            "action_items": [], "reply_needed": False}

On the resource side: for a typical inbox receiving 50 emails per day, a single digest run processes roughly 50 summarization requests. Using mistral-nemo (12B parameters, Q4 quantization) on an RTX 4070, each summarization takes approximately 3–5 seconds, putting the total digest processing time at 3–4 minutes. The model loads once and stays resident in VRAM between requests when using Ollama’s keep-alive feature (OLLAMA_KEEP_ALIVE=30m). On CPU-only hardware (a modern laptop with 16GB RAM), expect 20–30 seconds per email using a smaller model like phi3:mini - still acceptable for a background batch process that runs every 30 minutes.

System Architecture in Full

The complete system has five layers. At the bottom is the credential layer: credentials.json and the system keychain holding the OAuth refresh token. Above that is the data ingestion layer: the Gmail API client that fetches messages using messages.list and messages.get, handles pagination, and respects rate limits. Next is the preprocessing layer: the MIME parser that extracts plain-text bodies and the Presidio pipeline that masks PII before any data touches the model. Above that is the intelligence layer: Ollama running a local quantized LLM that receives sanitized text and returns structured JSON (for summarization) or free-form draft text (for reply generation), along with the ChromaDB vector store holding your sent-mail style corpus. Finally, at the top is the output layer: a SQLite database storing all processed summaries, a Markdown digest renderer, optional Home Assistant notifications for high-urgency items, and the Gmail Drafts API endpoint where generated replies are deposited.

Data flows in one direction through these layers. Email content enters at the Gmail API, is processed locally at each layer, and produces outputs (summaries, drafts, notifications) that stay on your machine or go back to your Gmail account as draft-only content. Nothing is sent to any external AI API. The only network calls the system makes are to Google’s servers for email retrieval and to your local Ollama instance (loopback, 127.0.0.1:11434) for inference.

This architecture is extensible. Adding an unsubscribe action is straightforward: after classifying an email as “Newsletter” with high confidence, call users.messages.modify to apply an “ARCHIVE” label and remove it from the inbox. Adding Outlook/Exchange support requires replacing the Gmail API client with an IMAP connection using Python’s imaplib - the preprocessing, intelligence, and output layers remain unchanged because they operate on plain text regardless of the source. The local-first design means each new capability you add inherits the privacy properties of the rest of the system automatically.

The investment to set this up - roughly two to three hours for the initial configuration - pays for itself quickly if you receive high email volume and have previously relied on cloud AI tools to manage it. More importantly, it gives you a clear-eyed view of what those tools were doing with your data, and a working alternative that requires no ongoing trust in a third party’s privacy policy.