Automating Gmail with Local AI Agents and Python

Contents

You can automate your Gmail inbox on your own machine. The Gmail API feeds messages into a private Python script. A local LLM then handles summaries, sorting, and draft replies. You get the smart inbox features that tools like Google’s Gemini sidebar or Microsoft Copilot for Outlook offer. None of your email content ever leaves your computer.

This guide walks through the full build. You’ll set up the Gmail API with minimal OAuth scopes. You’ll fetch and parse raw email data, then mask any PII with Microsoft Presidio before the model sees it. You’ll build a daily summarizer that ranks mail by urgency. You’ll also build a smart draft writer that learns from your sent mail, and you’ll wire the whole pipeline up with cron. By the end, you’ll have a working local email agent that runs on any mid-range Linux or macOS box with Ollama installed.

You’ll need Python 3.11 or later, a Google account, a Google Cloud project (the free tier is fine), and Ollama running a local model like mistral-nemo or llama3.1:8b. The full package list: google-api-python-client, google-auth-oauthlib, presidio-analyzer, presidio-anonymizer, spacy, beautifulsoup4, ollama, sqlite-utils, and chromadb for the vector search piece.

Why Local? The Privacy Risk of Cloud Email AI

When you give a cloud email AI tool access to your Gmail, you aren’t just turning on a feature. You’re entering a data deal with real legal and technical fallout. Google’s own AI features inside Workspace are ruled by terms that split “core services” from “additional services.” The training data carve-outs in those deals have shifted more than once in the past three years. Third-party tools that plug into Gmail (productivity apps, CRM add-ons, AI writing helpers) tend to ask for the broadest OAuth scope: https://mail.google.com/. That scope grants full read, compose, send, and delete rights across your entire mailbox history. It’s not a technical need for summaries. It’s a business choice.

The risk isn’t just theory. Several productivity tools that plugged into Gmail via wide OAuth scopes were bought by bigger firms after 2022. Their data retention rules then shifted without telling users. In the email automation space, at least two tools with millions of users were found to have kept email content for model fine-tuning. The terms were legal but opaque in practice. Email holds some of the most sensitive data in a person’s digital life: pay talks, medical notes, bank statements, legal threads, and personal ties. Think about the trade. Your data has huge value to a training pipeline. You get a slightly smarter subject-line guess in return.

The OAuth scope problem is the most concrete piece of this risk, and the easiest to fix. When you build your own local agent, you pick only the scopes you actually need. For a read-only summarizer, that’s https://www.googleapis.com/auth/gmail.readonly. For draft creation, you add https://www.googleapis.com/auth/gmail.compose. That second scope lets you create and update drafts. It can’t send email, touch labels, or delete messages. No third-party service ever sees your credential. The refresh token stays in your local system keychain. Email content never leaves your machine. The local path is a Python script running on your own hardware, using least-privilege OAuth scopes, and feeding everything through an LLM that also runs locally.

Setting Up the Gmail API with Minimal Permissions

Start at the Google Cloud Console . Create a new project (call it something like “local-email-agent”), then go to “APIs & Services” > “Library.” Search for “Gmail API” and turn it on. This takes about thirty seconds and costs nothing.

Next, head to “APIs & Services” > “Credentials” and click “Create Credentials” > “OAuth client ID.” When asked for the app type, pick Desktop app. That’s the right choice for a local Python script. The “Web application” type needs a redirect URI pointing to an HTTP server. That adds extra moving parts and a network listener to a local script. The Desktop app type uses a loopback redirect (http://localhost) that the google-auth-oauthlib library handles for you. Download the credentials.json file and store it somewhere safe. That file doesn’t hold your tokens, only the client ID and secret used to kick off the OAuth flow.

Before you create the OAuth consent screen, set it to “External” mode. The one exception is if you’re a Google Workspace admin who wants to lock it to your org. Add yourself as a test user. Under “Scopes,” add only the two scopes you need: https://www.googleapis.com/auth/gmail.readonly and https://www.googleapis.com/auth/gmail.compose. Don’t add broader scopes “just in case.” Least privilege applies even when you’re granting access to yourself.

The first time you run the auth flow, a browser window opens. You log in and approve the scopes. The library then writes a token.json file with your refresh token. Store this token in your OS keychain. Don’t leave it as a plain file. On Linux, use keyring (backed by GNOME Keyring or KWallet). On macOS, the system Keychain works via the same keyring package. Here’s the full auth helper:

import os
import json
import keyring
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

SCOPES = [
    "https://www.googleapis.com/auth/gmail.readonly",
    "https://www.googleapis.com/auth/gmail.compose",
]
KEYRING_SERVICE = "local-email-agent"
KEYRING_USERNAME = "gmail-token"


def get_credentials(credentials_path: str = "credentials.json") -> Credentials:
    token_json = keyring.get_password(KEYRING_SERVICE, KEYRING_USERNAME)
    creds = None

    if token_json:
        creds = Credentials.from_authorized_user_info(json.loads(token_json), SCOPES)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(credentials_path, SCOPES)
            creds = flow.run_local_server(port=0)

        keyring.set_password(
            KEYRING_SERVICE,
            KEYRING_USERNAME,
            creds.to_json(),
        )

    return creds

Run this once to finish the browser auth flow. On every later run, the script pulls the refresh token from the keychain and swaps it for a fresh access token. No browser needed.

Local Processing of Sensitive Data

Open WebUI chat interface showing a local LLM conversation with Ollama backend — Open WebUI provides a ChatGPT-like interface for interacting with local models through Ollama

With credentials in hand, you can start fetching mail. The Gmail API’s messages.list endpoint returns message IDs and thread IDs. It doesn’t return message bodies. You then fetch full content with messages.get using the format=full parameter. For a daily digest, query the last 24 hours of unread messages:

from datetime import datetime, timedelta
from googleapiclient.discovery import build


def fetch_recent_messages(service, hours: int = 24, max_results: int = 100):
    after_timestamp = int((datetime.utcnow() - timedelta(hours=hours)).timestamp())
    query = f"is:unread after:{after_timestamp}"

    result = service.users().messages().list(
        userId="me",
        q=query,
        maxResults=max_results,
    ).execute()

    messages = result.get("messages", [])
    full_messages = []

    for msg_stub in messages:
        msg = service.users().messages().get(
            userId="me",
            id=msg_stub["id"],
            format="full",
        ).execute()
        full_messages.append(msg)

    return full_messages

For inboxes with more than 100 unread messages in 24 hours, set up pagination using the nextPageToken field. The Gmail API limits you to 250 “units” per second per user. A messages.get call costs 5 units. Stay well inside that limit by batching messages with a small time.sleep(0.1) between requests.

Parsing raw MIME messages from the Gmail API takes a bit of work. The API returns the body as base64url-encoded data nested in a parts tree. For multipart messages, you have to walk the part tree to find the text/plain part. Strip the HTML fallback with BeautifulSoup, not a regex. Email HTML is too messy for pattern matching:

import base64
from bs4 import BeautifulSoup


def extract_body(payload: dict) -> str:
    mime_type = payload.get("mimeType", "")

    if mime_type == "text/plain":
        data = payload.get("body", {}).get("data", "")
        return base64.urlsafe_b64decode(data + "==").decode("utf-8", errors="replace")

    if mime_type == "text/html":
        data = payload.get("body", {}).get("data", "")
        html = base64.urlsafe_b64decode(data + "==").decode("utf-8", errors="replace")
        return BeautifulSoup(html, "html.parser").get_text(separator="\n")

    if "parts" in payload:
        # Prefer plain text; fall back to HTML
        plain = next(
            (p for p in payload["parts"] if p.get("mimeType") == "text/plain"), None
        )
        if plain:
            return extract_body(plain)
        html_part = next(
            (p for p in payload["parts"] if p.get("mimeType") == "text/html"), None
        )
        if html_part:
            return extract_body(html_part)
        # Recurse into nested multipart
        for part in payload["parts"]:
            result = extract_body(part)
            if result.strip():
                return result

    return ""

Skip attachments entirely. The pipeline never loads binary attachment content into memory.

Before any email body reaches the LLM, it runs through a PII masking layer. Even with a fully local model, this is a smart defense. It stops the LLM from echoing sensitive tokens in its output (which might then get logged). It also trains you to treat email as structured data, not raw text. Microsoft Presidio is the right tool here. Install it with pip install presidio-analyzer presidio-anonymizer and grab the spaCy model with python -m spacy download en_core_web_lg:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

PII_ENTITIES = [
    "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
    "CREDIT_CARD", "US_SSN", "IBAN_CODE", "IP_ADDRESS",
]


def mask_pii(text: str) -> str:
    results = analyzer.analyze(text=text, entities=PII_ENTITIES, language="en")
    anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
    return anonymized.text

After masking, John Smith <john.smith@acmecorp.com> becomes <PERSON> <<EMAIL_ADDRESS>>. The LLM summarizes the redacted text. The summary never holds the raw PII, just the email’s structural meaning.

For long email threads, chunk the content to fit inside the model’s context window. A simple plan: include the three most recent messages in a thread in full, and summarize older messages with a single prior-context sentence. That keeps the talk-track the LLM needs without busting a typical 8k to 32k token window.

Building a Smart Email Summarizer

The core of the summarizer is a structured prompt that asks for JSON output. With a local model like mistral-nemo or llama3.1:8b running through Ollama, you can pull structured data reliably by stating the output format and using Ollama’s format="json" parameter:

import ollama
import json


SUMMARIZE_PROMPT = """You are an email assistant. Analyze the following email and respond with a JSON object containing exactly these fields:
- "summary": A 2-3 sentence plain-English summary of the email's content and purpose.
- "action_items": A list of strings, each describing a specific action the recipient needs to take. Empty list if none.
- "urgency": An integer from 1 (low, informational) to 5 (high, requires response today).
- "category": One of: "Task", "FYI", "Social", "Spam", "Receipt", "Newsletter".
- "reply_needed": A boolean indicating whether a reply is expected.

Email (PII redacted):
---
{body}
---

Respond with only the JSON object, no other text."""


def summarize_email(body: str, model: str = "mistral-nemo") -> dict:
    masked_body = mask_pii(body)
    # Truncate to ~3000 words to stay within context
    words = masked_body.split()
    if len(words) > 3000:
        masked_body = " ".join(words[:3000]) + "\n\n[Thread truncated]"

    response = ollama.chat(
        model=model,
        messages=[{"role": "user", "content": SUMMARIZE_PROMPT.format(body=masked_body)}],
        format="json",
    )

    try:
        return json.loads(response["message"]["content"])
    except json.JSONDecodeError:
        return {
            "summary": "Parse error - raw response logged.",
            "action_items": [],
            "urgency": 1,
            "category": "FYI",
            "reply_needed": False,
        }

A full daily digest run grabs every unread message from the past 24 hours, summarizes each one, and saves the results to a local SQLite database using sqlite-utils:

import sqlite_utils
from datetime import datetime


def run_daily_digest(credentials_path: str = "credentials.json", db_path: str = "email_agent.db"):
    creds = get_credentials(credentials_path)
    service = build("gmail", "v1", credentials=creds)
    messages = fetch_recent_messages(service, hours=24)

    db = sqlite_utils.Database(db_path)
    if "summaries" not in db.table_names():
        db["summaries"].create({
            "message_id": str,
            "thread_id": str,
            "subject": str,
            "sender": str,
            "received_at": str,
            "summary": str,
            "action_items": str,
            "urgency": int,
            "category": str,
            "reply_needed": int,
            "processed_at": str,
        }, pk="message_id")

    results = []
    for msg in messages:
        message_id = msg["id"]
        # Skip already-processed messages
        if db["summaries"].count_where("message_id = ?", [message_id]) > 0:
            continue

        headers = {h["name"]: h["value"] for h in msg["payload"].get("headers", [])}
        body = extract_body(msg["payload"])
        if not body.strip():
            continue

        analysis = summarize_email(body)

        db["summaries"].insert({
            "message_id": message_id,
            "thread_id": msg["threadId"],
            "subject": headers.get("Subject", "(no subject)"),
            "sender": headers.get("From", "unknown"),
            "received_at": headers.get("Date", ""),
            "summary": analysis.get("summary", ""),
            "action_items": json.dumps(analysis.get("action_items", [])),
            "urgency": analysis.get("urgency", 1),
            "category": analysis.get("category", "FYI"),
            "reply_needed": int(analysis.get("reply_needed", False)),
            "processed_at": datetime.utcnow().isoformat(),
        }, replace=True)

        results.append(analysis)

    return results

You can render the output as a Markdown digest file sorted by urgency. That makes a handy morning briefing you open in any text editor. You can also pipe it to Home Assistant via its REST API as a push alert for urgent items. Filter for messages with urgency >= 4 and reply_needed = True to build a “must act today” list. On most days, that list stays under five items.

Building a Smart Draft System

The smart draft system adds a second skill. Given an incoming email that needs a reply, it writes a fitting draft in your own voice and drops it in Gmail’s Drafts folder for your review. The system never sends anything on its own. That would be a risky default. It only creates drafts.

To match your voice, gather a corpus of your sent emails from the past six months. This becomes the reference set for a local vector database. The DB pulls past replies that match the style of the new email:

import chromadb
from chromadb.utils import embedding_functions


def build_style_corpus(service, chroma_path: str = "./chroma_db"):
    """Fetch sent mail and index it for vector retrieval."""
    client = chromadb.PersistentClient(path=chroma_path)
    ef = embedding_functions.OllamaEmbeddingFunction(
        model_name="nomic-embed-text",
        url="http://localhost:11434/api/embeddings",
    )
    collection = client.get_or_create_collection("sent_mail", embedding_function=ef)

    # Fetch up to 500 recent sent messages
    result = service.users().messages().list(
        userId="me",
        labelIds=["SENT"],
        maxResults=500,
    ).execute()

    for msg_stub in result.get("messages", []):
        msg = service.users().messages().get(
            userId="me", id=msg_stub["id"], format="full"
        ).execute()
        body = extract_body(msg["payload"])
        if len(body.strip()) < 50:
            continue  # Skip one-liners and auto-replies

        headers = {h["name"]: h["value"] for h in msg["payload"].get("headers", [])}
        collection.add(
            documents=[body[:2000]],
            ids=[msg["id"]],
            metadatas=[{"subject": headers.get("Subject", "")}],
        )

    return collection

Draft generation pulls the five most similar past replies from the corpus. It uses them as few-shot examples in the prompt. For most people, that’s more reliable than fine-tuning a model on your sent mail. Fine-tuning needs careful data prep and GPU time. Few-shot prompting with retrieval gives solid results with no training overhead:

def generate_draft_reply(
    incoming_email: str,
    subject: str,
    style_collection,
    model: str = "mistral-nemo",
) -> str:
    masked_incoming = mask_pii(incoming_email)

    # Retrieve similar past replies for style reference
    similar = style_collection.query(
        query_texts=[masked_incoming[:500]],
        n_results=5,
    )
    style_examples = "\n\n---\n\n".join(
        doc[:500] for doc in similar["documents"][0]
    )

    prompt = f"""You are drafting an email reply. Study the writing style of the examples below (these are real replies written by the user), then write a reply to the incoming email in the same voice and tone.

STYLE EXAMPLES (user's past replies):
{style_examples}

INCOMING EMAIL (PII redacted):
Subject: {subject}
---
{masked_incoming[:2000]}
---

Write a natural, professional reply in the user's voice. Do not include a subject line. Start directly with the greeting."""

    response = ollama.chat(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )
    return response["message"]["content"]

Once you have the draft text, create it in Gmail using the users.drafts.create method. The draft has to be a MIME message encoded in base64:

import email.mime.text
import email.mime.multipart
import base64


def create_gmail_draft(service, thread_id: str, to: str, subject: str, body: str):
    message = email.mime.text.MIMEText(body)
    message["to"] = to
    message["subject"] = f"Re: {subject}" if not subject.startswith("Re:") else subject

    raw = base64.urlsafe_b64encode(message.as_bytes()).decode("utf-8")

    draft = service.users().drafts().create(
        userId="me",
        body={"message": {"raw": raw, "threadId": thread_id}},
    ).execute()

    return draft["id"]

The draft shows up in your Gmail Drafts folder under your account. You can review, edit, and send it. Or trash it. The LLM output is a starting point, not a finished product.

Scheduling and Automation

A script you have to run by hand gives you convenience, not automation. The simplest production-ready scheduler on Linux and macOS is cron. Edit your crontab with crontab -e and add:

# Run email digest every 30 minutes, log output
*/30 * * * * /usr/bin/python3 /home/youruser/email-agent/main.py digest >> /var/log/email-agent.log 2>&1

# Rebuild sent-mail style corpus once daily at 3am
0 3 * * * /usr/bin/python3 /home/youruser/email-agent/main.py rebuild-corpus >> /var/log/email-agent.log 2>&1

Use the full path to your Python interpreter (find it with which python3) and the full path to your script. On macOS, prefer launchd plist files over cron for better scheduling after sleep/wake cycles. Cron works fine for always-on machines. On Windows, use Task Scheduler with the “Run whether user is logged on or not” option and store credentials with the Windows Credential Manager.

For Home Assistant, the digest script can POST urgent summaries to HA’s REST notification API. After you turn on the API in HA with a long-lived access token, a single requests.post call sends the summary as a push alert to your phone:

import requests


def notify_home_assistant(summary_text: str, ha_url: str, ha_token: str):
    requests.post(
        f"{ha_url}/api/services/notify/mobile_app_your_phone",
        headers={"Authorization": f"Bearer {ha_token}"},
        json={"message": summary_text, "title": "Email Digest - Urgent"},
        timeout=10,
    )

Error handling needs real care in a scheduled script. Failures are silent. Wrap each external call (Gmail API, Ollama, ChromaDB) in a try/except block. Log failures with enough context to diagnose them later. The common failure modes: the Gmail API returns a 429 (rate limit hit, so back off and retry after the Retry-After header value); Ollama returns malformed JSON for the structured summary prompt (catch json.JSONDecodeError and log the raw response); the OAuth token expires or gets revoked (the google.auth.exceptions.RefreshError, which forces you to re-run the interactive auth flow).

import logging
import time

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
    handlers=[logging.StreamHandler()],
)
logger = logging.getLogger(__name__)


def safe_summarize(body: str, retries: int = 2) -> dict:
    for attempt in range(retries + 1):
        try:
            return summarize_email(body)
        except Exception as e:
            logger.warning(f"Summarization attempt {attempt + 1} failed: {e}")
            if attempt < retries:
                time.sleep(2 ** attempt)
    return {"summary": "Failed after retries", "urgency": 1, "category": "FYI",
            "action_items": [], "reply_needed": False}

On the resource side: a typical inbox gets 50 emails per day. A single digest run handles roughly 50 summary requests. With mistral-nemo (12B parameters, Q4 quantization) on an RTX 4070, each summary takes about 3 to 5 seconds. That puts the full digest time at 3 to 4 minutes. The model loads once and stays in VRAM between requests when you use Ollama’s keep-alive feature (OLLAMA_KEEP_ALIVE=30m). On CPU-only hardware (a modern laptop with 16GB RAM), expect 20 to 30 seconds per email with a smaller model like phi3:mini. That’s still fine for a background batch job that runs every 30 minutes.

System Architecture in Full

The full system has five layers. At the bottom is the credential layer: credentials.json and the system keychain that holds the OAuth refresh token. Above that is the data ingestion layer: the Gmail API client that fetches messages using messages.list and messages.get, handles pagination, and respects rate limits. Next is the preprocessing layer: the MIME parser that pulls plain-text bodies and the Presidio pipeline that masks PII before any data touches the model. Above that is the intelligence layer: Ollama running a local quantized LLM that takes clean text and returns structured JSON (for summaries) or free-form draft text (for replies). The ChromaDB vector store holding your sent-mail style corpus sits in this layer too. At the top is the output layer: a SQLite database that stores all processed summaries, a Markdown digest renderer, optional Home Assistant alerts for urgent items, and the Gmail Drafts API endpoint where new replies land.

Data flows one way through these layers. Email enters at the Gmail API. Each layer handles it locally. Outputs (summaries, drafts, alerts) stay on your machine or go back to your Gmail account as draft-only content. Nothing goes to an outside AI API. The only network calls the system makes are to Google’s servers for email retrieval and to your local Ollama instance (loopback, 127.0.0.1:11434) for inference.

This setup is easy to extend. Adding an unsubscribe action is simple. After tagging an email as “Newsletter” with high confidence, call users.messages.modify to apply an “ARCHIVE” label and pull it from the inbox. Adding Outlook/Exchange support means swapping the Gmail API client for an IMAP connection via Python’s imaplib. The preprocessing, intelligence, and output layers don’t change, because they work on plain text no matter the source. The local-first design means each new feature you add inherits the privacy traits of the rest of the system for free.

The cost to set this up runs about two to three hours of initial config. It pays back fast if you get a lot of email and used to lean on cloud AI tools to handle it. More to the point: you get a clear view of what those tools were doing with your data. And you get a working alternative that doesn’t rely on a third party’s privacy policy.