20 December 2025

Running Whisper AI for Real-Time Speech-to-Text on Linux

https://www.tecmint.com/whisper-ai-audio-transcription-on-linux/

 

1. Convert an Audio File into Text (not live, not automatically save it into a text file)

Whisper AI is an advanced automatic speech recognition (ASR) model developed by OpenAI that can transcribe audio into text with impressive accuracy and supports multiple languages. While Whisper AI is primarily designed for batch processing, it can be configured for real-time speech-to-text transcription on Linux.

In this guide, we will go through the step-by-step process of installing, configuring, and running Whisper AI for live transcription on a Linux system.

What is Whisper AI?

Whisper AI is an open-source speech recognition model trained on a vast dataset of audio recordings and it is based on a deep-learning architecture that enables it to:

  • Transcribe speech in multiple languages.
  • Handle accents and background noise efficiently.
  • Perform translation of spoken language into English.

Since it is designed for high-accuracy transcription, it is widely used in:

  • Live transcription services (e.g., for accessibility).
  • Voice assistants and automation.
  • Transcribing recorded audio files.

By default, Whisper AI is not optimized for real-time processing. However, with some additional tools, it can process live audio streams for immediate transcription.

Whisper AI System Requirements

Before running Whisper AI on Linux, ensure your system meets the following requirements: 

Hardware Requirements:

  • CPU: A multi-core processor (Intel/AMD).
  • RAM: At least 8GB (16GB or more is recommended).
  • GPU: NVIDIA GPU with CUDA (optional but speeds up processing significantly).
  • Storage: Minimum 10GB of free disk space for models and dependencies.

Software Requirements:

  • A Linux distribution such as Ubuntu, Debian, Arch, Fedora, etc.
  • Python version 3.8 or later.
  • Pip package manager for installing Python packages.
  • FFmpeg for handling audio files and streams.

Step 1: Installing Required Dependencies

Before installing Whisper AI, update your package list and upgrade existing packages.

sudo apt update     [On Ubuntu]
sudo dnf update -y  [On Fedora]
sudo pacman -Syu    [On Arch]

Next, you need to install Python 3.8 or higher and Pip package manager as shown.

sudo apt install python3 python3-pip python3-venv -y             [On Ubuntu]
sudo dnf install python3 python3-pip python3-virtualenv -y       [On Fedora]
sudo pacman -S python python-pip python-virtualenv               [On Arch] 

Lastly, you need to install FFmpeg, which is a multimedia framework used to process audio and video files.

sudo apt install ffmpeg     [On Ubuntu]
sudo dnf install ffmpeg     [On Fedora]
sudo pacman -S ffmpeg       [On Arch]

Step 2: Install Whisper AI in Linux

Once the required dependencies are installed, you can proceed to install Whisper AI in a virtual environment that allows you to install Python packages without affecting system packages.

python3 -m venv whisper_env
source whisper_env/bin/activate
pip install openai-whisper
Install Whisper AI on Linux
Install Whisper AI on Linux

Once the installation is complete, check if Whisper AI was installed correctly by running.

whisper --help

This should display a help menu with available commands and options, which means Whisper AI is installed and ready to use.

Check Whisper AI on Linux
Check Whisper AI on Linux

Step 3: Running Whisper AI in Linux

Once Whisper AI is installed, you can start transcribing audio files using different commands.

Transcribing an Audio File

To transcribe an audio file (audio.mp3), run:

whisper audio.mp3

Whisper will process the file and generate a transcript in text format.

Transcribing an Audio File
Transcribing an Audio File


2. Convert Audio Files into Text Files in a Batch (not live)

This is **batch (file-based)** transcription, not **live dictation**.

Auto-save text to Markdown / notes, please. Thank you!
------------------------------------------------------------------------------

You got it 👍

Below is a **NOT LIVE** but **clean, practical Whisper setup that automatically saves transcripts as well-formatted Markdown notes**. This works great for meetings, study notes, or voice journals.

I’ll give you:

1.  📁 Folder-based auto-transcription
2.  📝 Markdown formatting (title, timestamps, sections)
3.  ⚙️ A reusable script you can customize

1️⃣ **Folder Structure (recommended)**
    whisper_notes/
    ├── audio/        # put .wav / .mp3 here
    ├── notes/        # Markdown files auto-saved here
    └── transcribe.py


Create it:

    mkdir -p ~/whisper_notes/audio ~/whisper_notes/notes
    cd ~/whisper_notes


2️⃣ **Install requirements (once)**

    pip install openai-whisper torch

(You already installed Whisper, this just ensures dependencies.)

3️⃣ **Create the auto-Markdown script**

    nano transcribe.py

Paste **this full script:**

    import whisper
    from pathlib import Path
    from datetime import datetime
    
    AUDIO_DIR = Path("audio")
    NOTES_DIR = Path("notes")
    MODEL_NAME = "medium"   # change to small if CPU is slow
    
    NOTES_DIR.mkdir(exist_ok=True)
    
    model = whisper.load_model(MODEL_NAME)
    
    def seconds_to_mmss(seconds):
        m = int(seconds // 60)
        s = int(seconds % 60)
        return f"{m:02d}:{s:02d}"
    
    for audio_file in AUDIO_DIR.iterdir():
        if audio_file.suffix.lower() not in {".wav", ".mp3", ".m4a", ".flac"}:
            continue
    
        print(f"Transcribing: {audio_file.name}")
        result = model.transcribe(audio_file, fp16=False)
    
        title = audio_file.stem.replace("_", " ").title()
        now = datetime.now().strftime("%Y-%m-%d %H:%M")
    
        md_lines = [
            f"# {title}",
            "",
            f"- **Date:** {now}",
            f"- **Source:** `{audio_file.name}`",
            "",
            "## Transcript",
            ""
        ]
    
        for seg in result["segments"]:
            ts = seconds_to_mmss(seg["start"])
            text = seg["text"].strip()
            md_lines.append(f"- **[{ts}]** {text}")
    
        md_file = NOTES_DIR / f"{audio_file.stem}.md"
        md_file.write_text("\n".join(md_lines), encoding="utf-8")
    
        print(f"Saved → {md_file}")
    
    print("✅ All done.")


Save and exit.

4️⃣ **Use it (very simple)**

1️⃣ Put audio files into:

    ~/whisper_notes/audio/

2️⃣ Run:

    python3 transcribe.py

3️⃣ Get Markdown notes in:

    ~/whisper_notes/notes/

5️⃣ Example Markdown Output

    # Team Meeting Notes
    
    - **Date:** 2025-01-21 14:30
    - **Source:** `team_meeting.wav`
    
    ## Transcript
    
    - **[00:00]** Good afternoon everyone, let's get started.
    - **[00:32]** The first topic is system migration.
    - **[02:15]** We expect completion by next Friday.


Perfect for:
-   Obsidian
-   VS Code
-   GitHub
-   Joplin
-   Plain Markdown

6️⃣ **Optional Improvements (I can add these)**

✅ Auto-move processed audio
✅ One-file-per-day notes
✅ Headings instead of bullet timestamps
✅ Speaker labels (best-effort)
✅ Hotkey recording → auto-note
✅ Git auto-commit notes
✅ Summaries at top (AI-assisted)

Just tell me what you want next — this setup is a solid base 👌

I got the following error message -

'
Traceback (most recent call last):
  File "/home/shermanchen/dev/whisperai/whisper_notes/transcribe.py", line 23, in <module>
    result = model.transcribe(audio_file, fp16=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/shermanchen/dev/whisperai/whisper_env/lib/python3.12/site-packages/whisper/transcribe.py", line 139, in transcribe
    mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/shermanchen/dev/whisperai/whisper_env/lib/python3.12/site-packages/whisper/audio.py", line 141, in log_mel_spectrogram
    audio = torch.from_numpy(audio)
            ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected np.ndarray (got PosixPath)

'

Great, thanks for pasting the full traceback — this is a **clear and easy fix** 👍
You’re actually **very close**.

✅ **What went wrong (exactly)**

This line caused it:

    result = model.transcribe(audio_file, fp16=False)

`audio_file` is a Path object (`PosixPath`), but **Whisper expects either**:
-   a string path (`str`)
-   or a **NumPy array**

So Whisper tried to treat the `Path` like audio data → 💥 error.

✅ **The fix (one-line change)**

Convert the `Path` to a string.

**Edit your script:**

    nano transcribe.py

**Change this line:**

    result = model.transcribe(audio_file, fp16=False)

**To this:**

    result = model.transcribe(str(audio_file), fp16=False)

Save and exit.

✅ **Run it again**

    python3 transcribe.py

You should now see:

    Transcribing: something.wav
    Saved → notes/something.md
    ✅ All done.

🧠 **Why this happens (for future reference)**

-   `pathlib.Path` is great for filesystem logic
-   Whisper internally calls NumPy / Torch
-   Those libraries **do not understand Path objects**
-   Converting to `str()` is the correct and standard solution

This is a **very common Python + ML gotcha**, not your fault.

🔒 **Optional hardening (recommended)**

If you want to make the script more robust, you can add:

    audio_path = audio_file.resolve()
    result = model.transcribe(str(audio_path), fp16=False)


This avoids issues with symlinks or relative paths.

Next enhancements (I can implement any of these for you)

✔ Auto-move processed audio to audio/processed/
✔ Skip already-transcribed files
✔ Daily Markdown notes (append instead of overwrite)
✔ Add a summary section at the top
✔ Faster mode (--beam_size, --temperature)
✔ Hotkey recording → instant Markdown note

Just tell me what you want next — your setup is now solid ✅


3. Live Transcribe Voice from Microphone into a Text File

Below is a **working, practical solution** for **LIVE microphone → Whisper → auto-append to a Markdown note** on Ubuntu.

This is the **best compromise** Whisper can do (because Whisper is not truly streaming).

How “Live” Whisper Actually Works (important)
------------------------------------------------------------------------------

Whisper **cannot transcribe word-by-word** like Vosk.

The correct way is:

🎤 Record short chunks (e.g. 5–10 seconds) → transcribe → append to Markdown → repeat

This gives:

-   ✅ High accuracy
-   ✅ Punctuation
-   ✅ Continuous notes
-   ❌ A few seconds delay (unavoidable)

✅ **Final Result**

-   You speak continuously
-   Every 5 seconds, Whisper transcribes
-   Text is auto-appended to ONE Markdown note
-   Timestamps included
-   Stop anytime with Ctrl+C

**STEP 1 — Install dependencies (once)**

    pip install openai-whisper sounddevice numpy
    sudo apt install ffmpeg -y

**STEP 2 — Create live Markdown dictation script**

    nano whisper_live_md.py

Paste **this full script**:

    import whisper
    import sounddevice as sd
    import numpy as np
    from datetime import datetime
    from pathlib import Path
    import queue
    import time
    
    MODEL_NAME = "small"     # change to "medium" for higher accuracy
    CHUNK_SECONDS = 5        # length per transcription chunk
    SAMPLE_RATE = 16000
    
    NOTES_DIR = Path("notes")
    NOTES_DIR.mkdir(exist_ok=True)
    
    note_file = NOTES_DIR / f"live_note_{datetime.now().strftime('%Y-%m-%d_%H-%M')}.md"
    
    model = whisper.load_model(MODEL_NAME)
    audio_q = queue.Queue()
    
    def callback(indata, frames, time_info, status):
        audio_q.put(indata.copy())
    
    def seconds_to_mmss(seconds):
        m = int(seconds // 60)
        s = int(seconds % 60)
        return f"{m:02d}:{s:02d}"
    
    with note_file.open("w", encoding="utf-8") as f:
        f.write(f"# Live Dictation Notes\n\n")
        f.write(f"- **Started:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")
    
    print("🎤 Live dictation started (Ctrl+C to stop)")
    
    try:
        with sd.InputStream(samplerate=SAMPLE_RATE, channels=1, callback=callback):
            while True:
                audio_chunks = []
                start_time = time.time()
    
                while time.time() - start_time < CHUNK_SECONDS:
                    audio_chunks.append(audio_q.get())
    
                audio = np.concatenate(audio_chunks, axis=0).flatten()
    
                result = model.transcribe(audio, fp16=False)
                text = result["text"].strip()
    
                if text:
                    timestamp = datetime.now().strftime("%H:%M:%S")
                    line = f"- **[{timestamp}]** {text}\n"
    
                    with note_file.open("a", encoding="utf-8") as f:
                        f.write(line)
    
                    print(text)
    
    except KeyboardInterrupt:
        print("\n🛑 Dictation stopped.")
        print(f"📝 Saved to: {note_file}")

Save and exit.

**STEP 3 — Run it**

    python3 whisper_live_md.py

Start speaking 🎤
Every ~5 seconds:
-   Text appears in terminal
-   Text is appended to Markdown

**Example Markdown Output (LIVE)**

    # Live Dictation Notes
    
    - **Started:** 2025-01-21 22:10:03
    
    - **[22:10:08]** Today I am testing live dictation using Whisper.
    - **[22:10:14]** The accuracy is much better than Vosk.
    - **[22:10:20]** This will be saved automatically as markdown.

Perfect for:

-   Obsidian
-   VS Code
-   Joplin
-   GitHub notes

**Recommended Settings (Important)**

**CPU only (most laptops)**

    MODEL_NAME = "small"
    CHUNK_SECONDS = 5

**Higher accuracy**

    MODEL_NAME = "medium"
    CHUNK_SECONDS = 8

**Faster response**

    MODEL_NAME = "base"
    CHUNK_SECONDS = 3

(The End)

15 December 2025

Can i use rsync to back up file on windows server?

Yes, you can use rsync to back up files on a Windows Server, but it's not native; you need to use Windows Subsystem for Linux (WSL) for a modern approach or tools like cwRsync for older methods, with WSL being recommended for easier integration and compatibility with Linux environments, enabling efficient, scriptable backups using SSH for secure transfer. 
This video provides an overview of how rsync can be used for backups:
Recommended Method: Windows Subsystem for Linux (WSL)
  1. Enable WSL: Install a Linux distribution (like Ubuntu) from the Microsoft Store on your Windows Server.
  2. Install rsync: Open your WSL terminal and run sudo apt install rsync openssh-client.
  3. Access Windows Files: Mount your Windows directories within WSL (e.g., /mnt/c/) to access them.
  4. Run rsync: Use standard rsync commands like rsync -avz /mnt/c/path/to/source/ user@remote:/path/to/backup/ for secure, efficient transfers, often over SSH. 
This video demonstrates how to install and use rsync on Windows via WSL:
Other Options
  • cwRsync: A Windows-specific distribution of rsync, though setup for unattended backups can be challenging.
  • Linux VM: Run a full Linux VM on the server and use rsync within that environment. 
Key Considerations
  • SSH: For network backups, ensure OpenSSH server is running on the destination, and set up SSH keys for passwordless authentication for scripts.
  • Snapshottingrsync can overwrite, so consider snapshotting techniques or tools with versioning (like Restic/Borg) for better recovery, notes Reddit user on Reddit.
  • Alternatives: For simpler needs, PowerShell/Batch scripts with robocopy or dedicated tools like FreeFileSync or BackupAssist (with rsync module) are alternatives, mention Server Fault user and BackupAssist. 

how to find newly created/changed files/directories on ubuntu?

The following command finds the newest 10 files -

$ find . -type f -printf '%T@ %P\n' | sort -nr | head -n 10 | cut -d' ' -f2-
 
This Linux command finds and lists the **10 most recently modified files** (not directories) in the current directory (`.`) and all its subdirectories, showing their relative paths in order from newest to oldest.

Let's break it down step by step:

1. `find . -type f -printf '%T@ %P\n'`
   - `find .`: Starts searching from the current directory (`.`) and recurses into all subdirectories.
   - `-type f`: Restricts the search to regular **files only** (excludes directories, symlinks, etc.).
   - `-printf '%T@ %P\n'`:
     - `%T@`: Prints the file's **last modification time** as a Unix timestamp (seconds since epoch, with decimal fraction for sub-second precision).
     - Space: Adds a literal space separator.
     - `%P`: Prints the file's **path relative** to the starting directory (`.`), without the leading `./`.
     - `\n`: Newline, so each file is on its own line.
   - Output example:
     ```
     1734621234.5678901234 path/to/recent/file.txt
     1734600000.0000000000 path/to/older/file.jpg
     ...
     ```

2. `| sort -nr`
   - Pipes the output to `sort`.
   - `-n`: **Numeric** sort (treats the timestamp as a number, not as text).
   - `-r`: **Reverse** order (descending, so highest/newest timestamps first).
   - Result: Lines are now sorted from most recently modified to least recently modified.

3. `| head -n 10`
   - Takes only the **first 10 lines** of the sorted output.
   - This gives you the 10 newest files.

4. `| cut -d' ' -f2-`
   - Pipes to `cut`.
   - `-d' '`: Uses a **space** as the field delimiter.
   - `-f2-`: Outputs fields **2 and everything after** (i.e., skips the timestamp and keeps only the file path).
   - Result: Clean list of the 10 most recently modified file paths, one per line.

### Final output example:
```
documents/report_final.docx
pictures/vacation/photo123.jpg
src/main.py
logs/app.log
...
```

### Summary
This is a common and efficient way to get the **top 10 recently modified files** in a directory tree. It's more precise than using `-ls` or `-exec stat` because `%T@` provides sub-second timestamp resolution for accurate sorting.

### Alternatives
- If you want access time instead of modification time: Replace `%T@` with `%A@`.
- If you want human-readable dates instead of just paths: Remove the final `cut` and adjust the `sort`/`head` accordingly.
- On macOS (BSD find), `-printf` isn't available by default — you'd need to use `stat` or install GNU find via Homebrew. 



On Ubuntu (and Linux in general), there are several reliable ways to find newly created or recently changed files/directories, depending on how precise and how far back you want to look.

Below are the most useful and practical methods, from simple to advanced.


1️⃣ Using find (most common & flexible)

🔹 Files changed in the last N days

find /path/to/search -mtime -N

Example: files changed in the last 2 days

find ~ -mtime -2

🔹 Files changed in the last N minutes

find ~ -mmin -30

🔹 Newly created files (best approximation)

Linux doesn’t always store true “creation time”, but you can check change time:

find ~ -ctime -1
OptionMeaning
-mtimeFile content modified
-ctimeMetadata changed (permissions, ownership, creation on some FS)
-mminModified minutes ago

2️⃣ Only directories (not files)

find ~ -type d -mtime -1

Example: directories changed today.


3️⃣ Sort files by newest first

🔹 Files

ls -lt

🔹 Directories only

ls -lt --group-directories-first

🔹 Recursive, newest last

ls -ltrR

4️⃣ Using stat (detailed timestamps)

stat filename

Shows:

  • Access time

  • Modify time

  • Change time

  • Birth time (if filesystem supports it)

Example:

stat myfile.txt

5️⃣ Find files changed since a specific date/time

🔹 Create a reference file

touch /tmp/ref_time

🔹 Later, find newer files

find ~ -newer /tmp/ref_time

Very useful for tracking changes during installs or scripts.


6️⃣ Real-time monitoring (advanced)

🔹 Monitor directory changes live

inotifywait -m -r ~/somefolder

(Requires package)

sudo apt install inotify-tools

7️⃣ GUI (Files app / Nautilus)

  1. Open Files

  2. Go to the folder

  3. Click ⋮ → Sort by → Modified

  4. Choose Descending


🔑 Which method should you use?

ScenarioBest choice
“What changed today?”find -mtime -1
“What changed in last 10 minutes?”find -mmin -10
“What did a script change?”find -newer
“Watch changes live”inotifywait
“Quick visual check”GUI sort

14 December 2025

My laptop is Windows 11 and Ubuntu 24.04 dual boot. How can I assign more SSD space to Ubuntu (i.e. from Windows)?

You can do this safely, but do it in the right order. Below is the recommended, low-risk method for a Windows 11 + Ubuntu 24.04 dual-boot system.


⚠️ Before you start (important)

  1. Back up important files (both Windows and Ubuntu).

  2. Turn off Windows Fast Startup

    • Windows Settings → System → Power & sleep → Additional power settings

    • Choose what the power buttons do → Change settings currently unavailable

    • Uncheck “Turn on fast startup”

This prevents filesystem corruption.


✅ Overview of the process

You will:

  1. Shrink the Windows partition (from Windows)

  2. Boot into Ubuntu live USB

  3. Expand the Ubuntu partition using GParted


🧩 Step 1: Shrink Windows partition (from Windows)

Never shrink Windows from Linux.

  1. Boot into Windows 11

  2. Press Win + XDisk Management

  3. Right-click the Windows (C:) partition

  4. Choose Shrink Volume

  5. Enter how much space to shrink (in MB)

    • Example:

      • 50 GB → 51200

      • 100 GB → 102400

  6. Click Shrink

👉 You will now see Unallocated space on the disk.


🧩 Step 2: Boot Ubuntu Live USB

You cannot resize Ubuntu while it’s running.

  1. Insert Ubuntu 24.04 USB

  2. Reboot → press F12 / F10 / Esc / Del (depends on your laptop)

  3. Choose the USB

  4. Select “Try Ubuntu”


🧩 Step 3: Resize Ubuntu partition (GParted)

Ubuntu live includes GParted.

  1. Open GParted

  2. Select your SSD (top-right dropdown, e.g. /dev/nvme0n1)

  3. Identify partitions:

    • EFI (small, ~100–500MB)

    • Windows NTFS

    • Ubuntu ext4

    • Swap (if present)

Case A: Unallocated space is next to Ubuntu

  1. Right-click Ubuntu partition → Resize/Move

  2. Drag to take the unallocated space

  3. Click Apply


Case B: Unallocated space is not adjacent

You must move partitions (slow but safe).

Typical order:

  1. Move swap partition (if exists)

  2. Then resize Ubuntu

⚠️ Do not move the EFI partition


🧩 Step 4: Reboot

  1. Close GParted

  2. Reboot

  3. Remove USB

Ubuntu should now have more space 🎉


🧪 Verify in Ubuntu

df -h

or

lsblk

❗ Common mistakes to avoid

❌ Shrinking Windows from Linux
❌ Forgetting to disable Fast Startup
❌ Interrupting GParted while it’s working
❌ Touching EFI partition


🧠 Extra tips

  • If Ubuntu uses LVM, the steps are slightly different

  • If you use BitLocker, suspend it before shrinking Windows

  • NVMe SSDs are fine; just make sure you select the correct disk


If you want, you can:

  • Tell me whether your Ubuntu uses LVM or normal ext4

  • Paste the output of:

lsblk

(from Ubuntu live)

and I’ll give you exact, partition-by-partition instructions tailored to your laptop.

On Ubuntu, how to run one of the commands I ran before again?

To rerun a command from your history, use ! followed by its number (e.g., !123), or use !! for the very last command, !-2 for the second-to-last, or press Ctrl+r and type part of the command to search for it, then hit Enter. 
By History Number
  1. View History: Type history and press Enter to see a numbered list.
  2. Re-run: Type ! then the number of the command (e.g., !567 to run command #567). 
bash
# Example:
history
# ... (list of commands with numbers) ...
!123 # Runs the 123rd command
Shortcuts
  • !!: Runs the immediately preceding command.
  • !-2: Runs the second-to-last command.
  • !string: Runs the most recent command starting with string (e.g., !ls runs the last ls command). 
Reverse Search (Ctrl+r) 
  • Press Ctrl+r to enter interactive search mode.
  • Type part of the command you're looking for (e.g., ssh).
  • Press Ctrl+r again to cycle through older matches.
  • Press Enter to run the command, or Esc to exit search without running. 
  • If you have cycled through some older matches, now you want to cycle
    through some new 
    matches, you can add the following line to your  ~/.bashrc file:
stty -ixon
Starting now, you can press   Ctrl+s to cycle through new matches.