Ready to extract transcript
We'll recognize the speech in your audio file.
Vidxt turns spoken audio into readable, searchable text directly in your browser. Upload an interview, lecture, podcast or voice memo, and you get a clean transcript you can copy, edit or export within minutes.
Speech recognition runs against your file through a high-accuracy engine that handles English, Chinese, Japanese, Korean and several other languages. Punctuation and casing are restored automatically, so the output reads close to natural writing instead of a flat string of words.
Vidxt accepts the common audio formats you actually have on disk: MP3, WAV, M4A, AAC, FLAC and OGG. If your file is a video instead, you can run it through the video transcription tool, which extracts the audio track automatically before recognising speech.
Decoding and pre-processing happen locally through FFmpeg compiled to WebAssembly. The audio is sent to the speech engine only for the seconds it takes to generate text, and nothing is stored on our servers afterwards, so sensitive recordings do not sit in a cloud bucket.
On clear single-speaker audio you can expect well above 90% word accuracy. Background noise, heavy accents, overlapping speakers or low bitrate recordings will reduce that, so a quick proof-read is still worth doing.
English, Simplified and Traditional Chinese, Japanese, Korean, Spanish, French, German, Portuguese and several more. Mixed-language recordings work but tend to be most reliable when one language clearly dominates.
File size is capped at 2 GB on desktop and 500 MB on mobile, which covers multi-hour recordings at normal bitrates. Very long files take longer to process; splitting them into chapters often gives faster, more manageable output.
Yes. Day-to-day transcription is free, with no signup required for short clips. Heavy users who run long files frequently can move to a paid plan for higher monthly limits and priority processing.