- The AI Synthesizer
- Posts
- Saving 3 hours with ChatGPT
Saving 3 hours with ChatGPT
Getting the Transcripts of Lex Fridman Podcast
Yesterday we went through retrieving transcripts for a single video.
Today, we'll do that for all of the episodes of Lex Fridman Podcast
We'll use ChatGPT for that.
What do we want to achieve?
Creating a directory of text files
Each file should contain a transcript of the podcast segment
This is what the final result will look like:
Folder with transcripts
The prompt
This is the ChatGPT prompt that I used to get the transcripts. I used the INPUT, OUTPUT, TIPS framework with a one-shot example for retrieving information from the video description:
You are my personal Python coding assistant specializing in YouTube API.
Your task is to write a Python module that will do the following:
- INPUT: name of the channel, and playlist name (from that channel)
- OUTPUT: text files in the transcripts folder. Each text file should contain a transcript of a segment in a video (it should be named "Episode - [video_name] Segment - [segment_name] ([segment_start_timestamp]-[segment_end_timestamp])"). Make sure to use video name in the file name, not video id.
I'd also like to retrieve the metadata of segments in the future. That means, I'd like to save it to a csv file after the process is finished. Each record should have these information: video_name, segment_name, video_id, segment_url (e.g. https://www.youtube.com/watch?v=eTBAxD6lt2g&t=1907s (v=video_id, t=start_timestamp in seconds))
TIPS:
- The transcripts should be retrieved from the YouTube transcription API. It should be a part of the process.
- The playlists we will be retrieving can contain more than 50 elements. Make sure to retrieve all of the videos from the playlist, not only the first items.
- For getting the information about segment timestamps, you should look into description of video. Here's an example of description:
"""
OUTLINE:
0:00 - Introduction
1:31 - Exercise routine
7:42 - Advice to younger self
14:56 - Jungian shadow
19:42 - Betrayal and loyalty
39:52 - Drama
57:31 - Chimp Empire
1:02:24 - Overt vs covert contracts
1:08:31 - Age and health
1:14:39 - Sexual selection
1:25:15 - Relationships
1:37:49 - Fertility
1:48:15 - Productivity
2:05:02 - Family
"""
Note that the episodes can be longer than one hour. Make sure you handle segments that appear after the first hour.
- Don't assume that I have any method already written. If you have to use third-part libraries to do something - please do so. Just make sure to list the libraries that you used at the end.
- Use Google Api Key for authentication.
- Output files should be in plain text, don't break lines for new transcript chunks. Remember you need to use video name for the filename, so make sure to retrieve that information.
Analyze the task step by step (you can write code snippets for that) and write a final solution at the end.
Results
Just look at these outputs from ChatGPT!
First ChatGPT started to analyze the steps that it needs to do to achieve the task:
Step-by-step analysis (first part of the output)
Then we got the code:
Code (second part of the output)
You can check out the whole code here. You have to run the code to get the transcripts by yourself :)!
I've iterated the prompt a few times, but with all of the information in the prompt above, I had to do only one modification in the code:
handling videos that don't have the transcript available.
I think the result is incredible! Coding this would take a few hours, and here I got the result in about 10 minutes (prompt iteration).
This is an example of the transcript:
And CSV with metadata that we will use in the future:
This is the seventh day of the 30-day AI challenge.
Over the next month, I will be building the Lex Fridman AI engine with you!
If you're reading this, I assume you'd like to build things. If you stick to this newsletter you will have a running project after a month and know the necessary technology to build AI apps.
I've recently built PodcastGPT and want to share the process with the community. If you haven't seen the app yet, you can get access here: PodcastGPT
This is all for now! See you tomorrow.
Stay focused!
Luke