This article covers YouTube search techniques for language learners, and also has links to lists of linguistically tagged YouTube videos.
YouTube is the largest linguistic corpus available, that no linguist could ever conceive of gathering on his own. Its videos supply the learner with the whole range of linguistic registers. However, they are more often than not ignored by him, because he doesn't know how to sort the wheat from the chaff and get his hands on useful content.
In order to be really fluent in a language, one must know how to speak it in every linguistic register possible. YouTube is a great tool for reaching this goal, because it has lots of videos that have spontaneous and informal everyday conversation that you won't easily find elsewhere without paying a cost. If you travel to the country and do immersion, or if you pay for a native tutor, that will cost you money; if you get a language partner, that will cost you time ( that is, the time you'll be speaking your native language ).
Of course, in most cases, you must pay a cost if you want to actually speak to natives; however, thanks to YouTube, listening to natives talk to each other in spontaneous and informal environments is now free.
Search Techniques Edit
The first thing you'll notice upon entering the website is that the urls are not evenly distributed to everyone. The lion's share goes to promoted content, and videos by teenagers talking about beauty and makeup come second. So how do you filter out this stuff? There are two ways, standard and hackish.
Standard techniques use search resources that are already provided by either Google or the YouTube search engine.
Sort videos by upload date Edit
Applying this filter is the easiest way to get rid of promoted content or videos with a high view count.
- Click "Filters", and then under "Sort" click on "Upload date".
- Or simply add &search_sort=video_date_uploaded after the url.
Use "intitle:(query)" Edit
intitle:(query) searches on video titles and nowhere else.
- Search for vlogs about a specific subject, like gardening: vlog intitle:pruning
- Vlogs with city or neighborhood names in the title have a tendency for spontaneous conversations when you sort them by upload date. Pick city and neighborhood names from Google Maps, place names from Google Street View and search for them on YouTube. praha intitle:vlog ; denver intitle:vlog ; intitle: "milwaukee public market"
Search for given names and surnames Edit
Instead of searching for the vlogs, search for the vloggers.
Searching for a common given name and an uncommon, long surname will yield the best results, if you want to filter out kids and teenagers. Compare:
- kate intitle:vlog - common name;
- "kate thompson" - common name, common surname;
- "kate musselwhite" - common name, uncommon surname.
- Wikipedia has lists of the most common given names for almost every country.
- For random surnames, use LinkedIn. So google "乃"+japan+site:www.linkedin.com "乃" japan site:www.linkedin.com if you want the surnames of all Japanese people on LinkedIn whose given names are 乃. "ashley"+united+states+site:www.linkedin.com "ashley" united states site:www.linkedin.com would be another example.
Search script Edit
The script shown in this video, which can be downloaded here, automatically displays YouTube search results as an array of thumbnails, downloads YouTube links from a list, and then cuts and reuploads the parts you want to ask other people for transcriptions.
If you want to search for videos that were recorded by adults, who are more likely to upload videos that will help you to enrich your vocabulary, put only long surnames in your "surname" file. As a rule of thumb, for Western languages, they should have no less than eight characters.
Try to eliminate surnames that are easy to read or to pronounce, like "Johnson", and keep surnames like "McDowell" or "Bradshaw". Also eliminate those surnames that were borrowed from other languages, as they'll probably lead you to videos in languages other than the one you're looking for.
If you want to search for "Kate+Musselwhite" then you must write "Kate on the file given and +Musselwhite" on the file surname, or else the script will search for KateMusselwhite.
If you're searching for Japanese names, then you'll have to write the surnames in the given file and vice-versa.
Remember this is still Youtube search, so writing things like either intitle:vlog+"Anthony for given names or +Richert"+intitle:vlog for surnames works just as planned.
Getting names and surnames Edit
This script makes it easier to get many names and surnames from LinkedIn at once.
These images show:
1) the "given" and "surname" input. Both files were generated by the LinkedIn script. The surname file was cleaned up on vim by typing "dd" to delete the undesired surnames.
2) The output. Here are some of the videos that I found in the output, in this file alone:
Tutorials - 
Skits - 
Some of these videos have titles such as "Video 10" or "2013 11 27 16 05 30" so they'd be hard if not impossible to find on YouTube through standard search methods. Videos taken from movies and TV or featuring kids or teenagers were deliberately avoided during the compilation of this link list.
Lists of YouTube videos tagged for language learners Edit
This is a list of lists of YouTube videos tagged for language learners. Please specify your username and provide a direct link to the .txt , .html or .zip file, so that it remains possible to download everything with a single click ( through browser extensions ). You may also need to post a file explaining your tags. If you wish so, you can simply copy the tagging guidelines of the creator of this article ( Iamrcr ).
TO-DO list Edit
- Some people have pointed out that most of these videos are not appropriate for beginners or intermediate learners. Then I came up with the idea of making subtitles for these videos. Make sure you follow the syntax for srt files, so that people can use your subtitles any way they want to ( as for sub2srs ).
- Some real hacking. Use OpenCV to count the number of people in every thumbnail, tell their age and gender, and put this data right below each one.
- Download a few seconds in the middle of every video ( with ffmpeg and youtube-dl ). Then run pyAudioAnalysis to remove all music videos from the thumbnails.
- Rewrite these shell scripts in a real programming language and maybe even create a website that runs them.
/r/langcondir - A subreddit where you can post any conversation between native speakers that you may find through the methods outlined in this article. All languages allowed.