"Machine Learning for Metadata Explained" in Humanizing Tech
And why it’s going to change how you watch videos forever
I. Definition of Metadata
At some point in the future you’re going to come across the word “metadata”. It helps if you understand what it means and why it’s important. It will impact the apps, software, virtual reality, TV and movies you watch in the future.
Zuckerberg has said Facebook is moving from a mobile-first strategy to a video-first strategy.
That’s because you can create anything you want inside a video. Instead of hand-coding some funky website experience, all you really have to do is create a video instead and make it full bleed on a web page. And yes, you can add interactivity, but I digress.
So what is Metadata?
First of all, it’s data. That’s why that word is in there. And data just means stuff. Information. For example, say you’re 25 years old. So if we wanted to store that information, the number 25 would be the data and represent your age. OK.
The next one is trickier. The word meta means other information about that data. Lets use an example. Imagine you send a tweet saying, “The sun is shining today.” That sentence, that tweet, is your data. Now, you can attach meta data to that. Like the time it was tweeted, where you tweeted it from and what phone you used to do the tweeting. In this example:
- Data: “The sun is shining today.”
- Metadata timestamp: 8:05am Friday August 5, 2016
- Metadata location: Laguna Beach, CA
- Metadata device: iPhone 6S
That’s pretty easy to understand, right? Ok, lets amp it up a notch and consider what metadata exists with video. Because it’s not text. It’s not an image. It’s a bunch of images coming at you really fast. With sound. So to understand it, we can use another example to help describe video metadata.
- Data: Star Wars
- Metadata year: 1977
- Metadata length: 2 hours 1 minute
- Metadata director: George Lucas
- Metadata actors: Mark Hamill, Harrison Ford, Carrie Fisher
- Metadata awards: Oscar for Best Effects, Oscar for Best Sound, etc
That’s metadata for movies. You could do the same thing for TV shows and user generated content in social media apps. But you might be noticing a problem. In every case it describes the data (e.g., a tweet, the movie Star Wars), but it doesn’t describe what’s going on inside the data.
Sure, we’ve got a Metadata Synopsis but that’s only a paragraph. What about the character’s feelings, a description of the different scenes and what’s happening in them?
Up the complexity a bit and think about the live Olympic games. What about all the metadata things that are happening by numerous athlets in real-time. Unless you have someone sitting in front of a computer and manually entering this stuff into an excel file, this very valuable data is lost forever, or never exists in the first place.
Could you imagine what could be done if there was some way to automatically extract this data from these movies, TV shows, your social media posts, and live events?
- Notifying you when something you care about is about to, or just, happened.
- Making it easier to search for and find something interesting to watch instead of scrolling for years through the never-ending Netflix library.
It doesn’t exist today because the technology wasn’t ready. But now it is.
II. Machine Learning to the Rescue
Using basic machine learning techniques we can extract information from any video. After all, videos are nothing more than a series of images and an audio track.
We can use Natural Language Processing on the audio to transcribe the words that people are saying. There’s plenty you can find on the internet about this topic or just use Google’s machine learning APIs. We then pass that to a segment analyzer to understand not only when topics of conversation change, but also when entire scenes change. For the latter, we work in combination with the image analysis engine.
Taken together, we can develop metatags on soft topics and create automatic hashtags or keywords to describe the content. We can go a step further to create segment boundries as described above using some proprietary methods, while automatically assigning the video a title, thumbnail, and main topics per segment. This can then be used for navigating catch-up TV or even view personalized news segments.
We can do this in real-time to create captions as a live streaming event is happening. Imagine CNN Go in a box. Then, once we get further data on viewing habits we can begin to create trending segments and recommend those to other viewers who might like similar shows, programs, or movies.
A few compelling approaches include: automatic micro-genre creation much like Netflix did by hand (whoopsies), facial recognition for automatic metadata matching to actors, actresses, and even your friends. This can come complete with timestamps so you can quickly jump to that time Brad Pitt and George Clooney stole a bunch of loot in Vegas.
Finally, there’s always the eCommerce aspect, where we can identify products in a show automatically so viewers who want to buy them, can. Most people think of the buying process as existing while the show is happening, but you can wait until the show’s over. Instead of a title card or “watch this next”, show a 15 second ecommerce store with all the products shown in that TV episode.
It will come to Apple TV sooner than you think.
And because I’m a student of Steve Jobs, let me give you one more thing. With machine learning and some of the software already developed we can very accurately identify personality traits of the characters in these shows. Things like openness, agreeableness, extraversion, conscientiousness, and emotional range. Imagine landing your next Hollywood gig based on how well a machine described your emotional range.
Now that’s science fiction.
III. White Paper on Video Meta Data
Below is a white paper that my lovely colleagues at Piksel put together showcasing the depth of what machine learning for metadata can do with the video content you create. Heck, even if you pay for content and just monetize it for a different market, it can still benefit you.
I’ve uploaded it below so you don’t have to fill out the contact card on Piksel.com but if you have questions feel free to message me directly.
IV. What This All means
As we move to a more social world, the amount of videos captured and shared is going to increase exponentially. From traditional 2D video to 3D, virtual reality, or augmented overlay videos, being able to find something to watch is going to be just as hard as creating something worth watching.
Below is a demo video of how our artificial intelligence PhDs use it to help people find, watch, and share some of their favorite videos on Apple TV. The future of video is already here.
from Sean Everett on Medium http://ift.tt/2aWrntx