Deep neural networks can now transfer the style of one photo onto another: The challenges and opport
- gumatulwatchba
- Aug 18, 2023
- 6 min read
One of the most interesting discussions today around within machine learning is how it might impact and shape our cultural and artistic production in the next decades. Central to this discussion is the recent advances in image style transfer using deep learning. The main idea behind style transfer is to take two images, say, a photo of a person, and a painting, and use these to create a third image that combines the content of the former with the style of the later. The figure below shows an example using one photo of the author and the famous painting "The Scream" by Edvard Munch.
As we can imagine, this technique has tremendous potential -- it gives anyone the power to produce beautiful artwork, inspired by their favorite paintings, textures and etc. The technology that generates these images is also quite remarkable from a technology perspective and is worth understanding. In this article we will do a deep dive into how style transfer works and then use the power of deep learning style transfer to generate our own images. More specifically we will cover:
Deep neural networks can now transfer the style of one photo onto another
According to the academic literature, image style transfer is defined as follow: given two images on the input, synthesize a third image that has the semantic content of the first image and the texture/style of the second. To work properly we need a way to (1) determine the content and the style of any image (content/style extractor) and then (2) merge some arbitrary content with another arbitrary style (merger).
This definition of style transfer might seem a bit imprecise. Put another way, the central problem of style transfer revolves around our ability to come up with a clear way of computing the "content" of an image as distinct from computing the "style" of an image. Before deep learning arrived at the scene, researchers had been handcrafting methods to extract the content and texture of images, merge them and see if the results were interesting or garbage. Even in today's research of style transfer using deep learning there are high impact papers proposing new ways of using a neural network to extract the content, extract style or combine them.
This intuition developed above and the workflow explained at the beginning of the section are everything you need to know about transfer style to get started. Congratulations! You are one step closer of starting your own avant garde movement using computers as your brush. Now that we already know what transfer style is all about and that it has been researched before the dawn of deep learning, it is important to ask ourselves: why this subject got so much attention lately, when deep learning appeared in the scene? The two main reasons are that, before deep learning, texture transfer methods had at least one of these two shortcomings:
So what can deep learning offer us to improve style transfer? It turns out that deep learning techniques area really good at automated the extraction of high-level "features". Let's draw a parallel between what happened in the ML community and what happened in the texture transfer community. Imagine this scenario: people all around the world are training their ML algorithms without deep neural nets. Handcrafted feature engineering played a big role in the final performance of the algorithm. Then, deep learning arrives and suddenly you don't need the extract features anymore. The hidden layers of the neural net does all of the work for you and the results are better than ever!
Now, let's look at the texture transfer community. People were handcrafting ways of extracting content and texture of images to get better results. Then, one group of researchers led by Gatys asks the question: Why don't we let a deep neural network do the job for us, like the ML community did? So they tested this hypothesis and wrote an article called A Neural Algorithm of Artistic Style. The rest, as they say is history.
And that's it! You are not equipped to understand the amazing ideas that were responsible for the birth of deep learning style transfer. There are just a couple of details that you should also know. The method gives you freedom to choose the hidden layer(s) you want to use as extractors. I would recommend the use of the layers suggested by the researchers, because they probably test a lot of options before writing the article. Furthermore, in the optimization problem you can set the relative weight between the content and the style. If you give more importance to the content, the final result will have more of the semantics of content_image at the cost of less texture of style_image. If you give more importance to the style, the contrary will happen.
The best way to learn, of course, is to run these programs yourself so I recommend playing with the INetwork program and test some of its parameters. There are also other great open source style transfer projects available at GitHub, both in Python and Torch (Lua). To mention just a few: neural-doodle by alexjc (Python) and fast-neural-style by jcjohnson (Lua).
To ensure UI responsiveness and fluidity while deep neural networks run in background, we split GPU work items for each layer of the network until each individual time is less than a millisecond. This allows the driver to switch contexts to higher priority tasks in a timely manner, such as UI animations, thus reducing and sometimes eliminating frame drop.
Combined, all these strategies ensure that our users can enjoy local, low-latency, private deep learning inference without being aware that their phone is running neural networks at several hundreds of gigaflops per second.
This tutorial uses deep learning to compose one image in the style of another image (ever wish you could paint like Picasso or Van Gogh?). This is known as neural style transfer and the technique is outlined in A Neural Algorithm of Artistic Style (Gatys et al.).
In this section, we will leverage layerwise representations of a CNN toautomatically apply the style of one image to another image, i.e.,style transfer (Gatys et al., 2016). This task needs twoinput images: one is the content image and the other is the styleimage. We will use neural networks to modify the content image to makeit close to the style image in style. For example, the content image inFig. 14.12.1 is a landscape photo taken by us in MountRainier National Park in the suburbs of Seattle, while the style imageis an oil painting with the theme of autumn oak trees. In the outputsynthesized image, the oil brush strokes of the style image are applied,leading to more vivid colors, while preserving the main shape of theobjects in the content image.
Fig. 14.12.2 illustrates the CNN-based styletransfer method with a simplified example. First, we initialize thesynthesized image, for example, into the content image. This synthesizedimage is the only variable that needs to be updated during the styletransfer process, i.e., the model parameters to be updated duringtraining. Then we choose a pretrained CNN to extract image features andfreeze its model parameters during training. This deep CNN uses multiplelayers to extract hierarchical features for images. We can choose theoutput of some of these layers as content features or style features.Take Fig. 14.12.2 as an example. The pretrainedneural network here has 3 convolutional layers, where the second layeroutputs the content features, and the first and third layers output thestyle features.
Neural networks involve a trial-and-error process, so they need massive amounts of data on which to train. It's no coincidence neural networks became popular only after most enterprises embraced big data analytics and accumulated large stores of data. Because the model's first few iterations involve somewhat educated guesses on the contents of an image or parts of speech, the data used during the training stage must be labeled so the model can see if its guess was accurate. This means, though many enterprises that use big data have large amounts of data, unstructured data is less helpful. Unstructured data can only be analyzed by a deep learning model once it has been trained and reaches an acceptable level of accuracy, but deep learning models can't train on unstructured data.
Deep Learning is a subset of machine learning where artificial neural networks, algorithms based on the structure and functioning of the human brain, learn from large amounts of data to create patterns for decision-making. Neural networks with various (deep) layers enable learning through performing tasks repeatedly and tweaking them a little to improve the outcome.
1. Build and train deep neural networks, implement vectorized neural networks, identify architecture parameters, and apply DL to your applications.2. Use best practices to train and develop test sets and analyze bias/variance for building DL applications, use standard NN techniques, apply optimization algorithms, and implement a neural network in TensorFlow3. Use strategies for reducing errors in ML systems, understand complex ML settings, and apply end-to-end, transfer, and multi-task learning4. Build a Convolutional Neural Network, apply it to visual detection and recognition tasks, use neural style transfer to generate art, and apply these algorithms to image, video, and other 2D/3D data5. Build and train Recurrent Neural Networks and its variants (GRUs, LSTMs), apply RNNs to character-level language modeling, work with NLP and Word Embeddings, and use HuggingFace tokenizers and transformers to perform Named Entity Recognition and Question Answering
Neural style transfer is an optimization technique that takes as input a content and a style image, and optimizes a target image to resemble the contents of the content image and style of the style image. The key finding of this research paper by Gatys was that the content and style features of an image can be separated using deep neural networks. 2ff7e9595c
Comments