We’re happy to present a new editor today, Monaco. The Monaco Editor is the code editor that powers VS Code and it is a browser-based version of the VS Code editor. Now let’s dive into the details!
In addition to the quality and typing improvements, the main focus
]]>We’re happy to present a new editor today, Monaco. The Monaco Editor is the code editor that powers VS Code and it is a browser-based version of the VS Code editor. Now let’s dive into the details!
In addition to the quality and typing improvements, the main focus for the editor has been on:
History of changes with checkpoints and side by side live comparison
Error highlighting and linking
Variable Viewer
New markdown editor
It’s high time to try the new editor now!
We’ve updated our Terms of Service. You can read the full Terms of Service by signing in your account.
As always, we appreciate your feedback on our forum or directly in Datalore via the feedback form.
For the latest news about Datalore, follow us on Twitter.
]]>We're excited to announce a new feature for publishing your notebooks. We’ve also added support for the IPython kernel.
In Datalore you can share your notebook and get a link in seconds. Your work should be as easy to share as a web page.
To share, simply click Publish in the Datalore toolbar, or visit view.datalore.io and upload your workbook manually.
Alternatively, you can work with your Jupyter notebooks in PyCharm and give your teammates real-time access to your notebooks through the bundled Datalore plugin. Learn more in the PyCharm documentation.
Check out all of these features online at datalore.io and let us know what you think!
We’re looking for better ways to do data science and would like to know your thoughts about Datalore so we can make the product better for you.
Please provide us with your valuable feedback by taking this survey, which should take around 5-7 minutes of your time.
You will have a chance of winning one of five $50 Amazon certificates. Winners will be picked at random from among the surveys that are filled out completely with meaningful answers.
]]>We are happy to introduce our new features and enhancements to give you an even better Datalore experience. Batch reverse geocoding, Geodesic lines, new layout for intentions, and import/export of *.ipynb
notebooks.
Easily transfer .ipynb
projects to Datalore! Import workbooks from a URL and export them
We are happy to introduce our new features and enhancements to give you an even better Datalore experience. Batch reverse geocoding, Geodesic lines, new layout for intentions, and import/export of *.ipynb
notebooks.
Easily transfer .ipynb
projects to Datalore! Import workbooks from a URL and export them as well. Please note that your .ipynb
notebook’s version needs to be in the 3.x or 4.x nbformat.
Preprocessing intentions now apply to split variables of a dataset.
Some preprocessing intentions are more easily available (on the main panel).
In addition, you can now evaluate multiple models for the same dataset.
Before, it was necessary to set plot aesthetics aes
for all coordinates to draw geocoded geometries. Now, you can have it done automatically and make your code cleaner.
You can now reverse geocode multiple point locations (latitude and longitude) to readable geographical names. For example, if your data contains the coordinates of some objects, you can get the names of the cities where they are located.
A geodesic line is the shortest path between two points on a curved surface, such as the Earth. Now you can choose how to connect points on the map: either as geodesic lines or as straight lines.
Geocoding is used to get the coordinates of map objects. Now you can process complex strings in batch mode in Datalore. For example, you can display your data on the map if it contains a column with city names in a format like city, state
, state, country
, or something similar.
Check out all of these features online at datalore.io and let us know what you think!
Your feedback is always welcome in Twitter, the forum, or directly at contact@datalore.io.
Datalore Team
]]>We’re thrilled to introduce you to Datalore, an intelligent web application for data analysis and visualization in Python, as it officially reaches version 1.0! Ever since the public beta release last February, we’ve been working hard to implement a lot of new ideas based on your suggestions.
]]>We’re thrilled to introduce you to Datalore, an intelligent web application for data analysis and visualization in Python, as it officially reaches version 1.0! Ever since the public beta release last February, we’ve been working hard to implement a lot of new ideas based on your suggestions.
As strong believers in enjoyable coding, we have introduced and optimized these options to help analysts stay in their flow while exploring data.The beta version provided core application features: a smart code editor with context-dependent suggestions, incremental recalculations, and built-in tools and libraries for machine learning. But we challenged ourselves to keep improving the tool.
Today, we’re officially releasing Datalore 1.0 and introducing three major updates. First, you have a choice between on-the-go and user-controlled code execution, with the latter enabling you to complete major code edits and run only the updated blocks. Second, the editor layout may be rearranged both vertically and horizontally, depending on whether you want to see the results below the code or if you want to put the code and results in blocks side by side. And finally, we’re introducing the upgraded Professional plan for an enhanced exploration experience.
Originally, all code was executed as is, which led to inconsistencies and delays in the editor and sometimes overcharging. From now on, there are two possible ways to run the code in the application. You can put the workbook computation on hold to complete major code edits, and run only the computations you want to check right away. Alternatively, you can keep using online code execution that automatically runs calculations and applies changes in the code.
In both cases, the incremental recalculation takes care of workbook consistency. The output reflects your changes, with blocks dependent on new edits recalculated either after you choose to run the updated code or instantly during online code execution.
We’re providing two options for the input and output layout. In the sequential view, each output appears right below its input. In the classical split view, input and output blocks are positioned side by side. Switch to either depending on how you prefer to monitor calculation results. Moreover, in the sequential view, you can collapse all inputs and outputs. This provides you with a code-only editor – or shows a clean output page with graphs and Markdown comments.
Now you can get access to extended data storage and high-performance computational resources with the Professional plan subscription. For complex projects, consider using 50 GB of space for data storage and up to 10 computations running simultaneously. The Professional plan also provides you with flexible access to medium, large, and extra large computational instances (including XL GPU instances).
Of course, you are always welcome to keep using Datalore with the free Community plan, which offers 10 GB of upload space and up to 3 computations running in parallel.
Go to datalore.io and try the application! And tell us what you think about our 1.0 version. We are excited to hear your feedback via the Datalore forum – it's the quickest way to share your opinion with the team. Whether you love our updated features, have encountered bugs, or want to share some suggestions – contact us. You can also get in touch with us via the in-app “Send Feedback” button and our Twitter.
Datalore Team
]]>Our May digest covers recent news, such as the implementation of GDPR and what it means for the machine learning, and various articles and tutorials published in the last month. Learn more about data preprocessing, establishing fairness in ML models, and topic modeling and enjoy research highlights: from GANs to
]]>Our May digest covers recent news, such as the implementation of GDPR and what it means for the machine learning, and various articles and tutorials published in the last month. Learn more about data preprocessing, establishing fairness in ML models, and topic modeling and enjoy research highlights: from GANs to reinforcement learning.
Much has been made about the potential impact of the EU’s General Data Protection Regulation (GDPR) on data science programs. But there’s perhaps no more important—or uncertain—question than how the regulation will impact machine learning (ML) and enterprise data science. This article aims to demystify this intersection between ML and the GDPR, focusing on the three big questions: Does the GDPR prohibit machine learning? Is there a “right to explainability” from ML? Do data subjects have the ability to demand that models be retrained without their data?
The highlight of Google’s I/O keynote earlier this month was the reveal of Duplex, a system that can make calls to set up a salon appointment or a restaurant reservation for you by calling those places, chatting with a human and getting the job done. That demo drew lots of laughs at the keynote, but after the dust settled, plenty of ethical questions popped up because of how Duplex tries to fake being human. Here's the brief overview of the Google Duplex and here's a deeper inquiry into Duplex' ethics.
Dirty datasets for data preprocessing practice Looking for datasets to practice data cleaning or preprocessing on? Look no further! Each of these datasets needs a little clean-up before it’s ready for different analysis techniques. For each dataset, there's a link to where you can access it, a brief description of what’s in it, and an “issues” section describing what needs to be done or fixed in order for it to fit easily into a data analysis pipeline.
Improve your training data There is a difference between deep learning research and production, and the difference is often in how many resources are spent on improving models versus preprocessing datasets. There are lots of good reasons why researchers are so fixated on model architectures, but it does mean that there are very few resources available to guide people who are focused on deploying machine learning in production. To address that, Pete Warden' talk at the Train AI conference was on “the unreasonable effectiveness of training data”, and in this blog post he expanded on the topic, explaining why data is so important along with some practical tips on improving it.
Why machine learning is hard. While various online cources and manuals made machine learning widely accessible for anyone, the machine learning remains quite a hard area, and not only because of the math involved. Read this essay to see what makes ML problems tough.
Topic modelling. The process of learning, recognizing, and extracting topics across a collection of documents is called topic modeling - one of the most useful ways to understand text in documents. In a comprehensive overview, authors explore topic modeling and its associated techniques through 4 of the most popular techniques today: LSA, pLSA, LDA, and the deep learning-based lda2vec.
Fairness in ML with PyTorch Generative adversarial networks come to save the day when you need to ensure fairness in the predictions you model makes. Just add the adversary module that’ll try to predict whether the classifier unit is unfair to some sensitive data (like gander or race) & let adversary and classification units play a zero-sum game where the classifier has to make good predictions but is being penalized if the adversary detects unfair decisions. The end-result of this game is, hopefully, a fair classifier that is also good at predicting. See the overview of this approach and check the PyTorch guide.
Authors introduce Primal-Dual Wasserstein Generative Adversarial Network, a new learning algorithm for building latent variable models of the data distribution based on the primal and the dual formulations of the optimal transport problem. In order to learn the generative model, the model uses the dual formulation and the decoder trains adversarially through a critic network regularized by the approximate coupling obtained from the primal. To avoid violation of various properties of the optimal critic, authors regularize norm and direction of the gradients of the critic function. As a result, Primal-Dual Wasserstein GAN utilizes benefits of auto-encoding models in terms of mode coverage and latent structure while avoiding their undesirable averaging properties like the inability to capture sharp visual features when modeling real images.
A new model, Self-Attention Generative Adversarial Network (SAGAN), allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other.
In reinforcement learning, there’s no teacher available to estimate the value function as in supervised learning. The only option available is a proxy for the value function - usually a sampled and bootstrapped approximation to the true value function, known as a return. In the recent article, authors propose a gradient-based meta-learning algorithm that is able to adapt the nature of the return online, whilst interacting and learning from the environment. Such online approach enabled a new state-of-the-art performance when applied to 57 games on the Atari 2600 environment over 200 million frames.
A rigorous study of the sample complexity required to properly train convolutional neural networks (CNNs) follows a widespread assumption that CNN is a more compact representation than the fully connected neural network (FNN) and thus requires fewer samples for learning. Concentrating on sizes of the input and convolutional layers, authors calculate the sample complexity of achieving population prediction error for both CNN and FNN - and proceed with calculating the sample complexity of training a one-hidden-layer CNN with linear activation with unknown weights of convolutional and output layers with preset sizes. They figure the sample complexity as a function of sizes of convolutional and output layers and prediction error - and believe these tools may inspire further developments in understanding CNN.
One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima. In the recent paper, authors study the landscape of neural networks for binary classification tasks. Under mild assumptions, they prove that after adding one special neuron with a skip connection to the output, or one special neuron per layer, every local minimum is a global minimum.
]]>Datalore's built-in tools are great for interactive plots and charts - but our visualization libraries are capable of so much more. With just a handful of Python lines, you can create captivating images of fractals.
Visualizations of the set of complex numbers known as the Mandelbrot set result in intricate
]]>Datalore's built-in tools are great for interactive plots and charts - but our visualization libraries are capable of so much more. With just a handful of Python lines, you can create captivating images of fractals.
Visualizations of the set of complex numbers known as the Mandelbrot set result in intricate fractal-like images which drew attention to the Mandelbrot set outside of mathematics. Inspired by Syntopia's series of posts, today we're plotting the Mandelbrot set in 2D and 3D using the datalore.plot
library (which implements the well-known R library ggplot
).
You can play with zooming and constructing fractal sets in this Datalore workbook. All Mandelbrot visualizations in Datalore start with a single function - plot_mandelbrot_set
.
Let's start with the definition. Consider a complex number $c\in\mathbb{C}$ and the following recursive formula: $\displaystyle z_{n+1}=z_n^2 + c$, where $z_0=0$. If $|z_n|<\infty$ as $n\rightarrow\infty$, then $c$ belongs to a Mandelbrot set $\mathcal{M}$.
Note that you can consider $z_n$ as a function $f(n, c)$ such that $f(n+1, c) = f^2(n, c) + c$. In practice we'll assign point $c$ to $\mathcal{M}$ if $|z_n|>H$, where $H$ is some arbitrary value (greater than 2) called horizon value.
One can get a monochrome picture of the Mandelbrot set by just checking whether or not the points belong to $\mathcal{M}$, but you might remember colorful zoomings of this famous fractal set.
One of the ways to assigns color to an exterior point while plotting a Mandelbrot set is to use escape time values - the number of steps needed for a length $r_n=|z_n|$ to achieve horizon level $H$.
We can perform the assignment process with the help of the following function:
def mandelbrot(c, threshold=2, num_iter=32):
z = c
for i in range(num_iter):
if np.abs(z) > threshold:
return i
z = z*z + c
return 0
Here we assume that $c$ is a complex number:
c = complex(x, y)
Let's plot a monochrome Mandelbrot set and the result of applying a Laplace operator to an image $u(x,y)$ of the Mandelbrot set.
Laplace operator:
$$\displaystyle\Delta u=\frac{\partial^2 u}{\partial x^2}+\frac{\partial^2 u}{\partial y^2}$$
Finite-difference version of Laplace operator:
$$\displaystyle\Delta u\simeq \frac{u(x+h,y)+u(x-h,y)+u(x,y-h)+u(x,y+h)-4u(x,y)}{h^2}$$
We can see that the color changes stepwise on our picture. To smooth the transition between color segments, we will apply anti-aliasing - a renormalization technique for image rendering.
Note how that integer-valued escape time in the above function depends on the horizon value (threshold argument of our function). Our goal is to get real-valued escape time which will not depend on actual horizon value.
Define the function $\displaystyle f(x)=\frac{\log\log(x)}{\log 2}$ and let $t$ be the escape time and $H$ the horizon value. Then the renormalization we are looking for takes the form:
$$t - f(|z|) + f(H)$$
With this renormalization, we obtain the following Mandelbrot image.
You might have noticed that we have a few step-wise color changes close to the Mandelbrot set itself. One way to mitigate this problem is to perform color equalization with the help of the power function:
$$\left(t - f(|z|) + f(H)\right)^w$$
for some $w < 1$. The following function was used to produce the image of the Mandelbrot set:
def mandelbrot_set(x_min, x_max, y_min, y_max, width=270, height=270, threshold=2**40, num_iter=30, power=0.2):
xs = np.linspace(x_min, x_max, width)
ys = np.linspace(y_min, y_max, height)
img = np.array([mandelbrot(complex(x, y), threshold, num_iter)**power for y in ys for x in xs])
return img
To add colors, assign color intensities to the RGB channels of your image:
def plot_colored_image(image, rgb, shape, x_size, y_size, title):
width, height = shape
colored_image = np.zeros((width, height, 3))
red, green, blue = rgb
colored_image[:,:, 0] = image * red
colored_image[:,:, 1] = image * green
colored_image[:,:, 2] = image * blue
plot = ggplot() + geom_image(colored_image) + ggsize(x_size, y_size) + ggtitle(title)
return plot
A simple helper function will recalculate the image frame for the selected center and zoom of the picture:
def get_boundaries(center, span, zoom):
return center - span/2.**zoom, center + span/2.**zoom
There is no strict analogy for complex numbers in 3D space, so there are a number of ways to determine 3D fractals. Some of them use quaternions.
Consider 3D polar coordinates:
$
$\begin{cases}
r = \sqrt{x^2+y^2+z^2}\\\theta = \arctan\left(\frac{\sqrt{x^2+y^2}}{z}\right)\\\phi=\arctan\left(\frac{y}{x}\right)
\end{cases}$
$
that transfer to the Descartes coordinates in the following manner:
$
$\begin{cases}
x = r\sin\theta\cos\phi\\y = r\sin\theta\sin\phi\\z = r\cos\theta
\end{cases}$
$
Given $v = (x,y,z)$, we can define the $n$-th power of this triple:
$$(x,y,z)^n=r^n\left(\sin n\theta\cos n\phi, \sin n\theta \sin n \phi, \cos n\theta\right)$$
Now we can define the recurrent formula for the 3D Mandelbrot set (called a Mandelbulb) as $\displaystyle v_{n+1}=v_n^p + c$, where $c=(x,y,z)$. A typical choice for the power $p$ is 8.
Our goal is to plot a Mandelbulb. Suppose the observer position is defined by a point $Q=(a,b,c)\in\mathbb{R}^3$.
Consider a plane $\Omega$, passing through the origin with a normal vector passing through $Q$: $\displaystyle a\cdot x + b\cdot y + c \cdot z=0$
Suppose $c\neq 0$, then fixing $x$ and $y$ one can derive $z$: $\displaystyle z =-\frac{ax+by}{c}$
The following function will return points belonging to the plane $\Omega$:
def get_plane_points(Q, center, span, zoom, width, height, eps=1e-4):
x_min, x_max = get_boundaries(center[0], span[0], zoom)
y_min, y_max = get_boundaries(center[1], span[1], zoom)
a, b , c = Q
x = np.linspace(x_min, x_max, width)
y = np.linspace(y_min, y_max, height)
x, y = np.meshgrid(x, y)
x, y = x.reshape(-1), y.reshape(-1)
if np.abs(c) > eps:
z = -(a*x + b*y)/c
P = np.vstack((x, y, z)).T
elif np.abs(a) > eps:
z = -(c*x + b*y)/a
P = np.vstack((z, y, x)).T
elif np.abs(b) > eps:
z = -(a*x + c*y)/b
P = np.vstack((x, z, y)).T
return P
Starting from the point $Q$ we are going to move along rays in the direction of points $P\in\Omega$. In each new position, we need to know the distance to the closest point belonging to the Mandelbulb $\mathcal{M}$.
Suppose we know how to estimate this distance $r$. This means that we can move one more step in this direction with a step of size $r$. We stop moving in a particular direction if the current distance estimate is below the minimal distance defined by a user.
Here's the function for such an algorithm:
def trace(start, directions, max_steps, min_distance, iterations, degree, bailout, power):
total_distance = np.zeros(directions.shape[0])
keep_iterations = np.ones_like(total_distance)
steps = np.zeros_like(total_distance)
for _ in range(max_steps):
positions = start[np.newaxis, :] + total_distance[:, np.newaxis] * directions
distance = DistanceEstimator(positions, iterations, degree, bailout)
keep_iterations[distance < min_distance] = 0
total_distance += distance * keep_iterations
steps += keep_iterations
return 1 - (steps/max_steps)**power
Consider a point $P\in\Omega$ and a start point $Q$. Define direction vector as a unit vector pointing from $Q$ to $P$.
Note that we can vectorize this function:
def get_directions(P, Q):
v = np.array(P - Q)
v = v/np.linalg.norm(v, axis=1)[:, np.newaxis]
return v
It was shown that the distance to the closest point can be estimated via $$r = \lim_{n\rightarrow\infty}\frac{|v_n|\log |v_n|}{|v_n^{\prime}|}$$
This means that we have to keep track of distances and their derivatives in our algorithm. Denote $r_n=|v_n|$ and $dr_n=|v_n^{\prime}|$.
The most wonderful thing is that we can just use the scalar version of the formula for the derivative update:
$$dr_n=p\cdot r_{n-1}^{p-1}dr_{n-1} + 1$$
The following function summarizes this algorithm:
@jit
def DistanceEstimator(positions, iterations, degree=8, bailout=1000):
m = positions.shape[0]
x, y, z = np.zeros(m), np.zeros(m), np.zeros(m)
x0, y0, z0 = positions[:, 0], positions[:, 1], positions[:, 2]
dr = np.zeros(m) + 1
r = np.zeros(m)
theta = np.zeros(m)
phi = np.zeros(m)
zr = np.zeros(m)
for _ in range(iterations):
r = np.sqrt(x*x + y*y + z*z)
idx1 = r < bailout
dr[idx1] = np.power(r[idx1], degree - 1) * degree * dr[idx1] + 1.0
theta[idx1] = np.arctan2(np.sqrt(x[idx1]*x[idx1] + y[idx1]*y[idx1]), z[idx1])
phi[idx1] = np.arctan2(y[idx1], x[idx1])
zr[idx1] = r[idx1] ** degree
theta[idx1] = theta[idx1] * degree
phi[idx1] = phi[idx1] * degree
x[idx1] = zr[idx1] * np.sin(theta[idx1]) * np.cos(phi[idx1]) + x0[idx1]
y[idx1] = zr[idx1] * np.sin(theta[idx1]) * np.sin(phi[idx1]) + y0[idx1]
z[idx1] = zr[idx1] * np.cos(theta[idx1]) + z0[idx1]
return 0.5 * np.log(r) * r / dr
Let's plot a classic Mandelbulb with $p=8$ and zoom in on the resulting image.
We can take other values for $p$ and see how the resulting pictures differ.
You can play with zooming and constructing fractal sets in this Datalore workbook.
Header image source: A ray-traced image of the 3D Mandelbulb for the iteration v ↦ v8 + c, CC BY-SA 3.0
]]>We're getting back to you with our machine learning recommendations and news. This month we brought you a lot of reading, with both scholar and media articles: some research news, a couple of stories about natural language processing, and a selection of reviews on machine learning application in industry.
We're getting back to you with our machine learning recommendations and news. This month we brought you a lot of reading, with both scholar and media articles: some research news, a couple of stories about natural language processing, and a selection of reviews on machine learning application in industry.
European scientists created a model that detects forged videos. The model trained on the dataset of half a million images from fake videos produced with the face2face algorithm
A review of hyperparameters influence on Random Forest predictions with the outline of the model-based optimization strategy (MBO) for the most established fine-tuning
A Human Guided Data Exploration framework utilizes computationally efficient constrained randomization scheme for more human-like hypothesis search during exploratory data analysis
Google Brain Team introduces a method to search for neural networks optimizers using machine learning algorithms
A tutorial on Monte Carlo method for reinforcement learning in Python
XiaoIce, Microsoft’s social chatbot in China, implements "full duplex" (a term that refers to the ability to communicate in both directions simultaneously), which leads to a more natural conversation with the bot
Tacotron, the text-to-speech system from Google research team, incorporates prosody embeddings to make the resulting speech more human-like
A navigation simulator from DeepMind team that learns its surroundings with pixelated Google Street View images database - just like a person who relies on visual information to get familiar with the new environment
A minute-by-minute vizualization of typical American day that models a day as a time-varying Markov chain.
An article on how mapping apps algorithms make traffic jams worse when planning an optimal individual route
Airbnb's implementation of listing embeddings for similar listing recommendations and real-time personalization in search
While true.learn() - a simulator game of a machine learning specialist
Excellent April Fool's Day video from Google that introduced Google Wind in the Netherlands
Here is our choice of academic articles on deep learning published in February. This selection covers diverse topics like half-precision training (two different approaches to achieve 2x times faster deep learning training), style transfer (closed-form solution for the photorealistic style transfer with smoothing) and reinforcement learning (10x times more effective
]]>Here is our choice of academic articles on deep learning published in February. This selection covers diverse topics like half-precision training (two different approaches to achieve 2x times faster deep learning training), style transfer (closed-form solution for the photorealistic style transfer with smoothing) and reinforcement learning (10x times more effective than previous algorithms and above the human level).
It's a commonplace that deep learning benefits from increasing the size of the model and the amount of training data. For example, ResNet model with 20 layers has 0.27M of parameters and grows linearly with the number of layers. To train ResNet on ImageNet dataset, you will need 110 layers and 1.7M of parameters respectively. An inevitable consequence of this is that this increases the memory and computation requirements for the model training. The most straightforward way to mitigate the memory requirements is to use lower precision arithmetic.
Two new articles dedicated to this topic were published on arXiv in February: Mixed Precision Training and Mixed Precision Training of Convolutional Neural Networks using Integer Operations.
A new study propose three techinques for training with half precision FP16 and still matching the model accuracy of single precision:
Single-precision master copy of weights
In vanilla mixed precision training all weights, activations, and gradients are stored as FP16. In half precision arithmetic the values smaller than $2^{-24}$ become zero. The authors stress that approximately 5% of weight gradient values are zeroed in FP16 for this reason. To overcome this problem the FP32 master copy of weights is offered. In the proposed scheme we create the FP32 master copy of all weights and perform forward and backward
propagation in FP16 and then update weights stored in the master copy.
Storing an additional copy of weights increases the memory requirements compared to vanilla mixed precision training, but the impact on the total memory usage is not so significant: the overall memory consumptions is approximately halved.
Loss scaling
In FP16 arithmetic small weight updates become zero. To mitigate this problem the authors introduce a constant scaling loss factor ranging from 8 to 32K. In case of overflow (which can be detected by inspecting computed weight gradients) the authors offer to skip the weight update and just move to the next iteration.
Arithmetic precision
The neural network arithmetic operations can be divided into three groups:
The authors stress that to maintain model accuracy, some networks require that the FP16 vector dot-product accumulates the partial products into an FP32 value, which is then converted to FP16 before storing.
Large reductions, which come up in batch-normalization and softmax layers should be performed in FP32 and then stored back to FP16.
As far as point-wise operations are memory-bandwidth limited, the arithmetic precision does not affect the speed of these operations and either FP16 or FP32 arithmetic can be used.
Experiments and conclusions
The authors considered different image classification, speech recognition, machine translation and language modelling tasks and showed, that the single precision and half-precision arithmetic with offered techniques achieve comparable accuracies on a wide range of models. On some models trained with Volta GPU they report 2-6x speedups, but in general case the training time decrease depends on library and framework optimizations.
It's worth mentioning that this result was achieved by the research group from Baidu in collaboration with researchers from Nvidia Corp.
We put here some links on how to perform mixed precision training in PyTorch:
Summarizing: to perform computation in PyTorch with half-precision arithmetic on GPU:
first cast the model (and inputs) to FP16 model.cuda().half()
store master copy of model weights in FP32 and define the optimizer, which will update the master copy weights during training:
param_copy = [param.clone().type(torch.cuda.FloatTensor).detach() for param in model.parameters()]
for param in param_copy:
param.requires_grad = True
optimizer = torch.optim.SGD(param_copy)
for layer in model.modules():
if isinstance(layer, nn.BatchNorm2d):
layer.float()
loss = loss * scale_factor
model.zero_grad()
and loss.backward()
Although the state-of-the-art results in mixed precision training are mostly represented by approaches where FP16 arithmetic is used, the authors of this study offered a new mixed precision training setup which uses Dynamic Fixed Point (DFP) tensors represented by a combination of INT16 tensor and a shared tensor-wide exponent.
The authors defined DFP tensor primitives to facilitate arithmetic operations (summation and multiplication) which applied to two DFP-16 tensors results in one DFP-32 tensor and a new shared exponent and a down-conversion operation, which scales DFP-32 output to DFP-16 tensor.
The efficient implementation of DFP-16 tensor operations primitives are supported through the "prototype 16-bit integer kernels in Intel's MKL-DNN library along with explicit exponent management." The experiments are run on recently introduced Intel XeonPhi Knights-Mill hardware.
The authors stress that this approach doesn't require any hyperparameter tuning (which is necessary for FP16 mixed precision training) and they achieved results comparable to the state-of-the-art results reported for FP32 training with potential 2x savings in computation.
Deep convolutional neural networks are very effective for image recognition and classification tasks. CNN trained for image recognition learns internal representation of objects, which can be interpreted as content and style features. Content features are used to recognize objects during the classification task. In the article Image Style Transfer Using Convolutional Neural Networks it was shown, that correlations between features in deep layers of CNN encode the visual style of the image.
Suppose we have a content image $\mathbf{c}$ meaning that we want to take the content from this image and another style image $\mathbf{s}$, the style of which we are going to apply to the content $\mathbf{c}$ to produce a new image $\mathbf{x}$.
You can't excplicitly tag features as "content" or "style", but you can define loss functions in a way that will encourage transferring content from $\mathbf{c}$ and style from $\mathbf{s}$.
Define the feature map of the CNN layer $\ell$ as $F_{\ell}[\cdot]\in\mathbb{R}^{N_{\ell}\times D_{\ell}}$, where $N_{\ell}$ - number of filters and $D_{\ell}$ - size if the vectorized feature map on the layer $\ell$.
The common approach to deal with the content features is to use squared error loss:
$$\mathcal{L_c}^{\ell}(\mathbf{c}, \mathbf{x})=\frac{1}{2N_{\ell}D_{\ell}}\sum_{i,j}\left(F_{\ell}[\mathbf{x}]-F_{\ell}[\mathbf{c}]\right)_{i,j}^2$$
Define the Gram matrix $G_{\ell}[\cdot]=F_{\ell}[\cdot]F_{\ell}^T[\cdot]\in\mathbb{R}^{N_{\ell}\times N_{\ell}}$ each element $G_{i,j}^{\ell}$ of which is the inner product of the vectorized feature maps $i$ and $j$ in layer $\ell$. The Gram matrix represents the feature correlations which are used to address the style transfer problem. The loss function for the style transfer is defined as
$$\mathcal{L_s}^{\ell}(\mathbf{s}, \mathbf{x})=\frac{1}{2N_{\ell}^2}\sum_{i,j}\left(G_{\ell}[\mathbf{x}]-G_{\ell}[\mathbf{c}]\right)_{i,j}^2$$
The total loss for the style transfer problem is defined as the weighted sum of content and style losses:
$$\mathcal{L}(\mathbf{c}, \mathbf{s}, \mathbf{x})=\alpha \sum_{\ell}\mathcal{L_c}^{\ell}(\mathbf{c}, \mathbf{x})+\beta\sum_{\ell} w_{\ell}\mathcal{L_s}^{\ell}(\mathbf{s}, \mathbf{x}),$$
where $w_{\ell}$ - weighting factors regulating the contribution of each layer to the total loss.
To address the problem of artefacts on the generated photorealistic images another approach to the style transfer was offered in the article Universal Style Transfer via Feature Transforms: the authors formulate the transfer task as image reconstruction process and apply classic signal whitening and coloring transforms (WCT) to the features extracted in each intermediate layer.
Whitening transform is a decorrelation operation: consider the column vector $\mathbf{x}$ with zero mean and non-singular covariance matrix $C$, then $\mathbf{y}=W\mathbf{x}$, where $W^TW=C^{-1}$ is whitened vector with unit diagonal covariance matrix.
For the style transfer purpose the WCT was implemented as an autoencoder on each of the intermediate layers of the CNN. This approach was shown to be less prone to artifacts when applied to photorealistic images. Nevertheless this approach still generates artifacts causing inconsistent stylization. Another drawback is that training of this model is computationally challenging.
One more step towards photorealistic style transfer has been offered in the recent article, published in February.
In this paper, authors propose a novel fast photorealistic image style transfer algorithm consisting of two steps: stylization and smoothing.
For both of these steps the closed-form solutions are provided.
The stylization step is based on the improved version of the autoencoder performing whitening and coloring transform (WCT) algorithm and is referred to as the PhotoWCT step.
The problem with the WCT stylized images was that repeating semantically similar patterns can be stylized differently. To address this problem the smoothing step was introduced, pursuing the goals:
Motivated by the ranking algorithms used for objects represented as data points lying in the Euclidean space ranked with respect to the intrinsic manifold structure of the data, authors represent all pixels as nodes in a graph and define affinity matrix $W\in\mathbb{R}^{N\times N}$, where $N$ - number of pixels.
The ranking problem is stated as follows: for a given set of points $Y=(y_1,\ldots,y_q,y_{q+1},\ldots, y_N)$ where the first $q$ points are marked as "queries" rank the rest of the points according to their relevance to the query points.
The smoothing step can be solved with the following optimization problem:
$$r^*=\arg\min_r\frac{1}{2}\left(\sum_{i,j=1}^Nw_{ij}|\frac{r_i}{\sqrt{d_{ij}}}-\frac{r_j}{\sqrt{d_{ij}}}|^2+\lambda\sum_{i=1}^N|r_i-y_i|^2\right),$$
where $\displaystyle d_{ij}=\sum_jw_{ij}$ and $y_i$ - is the pixel color in the PhotoWCT-stylized result $Y$ and $r_i$ - is the pixel color in the
desired smoothed output $R$ and $\lambda$ controls the balance of these two terms.
The most wonderful thing is that the above problem has a closed-form solution:
$$R^*=(1-\alpha)(I-\alpha S)^{-1}Y,$$
where $I$ is identity matrix, $\alpha=\frac{1}{1+\lambda}$ and $S=D^{-1/2}WD^{-1/2}\in\mathbb{R}^{N\times N}$.
You can see the impressive photorealistic style transfer in the original article.
Reinforcement learning is the branch of machine learning where the model training is based on responses (rewards and penalties) obtained by an agent from the environment. The often used environment is the collection of Atari games. The problem is that agent trained for one task can't apply previously acquired skills for another task.
This problem was partially solved in the article Asynchronous Methods for Deep Reinforcement Learning where the A3C (Asynchronous advantage actor-critic) algorithm has been proposed. In the A3C algorithm individual agents (actors) explore the environment for some time, then the process suspended and they exchange obtained explorations (in terms of gradients of the loss function) with the central component (parameter server or learner), which updates actor's parameters.
A new architecture of the asynchronuous actor-critic deep reinforcement learning was called Importance Weighted Actor-Learner Architecture (IMPALA). The two main differences between A3C and IMPALA are that:
The algorithm was trained on a recently published suite of 3D navigation puzzle-solving tasks by DeepMind Lab and was shown to be 10 times more effective than the A3C driven algorithm.
]]>This Monday, February the 12th, we launched a public beta of Datalore - an intelligent web application for data analysis and visualization in Python, brought to you by JetBrains. This tool turns the data science workflow into a delightful experience with the help of smart coding assistance, incremental computations, and
]]>This Monday, February the 12th, we launched a public beta of Datalore - an intelligent web application for data analysis and visualization in Python, brought to you by JetBrains. This tool turns the data science workflow into a delightful experience with the help of smart coding assistance, incremental computations, and built-in tools for machine learning.
Data science is an art of drawing insights from the raw data, and to make good predictions, you need to write code. To make machine learning-specific coding an enjoyable and easy experience, Datalore provides smart code completion, inspections, quick-fixes, and easy navigation.
To make your coding routine easier, we introduce Intentions - context-aware suggestions that appear depending on what you’ve just written. Click on the appropriate Intention, and Datalore will generate new code for dataset upload, train/test split, graph design, and much more.
Fine-tuning machine learning models comes with multiple edits. Suppose you adjust a few model parameters and want to see how it affects its predictions - and you want these results right now. Datalore follows dependencies between various computations in the workbook and minimizes recalculations caused by new changes. This way, the output at the right side of the screen always reflects your latest ideas.
Data analysis starts with Python necessities: numpy, pandas, and sklearn built-in libraries. On top of that, we developed two advanced visualization libraries: datalore.plot, inspired by the "grammar of graphics" ideas and their R implementation ggplot, and a datalore.geo_maps which enables the addition of interactive maps to your analysis. There are built-in datasets (Iris, MNIST, Titanic, and more) for beginners to explore or a handy File Manager to upload original datasets as .csv-files.
We also enable real-time remote access to the workbook and code editor. To show your model to colleagues and get their insights, just share the workbook link. Team members can edit and add code on the go while discussing ideas.
Datalore allows access to various computational resources depending on what task you are working with. Simple algorithms run on small computational agents, while deep learning algorithms require more powerful agents. Contact us via our forum if you want to work with larger instances.
Go to datalore.io and try it!
We are excited and anxious to get your feedback via the Datalore forum - it's the quickest way to share your opinion. Leave a post about issues that you have encountered and features that you would like to see, and get in contact with our team and other users. You can also find us on Twitter. We are still under development and look forward to your insights to make Datalore even more awesome.
]]>Welcome to the first post of many. In our monthly digests, Datalore team will bring you the best and latest news from the field of machine learning, deep learning, and artificial intelligence. Read about latest frameworks and algorithms, get relevant books and discussions recomendations, and find out about research breakthroughs.
]]>Welcome to the first post of many. In our monthly digests, Datalore team will bring you the best and latest news from the field of machine learning, deep learning, and artificial intelligence. Read about latest frameworks and algorithms, get relevant books and discussions recomendations, and find out about research breakthroughs.