Several variables differentiable calculus

Vector functions of a single real variable

Definition - Vector function of a single real variable. A vector function of a single real variable or vector field of a scalar variable is a function that maps every scalar value tDR into a vector (x1(t),,xn(t)) in Rn:

f:RRnt(x1(t),,xn(t))

where xi(t), i=1,,n, are real function of a single real variable known as coordinate functions.

The most common vector field of scalar variable are in the the real plane R2, where usually they are represented as

f(t)=x(t)i+y(t)j,

and in the real space R3, where usually they are represented as

f(t)=x(t)i+y(t)j+z(t)k,

Graphic representation of vector fields

The graphic representation of a vector field in R2 is a trajectory in the real plane.

Trajectory of a vector function in the plane.

The graphic representation of a vector field in R3 is a trajectory in the real space.

Trajectory of a vector function in the space.

Derivative of a vector field

The concept of derivative as the limit of the average rate of change of a function can be extended easily to vector fields.

Definition - Derivative of a vectorial field. A vectorial field f(t)=(x1(t),,xn(t)) is differentiable at a point t=a if the limit

limΔt0f(a+Δt)f(a)Δt.

exists. In such a case, the value of the limit is known as the derivative of the vector field at a, and it is written f(a).

Many properties of real functions of a single real variable can be extended to vector fields through its component functions. Thus, for instance, the derivative of a vector field can be computed from the derivatives of its component functions.

Theorem. Given a vector field f(t)=(x1(t),,xn(t)), if xi(t) is differentiable at t=a for all i=1,,n, then f is differentiable at a and its derivative is

f(a)=(x1(a),,xn(a))

The proof for a vectorial field in R2 is easy.

f(a)=limΔt0f(a+Δt)f(a)Δt=limΔt0(x(a+Δt),y(a+Δt))(x(a),y(a))Δt==limΔt0(x(a+Δt)x(a)Δt,y(a+Δt)y(a)Δt)==(limΔt0x(a+Δt)x(a)Δt,limΔt0y(a+Δt)y(a)Δt)=(x(a),y(a)).

Kinematics: Curvilinear motion

The notion of derivative as a velocity along a trajectory in the real line can be generalized to a trajectory in any euclidean space Rn.

In case of a two dimensional space R2, if f(t) describes the position of a moving object in the real plane at any time t, taking as reference the coordinates origin O and the unitary vectors i=(1,0),j=(0,1), we can represent the position of the moving object P at every moment t with a vector OP=x(t)i+y(t)j, where the coordinates

{x=x(t)y=y(t)tDom(f)

are the coordinate functions of f.

Trajectory of a curvilinear motion in the plane.
In this context the derivative of a trajectory f(a)=(x1(a),,xn(a)) is the velocity vector of the trajectory f at moment t=a.

Example. Given the trajectory f(t)=(cost,sint), tR, whose image is the unit circumference centred in the coordinate origin, its coordinate functions are x(t)=cost, y(t)=sint, tR, and its velocity is

v=f(t)=(x(t),y(t))=(sint,cost).

In the moment t=π/4, the object is in position f(π/4)=(cos(π/4),sin(π/4))=(2/2,2/2) and it is moving with a velocity v=f(π/4)=(sin(π/4),cos(π/4))=(2/2,2/2).

Trajectory of a vector function in the space.

Observe that the module of the velocity vector is always 1 as |v|=(sint)2+(cost)2=1.

Tangent line to a trajectory

Tangent line to a trajectory in the plane

Vectorial equation

Given a trajectory f(t) in the real plane, the vectors that are parallel to the velocity v at a moment a are called tangent vectors to the trajectory f at the moment a, and the line passing through P=f(a) directed by v is the tangent line to the graph of f at the moment a.

Definition - Tangent line to a trajectory. Given a trajectory f(t) in the real plane R2, the tangent line to to the graph of f at a is the line with equation

l:(x,y)=f(a)+tf(a)=(x(a),y(a))+t(x(a),y(a))=(x(a)+tx(a),y(a)+ty(a)).

Example. We have seen that for the trajectory f(t)=(cost,sint), tR, whose image is the unit circumference centred at the coordinate origin, the object position at the moment t=π/4 is f(π/4)=(2/2,2/2) and its velocity v=(2/2,2/2). Thus the equation of the tangent line to f at that moment is

l:(x,y)=f(π/4)+tv=(22,22)+t(22,22)==(22t22,22+t22).

Cartesian and point-slope equations

From the vectorial equation of the tangent to a trajectory f(t) at the moment t=a we can get the coordinate functions

{x=x(a)+tx(a)y=y(a)+ty(a)tR,

and solving for t and equalling both equations we get the Cartesian equation of the tangent

xx(a)x(a)=yy(a)y(a),

if x(a)0 and y(a)0.

From this equation it is easy to get the point-slope equation of the tangent

yy(a)=y(a)x(a)(xx(a)).

Example. Using the vectorial equation of the tangent of the previous example

l:(x,y)=(22t22,22+t22),

its Cartesian equation is x2/22/2=y2/22/2 and the point-slope equation is

y2/2=2/22/2(x2/2)y=x+2.

Normal line to a trajectory in the plane

We have seen that the tangent line to a trajectory f(t) at a is the line passing through the point P=f(a) directed by the velocity vector v=f(a)=(x(a),y(a)). If we take as direction vector a vector orthogonal to v, we get another line that is known as normal line to the trajectory.

Definition - Normal line to a trajectory. Given a trajectory f(t) in the real plane R2, the normal line to the graph of f at moment t=a is the line with equation

l:(x,y)=(x(a),y(a))+t(y(a),x(a))=(x(a)+ty(a),y(a)tx(a)).

The Cartesian equation is

xx(a)y(a)=yy(a)x(a),

and the point-slope equation is

yy(a)=x(a)y(a)(xx(a)).

The normal line is always perpendicular to the tangent line as their direction vectors are orthogonal.

Example. Considering again the trajectory of the unit circumference f(t)=(cost,sint), tR, the normal line to the graph of f at moment t=π/4 is

l:(x,y)=(cos(π/2),sin(π/2))+t(cos(π/2),sin(π/2))==(22,22)+t(22,22)=(22+t22,22+t22),

the Cartesian equation is

x2/22/2=y2/22/2, and the point-slope equation is y2/2=2/22/2(x2/2)y=x.

Tangent and normal lines to a function

A particular case of tangent and normal lines to a trajectory are the tangent and normal lines to a function of one real variable. For every function y=f(x), the trajectory that trace its graph is

g(x)=(x,f(x))xR,

and its velocity is

g(x)=(1,f(x)),

so that the tangent line to g at the moment a is

xa1=yf(a)f(a)yf(a)=f(a)(xa),

and the normal line is

xaf(a)=yf(a)1yf(a)=1f(a)(xa).

Example. Given the function y=x2, the trajectory that traces its graph is g(x)=(x,x2) and its velocity is g(x)=(1,2x). At the moment x=1 the trajectory passes through the point (1,1) with a velocity (1,2). Thus, the tangent line at that moment is

x11=y12y1=2(x1)y=2x1,

and the normal line is

x12=y11y1=12(x1)y=x2+32.

Tangent line to a trajectory in the space

The concept of tangent line to a trajectory can be easily extended from the real plane to the three-dimensional space R3.

If f(t)=(x(t),y(t),z(t)), tR, is a trajectory in the real space R3, then at the moment a, the moving object that follows this trajectory will be at the position P=(x(a),y(a),z(a)) with a velocity v=f(t)=(x(t),y(t),z(t)). Thus, the tangent line to f at this moment have the following vectorial equation

l:(x,y,z)=(x(a),y(a),z(a))+t(x(a),y(a),z(a))==(x(a)+tx(a),y(a)+ty(a),z(a)+tz(a)),

and the Cartesian equations are xx(a)x(a)=yy(a)y(a)=zz(a)z(a), provided that x(a)0, y(a)0 y z(a)0.

Example. Given the trajectory f(t)=(cost,sint,t), tR in the real space, at the moment t=π/2 the trajectory passes through the point

f(π/2)=(cos(π/2),sin(π/2),π/2)=(0,1,π/2),

with velocity

v=f(π/2)=(sin(π/2),cos(π/2),1)=(1,0,1),

and the tangent line to the graph of f at that moment is

l:(x,y,z)=(0,1,π/2)+t(1,0,1)=(t,1,t+π/2).

Tangent line to a trajectory in the space.

Interactive Example

Normal plane to a trajectory in the space

In the three-dimensional space R3, the normal line to a trajectory is not unique. There are an infinite number of normal lines and all of them are in the normal plane.

If f(t)=(x(t),y(t),z(t)), tR, is a trajectory in the real space R3, then at the moment a, the moving object that follows this trajectory will be at the position P=(x(a),y(a),z(a)) with a velocity v=f(t)=(x(t),y(t),z(t)). Thus, using the velocity vector as normal vector the normal plane to f at this moment have the following vectorial equation

Π:(xx(a),yy(a),zz(a))(x(a),y(a),z(a))=0=x(a)(xx(a))+y(a)(yy(a))+z(a)(zz(a))=0.

Example. For the trajectory of the previous example f(t)=(cost,sint,t), tR, at the moment t=π/2 the trajectory passes through the point

f(π/2)=(cos(π/2),sin(π/2),π/2)=(0,1,π/2),

with velocity

v=f(π/2)=(sin(π/2),cos(π/2),1)=(1,0,1), and normal plane to the graph of f at that moment is

Π:(x0,y1,zπ2)(1,0,1)=0x+zπ2=0.

Normal plane to a trajectory in the space.

Interactive Example

Functions of several variables

A lot of problems in Geometry, Physics, Chemistry, Biology, etc. involve a variable that depend on two or more variables:

  • The area of a triangle depends on two variables that are the base and height lengths.
  • The volume of a perfect gas depends on two variables that are the pressure and the temperature.
  • The way travelled by an object free falling depends on a lot of variables: the time, the area of the cross section of the object, the latitude and longitude of the object, the height above the sea level, the air pressure, the air temperature, the speed of wind, etc.

These dependencies are expressed with functions of several variables.

Definition - Functions of several real variables. A function of n real variables or a scalar field from a set A1××AnRn in a set BR, is a relation that maps any tuple (a1,,an)A1××An into a unique element of B, denoted by f(a1,,an), that is knwon as the image of (a1,,an) by f.

f:A1××AnB(a1,,an)f(a1,,an)

  • The area of a triangle is a real function of two real variables

f(x,y)=xy2.

  • The volume of a perfect gas is a real function of two real variables

v=f(t,p)=nRtp,with n and R constants.

Graph of a function of two variables

The graph of a function of two variables f(x,y) is a surface in the real space R3 where every point of the surface has coordinates (x,y,z), with z=f(x,y).

Graph of a two-variables function

Example. The function f(x,y)=xy2 that measures the area of a triangle of base x and height y has the graph below.

Graph of the function that measures the area of a triangle.

The function f(x,y)=sin(x2+y2)x2+y2 has the peculiar graph below.

Surface of a drop of water.

Level set of a scalar field

Definition - Level set Given a scalar field f:RnR, the level set c of f is the set

Cf,c=(x1,,xn):f(x1,,xn)=c,

that is, a set where the function takes on the constant value c.

Example. Given the scalar field f(x,y)=x2+y2 and the point P=(1,1), the level set of f that includes P is

Cf,2=(x,y):f(x,y)=f(1,1)=2=(x,y):x2+y2=2,

that is the circumference of radius 2 centred at the origin.

Level sets are common in applications like topographic maps, where the level curves correspond to points with the same height above the sea level,

Level curves of a topograhic map.

and weather maps (isobars), where level curves correspond to points with the same atmospheric pressure.

Isobars of a weather map.

Partial functions

Definition - Partial function. Given a scalar field f:RnR, an i-th partial function of f is any function fi:RR that results of substituting all the variables of f by constants, except the i-th variable, that is:

fi(x)=f(c1,,ci1,x,ci+1,,cn),

with cj (j=1,,n, ji) constants.

Example. If we take the function that measures the area of a triangle

f(x,y)=xy2, and set the value of the base to x=c, then we the area of the triangle depends only of the height, and f becomes a function of one variable, that is the partial function

f1(y)=f(c,y)=cy2,with c constant.

Partial derivative notion

Variation of a function with respect to a variable

We can measure the variation of a scalar field with respect to each of its variables in the same way that we measured the variation of a one-variable function.

Let z=f(x,y) be a scalar field of R2. If we are at point (x0,y0) and we increase the value of x a quantity Δx, then we move in the direction of the x-axis from the point (x0,y0) to the point (x0+Δx,y0), and the variation of the function is Δz=f(x0+Δx,y0)f(x0,y0).

Thus, the rate of change of the function with respect to x along the interval [x0,x0+Δx] is given by the quotient

ΔzΔx=f(x0+Δx,y0)f(x0,y0)Δx.

Instantaneous rate of change of a scalar field with respect to a variable

If instead o measuring the rate of change in an interval, we measure the rate of change in a point, that is, when Δx approaches 0, then we get the instantaneous rate of change that is the partial derivative with respect to x.

limΔx0ΔzΔx=limΔx0f(x0+Δx,y0)f(x0,y0)Δx.

The value of this limit, if exists, it is known as the partial derivative of f with respect to the variable x at the point (x0,y0); it is written as fx(x0,y0).

This partial derivative measures the instantaneous rate of change of f at the point P=(x0,y0) when P moves in the x-axis direction.

Geometric interpretation of partial derivatives

Geometrically, a two-variable function z=f(x,y) defines a surface. If we cut this surface with a plane of equation y=y0 (that is, the plane where y is the constant y0) the intersection is a curve, and the partial derivative of f with respect to to x at (x0,y0) is the slope of the tangent line to that curve at x=x0.

Geometric interpretation of the partial derivative.

Interactive Example

Partial derivative

The concept of partial derivative can be extended easily from two-variable function to n-variables functions.

Definition - Partial derivative. Given a n-variables function f(x1,,xn), f is partially differentiable with respect to the variable xi at the point a=(a1,,an) if exists the limit

limΔxi0f(a1,,ai1,ai+Δxi,ai+1,,an)f(a1,,ai1,ai,ai+1,,an)h.

In such a case, the value of the limit is known as partial derivative of f with respect to xi at a; it is denoted

fxi(a)=fxi(a).

Remark. The definition of derivative for one-variable functions is a particular case of this definition for n=1.

Partial derivatives computation

When we measure the variation of f with respect to a variable xi at the point a=(a1,,an), the other variables remain constant. Thus, if we can consider the i-th partial function fi(xi)=f(a1,,ai1,xi,ai+1,,an),

the partial derivative of f with respect to xi can be computed differentiating this function:

fxi(a)=fi(ai).

To differentiate partially f(x1,,xn) with respect to the variable xi, you have to differentiate f as a function of the variable xi, considering the other variables as constants.

Example of a perfect gas. Consider the function that measures the volume of a perfect gas v(t,p)=nRtp, where t is the temperature, p the pressure and n and R are constants.

The instantaneous rate of change of the volume with respect to the pressure is the partial derivative of v with respect to p. To compute this derivative we have to think in t as a constant and differentiate v as if the unique variable was p:

vp(t,p)=ddp(nRtp)t=cst=nRtp2.

In the same way, the instantaneous rate of change of the volume with respect to the temperature is the partial derivative of v with respect to t:

vt(t,p)=ddt(nRtp)p=cst=nRp.

Gradient

Definition - Gradient. Given a scalar field f(x1,,xn), the gradient of f, denoted by f, is a function that maps every point a=(a1,,an) to a vector with coordinates the partial derivatives of f at a,

f(a)=(fx1(a),,fxn(a)).

Later we will show that the gradient in a point is a vector with the magnitude and direction of the maximum rate of change of the function in that point. Thus, f(a) points to direction of maximum increase of f at a, while f(a) points to the direction of maximum decrease of f at a.

Example. After heating a surface, the temperature t (in C) at each point (x,y,z) (in m) of the surface is given by the function

t(x,y,z)=xy+z2.

In what direction will increase the temperature faster at point (2,1,1) of the surface? What magnitude will the maximum increase of temperature have?

The direction of maximum increase of the temperature is given by the gradient

t(x,y,z)=(tx(x,y,z),ty(x,y,z),tz(x,y,z))=(1y,xy2,2z).

At point (2,1,1) de direction is given by the vector

t(2,1,1)=(11,212,21)=(1,2,2),

and its magnitude is

|f(2,1,1)|=|12+(2)2+22|=|9|=3 C/m.

Composition of a vectorial field with a scalar field

Multivariate chain rule

If f:RnR is a scalar field and g:RRn is a vectorial function, then it is possible to compound g with f, so that fg:RR is a one-variable function.

Theorem - Chain rule. If g(t)=(x1(t),,xn(t)) is a vectorial function differentiable at t and f(x1,,xn) is a scalar field differentiable at the point g(t), then fg(t) is differentiable at t and

(fg)(t)=f(g(t))g(t)=fx1dx1dt++fxndxndt

Example. Let us consider the scalar field f(x,y)=x2y and the vectorial function g(t)=(cost,sint) t[0,2π] in the real plane, then

f(x,y)=(2xy,x2)andg(t)=(sint,cost),

and

(fg)(t)=f(g(t))g(t)=(2costsint,cos2t)(sint,cost)==2costsin2t+cos3t.

We can get the same result differentiating the composed function directly

(fg)(t)=f(g(t))=f(cost,sint)=cos2tsint,

and its derivative is

(fg)(t)=2cost(sint)sint+cos2tcost=2costsin2t+cos3t.

The chain rule for the composition of a vectorial function with a scalar field allow us to get the algebra of derivatives for one-variable functions easily:

(u+v)=u+v(uv)=uv+uv(uv)=uvuvv2(uv)=u(v)v

To infer the derivative of the sum of two functions u and v, we can take the scalar field f(x,y)=x+y and the vectorial function g(t)=(u(t),v(t)). Applying the chain rule we get

(u+v)(t)=(fg)(t)=f(g(t))g(t)=(1,1)(u,v)=u+v.

To infer the derivative of the quotient of two functions u and v, we can take the scalar field f(x,y)=x/y and the vectorial function g(t)=(u(t),v(t)).

(uv)(t)=(fg)(t)=f(g(t))g(t)=(1v,uv2)(u,v)=uvuvv2.

Tangent plane and normal line to a surface

Let C be the level set of a scalar field f that includes a point P. If v is the velocity at P of a trajectory following C, then

f(P)v=0.

If we take the trajectory g(t) that follows the level set C and passes through P at time t=t0, that is P=g(t0), so v=g(t0), then

(fg)(t)=f(g(t))=f(P),

that is constant at any t. Thus, applying the chain rule we have

(fg)(t)=f(g(t))g(t)=0,

and, particularly, at t=t0, we have

f(P)v=0.

That means that the gradient of f at P is normal to C at P, provided that the gradient is not zero.

Normal and tangent line to curve in the plane

Normal line to a curve in the plane. According to the previous result, the normal line to a curve with equation f(x,y)=0 at point P=(x0,y0), has equation

P+tf(P)=(x0,y0)+tf(x0,y0).

Example. Given the scalar field f(x,y)=x2+y225, and the point P=(3,4), the level set of f that passes through P, that satisfies f(x,y)=f(P)=0, is the circle with radius 5 centred at the origin of coordinates. Thus, taking as a normal vector the gradient of f

f(x,y)=(2x,2y),

at the point P=(3,4) is f(3,4)=(6,8), and the normal line to the circle at P is

P+tf(P)=(3,4)+t(6,8)=(3+6t,4+8t),

On the other hand, the tangent line to the circle at P is

((x,y)P)f(P)=((x,y)(3,4))(6,8)=(x3,y4)(6,8)=6x+8y=50.

Normal line and tangent plane to a surface in the space

Normal line to a surface in the space. if we have a surface with equation f(x,y,z)=0, at the point P=(x0,y0,z0) the normal line has equation

P+tf(P)=(x0,y0,z0)+tf(x0,y0,z0).

Example. Given the scalar field f(x,y,z)=x2+y2z, and the point P=(1,1,2), the level set of f that passes through P, that satisfies f(x,y)=f(P)=0, is the paraboloid z=x2+y2. Thus, taking as a normal vector the gradient of f

f(x,y,z)=(2x,2y,1),

at the point P=(1,1,2) is f(1,1,2)=(2,2,1), and the normal line to the paraboloid at P is

P+tf(P)=(1,1,2)+tf(1,1,2)=(1,1,2)+t(2,2,1)=(1+2t,1+2t,2t).

On the other hand, the tangent plane to the paraboloid at P is

((x,y,z)P)f(P)=((x,y,z)(1,1,2))(2,2,1)=(x1,y1,z2)(2,2,1)==2(x1)+2(y1)(z2)=2x+2yz2=0.

The graph of the paraboloid f(x,y,z)=x2+y2z=0 and the normal line and the tangent plane to the graph of f at the point P=(1,1,2) are below.

Geometric interpretation of the partial derivative.

Interactive Example

Directional derivative

For a scalar field f(x,y), we have seen that the partial derivative fx(x0,y0) is the instantaneous rate of change of f with respect to x at point P=(x0,y0), that is, when we move along the x-axis.

In the same way, fy(x0,y0) is the instantaneous rate of change of f with respect to y at the point P=(x0,y0), that is, when we move along the y-axis.

But, what happens if we move along any other direction?

The instantaneous rate of change of f at the point P=(x0,y0) along the direction of a unitary vector u is known as directional derivative.

Definition - Directional derivative. Given a scalar field f of Rn, a point P and a unitary vector u in that space, we say that f is differentiable at P along the direction of u if exists the limit

fu(P)=limh0f(P+hu)f(P)h.

In such a case, the value of the limit is known as directional derivative of f at the point P along the direction of u.

Theorem - Directional derivative . Given a scalar field f of Rn, a point P and a unitary vector u in that space, the directional derivative of f at the point P along the direction of u can be computed as the dot product of the gradient of f at P and the unitary vector u:

fu(P)=f(P)u.

If we consider a unitary vector u, the trajectory that passes through P, following the direction of u, has equation

g(t)=P+tu, tR.

For t=0, this trajectory passes through the point P=g(0) with velocity u=g(0).

Thus, the directional derivative of f at the point P along the direction of u is

(fg)(0)=f(g(0))g(0)=f(P)u.

The partial derivatives are the directional derivatives along the vectors of the canonical basis.

Example. Given the function f(x,y)=x2+y2, its gradient is

f(x,y)=(2x,2y).

The directional derivative of f at the point P=(1,1), along the unit vector u=(1/2,1/2) is

fu(P)=f(P)u=(2,2)(1/2,1/2)=22+22=42.

To compute the directional derivative along a non-unitary vector v, we have to use the unitary vector that results from normalizing v with the transformation

v=v|v|.

Geometric interpretation of the directional derivative

Geometrically, a two-variable function z=f(x,y) defines a surface. If we cut this surface with a plane of equation a(yy0)=b(xx0) (that is, the vertical plane that passes through the point P=(x0,y0) with the direction of vector u=(a,b)) the intersection is a curve, and the directional derivative of f at P along the direction of u is the slope of the tangent line to that curve at point P.

Interactive Example

Growth of scalar field along the gradient

We have seen that for any vector u

fu(P)=f(P)u=|f(P)|cosθ,

where θ is the angle between u and the gradient f(P).

Taking into account that 1cosθ1, for any vector u it is satisfied that

|f(P)|fu(P)|f(P)|.

Furthermore, if u has the same direction and sense than the gradient, we have fu(P)=|f(P)|cos0=|f(P)|. Therefore, the maximum increase of a scalar field at a point P is along the direction of the gradient at that point.

In the same manner, if u has the same direction but opposite sense than the gradient, we have fu(P)=|f(P)|cosπ=|f(P)|. Therefore, the maximum decrease of a scalar field at a point P is along the opposite direction of the gradient at that point.

Implicit derivation

When we have a relation f(x,y)=0, sometimes we can consider y as an implicit function of x, at least in a neighbourhood of a point (x0,y0).

The equation x2+y2=25, whose graph is the circle of radius 5 centred at the origin of coordinates, its not a function, because if we solve the equation for y, we have two images for some values of x,

y=±25x2

However, near the point (3,4) we can represent the relation as the function y=25x2, and near the point (3,4) we can represent the relation as the function y=25x2.

If an equation f(x,y)=0 defines y as a implicit function of x, y=h(x), in a neighbourhood of (x0,y0), then we can compute de derivative of y, h(x), even if we do not know the explicit formula for h.

Theorem - Implicit derivation. Let f(x,y):R2R a two-variable function and let (x0,y0) be a point in R2 such that f(x0,y0)=0. If f has partial derivatives continuous at (x0,y0) and fy(x0,y0)0, then there is an open interval IR with x0I and a function h(x):IR such that

  1. y0=h(x0).
  2. f(x,h(x))=0 for all xI.
  3. h is differentiable on I, and y=h(x)=fxfy

Proof. To prove the last result, take the trajectory g(x)=(x,h(x)) on the interval I. Then

(fg)(x)=f(g(x))=f(x,h(x))=0.

Thus, using the chain rule we have

(fg)(x)=f(g(x))g(x)=(fx,fy)(1,h(x))==fx+fyh(x)=0,

from where we can deduce

y=h(x)=fxfy.

This technique that allows us to compute y in a neighbourhood of x0 without the explicit formula of y=h(x), it is known as implicit derivation.

Example. Consider the equation of the circle of radius 5 centred at the origin x2+y2=25. It can also be written as

f(x,y)=x2+y225=0. Take the point (3,4) that satisfies the equation, f(3,4)=0.

As f have partial derivatives fx=2x and fy=2y, that are continuous at (3,4), and fy(3,4)=80, then y can be expressed as a function of x in a neighbourhood of (3,4) and its derivative is

y=fxfy=2x2y=xyandy(3)=34.

In this particular case, that we know the explicit formula of y=1x2, we can get the same result computing the derivative as usual

y=121x2(2x)=x1x2.

The implicit function theorem can be generalized to functions with several variables.

Theorem - Implicit derivation. Let f(x1,,xn,y):Rn+1R a n+1-variables function and let (a1,,an,b) be a point in Rn+1 such that f(a1,,an,b)=0. If f has partial derivatives continuous at (a1,,an,b) and fy(a1,,an,b)0, then there is a region IRn with (x1,,xn)I and a function h(x1,,xn):IR such that

  1. b=h(a1,,an).
  2. f(x1,,xn,h(x1,,xn))=0 for all (x1,,xn)I.
  3. h is differentiable on I, and yxi=fxify

Second order partial derivatives

As the partial derivatives of a function are also functions of several variables we can differentiate partially each of them.

If a function f(x1,,xn) has a partial derivative fxi(x1,,xn) with respect to the variable xi in a set A, then we can differentiate partially again fxi with respect to the variable xj. This second derivative, when exists, is known as second order partial derivative of f with respect to the variables xi and xj; it is written as

2fxjxi=xj(fxi).

In the same way we can define higher order partial derivatives.

Example. The two-variables function f(x,y)=xy has 4 second order partial derivatives:

2fx2(x,y)=x(fx(x,y))=x(yxy1)=y(y1)xy2,2fyx(x,y)=y(fx(x,y))=y(yxy1)=xy1+yxy1logx,2fxy(x,y)=x(fy(x,y))=x(xylogx)=yxy1logx+xy1x,2fy2(x,y)=y(fy(x,y))=y(xylogx)=xy(logx)2.

Hessian matrix and Hessian

Definition - Hessian matrix. Given a scalar field f(x1,,xn), with second order partial derivatives at the point a=(a1,,an), the Hessian matrix of f at a, denoted by 2f(a), is the matrix

2f(a)=(2fx12(a)2fx1x2(a)2fx1xn(a)2fx2x1(a)2fx22(a)2fx2xn(a)2fxnx1(a)2fxnx2(a)2fxn2(a))

The determinant of this matrix is known as Hessian of f at a; it is denoted Hf(a)=|2f(a)|.

Example. Consider again the two-variables function

f(x,y)=xy.

Its Hessian matrix is

2f(x,y)=(2fx22fxy2fyx2fy2)=(y(y1)xy2xy1(ylogx+1)xy1(ylogx+1)xy(logx)2).

At point (1,2) is

2f(1,2)=(2(21)122121(2log1+1)121(2log1+1)12(log1)2)=(2110).

And its Hessian is

Hf(1,2)=|2110|=2011=1.

Symmetry of second partial derivatives

In the previous example we can observe that the mixed derivatives of second order 2fyx and 2fxy are the same. This fact is due to the following result.

Theorem - Symmetry of second partial derivatives. If f(x1,,xn) is a scalar field with second order partial derivatives 2fxixj and 2fxjxi continuous at a point (a1,,an), then

2fxixj(a1,,an)=2fxjxi(a1,,an).

This means that when computing a second partial derivative.

As a consequence, if the function satisfies the requirements of the theorem for all the second order partial derivatives, the Hessian matrix is symmetric.

Taylor polynomials

Linear approximation of a scalar field

In a previous chapter we saw how to approximate a one-variable function with a Taylor polynomial. This can be generalized to several-variables functions.

If P is a point in the domain of a scalar field f and v is a vector, the first degree Taylor formula of f around P is

f(P+v)=f(P)+f(P)v+Rf,P1(v),

where

Pf,P1(v)=f(P)+f(P)v

is the first degree Taylor polynomial of f at P, and Rf,P1(v) is the Taylor remainder for the vector v, that is the error in the approximation.

The remainder satisfies

lim|v|0Rf,P1(v)|v|=0

The first degree Taylor polynomial for a function of two variables is the tangent plane to the graph of f at P.

Linear approximation of a two-variable function

If f is a scalar field of two variables f(x,y) and P=(x0,y0), as for any point Q=(x,y) we can take the vector v=PQ=(xx0,yy0), then the first degree Taylor polynomial of f at P, can be written as

Pf,P1(x,y)=f(x0,y0)+f(x0,y0)(xx0,yy0)==f(x0,y0)+fx(x0,y0)(xx0)+fy(x0,y0)(yy0).

Example. Given the scalar field f(x,y)=log(xy), its gradient is

f(x,y)=(1x,1y),

and the first degree Taylor polynomial at the point P=(1,1) is

Pf,P1(x,y)=f(1,1)+f(1,1)(x1,y1)==log1+(1,1)(x1,y1)=x1+y1=x+y2.

This polynomial approximates f near the point P. For instance,

f(1.01,1.01)Pf,P1(1.01,1.01)=1.01+1.012=0.02.

The graph of the scalar field f(x,y)=log(xy) and the first degree Taylor polynomial of f at the point P=(1,1) is below.

First degree Taylor polynomial

Quadratic approximation of a scalar field

If P is a point in the domain of a scalar field f and v is a vector, the second degree Taylor formula of f around P is

f(P+v)=f(P)+f(P)v+12(v2f(P)v)+Rf,P2(v),

where

Pf,P2(v)f(P)+f(P)v+12(v2f(P)v)

is the second degree Taylor polynomial of f at the point P, and Rf,P2(v) is the Taylor remainder for the vector v, that is the error in the approximation.

The remainder satisfies

lim|v0|Rf,P2(v)|v|2=0.

This means that the remainder is smaller than the square of the module of v.

Quadratic approximation of a two-variable function

If f is a scalar field of two variables f(x,y) and P=(x0,y0), then the second degree Taylor polynomial of f at P, can be written as

Pf,P2(x,y)=f(x0,y0)+f(x0,y0)(xx0,yy0)++12(xx0,yy0)2f(x0,y0)(xx0,yy0)==f(x0,y0)+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)++12(2fx2(x0,y0)(xx0)2+22fyx(x0,y0)(xx0)(yy0)++2fy2(x0,y0)(yy02))

Example. Given the scalar field f(x,y)=log(xy), its gradient is

f(x,y)=(1x,1y),

its Hessian matrix is

Hf(x,y)=(1x2001y2)

and the second degree Taylor polynomial of f at the point P=(1,1) is

Pf,P2(x,y)=f(1,1)+f(1,1)(x1,y1)++12(x1,y1)2f(1,1)(x1,y1)==log1+(1,1)(x1,y1)++12(x1,y1)(1001)(x1y1)==x1+y1+x2y2+2x+2y22==x2y2+4x+4y62.

Thus, f(1.01,1.01)Pf,P1(1.01,1.01)=1.0121.012+41.01+41.0162=0.0199.

The graph of the scalar field f(x,y)=log(xy) and the second degree Taylor polynomial of f at the point P=(1,1) is below.

Second degree Taylor polynomial

Interactive Example

Relative extrema

Definition - Relative extrema. A scalar field f in Rn has a relative maximum at a point P if there is a value ϵ>0 such that

f(P)f(X) X,|PX|<ϵ.

f has a relative minimum at f if there is a value ϵ>0 such that

f(P)f(X) X,|PX|<ϵ.

Both relative maxima and minima are known as relative extrema of f.

Critical points

Theorem - Critical points. If a scalar field f in Rn has a relative maximum or minimum at a point P, then P is a critical or stationary point of f, that is, a point where the gradient vanishes

f(P)=0.

Taking the trajectory that passes through P with the direction of the gradient at that point g(t)=P+tf(P), the function h=(fg)(t) does not decrease at t=0 since

h(0)=(fg)(0)=f(g(0))g(0)=f(P)f(P)=|f(P)|20,

and it only vanishes if f(P)=0.

Thus, if f(P)0, f can not have a relative maximum at P since following the trajectory of g from P there are points where f has an image greater than the image at P. In the same way, following the trajectory of g in the opposite direction there are points where f has an image less than the image at P, so f can not have relative minimum at P.

Example. Given the scalar field f(x,y)=x2+y2, it is obvious that f only has a relative minimum at (0,0) since

f(0,0)=0f(x,y)=x2+y2, x,yR.

Is easy to check that f has a critical point at (0,0), that is f(0,0)=0.

Relative minimum of a two-variable function

Saddle points

Not all the critical points of a scalar field are points where the scalar field has relative extrema. If we take, for instance, the scalar field f(x,y)=x2y2, its gradient is

f(x,y)=(2x,2y),

that only vanishes at (0,0). However, this point is not a relative maximum since the points (x,0) in the x-axis have images f(x,0)=x20=f(0,0), nor a relative minimum since the points (0,y) in the y-axis have images f(0,y)=y20=f(0,0). This type of critical points that are not relative extrema are known as saddle points.

Saddle point of a two-variable function

Analysis of the relative extrema

From the second degree Taylor’s formula of a scalar field f at a point P we have

f(P+v)f(P)f(P)v+122f(P)vv.

Thus, if P is a critical point of f, as f(P)=0, we have

f(P+v)f(P)122f(P)vv.

Therefore, the sign of the f(P+v)f(P) is the sign of the second degree term 2f(P)vv.

There are four possibilities:

  • Definite positive: 2f(P)vv>0 v0.

  • Definite negative: 2f(P)vv<0 v0.

  • Indefinite: 2f(P)vv>0 for some v0 and 2f(P)uu<0 for some u0.

  • Semidefinite: In any other case.

Thus, depending on de sign of 2f(P)vv, we have

Theorem. Given a critical point P of a scalar field f, it holds that

  • If 2f(P) is definite positive then f has a relative minimum at P.
  • If 2f(P) is definite negative then f has a relative maximum at P.
  • If 2f(P) is indefinite then f has a saddle point at P.

When 2f(P) is semidefinite we can not draw any conclusion and we need higher order partial derivatives to classify the critical point.

Analysis of the relative extrema of a scalar field in R2

In the particular case of a scalar field of two variables, we have

Theorem. Given a critical point P=(x0,y0) of a scalar field f(x,y), it holds that

  • If Hf(P)>0 and 2fx2(x0,y0)>0 then f has a relative minimum at P.
  • If Hf(P)>0 and 2fx2(x0,y0)<0 then f has a relative maximum at P.
  • IF Hf(P)<0 then f has a saddle point at P.

Example. Given the scalar field f(x,y)=x33y33x+y, its gradient is

f(x,y)=(x21,y2+1),

and it has critical points at (1,1), (1,1), (1,1) and (1,1).

The hessian matrix is

2f(x,y)=(2x002y)

and the hessian is

Hf(x,y)=4xy.

Thus, we have

  • Point (1,1): Hf(1,1)=4<0 Saddle point.

  • Point (1,1): Hf(1,1)=4>0 and 2x2(1,1)=2>0 Relative min.

  • Point (1,1): Hf(1,1)=4>0 and 2x2(1,1)=2<0 Relative max.

  • Point (1,1): Hf(1,1)=4<0 Saddle point.

The graph of the function f(x,y)=x33y33x+y and their relative extrema and saddle points are shown below.

Relative extrema and saddle points of a two-variable function
Previous