# 경사 하강법

둘러보기로 가기
검색하러 가기

## 노트

### 위키데이터

- ID : Q1199743

### 말뭉치

- If it is convex we use Gradient Descent and if it is concave we use we use Gradient Ascent.
^{[1]} - When we use the convex one we use gradient descent and when we use the concave one we use gradient ascent.
^{[1]} - Here, we are going to look into one such popular optimization technique called Gradient Descent.
^{[2]} - In machine learning, gradient descent is used to update parameters in a model.
^{[2]} - Let us relate gradient descent with a real-life analogy for better understanding.
^{[2]} - Batch Gradient Descent: In this type of gradient descent, all the training examples are processed for each iteration of gradient descent.
^{[2]} - Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize neural networks.
^{[3]} - At the same time, every state-of-the-art Deep Learning library contains implementations of various algorithms to optimize gradient descent.
^{[3]} - By the end of this blog post, you’ll have a comprehensive understanding of how gradient descent works at its core.
^{[3]} - We will intuitively by the means of gradient descent accomplish a task of rod balancing problem on our finger.
^{[3]} - At each step, the weight vector (w) is altered in the direction that produces the steepest descent along with the error.
^{[4]} - Summing over multiple examples in standard gradient descent requires more computation per weight update step.
^{[4]} - Similar to batch gradient descent, stochastic gradient descent performs a series of steps to minimize a cost function.
^{[5]} - Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms.
^{[6]} - Batch Gradient Descent: This is a type of gradient descent which processes all the training examples for each iteration of gradient descent.
^{[6]} - But if the number of training examples is large, then batch gradient descent is computationally very expensive.
^{[6]} - Hence if the number of training examples is large, then batch gradient descent is not preferred.
^{[6]} - Since we need to calculate the gradients for the whole dataset to perform one parameter update, batch gradient descent can be very slow.
^{[7]} - In mini-batch gradient descent, we calculate the gradient for each small mini-batch of training data.
^{[7]} - Gradient descent is one of those “greatest hits” algorithms that can offer a new perspective for solving problems.
^{[8]} - At a theoretical level, gradient descent is an algorithm that minimizes functions.
^{[8]} - To run gradient descent on this error function, we first need to compute its gradient.
^{[8]} - Below are some snapshots of gradient descent running for 2000 iterations for our example problem.
^{[8]} - However, it still serves as a decent pedagogical tool to get some of the most important ideas about gradient descent across the board.
^{[9]} - However, this gives you a very inaccurate picture of what gradient descent really is.
^{[9]} - As depicted in the above animation, gradient descent doesn't involve moving in z direction at all.
^{[9]} - A widely used technique in gradient descent is to have a variable learning rate, rather than a fixed one.
^{[9]} - The gradient descent varies in terms of the number of training patterns used to calculate errors.
^{[10]} - Each iteration of the gradient descent uses a single sample and requires a prediction for each iteration.
^{[10]} - If the gradient descent is running well, you will see a decrease in cost in each iteration.
^{[10]} - Gradient Descent is an optimization algorithm used to find a local minimum of a given function.
^{[11]} - Gradient Descent finds a local minimum, which can be different from the global minimum.
^{[11]} - Gradient Descent needs a function and a starting point as input.
^{[11]} - As we can see, Gradient Descent found a local minimum here, but it is not the global minimum.
^{[11]} - Gradient Descent is an iterative process that finds the minima of a function.
^{[12]} - To get an idea of how Gradient Descent works, let us take an example.
^{[12]} - Now let us see in detail how gradient descent is used to optimise a linear regression problem.
^{[12]} - For simplicity, we take a constant slope of 0.64, so that we can understand how gradient descent would optimise the intercept.
^{[12]} - Gradient descent is an optimization technique that can find the minimum of an objective function.
^{[13]} - Now it's time to run gradient descent to minimize our objective function.
^{[13]} - To keep things simple, let's do a test run of gradient descent on a two-class problem (digit 0 vs. digit 1).
^{[13]} - When running gradient descent, we'll keep learning rate and momentum very small as the inputs are not normalized or standardized.
^{[13]} - This process is called Stochastic Gradient Descent (SGD) (or also sometimes on-line gradient descent).
^{[14]} - Gradient descent is by far the most popular optimization strategy used in machine learning and deep learning at the moment.
^{[15]} - Gradient descent is an optimization algorithm that's used when training a machine learning model.
^{[15]} - Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function.
^{[15]} - The equation below describes what gradient descent does: b is the next position of our climber, while a represents his current position.
^{[15]} - Gradient descent is an optimization technique commonly used in training machine learning algorithms.
^{[16]} - With gradient descent, you'll simply look around in all possible directions and take a step in the steepest downhill direction.
^{[16]} - Mini batch gradient descent allows us to split our training data into mini batches which can be processed individually.
^{[16]} - On the other extreme, a batch size equal to the number of training examples would represent batch gradient descent.
^{[16]} - This way is called Gradient Descent and it also follow our downhill strategy.
^{[17]} - Gradient Descent is one of the most used machine learning algorithms in the industry.
^{[18]} - And with a goal to reduce the cost function, we modify the parameters by using the Gradient descent algorithm over the given data.
^{[18]} - Gradient descent was originally proposed by CAUCHY in 1847.
^{[18]} - Gradient descent using Contour Plot.
^{[18]} - Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks.
^{[19]} - Before we dive into gradient descent, it may help to review some concepts from linear regression.
^{[19]} - While gradient descent is the most common approach for optimization problems, it does come with its own set of challenges.
^{[19]} - gradient descent, SGD approximates the true gradient of \(E(w,b)\) by considering a single training example at a time.
^{[20]} - This is where Gradient Descent comes into the picture.
^{[21]} - We are first going to look at the different variants of gradient descent.
^{[22]} - We will also take a short look at algorithms and architectures to optimize gradient descent in a parallel and distributed setting.
^{[22]} - In machine learning, we use gradient descent to update the parameters of our model.
^{[23]} - In this post you discovered gradient descent for machine learning.
^{[24]} - Gradient Descent is an optimizing algorithm used in Machine/ Deep Learning algorithms.
^{[25]} - The first stage in gradient descent is to pick a starting value (a starting point) for \(w_1\).
^{[26]} - In machine learning, gradients are used in gradient descent.
^{[26]} - Note: When performing gradient descent, we generalize the above process to tune all the model parameters simultaneously.
^{[26]} - Backtracking line search is another variant of gradient descent.
^{[27]} - Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function.
^{[28]} - The basic intuition behind gradient descent can be illustrated by a hypothetical scenario.
^{[28]} - Gradient descent can also be used to solve a system of nonlinear equations.
^{[28]} - Below is an example that shows how to use the gradient descent to solve for three unknown variables, x 1 , x 2 , and x 3 .
^{[28]}

### 소스

- ↑
^{1.0}^{1.1}Gradient Ascent vs Gradient Descent in Logistic Regression - ↑
^{2.0}^{2.1}^{2.2}^{2.3}Machine Learning: What is Gradient Descent? - ↑
^{3.0}^{3.1}^{3.2}^{3.3}Introduction to Optimization and Gradient Descent Algorithm [Part-1.] - ↑
^{4.0}^{4.1}The Ascent of Gradient Descent - ↑ Stochastic gradient descent
- ↑
^{6.0}^{6.1}^{6.2}^{6.3}Gradient Descent algorithm and its variants - ↑
^{7.0}^{7.1}How to understand Gradient Descent, the most popular ML algorithm - ↑
^{8.0}^{8.1}^{8.2}^{8.3}An Introduction to Gradient Descent and Linear Regression - ↑
^{9.0}^{9.1}^{9.2}^{9.3}Intro to optimization in deep learning: Gradient Descent - ↑
^{10.0}^{10.1}^{10.2}What Is Gradient Descent in Deep Learning? - ↑
^{11.0}^{11.1}^{11.2}^{11.3}Gradient Descent in Java - ↑
^{12.0}^{12.1}^{12.2}^{12.3}An Easy Guide to Gradient Descent in Machine Learning - ↑
^{13.0}^{13.1}^{13.2}^{13.3}Gradient Descent in Python: Implementation and Theory - ↑ CS231n Convolutional Neural Networks for Visual Recognition
- ↑
^{15.0}^{15.1}^{15.2}^{15.3}Gradient Descent: An Introduction to 1 of Machine Learning’s Most Popular Algorithms - ↑
^{16.0}^{16.1}^{16.2}^{16.3}Gradient descent. - ↑ Gradient Descent in deep learning: a mountain perspective
- ↑
^{18.0}^{18.1}^{18.2}^{18.3}How Does the Gradient Descent Algorithm Work in Machine Learning? - ↑
^{19.0}^{19.1}^{19.2}What is Gradient Descent? - ↑ 1.5. Stochastic Gradient Descent — scikit-learn 0.23.2 documentation
- ↑ Keep it simple! How to understand Gradient Descent algorithm
- ↑
^{22.0}^{22.1}An overview of gradient descent optimization algorithms - ↑ Gradient Descent — ML Glossary documentation
- ↑ Gradient Descent For Machine Learning
- ↑ Gradient Descent Explained
- ↑
^{26.0}^{26.1}^{26.2}Reducing Loss: Gradient Descent - ↑ Stochastic gradient descent
- ↑
^{28.0}^{28.1}^{28.2}^{28.3}Gradient descent

## 메타데이터

### 위키데이터

- ID : Q1199743

### Spacy 패턴 목록

- [{'LOWER': 'gradient'}, {'LEMMA': 'descent'}]
- [{'LOWER': 'steepest'}, {'LEMMA': 'descent'}]
- [{'LOWER': 'method'}, {'LOWER': 'of'}, {'LOWER': 'steepest'}, {'LEMMA': 'descent'}]