2021년 2월 17일 (수) 08:21에 Pythagoras0님의 편집

2021-02-17T08:21:18Z

Pythagoras0: /* 메타데이터 */ 새 문단

2020-12-26T12:24:12Z

메타데이터: 새 문단

Pythagoras0: /* 노트 */ 새 문단

2020-12-21T08:54:03Z

노트: 새 문단

새 문서

== 노트 ==

===위키데이터===
* ID : [https://www.wikidata.org/wiki/Q65121812 Q65121812]
===말뭉치===
# The learning rate is a hyperparameter that controls how much we are adjusting the weights of our network with respect to the loss gradient.<ref name="ref_f07907a5">[https://blog.dataiku.com/the-learning-rate-finder-technique-how-reliable-is-it The Learning Rate Finder Technique: How Reliable Is It?]</ref>
# Indeed, one of the many challenges in training deep neural networks has historically been the selection of a good learning rate — that’s where the Learning Rate Range Test (LRRT) comes in.<ref name="ref_f07907a5" />
# The LRRT was proposed in 2015 and made popular by the fast.ai’s deep learning library as the Learning Rate Finder, also known as the LRFinder.<ref name="ref_f07907a5" />
# The LRRT consists of, at most, one epoch of training iterations, where the learning rate is increased at every mini-batch of data.<ref name="ref_f07907a5" />
# Under a weight-update algorithm, a low learning rate would make the network learning slowly, and a high learning rate would make the weights and error function diverge.<ref name="ref_77548098">[https://link.springer.com/chapter/10.1007/978-3-540-92137-0_76 Theoretical and Empirical Analysis of the Learning Rate and Momentum Factor in Neural Network Modeling for Stock Prediction]</ref>
# A backpropagation learning algorithm for feedforward neural networks withan adaptive learning rate is derived.<ref name="ref_749364d0">[https://link.springer.com/article/10.1023/A:1009686825582 Towards the Optimal Learning Rate for Backpropagation]</ref>
# Indeed,the derived optimal adaptive learning rate of a neural network trainedby backpropagation degenerates to the learning rate of the NLMS for a linear activation function of a neuron.<ref name="ref_749364d0" />
# By continuity, the optimal adaptive learning rate for neural networks imposes additional stabilisationeffects to the traditional backpropagation learning algorithm.<ref name="ref_749364d0" />
# The mathematical considerations that go into the derivation of backpropagation require that the learning rate be as small as possible.<ref name="ref_99830a96">[https://www.spiedigitallibrary.org/conference-proceedings-of-spie/1966/0000/Adaptive-learning-rate-for-increasing-learning-speed-in-backpropagation-networks/10.1117/12.152622.full Adaptive learning rate for increasing learning speed in backpropagation networks]</ref>
# Our simulation results suggest that the adaptive learning rate modification helps substantially speed up the convergence of backpropagation algorithm.<ref name="ref_99830a96" />
# Furthermore, it makes the initial choice of the learning rate fairly unimportant as our method allows the learning rate to change and settle at a reasonable value for the specific problem.<ref name="ref_99830a96" />
# Simulation results indicate that our heuristic modification matches the performance of backpropagation with the quasi-optimal learning rate.<ref name="ref_99830a96" />
# This can be counteracted by slowly decreasing the learning rate.<ref name="ref_99fb1f7b">[https://wiki.tum.de/display/lfdv/Adaptive+Learning+Rate+Method Adaptive Learning Rate Method]</ref>
# Learning rate is one such hyperparameter that defines the adjustment in the weights of our network with respect to the loss gradient.<ref name="ref_76e0046a">[https://heartbeat.fritz.ai/introduction-to-learning-rates-in-machine-learning-6ed685c16506 Introduction to Learning Rates in Machine Learning]</ref>
# With a desirable learning rate, he would quickly understand that black color is not an important feature of birds and would look for another feature.<ref name="ref_76e0046a" />
# But with a low learning rate, he would consider the yellow bird an outlier and would continue to believe that all birds are black.<ref name="ref_76e0046a" />
# It’s really important to achieve a desirable learning rate because both low and high learning rates result in wasted time and resources.<ref name="ref_76e0046a" />
# The learning rate warmup for Adam is a must-have trick for stable training in certain situations (or eps tuning).<ref name="ref_77a92449">[https://github.com/LiyuanLucasLiu/RAdam LiyuanLucasLiu/RAdam: On the Variance of the Adaptive Learning Rate and Beyond]</ref>
# The variance of the adaptive learning rate is simulated and plotted in Figure 1 (blue curve).<ref name="ref_77a92449" />
# A better idea is to select the dynamic learning rate which decreases over time because it allows the algorithm to swiftly identify the point.<ref name="ref_d788f200">[https://medium.com/analytics-vidhya/cost-function-learning-rate-and-gradient-descent-in-machine-learning-3dfd033e2d59 Cost Function, Learning rate, and Gradient Descent in Machine learning]</ref>
# It is common practice to decay the learning rate.<ref name="ref_90c13e42">[https://research.google/pubs/pub46644/ Don't decay the learning rate, increase the batch size – Google Research]</ref>
# Operationally, stochastic inference iteratively subsamples from the data, analyzes the subsample, and updates parameters with a decreasing learning rate.<ref name="ref_e8f2b32e">[http://proceedings.mlr.press/v28/ranganath13.html An Adaptive Learning Rate for Stochastic Variational Inference]</ref>
# We solve this problem by developing an adaptive learning rate for stochastic inference.<ref name="ref_e8f2b32e" />
# Inference with the adaptive learning rate converges faster and to a better approximation than the best settings of hand-tuned rates.<ref name="ref_e8f2b32e" />
# Abstract: Intriguing empirical evidence exists that deep learning can work well with exotic schedules for varying the learning rate.<ref name="ref_5a29a5ab">[https://iclr.cc/virtual_2020/poster_rJg8TeSFDH.html ICLR: An Exponential Learning Rate Schedule for Deep Learning]</ref>
# • Training can be done using SGD with momentum and an exponentially in- creasing learning rate schedule, i.e., learning rate increases by some (1 + α) factor in every epoch for some α > 0.<ref name="ref_5a29a5ab" />
# The learning rate controls how quickly the model is adapted to the problem.<ref name="ref_fd35f135">[https://stackoverflow.com/questions/58266988/learning-rate-gradient-descent-difference Learning rate & gradient descent difference?]</ref>
# The challenge of training deep learning neural networks involves carefully selecting the learning rate.<ref name="ref_fd35f135" />
# It is otherefore often necessary to reduce the global learning rate µ when using a lot of momentum (m close to 1).<ref name="ref_02a92368">[https://cnl.salk.edu/~schraudo/teach/NNcourse/momrate.html Momentum and Learning Rate Adaptation]</ref>
# A useful batch method for adapting the global learning rate µ is the bold driver algorithm.<ref name="ref_02a92368" />
# In order to actually reach the minimum, and stay there, we must anneal (gradually lower) the global learning rate.<ref name="ref_02a92368" />
# At time t, we would like to change the learning rate (before changing the weight) such that the loss E(t+1) at the next time step is reduced.<ref name="ref_02a92368" />
# Learning rate is an important hyperparameter that controls how much we adjust the weights in the network according to the gradient.<ref name="ref_7e86914f">[https://analyticsindiamag.com/comprehensive-guide-to-learning-rate-algorithms-with-python-codes/ Comprehensive Guide To Learning Rate Algorithms (With Python Codes)]</ref>
# Here, η represents the learning rate.<ref name="ref_7e86914f" />
# Adaptive learning rate algorithm – Here, the optimizers help in changing the learning rate throughout the process of training.<ref name="ref_7e86914f" />
# One solution to this is fixing our learning rate large enough to escape the saddle.<ref name="ref_7e86914f" />
# # Decay Learning Rate, pass validation accuracy for tracking at every epoch print ( 'Epoch {} completed' .<ref name="ref_1e72376a">[https://www.deeplearningwizard.com/deep_learning/boosting_models_pytorch/lr_scheduling/ Learning Rate Scheduling]</ref>
# Learning rate decay (lrDecay) is a \emph{de facto} technique for training modern neural networks.<ref name="ref_6d161af3">[https://openreview.net/forum?id=r1eOnh4YPB How Does Learning Rate Decay Help Modern Neural Networks?]</ref>
# It starts with a large learning rate and then decays it multiple times.<ref name="ref_6d161af3" />
# We provide another novel explanation: an initially large learning rate suppresses the network from memorizing noisy data while decaying the learning rate improves the learning of complex patterns.<ref name="ref_6d161af3" />
# In this post, I’m describing a simple and powerful way to find a reasonable learning rate that I learned from fast.ai Deep Learning course.<ref name="ref_fe63c7de">[https://www.kdnuggets.com/2017/11/estimating-optimal-learning-rate-deep-neural-network.html Estimating an Optimal Learning Rate For a Deep Neural Network]</ref>
# All of them let you set the learning rate.<ref name="ref_fe63c7de" />
# If the learning rate is high, then training may not converge or even diverge.<ref name="ref_fe63c7de" />
# There are multiple ways to select a good starting point for the learning rate.<ref name="ref_fe63c7de" />
# Learning rate is a hyper-parameter that controls the weights of our neural network with respect to the loss gradient.<ref name="ref_b7f61b67">[https://www.educative.io/edpresso/learning-rate-in-machine-learning Learning Rate in Machine learning]</ref>
# A desirable learning rate is low enough that the network converges to something useful, but high enough that it can be trained in a reasonable amount of time.<ref name="ref_b7f61b67" />
# Selecting a learning rate is an example of a "meta-problem" known as hyperparameter optimization.<ref name="ref_781c625d">[https://datascience.stackexchange.com/questions/410/choosing-a-learning-rate Choosing a learning rate]</ref>
# The best learning rate depends on the problem at hand, as well as on the architecture of the model being optimized, and even on the state of the model in the current optimization process!<ref name="ref_781c625d" />
# Simulated annealing is a technique for optimizing a model whereby one starts with a large learning rate and gradually reduces the learning rate as optimization progresses.<ref name="ref_781c625d" />
# This can be combined with early stopping to optimize the model with one learning rate as long as progress is being made, then switch to a smaller learning rate once progress appears to slow.<ref name="ref_781c625d" />
# The learning rate is the most important hyperparameter for tuning neural networks.<ref name="ref_9c16b1d5">[https://mlexplained.com/2018/01/29/learning-rate-tuning-in-deep-learning-a-practical-guide/ Learning Rate Tuning in Deep Learning: A Practical Guide]</ref>
# The learning rate represents how large a step you take in that direction.<ref name="ref_9c16b1d5" />
# When we look at the left diagram, we see that both steps are in the direction of the gradient; the only difference is the step size.<ref name="ref_9c16b1d5" />
# If the step size is too large, your parameters go back and forth between points with a large loss and easily overshoot the minima (the point where the loss function is at its lowest).<ref name="ref_9c16b1d5" />
# Step #1: We start by defining an upper and lower bound on our learning rate.<ref name="ref_1bfd15cf">[https://www.pyimagesearch.com/2019/08/05/keras-learning-rate-finder/ Keras Learning Rate Finder]</ref>
# At 1e-10 the learning rate will be too small for our network to learn, while at 1e+1 the learning rate will be too large and our model will overfit.<ref name="ref_1bfd15cf" />
# We start by defining an upper and lower bound on our learning rate.<ref name="ref_1bfd15cf" />
# After each batch update, we exponentially increase our learning rate.<ref name="ref_1bfd15cf" />
# With standard steepest descent, the learning rate is held constant throughout training.<ref name="ref_495bbd62">[https://www.mathworks.com/help/deeplearning/ref/traingda.html Gradient descent with adaptive learning rate backpropagation]</ref>
# If the learning rate is set too high, the algorithm can oscillate and become unstable.<ref name="ref_495bbd62" />
# If the learning rate is too small, the algorithm takes too long to converge.<ref name="ref_495bbd62" />
# You can improve the performance of the steepest descent algorithm if you allow the learning rate to change during the training process.<ref name="ref_495bbd62" />
# One of the simplest learning rate strategies is to have a fixed learning rate throughout the training process.<ref name="ref_5b3f35e0">[https://mxnet.apache.org/versions/1.7/api/python/docs/tutorials/packages/gluon/training/learning_rates/learning_rate_schedules.html Learning Rate Schedules — Apache MXNet documentation]</ref>
# Choosing a small learning rate allows the optimizer find good solutions, but this comes at the expense of limiting the initial speed of convergence.<ref name="ref_5b3f35e0" />
# Schedules define how the learning rate changes over time and are typically specified for each epoch or iteration (i.e. batch) of training.<ref name="ref_5b3f35e0" />
# All of these schedules define the learning rate for a given iteration, and it is expected that iterations start at 1 rather than 0.<ref name="ref_5b3f35e0" />
# When training a model, it is often recommended to lower the learning rate as the training progresses.<ref name="ref_89df335a">[https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/exponential_decay tf.compat.v1.train.exponential_decay]</ref>
# This function applies an exponential decay function to a provided initial learning rate.<ref name="ref_89df335a" />
# It requires a global_step value to compute the decayed learning rate.<ref name="ref_89df335a" />
# The function returns the decayed learning rate.<ref name="ref_89df335a" />
# computing a more accurate learning rate.<ref name="ref_6d3e8006">[https://www.researchgate.net/publication/331208233_A_novel_adaptive_learning_rate_scheduler_for_deep_neural_networks (PDF) A novel adaptive learning rate scheduler for deep neural networks]</ref>
# Learning rate is used to scale the magnitude of parameter updates during gradient descent.<ref name="ref_b5a56aaf">[https://www.mygreatlearning.com/blog/understanding-learning-rate-in-machine-learning/ Understanding Learning Rate in Machine Learning]</ref>
# The choice of the value for learning rate can impact two things: 1) how fast the algorithm learns and 2) whether the cost function is minimized or not.<ref name="ref_b5a56aaf" />
# It can be seen that for an optimal value of the learning rate, the cost function value is minimized in a few iterations (smaller time).<ref name="ref_b5a56aaf" />
# If the learning rate used is lower than the optimal value, the number of iterations/epochs required to minimize the cost function is high (takes longer time).<ref name="ref_b5a56aaf" />
# Gradient descent algorithms multiply the gradient by a scalar known as the learning rate (also sometimes called step size) to determine the next point.<ref name="ref_af5c77ad">[https://developers.google.com/machine-learning/crash-course/reducing-loss/learning-rate Reducing Loss: Learning Rate]</ref>
# Most machine learning programmers spend a fair amount of time tuning the learning rate.<ref name="ref_af5c77ad" />
# The learning rate hyperparameter controls the rate or speed at which the model learns.<ref name="ref_d9f725a8">[https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/ How to Configure the Learning Rate When Training Deep Learning Neural Networks]</ref>
# Generally, a large learning rate allows the model to learn faster, at the cost of arriving on a sub-optimal final set of weights.<ref name="ref_d9f725a8" />
# When the learning rate is too large, gradient descent can inadvertently increase rather than decrease the training error.<ref name="ref_d9f725a8" />
# Therefore, we should not use a learning rate that is too large or too small.<ref name="ref_d9f725a8" />
# One of the key hyperparameters to set in order to train a neural network is the learning rate for gradient descent.<ref name="ref_db5af4c3">[https://www.jeremyjordan.me/nn-learning-rate/ Setting the learning rate of your neural network.]</ref>
# If your learning rate is set too low, training will progress very slowly as you are making very tiny updates to the weights in your network.<ref name="ref_db5af4c3" />
# However, if your learning rate is set too high, it can cause undesirable divergent behavior in your loss function.<ref name="ref_db5af4c3" />
# 3e-4 is the best learning rate for Adam, hands down.<ref name="ref_db5af4c3" />
# In setting a learning rate, there is a trade-off between the rate of convergence and overshooting.<ref name="ref_5e947bec">[https://en.wikipedia.org/wiki/Learning_rate Learning rate]</ref>
# While the descent direction is usually determined from the gradient of the loss function, the learning rate determines how big a step is taken in that direction.<ref name="ref_5e947bec" />
# A learning rate schedule changes the learning rate during learning and is most often changed between epochs/iterations.<ref name="ref_5e947bec" />
# Time-based learning schedules alter the learning rate depending on the learning rate of the previous time iteration.<ref name="ref_5e947bec" />
# This algorithm needs differentiable transfer function and the adaptive step size is recommended for Elman's RNN because of the delays involved in the training scheme.<ref name="ref_33cd6b8d">[https://www.sciencedirect.com/topics/computer-science/adaptive-learning-rate Adaptive Learning Rate - an overview]</ref>
# Learning rate is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient.<ref name="ref_e999e86c">[https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10 Understanding Learning Rates and How It Improves Performance in Deep Learning]</ref>
# Furthermore, the learning rate affects how quickly our model can converge to a local minima (aka arrive at the best accuracy).<ref name="ref_e999e86c" />
# In practice, our learning rate should ideally be somewhere to the left to the lowest point of the graph (as demonstrated in below graph).<ref name="ref_e999e86c" />
# One only needs to type in the following command to start finding the most optimal learning rate to use before training a neural network.<ref name="ref_e999e86c" />
# this would be the gradient multiplied by the learning rate).<ref name="ref_1f6a2613">[https://cs231n.github.io/neural-networks-3/ CS231n Convolutional Neural Networks for Visual Recognition]</ref>
# If it is lower than this then the learning rate might be too low.<ref name="ref_1f6a2613" />
# Unconverged network, improperly set learning rate, very low weight regularization penalty.<ref name="ref_1f6a2613" />
# In training deep networks, it is usually helpful to anneal the learning rate over time.<ref name="ref_1f6a2613" />
# The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.<ref name="ref_0320309d">[https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/ Understand the Impact of Learning Rate on Neural Network Performance]</ref>
# The learning rate may be the most important hyperparameter when configuring your neural network.<ref name="ref_0320309d" />
# The callbacks operate separately from the optimization algorithm, although they adjust the learning rate used by the optimization algorithm.<ref name="ref_0320309d" />
# Keras provides the ReduceLROnPlateau that will adjust the learning rate when a plateau in model performance is detected, e.g. no change for a given number of training epochs.<ref name="ref_0320309d" />
===소스===
<references />

@@ 109번째 줄: / 109번째 줄: @@
   <references />
-== 메타데이터 ==
+==메타데이터==
 ===위키데이터===
 * ID :  [https://www.wikidata.org/wiki/Q65121812 Q65121812]

← 이전 판		2020년 12월 26일 (토) 12:24 판
108번째 줄:		108번째 줄:
	===소스===		===소스===
	<references />		<references />
		+
		+	== 메타데이터 ==
		+
		+	===위키데이터===
		+	* ID : [https://www.wikidata.org/wiki/Q65121812 Q65121812]

Learning rate - 편집 역사

2021년 2월 17일 (수) 08:21에 Pythagoras0님의 편집

Pythagoras0: /* 메타데이터 */ 새 문단

Pythagoras0: /* 노트 */ 새 문단