_pywrap_tensorflow_internal.Unfortunately, determining the source of a hung application usually requires quite a bit of detective work. _pywrap_tensorflow_internal.pyd!tensorflow::WindowsFileSystem::TranslateName+0x255 _pywrap_tensorflow_internal.pyd!Eigen::ThreadPoolTempl::WorkerLoop+0x701 _pywrap_tensorflow_internal.pyd!Eigen::ThreadPoolTempl::WorkerLoop+0x3f6 _pywrap_tensorflow_internal.pyd!tensorflow::data::DatasetBaseIterator::RecordElement+0圆f _pywrap_tensorflow_internal.pyd!tensorflow::NodeDef::mutable_experimental_debug_info+0x11f78 _pywrap_tensorflow_internal.pyd!tensorflow::NodeDef::mutable_experimental_debug_info+0xef56 _pywrap_tensorflow_internal.pyd!std::default_delete::operator()+0x420 _pywrap_tensorflow_internal.pyd!std::default_delete::operator()+0x97d Python37.dll!PyType_GetDocFromInternalDoc+0x22d Python37.dll!PyEval_EvalFrameDefault+0x1174 Python37.dll!PyFunction_FastCallDict+0x1ba Python37.dll!PyEval_EvalCodeWithName+0x1a6 Python37.dll!PyEval_EvalFrameDefault+0x7e4 Python37.dll!PyObject_FastCall_Prepend+0圆c Python37.dll!PyFunction_FastCallDict+0xdd Python37.dll!PyEval_EvalFrameDefault+0x403 Python37.dll!PyMethodDef_RawFastCallKeywords+0xa5c Python37.dll!PyMethodDef_RawFastCallKeywords+0x387 _pywrap_tensorflow_internal.pyd!std::vector >::reserve+0x85a _pywrap_tensorflow_internal.pyd!tensorflow::DataTypeSet::Contains+0x2350 _pywrap_tensorflow_internal.pyd!TFE_TensorHandleResolve+0x227 _pywrap_tensorflow_internal.pyd!std::vector >::operator=+0圆23 _pywrap_tensorflow_internal.pyd!google::protobuf::RepeatedPtrField::Add+0x8f0c _pywrap_tensorflow_internal.pyd!tensorflow::AllocationRecord::Clear+0x550c _pywrap_tensorflow_internal.pyd!tensorflow::Env::NowSeconds+0xd16 _pywrap_tensorflow_internal.pyd!std::unique_ptr >::~unique_ptr >+0x10b4f Since I was not able to reproduce the problem reliably, I have no idea which part of my code might actually be important, The data pipeline uses dataset.map() with num_parallel_calls = tf. for multiprocessing and contains two tf.np_function()s (after which I have to use tf.ensure_shape to recover the correct shape for the data). The upper few frames (nvcuda.dll!cuProfilerStop) change when refreshing but the rest of the frame stays constant. Looking into Process explorer, I can see that there is only one really active thread:Ī Callstack for the thread is attached below. However, if I restart the training without rebooting my PC, the hang is much more likely to occur again. Using verbose=1 as well as tensorboard also shows that no progress is being made anymore. Whenever this happens, the console process stays open but the CPU/GPU will return to 0% utilization. Training a CNN using tf.() and tensorflow's data pipeline with tfrecord files randomly seems to freeze/stop/hang. GPU model and memory: Nvidia GeForce RTX 2070 8GB.GCC/Compiler version (if compiling from source):.Bazel version (if compiling from source):.Python version: Python 3.7.6 (Anaconda).TensorFlow version (use command below): unknown 2.0.0 (see below).TensorFlow installed from (source or binary): binary.iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows.Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes.
0 Comments
Leave a Reply. |