How to debug Linux hang? The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsHow to debug a completely stuck kernel?How to debug system freeze?How to debug Linux kernel panic?How to debug random reboots, with no kernel panic, of an embedded system?Use netconsole to debug with kernel crashHow to run kernel in debug mode and wait for KGDB with Virtualbox and Kali Linux?How do I generate the /sys/kernel/debug/tracing folder in kernel with yocto project?How do I debug intermittent System Crashes?Embedded Linux: getting two distinct DHCP responsesCrash during startup on a recent corporate computerHow to tell Linux where initramfs is in RAM
What is the padding with red substance inside of steak packaging?
Word for: a synonym with a positive connotation?
Would an alien lifeform be able to achieve space travel if lacking in vision?
What happens to a Warlock's expended Spell Slots when they gain a Level?
How to read αἱμύλιος or when to aspirate
Am I ethically obligated to go into work on an off day if the reason is sudden?
Huge performance difference of the command find with and without using %M option to show permissions
Why can't wing-mounted spoilers be used to steepen approaches?
should truth entail possible truth
Why are PDP-7-style microprogrammed instructions out of vogue?
Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?
60's-70's movie: home appliances revolting against the owners
Is there a writing software that you can sort scenes like slides in PowerPoint?
Using dividends to reduce short term capital gains?
Is every episode of "Where are my Pants?" identical?
How to type a long/em dash `—`
Sort list of array linked objects by keys and values
For what reasons would an animal species NOT cross a *horizontal* land bridge?
Can each chord in a progression create its own key?
How to determine omitted units in a publication
Is an up-to-date browser secure on an out-of-date OS?
Homework question about an engine pulling a train
Didn't get enough time to take a Coding Test - what to do now?
How do spell lists change if the party levels up without taking a long rest?
How to debug Linux hang?
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election ResultsHow to debug a completely stuck kernel?How to debug system freeze?How to debug Linux kernel panic?How to debug random reboots, with no kernel panic, of an embedded system?Use netconsole to debug with kernel crashHow to run kernel in debug mode and wait for KGDB with Virtualbox and Kali Linux?How do I generate the /sys/kernel/debug/tracing folder in kernel with yocto project?How do I debug intermittent System Crashes?Embedded Linux: getting two distinct DHCP responsesCrash during startup on a recent corporate computerHow to tell Linux where initramfs is in RAM
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
We are using beagle bone black based custom board, with kernel version 3.12.
We are facing system hang during one of the init script,(Which brings up WiFi)
this hang happens after random number of power cycle,
Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys
I assume this to be in the ISR code because of which none of the thing works.
Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK
)' we don't see the issue. :(
Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.
However we want to find out where the issue is.
Any suggestion?
I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?
Randomness of the bug makes it much more difficult to debug :(
Any help appreciated.
linux linux-kernel embedded
|
show 12 more comments
We are using beagle bone black based custom board, with kernel version 3.12.
We are facing system hang during one of the init script,(Which brings up WiFi)
this hang happens after random number of power cycle,
Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys
I assume this to be in the ISR code because of which none of the thing works.
Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK
)' we don't see the issue. :(
Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.
However we want to find out where the issue is.
Any suggestion?
I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?
Randomness of the bug makes it much more difficult to debug :(
Any help appreciated.
linux linux-kernel embedded
Did you look carefully at the logs when you rebooted after it hung?
– Julie Pelletier
Jun 6 '16 at 6:23
Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.
– AnkurTank
Jun 6 '16 at 6:32
@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic
– user140866
Jun 6 '16 at 6:46
1
Did you try to redirect console to serial, and boot the kernel withloglevel=7
withoutquiet
(if any)? Are there some obscure messages from kernel were coming?
– user140866
Jun 6 '16 at 6:47
2
DETECT_HUNG_TASK
is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.
– user140866
Jun 6 '16 at 6:52
|
show 12 more comments
We are using beagle bone black based custom board, with kernel version 3.12.
We are facing system hang during one of the init script,(Which brings up WiFi)
this hang happens after random number of power cycle,
Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys
I assume this to be in the ISR code because of which none of the thing works.
Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK
)' we don't see the issue. :(
Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.
However we want to find out where the issue is.
Any suggestion?
I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?
Randomness of the bug makes it much more difficult to debug :(
Any help appreciated.
linux linux-kernel embedded
We are using beagle bone black based custom board, with kernel version 3.12.
We are facing system hang during one of the init script,(Which brings up WiFi)
this hang happens after random number of power cycle,
Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys
I assume this to be in the ISR code because of which none of the thing works.
Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK
)' we don't see the issue. :(
Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.
However we want to find out where the issue is.
Any suggestion?
I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?
Randomness of the bug makes it much more difficult to debug :(
Any help appreciated.
linux linux-kernel embedded
linux linux-kernel embedded
edited Jun 6 '16 at 7:00
AnkurTank
asked Jun 6 '16 at 6:16
AnkurTankAnkurTank
3912827
3912827
Did you look carefully at the logs when you rebooted after it hung?
– Julie Pelletier
Jun 6 '16 at 6:23
Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.
– AnkurTank
Jun 6 '16 at 6:32
@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic
– user140866
Jun 6 '16 at 6:46
1
Did you try to redirect console to serial, and boot the kernel withloglevel=7
withoutquiet
(if any)? Are there some obscure messages from kernel were coming?
– user140866
Jun 6 '16 at 6:47
2
DETECT_HUNG_TASK
is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.
– user140866
Jun 6 '16 at 6:52
|
show 12 more comments
Did you look carefully at the logs when you rebooted after it hung?
– Julie Pelletier
Jun 6 '16 at 6:23
Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.
– AnkurTank
Jun 6 '16 at 6:32
@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic
– user140866
Jun 6 '16 at 6:46
1
Did you try to redirect console to serial, and boot the kernel withloglevel=7
withoutquiet
(if any)? Are there some obscure messages from kernel were coming?
– user140866
Jun 6 '16 at 6:47
2
DETECT_HUNG_TASK
is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.
– user140866
Jun 6 '16 at 6:52
Did you look carefully at the logs when you rebooted after it hung?
– Julie Pelletier
Jun 6 '16 at 6:23
Did you look carefully at the logs when you rebooted after it hung?
– Julie Pelletier
Jun 6 '16 at 6:23
Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.
– AnkurTank
Jun 6 '16 at 6:32
Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.
– AnkurTank
Jun 6 '16 at 6:32
@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic
– user140866
Jun 6 '16 at 6:46
@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic
– user140866
Jun 6 '16 at 6:46
1
1
Did you try to redirect console to serial, and boot the kernel with
loglevel=7
without quiet
(if any)? Are there some obscure messages from kernel were coming?– user140866
Jun 6 '16 at 6:47
Did you try to redirect console to serial, and boot the kernel with
loglevel=7
without quiet
(if any)? Are there some obscure messages from kernel were coming?– user140866
Jun 6 '16 at 6:47
2
2
DETECT_HUNG_TASK
is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.– user140866
Jun 6 '16 at 6:52
DETECT_HUNG_TASK
is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.– user140866
Jun 6 '16 at 6:52
|
show 12 more comments
1 Answer
1
active
oldest
votes
Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.
However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)
So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.
inside function he used GPIO toggle as follows,
func()
//set gpio high
some doubtfull code..
....
//set gpio low
That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.
@ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f287890%2fhow-to-debug-linux-hang%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.
However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)
So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.
inside function he used GPIO toggle as follows,
func()
//set gpio high
some doubtfull code..
....
//set gpio low
That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.
@ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)
add a comment |
Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.
However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)
So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.
inside function he used GPIO toggle as follows,
func()
//set gpio high
some doubtfull code..
....
//set gpio low
That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.
@ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)
add a comment |
Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.
However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)
So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.
inside function he used GPIO toggle as follows,
func()
//set gpio high
some doubtfull code..
....
//set gpio low
That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.
@ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)
Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.
However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)
So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.
inside function he used GPIO toggle as follows,
func()
//set gpio high
some doubtfull code..
....
//set gpio low
That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.
@ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)
answered Jun 8 '16 at 10:53
AnkurTankAnkurTank
3912827
3912827
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f287890%2fhow-to-debug-linux-hang%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
-embedded, linux, linux-kernel
Did you look carefully at the logs when you rebooted after it hung?
– Julie Pelletier
Jun 6 '16 at 6:23
Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.
– AnkurTank
Jun 6 '16 at 6:32
@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic
– user140866
Jun 6 '16 at 6:46
1
Did you try to redirect console to serial, and boot the kernel with
loglevel=7
withoutquiet
(if any)? Are there some obscure messages from kernel were coming?– user140866
Jun 6 '16 at 6:47
2
DETECT_HUNG_TASK
is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.– user140866
Jun 6 '16 at 6:52