How to debug Linux hang? The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsHow to debug a completely stuck kernel?How to debug system freeze?How to debug Linux kernel panic?How to debug random reboots, with no kernel panic, of an embedded system?Use netconsole to debug with kernel crashHow to run kernel in debug mode and wait for KGDB with Virtualbox and Kali Linux?How do I generate the /sys/kernel/debug/tracing folder in kernel with yocto project?How do I debug intermittent System Crashes?Embedded Linux: getting two distinct DHCP responsesCrash during startup on a recent corporate computerHow to tell Linux where initramfs is in RAM

What is the padding with red substance inside of steak packaging?

Word for: a synonym with a positive connotation?

Would an alien lifeform be able to achieve space travel if lacking in vision?

What happens to a Warlock's expended Spell Slots when they gain a Level?

How to read αἱμύλιος or when to aspirate

Am I ethically obligated to go into work on an off day if the reason is sudden?

Huge performance difference of the command find with and without using %M option to show permissions

Why can't wing-mounted spoilers be used to steepen approaches?

should truth entail possible truth

Why are PDP-7-style microprogrammed instructions out of vogue?

Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?

60's-70's movie: home appliances revolting against the owners

Is there a writing software that you can sort scenes like slides in PowerPoint?

Using dividends to reduce short term capital gains?

Is every episode of "Where are my Pants?" identical?

How to type a long/em dash `—`

Sort list of array linked objects by keys and values

For what reasons would an animal species NOT cross a *horizontal* land bridge?

Can each chord in a progression create its own key?

How to determine omitted units in a publication

Is an up-to-date browser secure on an out-of-date OS?

Homework question about an engine pulling a train

Didn't get enough time to take a Coding Test - what to do now?

How do spell lists change if the party levels up without taking a long rest?



How to debug Linux hang?



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election ResultsHow to debug a completely stuck kernel?How to debug system freeze?How to debug Linux kernel panic?How to debug random reboots, with no kernel panic, of an embedded system?Use netconsole to debug with kernel crashHow to run kernel in debug mode and wait for KGDB with Virtualbox and Kali Linux?How do I generate the /sys/kernel/debug/tracing folder in kernel with yocto project?How do I debug intermittent System Crashes?Embedded Linux: getting two distinct DHCP responsesCrash during startup on a recent corporate computerHow to tell Linux where initramfs is in RAM



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















We are using beagle bone black based custom board, with kernel version 3.12.

We are facing system hang during one of the init script,(Which brings up WiFi)

this hang happens after random number of power cycle,



Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys



I assume this to be in the ISR code because of which none of the thing works.



Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK)' we don't see the issue. :(



Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.



However we want to find out where the issue is.



Any suggestion?



I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?



Randomness of the bug makes it much more difficult to debug :(



Any help appreciated.










share|improve this question
























  • Did you look carefully at the logs when you rebooted after it hung?

    – Julie Pelletier
    Jun 6 '16 at 6:23











  • Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.

    – AnkurTank
    Jun 6 '16 at 6:32











  • @JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic

    – user140866
    Jun 6 '16 at 6:46






  • 1





    Did you try to redirect console to serial, and boot the kernel with loglevel=7 without quiet (if any)? Are there some obscure messages from kernel were coming?

    – user140866
    Jun 6 '16 at 6:47






  • 2





    DETECT_HUNG_TASK is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.

    – user140866
    Jun 6 '16 at 6:52

















1















We are using beagle bone black based custom board, with kernel version 3.12.

We are facing system hang during one of the init script,(Which brings up WiFi)

this hang happens after random number of power cycle,



Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys



I assume this to be in the ISR code because of which none of the thing works.



Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK)' we don't see the issue. :(



Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.



However we want to find out where the issue is.



Any suggestion?



I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?



Randomness of the bug makes it much more difficult to debug :(



Any help appreciated.










share|improve this question
























  • Did you look carefully at the logs when you rebooted after it hung?

    – Julie Pelletier
    Jun 6 '16 at 6:23











  • Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.

    – AnkurTank
    Jun 6 '16 at 6:32











  • @JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic

    – user140866
    Jun 6 '16 at 6:46






  • 1





    Did you try to redirect console to serial, and boot the kernel with loglevel=7 without quiet (if any)? Are there some obscure messages from kernel were coming?

    – user140866
    Jun 6 '16 at 6:47






  • 2





    DETECT_HUNG_TASK is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.

    – user140866
    Jun 6 '16 at 6:52













1












1








1


1






We are using beagle bone black based custom board, with kernel version 3.12.

We are facing system hang during one of the init script,(Which brings up WiFi)

this hang happens after random number of power cycle,



Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys



I assume this to be in the ISR code because of which none of the thing works.



Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK)' we don't see the issue. :(



Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.



However we want to find out where the issue is.



Any suggestion?



I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?



Randomness of the bug makes it much more difficult to debug :(



Any help appreciated.










share|improve this question
















We are using beagle bone black based custom board, with kernel version 3.12.

We are facing system hang during one of the init script,(Which brings up WiFi)

this hang happens after random number of power cycle,



Nothing works during this hang, it looks like system is frozen, It doesn't even respond to sysrq keys



I assume this to be in the ISR code because of which none of the thing works.



Unluckily When we enable 'Detect hung task(DETECT_HUNG_TASK)' we don't see the issue. :(



Only thing works is if watchdog is enabled after watchdog timer expires it reboots the system and system recovers.



However we want to find out where the issue is.



Any suggestion?



I thought of using softdog and repair script pair to print some messages but I assume external interrupt will have higher priority and when it executes and hangs in there, softdog timer will also not get a chance to execute right ?



Randomness of the bug makes it much more difficult to debug :(



Any help appreciated.







linux linux-kernel embedded






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 6 '16 at 7:00







AnkurTank

















asked Jun 6 '16 at 6:16









AnkurTankAnkurTank

3912827




3912827












  • Did you look carefully at the logs when you rebooted after it hung?

    – Julie Pelletier
    Jun 6 '16 at 6:23











  • Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.

    – AnkurTank
    Jun 6 '16 at 6:32











  • @JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic

    – user140866
    Jun 6 '16 at 6:46






  • 1





    Did you try to redirect console to serial, and boot the kernel with loglevel=7 without quiet (if any)? Are there some obscure messages from kernel were coming?

    – user140866
    Jun 6 '16 at 6:47






  • 2





    DETECT_HUNG_TASK is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.

    – user140866
    Jun 6 '16 at 6:52

















  • Did you look carefully at the logs when you rebooted after it hung?

    – Julie Pelletier
    Jun 6 '16 at 6:23











  • Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.

    – AnkurTank
    Jun 6 '16 at 6:32











  • @JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic

    – user140866
    Jun 6 '16 at 6:46






  • 1





    Did you try to redirect console to serial, and boot the kernel with loglevel=7 without quiet (if any)? Are there some obscure messages from kernel were coming?

    – user140866
    Jun 6 '16 at 6:47






  • 2





    DETECT_HUNG_TASK is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.

    – user140866
    Jun 6 '16 at 6:52
















Did you look carefully at the logs when you rebooted after it hung?

– Julie Pelletier
Jun 6 '16 at 6:23





Did you look carefully at the logs when you rebooted after it hung?

– Julie Pelletier
Jun 6 '16 at 6:23













Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.

– AnkurTank
Jun 6 '16 at 6:32





Yes, We don't see anything there :( We doubt WiFI over SDIO driver, but we are not sure about it. Because when we don't load WiFi driver module we don't see it. However sometimes we don't see it even when it is enabled.

– AnkurTank
Jun 6 '16 at 6:32













@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic

– user140866
Jun 6 '16 at 6:46





@JuliePelletier Logs usually cannot be written after a hard hang, similar to kernel panic

– user140866
Jun 6 '16 at 6:46




1




1





Did you try to redirect console to serial, and boot the kernel with loglevel=7 without quiet (if any)? Are there some obscure messages from kernel were coming?

– user140866
Jun 6 '16 at 6:47





Did you try to redirect console to serial, and boot the kernel with loglevel=7 without quiet (if any)? Are there some obscure messages from kernel were coming?

– user140866
Jun 6 '16 at 6:47




2




2





DETECT_HUNG_TASK is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.

– user140866
Jun 6 '16 at 6:52





DETECT_HUNG_TASK is usually for userspace tasks that hang inside system call. If hang comes from kernel code (for example, driver), it is useless.

– user140866
Jun 6 '16 at 6:52










1 Answer
1






active

oldest

votes


















0














Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.



However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)



So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.



inside function he used GPIO toggle as follows,



func()

//set gpio high
some doubtfull code..
....
//set gpio low



That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.



@ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)






share|improve this answer























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f287890%2fhow-to-debug-linux-hang%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.



    However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)



    So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.



    inside function he used GPIO toggle as follows,



    func()

    //set gpio high
    some doubtfull code..
    ....
    //set gpio low



    That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.



    @ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)






    share|improve this answer



























      0














      Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.



      However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)



      So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.



      inside function he used GPIO toggle as follows,



      func()

      //set gpio high
      some doubtfull code..
      ....
      //set gpio low



      That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.



      @ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)






      share|improve this answer

























        0












        0








        0







        Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.



        However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)



        So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.



        inside function he used GPIO toggle as follows,



        func()

        //set gpio high
        some doubtfull code..
        ....
        //set gpio low



        That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.



        @ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)






        share|improve this answer













        Well, We did code reading as it was being suggested in the comments and found the section of the patch where system may go into infinite loop(in irq) and won't come out of it.



        However when we put printk in that irq function issue was not getting reproduced. (timing issue you know!)



        So finally my colleague tried old school method of toggling GPIO and it helped. That was also difficult as more than two entries of GPIO toggle would prevent reproducing issue.



        inside function he used GPIO toggle as follows,



        func()

        //set gpio high
        some doubtfull code..
        ....
        //set gpio low



        That's how he tracked the problematic code and its solution is available in linux-4.1 he fixed it and he is testing it.



        @ShankarSM:If you are reading this, all credit goes to you for tracking down it :-)







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 8 '16 at 10:53









        AnkurTankAnkurTank

        3912827




        3912827



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f287890%2fhow-to-debug-linux-hang%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            -embedded, linux, linux-kernel

            Popular posts from this blog

            Frič See also Navigation menuinternal link

            Identify plant with long narrow paired leaves and reddish stems Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?What is this plant with long sharp leaves? Is it a weed?What is this 3ft high, stalky plant, with mid sized narrow leaves?What is this young shrub with opposite ovate, crenate leaves and reddish stems?What is this plant with large broad serrated leaves?Identify this upright branching weed with long leaves and reddish stemsPlease help me identify this bulbous plant with long, broad leaves and white flowersWhat is this small annual with narrow gray/green leaves and rust colored daisy-type flowers?What is this chilli plant?Does anyone know what type of chilli plant this is?Help identify this plant

            fontconfig warning: “/etc/fonts/fonts.conf”, line 100: unknown “element blank” The 2019 Stack Overflow Developer Survey Results Are In“tar: unrecognized option --warning” during 'apt-get install'How to fix Fontconfig errorHow do I figure out which font file is chosen for a system generic font alias?Why are some apt-get-installed fonts being ignored by fc-list, xfontsel, etc?Reload settings in /etc/fonts/conf.dTaking 30 seconds longer to boot after upgrade from jessie to stretchHow to match multiple font names with a single <match> element?Adding a custom font to fontconfigRemoving fonts from fontconfig <match> resultsBroken fonts after upgrading Firefox ESR to latest Firefox