| 2014-04-26 cgf-000026 |
| |
| Forgot to clear to the end of screen when restoring a screen buffer. |
| That worked, for some reason, with Take Command but not with normal |
| consoles. I don't remember why I didn't resize the screen like a Linux |
| X terminal emulator but that might have made things work a little |
| better. Right now, there is a scroll bar for apps like less or vi and |
| that doesn't feel right. |
| |
| 2014-03-29 cgf-000025 |
| |
| Reorganized _cygtls::signal_debugger to avoid sending anything to the |
| debugger if we've seen an exception. I think it used to work that way |
| and I changed it without noting why. It sure seems like, if we don't do |
| this, gdb will see two signals and, it really does, when there has been |
| a Windows-recognized exception. |
| |
| 2014-02-15 cgf-000024 |
| |
| Wow. It's hard getting the screen handling stuff working correctly when |
| there is a screen buffer larger than screen size and vice versa. These |
| changes attempt to use SetConsoleWindowInfo whenever possible so that |
| the contents of the screen buffer are never wiped out. They also fix |
| some previously misbehaving "scroll the screen" commands. |
| |
| 2013-06-07 cgf-000023 |
| |
| Given the fact that the signal thread never exits there is no need |
| for exit_thread to ever block. So, nuke this code. |
| |
| 2013-01-31 cgf-000022 |
| |
| While researching the lftp behavior reported here: |
| |
| http://cygwin.com/ml/cygwin/2013-01/msg00390.html |
| |
| after a frenzy of rewriting sigflush handling to avoid blocking in the |
| signal thread (which is now and should ever have been illegal), it |
| dawned on me that we're not supposed to be flushing the tty input buffer |
| every time a signal is received. We're supposed to do this only when |
| the user hits a character (e.g., CTRL-C) which initiates a signal |
| action. So, I removed sigflush from sigpacket::process and moved it to |
| tc ()->kill_pgrp (). This function should only be called to send |
| signals related to the tty so this should have the desired effect. |
| |
| 2013-01-11 cgf-000021 |
| |
| Apparently I got the signal handling semantics of select() wrong again |
| even though I would have sworn that I tested this on Linux and Windows. |
| |
| select() is apparently *always* interrupted by a signal and *never* |
| restarts. Hopefully, between the comment added to the code and this |
| note, I'll not make this mistake again. |
| |
| 2013-01-02 cgf-000020 |
| |
| (This entry should have been checked in with the changes but... I forgot) |
| |
| This is a fairly big revamp of the way that windows signals are handled. |
| The intent is that all signal decisions should be made by the signal |
| thread; not by the exception handler. |
| |
| This required the ability to pass information from the exception handler |
| to the signal thread so, a si_cyg field was added to siginfo_t. This |
| contains information needed to generate a "core dump". Hmm. Haven't |
| checked to see if this breaks Cygwin's hardly-ever-used real core dump |
| facility. |
| |
| Anyway, I moved signal_exit back into exceptions.cc and removed it from |
| the sigpacket class. This function is now treated like a signal handler |
| function - Cygwin will cause it to be dispatched in the context of |
| whatever thread caught the signal. signal_exit also makes the |
| determination about when to write a stackdump. |
| |
| The signal-handler thread will no longer ever attempt to exit. It will |
| just keep processing signals (it will not process real signals after |
| Cygwin stops shutting down, however). This should make it impossible |
| for the signal thread to ever block waiting for the process lock since |
| it now never grabs the process lock. The signal-handler thread will |
| notify gdb when it gets a signal now but, in theory, gdb should see the |
| context of the thread which received the signal, not the signal-handler |
| thread. |
| |
| 2012-12-28 cgf-000019 |
| |
| (I forgot to mention that cgf-000018 was reverted. Although I never saw |
| a hang from this, I couldn't convince myself that one wasn't possible.) |
| |
| This fix attempts to correct a deadlock where, when a true Windows |
| signal arrives, Windows creates a thread which "does stuff" and attempts |
| to exit. In the process of exiting Cygwin grabs the process lock. If |
| the signal thread has seen the signal and wants to exit, it can't |
| because the newly-created thread now holds it. But, since the new |
| thread is relying on the signal thread to release its process lock, |
| it exits and the process lock is never released. |
| |
| To fix this, I removed calls to _cygtls::signal_exit in favor of |
| flagging that we were exiting by setting signal_exit_code (almost forgot |
| to mark that NO_COPY: that would have been fun). The new function |
| setup_signal_exit() now handles setting things up so that ReadFile loop |
| in wait_sig will do the right thing when it terminates. This function |
| may just Sleep indefinitely if a signal is being sent from a thread |
| other than the signal thread. wait_sig() was changed so that it will |
| essentially drop into asychronous-read-mode when a signal which exits |
| has been detected. The ReadFile loop is exited when we know that the |
| process is supposed to be exiting and there is nothing else in the |
| signal queue. |
| |
| Although I never actually saw this happen, exit_thread() was also |
| changed to release the process lock and just sleep indefintely if it is |
| detected that we are exiting. |
| |
| 2012-12-21 cgf-000018 |
| |
| Re: cgf-000017 |
| |
| It occurred to me that just getting the process lock during |
| DLL_THREAD_DETACH in dll_entry() might be adequate to fix this |
| problem. It's certainly much less intrusive. |
| |
| There are potential deadlock problems with grabbing a lock in |
| this code, though, so this check-in will be experimental. |
| |
| 2012-12-21 cgf-000017 |
| |
| The changes in this set are to work around the issue noted here: |
| |
| http://cygwin.com/ml/cygwin/2012-12/threads.html#00140 |
| |
| The problem is, apparently, that the return value of an ExitThread() |
| will take precedence over the return value of TerminateProcess/ExitProcess |
| if the thread is the last one exiting. That's rather amazing... |
| |
| For the fix, I replaced all calls to ExitThread with exit_thread(). The |
| exit_thread function, creates a handle to the current thread and sends |
| it to a packet via sig_send(__SIGTHREADEXIT). Then it acquires the |
| process lock and calls ExitThread. |
| |
| wait_sig will then wait for the handle, indicating that the thread has |
| exited, and, when that has happened, removes the process lock on behalf |
| of the now-defunct thread. wait_sig will now also avoid actually |
| exiting since it could trigger the same problem. |
| |
| Holding process_lock should prevent threads from exiting while a Cygwin |
| process is shutting down. They will just block forever in that case - |
| just like wait_sig. |
| |
| 2012-08-17 cgf-000016 |
| |
| While debugging another problem I finally noticed that |
| sigpacket::process was unconditionally calling tls->set_siginfo prior to |
| calling setup_handler even though setup_handler could fail. In the |
| event of two successive signals, that would cause the second signal's |
| info to overwrite the first even though the signal handler for the first |
| would eventually be called. Doh. |
| |
| Fixing this required passing the sigpacket si field into setup_handler. |
| Making setup_handler part of the sigpacket class seemed to make a lot of |
| sense so that's what I did. Then I passed the si element into |
| interrupt_setup so that the infodata structure could be filled out prior |
| to arming the signal. |
| |
| The other changes checked in here eliminate the ResetEvent for |
| signal_arrived since previous changes to cygwait should handle the |
| case of spurious signal_arrived detection. Since signal_arrived is |
| not a manual-reset event, we really should just let the appropriate |
| WFMO handle it. Otherwise, there is a race where a signal comes in |
| a "split second" after WFMO responds to some other event. Resetting |
| the signal_arrived would cause any subsequent WFMO to never be |
| triggered. My current theory is that this is what is causing: |
| |
| http://cygwin.com/ml/cygwin/2012-08/msg00310.html |
| |
| 2012-08-15 cgf-000015 |
| |
| RIP cancelable_wait. Yay. |
| |
| 2012-08-09 cgf-000014 |
| |
| So, apparently I got it somewhat right before wrt signal handling. |
| Checking on linux, it appears that signals will be sent to a thread |
| which can accept the signal. So resurrecting and extending the |
| "find_tls" function is in order. This function will return the tls |
| of any thread which 1) is waiting for a signal with sigwait*() or |
| 2) has the signal unmasked. |
| |
| In redoing this it became obvious that I had the class designation wrong |
| for the threadlist handling so I moved the manipulation of the global |
| threadlist into the cygheap where it logically belongs. |
| |
| 2012-07-21 cgf-000013 |
| |
| These changes reflect a revamp of the "wait for signal" functionality |
| which has existed in Cygwin through several signal massages. |
| |
| We now create a signal event only when a thread is waiting for a signal |
| and arm it only for that thread. The "set_signal_arrived" function is |
| used to establish the event and set it in a location referencable by |
| the caller. |
| |
| I still do not handle all of the race conditions. What happens when |
| a signal comes in just after a WF?O succeeds for some other purpose? I |
| suspect that it will arm the next WF?O call and the subsequent call to |
| call_signal_handler could cause a function to get an EINTR when possibly |
| it shouldn't have. |
| |
| I haven't yet checked all of the test cases for the URL listed in the |
| previous entry. |
| |
| Baby steps. |
| |
| 2012-06-12 cgf-000012 |
| |
| These changes are the preliminary for redoing the way threads wait for |
| signals. The problems are shown by the test case mentioned here: |
| |
| http://cygwin.com/ml/cygwin/2012-05/msg00434.html |
| |
| I've known that the signal handling in threads wasn't quite right for |
| some time. I lost all of my thread signal tests in the great "rm -r" |
| debacle of a few years ago and have been less than enthusiastic about |
| redoing everything (I had PCTS tests and everything). But it really is |
| time to redo this signal handling to make it more like it is supposed to |
| be. |
| |
| This change should not introduce any new behavior. Things should |
| continue to behave as before. The major differences are a change in the |
| arguments to cancelable_wait and cygwait now uses cancelable_wait and, |
| so, the returns from cygwait now mirror cancelable_wait. |
| |
| The next change will consolidate cygwait and cancelable_wait into one |
| cygwait function. |
| |
| 2012-06-02 cgf-000011 |
| |
| The refcnt handling was tricky to get right but I had convinced myself |
| that the refcnt's were always incremented/decremented under a lock. |
| Corinna's 2012-05-23 change to refcnt exposed a potential problem with |
| dup handling where the fdtab could be updated while not locked. |
| |
| That should be fixed by this change but, on closer examination, it seems |
| like there are many places where it is possible for the refcnt to be |
| updated while the fdtab is not locked since the default for |
| cygheap_fdget is to not lock the fdtab (and that should be the default - |
| you can't have read holding a lock). |
| |
| Since refcnt was only ever called with 1 or -1, I broke it up into two |
| functions but kept the Interlocked* operation. Incrementing a variable |
| should not be as racy as adding an arbitrary number to it but we have |
| InterlockedIncrement/InterlockedDecrement for a reason so I kept the |
| Interlocked operation here. |
| |
| In the meantime, I'll be mulling over whether the refcnt operations are |
| actually safe as they are. Maybe just ensuring that they are atomically |
| updated is enough since they control the destruction of an fh. If I got |
| the ordering right with incrementing and decrementing then that should |
| be adequate. |
| |
| 2012-06-02 cgf-000010 |
| |
| <1.7.16> |
| - Fix emacs problem which exposed an issue with Cygwin's select() function. |
| If a signal arrives while select is blocking and the program longjmps |
| out of the signal handler then threads and memory may be left hanging. |
| Fixes: http://cygwin.com/ml/cygwin/2012-05/threads.html#00275 |
| </1.7.16> |
| |
| This was try #4 or #5 to get select() signal handling working right. |
| It's still not there but it should now at least not leak memory or |
| threads. |
| |
| I mucked with the interface between cygwin_select and select_stuff::wait |
| so that the "new" loop in select_stuff::wait() was essentially moved |
| into the caller. cygwin_select now uses various enum states to decide |
| what to do. It builds the select linked list at the beginning of the |
| loop, allowing wait() to tear everything down and restart. This is |
| necessary before calling a signal handler because the signal handler may |
| longjmp away. |
| |
| I initially had this all coded up to use a special signal_cleanup |
| callback which could be called when a longjmp is called in a signal |
| handler. And cygwin_select() set up and tore down this callback. Once |
| I got everything compiling it, of course, dawned on me that just because |
| you call a longjmp in a signal handler it doesn't mean that you are |
| jumping *out* of the signal handler. So, if the signal handler invokes |
| the callback and returns it will be very bad for select(). Hence, this |
| slower, but hopefully more correct implementation. |
| |
| (I still wonder if some sort of signal cleanup callback might still |
| be useful in the future) |
| |
| TODO: I need to do an audit of other places where this problem could be |
| occurring. |
| |
| As alluded to above, select's signal handling is still not right. It |
| still acts as if it could call a signal handler from something other |
| than the main thread but, AFAICT, from my STC, this doesn't seem to be |
| the case. It might be worthwhile to extend cygwait to just magically |
| figure this out and not even bother using w4[0] for scenarios like this. |
| |
| 2012-05-16 cgf-000009 |
| |
| <1.7.16> |
| - Fix broken console mouse handling. Reported here: |
| http://cygwin.com/ml/cygwin/2012-05/msg00360.html |
| </1.7.16> |
| |
| I did a cvs annotate on smallprint.cc and see that the code to translate |
| %characters > 127 to 0x notation was in the 1.1 revision. Then I |
| checked the smallprint.c predecessor. It was in the 1.1 version of that |
| program too, which means that this odd change has probably been around |
| since <= 2000. |
| |
| Since __small_sprintf is supposed to emulate sprintf, I got rid of the |
| special case handling. This may affect fhandler_socket::bind. If so, we |
| should work around this problem there rather than keeping this strange |
| hack in __small_printf. |
| |
| 2012-05-14 cgf-000008 |
| |
| <1.7.16> |
| - Fix hang when zero bytes are written to a pty using |
| Windows WriteFile or equivalent. Fixes: |
| http://cygwin.com/ml/cygwin/2012-05/msg00323.html |
| </1.7.16> |
| |
| cgf-000002, as usual, fixed one thing while breaking another. See |
| Larry's predicament in: http://goo.gl/oGEr2 . |
| |
| The problem is that zero byte writes to the pty pipe caused the dread |
| end-of-the-world-as-we-know-it problem reported on the mailing list |
| where ReadFile reads zero bytes even though there is still more to read |
| on the pipe. This is because that change caused a 'record' to be read |
| and a record can be zero bytes. |
| |
| I was never really keen about using a throwaway buffer just to get a |
| count of the number of characters available to be read in the pty pipe. |
| On closer reading of the documentation for PeekNamedPipe it seemed like |
| the sixth argument to PeekNamedPipe should return what I needed without |
| using a buffer. And, amazingly, it did, except that the problem still |
| remained - a zero byte message still screwed things up. |
| |
| So, we now detect the case where there is zero bytes available as a |
| message but there are bytes available in the pipe. In that scenario, |
| return the bytes available in the pipe rather than the message length of |
| zero. This could conceivably cause problems with pty pipe handling in |
| this scenario but since the only way this scenario could possibly happen |
| is when someone is writing zero bytes using WriteFile to a pty pipe, I'm |
| ok with that. |
| |
| 2012-05-14 cgf-000007 |
| |
| <1.7.16> |
| - Fix invocation of strace from a cygwin process. Fixes: |
| http://cygwin.com/ml/cygwin/2012-05/msg00292.html |
| </1.7.16> |
| |
| The change in cgf-000004 introduced a problem for processes which load |
| cygwin1.dll dynamically. strace.exe is the most prominent example of |
| this. |
| |
| Since the parent handle is now closed for "non-Cygwin" processes, when |
| strace.exe tried to dynamically load cygwin1.dll, the handle was invalid |
| and child_info_spawn::handle_spawn couldn't use retrieve information |
| from the parent. This eventually led to a strace_printf error due to an |
| attempt to dereference an unavailable cygheap. Probably have to fix |
| this someday. You shouldn't use the cygheap while attempting to print |
| an error about the inavailability of said cygheap. |
| |
| This was fixed by saving the parent pid in child_info_spawn and calling |
| OpenProcess for the parent pid and using that handle iff a process is |
| dynamically loaded. |
| |
| 2012-05-12 cgf-000006 |
| |
| <1.7.16> |
| - Fix hang when calling pthread_testcancel in a canceled thread. |
| Fixes some of: http://cygwin.com/ml/cygwin/2012-05/msg00186.html |
| </1.7.16> |
| |
| This should fix the first part of the reported problem in the above |
| message. The cancel seemed to actually be working but, the fprintf |
| eventually ended up calling pthread_testcancel. Since we'd gotten here |
| via a cancel, it tried to recursively call the cancel handler causing a |
| recursive loop. |
| |
| 2012-05-12 cgf-000005 |
| |
| <1.7.16> |
| - Fix pipe creation problem which manifested as a problem creating a |
| fifo. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00253.html |
| </1.7.16> |
| |
| My change on 2012-04-28 introduced a problem with fifos. The passed |
| in name was overwritten. This was because I wasn't properly keeping |
| track of the length of the generated pipe name when there was a |
| name passed in to fhandler_pipe::create. |
| |
| There was also another problem in fhandler_pipe::create. Since fifos |
| use PIPE_ACCESS_DUPLEX and PIPE_ACCESS_DUPLEX is an or'ing of |
| PIPE_ACCESS_INBOUND and PIPE_ACCESS_OUTBOUND, using PIPE_ACCESS_OUTBOUND |
| as a "never-used" option for PIPE_ADD_PID in fhandler.h was wrong. So, |
| fifo creation attempted to add the pid of a pipe to the name which is |
| wrong for fifos. |
| |
| 2012-05-08 cgf-000004 |
| |
| The change for cgf-000003 introduced a new problem: |
| http://cygwin.com/ml/cygwin/2012-05/msg00154.html |
| http://cygwin.com/ml/cygwin/2012-05/msg00157.html |
| |
| Since a handle associated with the parent is no longer being duplicated |
| into a non-cygwin "execed child", Windows is free to reuse the pid of |
| the parent when the parent exits. However, since we *did* duplicate a |
| handle pointing to the pid's shared memory area into the "execed child", |
| the shared memory for the pid was still active. |
| |
| Since the shared memory was still available, if a new process reuses the |
| previous pid, Cygwin would detect that the shared memory was not created |
| and had a "PID_REAPED" flag. That was considered an error, and, so, it |
| would set procinfo to NULL and pinfo::thisproc would die since this |
| situation is not supposed to occur. |
| |
| I fixed this in two ways: |
| |
| 1) If a shared memory region has a PID_REAPED flag then zero it and |
| reuse it. This should be safe since you are not really supposed to be |
| querying the shared memory region for anything after PID_REAPED has been |
| set. |
| |
| 2) Forego duping a copy of myself_pinfo if we're starting a non-cygwin |
| child for exec. |
| |
| It seems like 2) is a common theme and an audit of all of the handles |
| that are being passed to non-cygwin children is in order for 1.7.16. |
| |
| The other minor modification that was made in this change was to add the |
| pid of the failing process to fork error output. This helps slightly |
| when looking at strace output, even though in this case it was easy to |
| find what was failing by looking for '^---' when running the "stv" |
| strace dumper. That found the offending exception quickly. |
| |
| 2012-05-07 cgf-000003 |
| |
| <1.7.15> |
| Don't make Cygwin wait for all children of a non-cygwin child program. |
| Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00063.html, |
| http://cygwin.com/ml/cygwin/2012-05/msg00075.html |
| </1.7.15> |
| |
| This problem is due to a recent change which added some robustness and |
| speed to Cygwin's exec/spawn handling by not trying to force inheritance |
| every time a process is started. See ChangeLog entries starting on |
| 2012-03-20, and multiple on 2012-03-21. |
| |
| Making the handle inheritable meant that, as usual, there were problems |
| with non-Cygwin processes. When Cygwin "execs" a non-Cygwin process N, |
| all of its N + 1, N + 2, ... children will also inherit the handle. |
| That means that Cygwin will wait until all subprocesses have exited |
| before it returns. |
| |
| I was willing to make this a restriction of starting non-Cygwin |
| processes but the problem with allowing that is that it can cause the |
| creation of a "limbo" pid when N exits and N + 1 and friends are still |
| around. In this scenario, Cygwin dutifully notices that process N has |
| died and sets the exit code to indicate that but N's parent will wait on |
| rd_proc_pipe and will only return when every N + ... windows process |
| has exited. |
| |
| The removal of cygheap::pid_handle was not related to the initial |
| problem that I set out to fix. The change came from the realization |
| that we were duping the current process handle into the child twice and |
| only needed to do it once. The current process handle is used by exec |
| to keep the Windows pid "alive" so that it will not be reused. So, now |
| we just close parent in child_info_spawn::handle_spawn iff we're not |
| execing. |
| |
| In debugging this it bothered me that 'ps' identified a nonactive pid as |
| active. Part of the reason for this was the 'parent' handle in |
| child_info was opened in non-Cygwin processes, keeping the pid alive. |
| That has been kluged around (more changes after 1.7.15) but that didn't |
| fix the problem. On further investigation, this seems to be caused by |
| the fact that the shared memory region pid handles were still being |
| passed to non-cygwin children, keeping the pid alive in a limbo-like |
| fashion. This was easily fixed by having pinfo::init() consider a |
| memory region with PID_REAPED as not available. A more robust fix |
| should be considered for 1.7.15+ where these handles are not passed |
| to non-cygwin processes. |
| |
| This fixed the problem where a pid showed up in the list after a user |
| does something like: "bash$ cmd /c start notepad" but, for some reason, |
| it does not fix the problem where "bash$ setsid cmd /c start notepad". |
| That bears investigation after 1.7.15 is released but it is not a |
| regression and so is not a blocker for the release. |
| |
| 2012-05-03 cgf-000002 |
| |
| <1.7.15> |
| Fix problem where too much input was attempted to be read from a |
| pty slave. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00049.html |
| </1.7.15> |
| |
| My change on 2012/04/05 reintroduced the problem first described by: |
| http://cygwin.com/ml/cygwin/2011-10/threads.html#00445 |
| |
| The problem then was, IIRC, due to the fact that bytes sent to the pty |
| pipe were not written as records. Changing pipe to PIPE_TYPE_MESSAGE in |
| pipe.cc fixed the problem since writing lines to one side of the pipe |
| caused exactly that the number of characters to be read on the other |
| even if there were more characters in the pipe. |
| |
| To debug this, I first replaced fhandler_tty.cc with the 1.258, |
| 2012/04/05 version. The test case started working when I did that. |
| |
| So, then, I replaced individual functions, one at a time, in |
| fhandler_tty.cc with their previous versions. I'd expected this to be a |
| problem with fhandler_pty_master::process_slave_output since that had |
| seen the most changes but was surprised to see that the culprit was |
| fhandler_pty_slave::read(). |
| |
| The reason was that I really needed the bytes_available() function to |
| return the number of bytes which would be read in the next operation |
| rather than the number of bytes available in the pipe. That's because |
| there may be a number of lines available to be read but the number of |
| bytes which will be read by ReadFile should reflect the mode of the pty |
| and, if there is a line to read, only the number of bytes in the line |
| should be seen as available for the next read. |
| |
| Having bytes_available() return the number of bytes which would be read |
| seemed to fix the problem but it could subtly change the behavior of |
| other callers of this function. However, I actually think this is |
| probably a good thing since they probably should have been seeing the |
| line behavior. |
| |
| 2012-05-02 cgf-000001 |
| |
| <1.7.15> |
| Fix problem setting parent pid to 1 when process with children execs |
| itself. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00009.html |
| </1.7.15> |
| |
| Investigating this problem with strace showed that ssh-agent was |
| checking the parent pid and getting a 1 when it shouldn't have. Other |
| stuff looked ok so I chose to consider this a smoking gun. |
| |
| Going back to the version that the OP said did not have the problem, I |
| worked forward until I found where the problem first occurred - |
| somewhere around 2012-03-19. And, indeed, the getppid call returned the |
| correct value in the working version. That means that this stopped |
| working when I redid the way the process pipe was inherited around |
| this time period. |
| |
| It isn't clear why (and I suspect I may have to debug this further at |
| some point) this hasn't always been a problem but I made the obvious fix. |
| We shouldn't have been setting ppid = 1 when we're about to pass off to |
| an execed process. |
| |
| As I was writing this, I realized that it was necessary to add some |
| additional checks. Just checking for "have_execed" isn't enough. If |
| we've execed a non-cygwin process then it won't know how to deal with |
| any inherited children. So, always set ppid = 1 if we've execed a |
| non-cygwin process. |