latency-test, latency-histogram: add realtime status bar#4107
latency-test, latency-histogram: add realtime status bar#4107grandixximo wants to merge 5 commits into
Conversation
77c1e3e to
8274d2d
Compare
|
These tests are moot when you run on a non-RT kernel (like I do in dev). I'm not sure the noise is really necessary in that case. |
|
silenced on non-rt |
|
Great. Every little bit helps. Reduces user frustration and more importantly less developer time wasted on spurious issues. |
|
Hmm, in #4044 in the images in the background you clearly see: So the new warning will not help that much. If desired, you can add it in the place where |
|
Connected to this: A general way for GUI's to show "You don't have realtime" warnings that you can not overlook would help the most. When milling, I start linuxcnc with the link. So no console. If I accidentally start the wrong kernel, bad luck. |
|
On a production system, you may want to warn all the time when this is amiss. Maybe something with a background color turning from gray into gray-red tinted and do something similar consistently in all GUIs. For dev builds you don't really want this because you want to see what the operator sees while you are working on stuff. I'd go for an opt-in choice by setting a value in the INI file. Maybe something like a boolean |
|
I was just checking the code. @grandixximo You already added a better warning in the C code in your nonroot patch, so this was probably not even applied in the screenshot. Now there is a double warning in the console. The note from C++ and the Warning from this PR: With latency-histogram there is also a pop-up. With latency-test, this doesn't work. And the most important app, linuxcnc, also just shows noting if you don't start it in a console. Anyone has a good idea to check easily and globally for real time capability? There is already a function rtapi_is_realtime(). This is not 100% reliable, if harden_rt() fails, it will return true, even if it runs in SCHED_OTHER. But this can be fixed. rtapi_is_realtime() is also linked to userspace apps but there it will not work, it checks if the userspace app has realtime... ;-) I could add a halcmd that checks for realtime. Or a pin that is true when all is ok, false otherwise. This could then be used in all gui's for an opt-in or opt-out warning. But I am not that deep into all these various gui's and how they communicate with the RT part. |
Might be an option that is default on when you deploy an default off in rip-mode? But this is annoying to test. Otherwise, I would tend for default on, dev's will manage it better to switch it off than users will fight not knowing that they don't have real time enabled. Instead of in ini, might be an environment variable LINUXCNC_NO_RT_WARN. Dev's can set it on their dev machine in .profile if they are annoyed and it works for all test configs. BTW, just brainstorming options. |
|
Just a POC, if you think a halcmd getrt (or better name) would help I can create a PR. Was easy. make setuid You find it on my fork: |
|
Thanks both, this is more useful than my original per-tool heuristic. I have pivoted the PR to use @hdiethelm's A few things I would like your input on, since they touch the broader direction in #4118:
This now depends on the |
|
Instantiating a HAL memory segment on invocation may be problematic. It will surely confuse because you have to remember to call halrun -U afterwards. That is not a good design. Opt-out policies are generally designed to force you to do a thing, even if you do not want to. That is why they should be avoided. |
The way
You see that in the following test where I added a message when rtapi_app exits: vs I don't see any big downside in doing it like that. But i also do not 100% like starting up rtapi_app just to exit right away. Better would be running it with or after halrun in the script. Or might be an approach using a signal / parameter. Of course, also RTAI / doc and so on needs to be checked / updated before I will call that ready. Better ideas are welcome. But I prefer using a check executing the same code path for realtime checks always instead the variant before where you then have most likely diverging real time checks spread in all possible apps. |
228e6ef to
a3df81c
Compare
|
@hdiethelm I have restructured to avoid the standalone HAL instantiation @BsAtHome flagged:
That keeps the scripts honest, but @BsAtHome's deeper point lands on |
|
@grandixximo Nice. I have to test it. I created a PR, feel free to rebase: @BsAtHome Any hint's what should be done in this case? Before my pr rtapi_app rework pr, even |
|
Crossed posts, @hdiethelm. Good, your "run it with/after halrun in the script" is exactly what I did for latency-histogram (getrt after |
|
Hmm, just an idea: @BsAtHome |
|
A (forced) popup is the equivalent of slapping someone in the face. The error message added to the GUI is actually not an error. Not running RT on a production system may be considered an error. If you look closely in AXIS' status bar you see "Kein Werkzeug". That is also the place where you want to warn the user. Add a status bar field that is obvious (light red background) yet not invasive. |
|
You have a point. I also get annoyed of all this popup's when using the good old microslop... :-D Now on the status bar: Good idea. How to do that? I am already somewhat deep in the C code, so I can add any needed support there but for GUI's, someone else has to take over. @grandixximo Can you do this based on whatever from the hal? halcmd / parameter / pin is easy to add for me. |
|
@hdiethelm thanks, I will rebase this onto #4132 once it settles. @BsAtHome agreed the popup is too much; I will drop the On the cross-GUI status bar, that is the right shape and I am happy to do the AXIS side (a status-bar field with a light-red background, like the existing tool slot) once @hdiethelm's "realtime ok" signal from #4132 exists. The question is where it lives. My preference is to keep #4107 narrow: rebased on getrt, popup gone, scoped to the latency tools. It can ship as soon as #4132 lands. The GUI-wide warning cannot be written until the HAL pin exists and touches AXIS, gmoccapy and qtvcp, so I would do it as a separate PR tracking #4118 rather than make this small change wait on the slowest part. That said, if you would rather have one PR own the whole intent, I am fine rescoping #4107 to the GUI-wide warning and retitling it; it just becomes larger and slower. Either way works for me. Which do you prefer? |
|
@hdiethelm to answer your "halcmd / parameter / pin" question directly: a |
|
I think we first have to agree on the proper conceptual design of how to detect in the different scenarios and what to do with it. |
|
@BsAtHome agreed, let me put a concrete proposal on the table. Detection: one source of truth. @hdiethelm's What to do with it: per UI, not one mechanism. The right surface differs by app, so each owns its own rendering rather than forcing a single widget everywhere:
One thing to rule out: coloring the window title bar / decoration is not reliable. Plenty of setups have no title bar at all (fullscreen/kiosk panels, some Wayland/WM configs), so the signal has to live inside the app window, not in the chrome. Still open (defer): production-vs-dev suppression policy. That can ride with #4118 once the pin and the per-UI rendering exist. Does that match how you see the scenarios? |
|
Sounds like a plan. I will create new PR with a signal. Then we can test how this feels and continue from there. I can do that tomorrow, right now I have other things to do. About the title bar: The idea was to only use this for the two test apps. If this is cumbersome, might be just modify the text that is already displayed in them. @grandixximo Can you mark this PR as a draft until we are done? Sorry about the for- and back. If i dont have a good solution yet, this is often my way of brainstoming. Try things and discard until it is good. Hope this is ok for you. |
That is only partly satisfactory because for this realtime needs to be running. You want to know in advance whether your system will be capable of running RT without starting any of it. Then, when you are running, you want to know from various applications what the actual status is by using generic API call or/and HAL pin. |
No problem, we brain storm it and come up with something that sticks. |
|
Seeing you are working on this, what would be a useful feature to add would be a command line switch that ran for a specific --time --quiet(ly) and print the latency results to the console so scripts could report latency. I've wanted to do this recently and had a lot of trouble finding a way to do this. cyclictest allows you to do this but it's still not easy and not within the LinuxCNC ecosystem. |
|
Looks nice in the footer! Visible but not a slap in the face. I did not know how to do this, mainly the reason why I used the title bar to add some ideas. Is this also possible in axis / gmoccapy? |
21060e3 to
940cc00
Compare
Good idea, belongs as a --quiet/--seconds mode of latency-test, please open an issue, it's a different PR.
Yes, but On the main GUIs I'd show just the realtime field ( Placement, reusing each GUI's existing status area:
Just the one field via Keep it as its own PR, separate from this one, it's the cross-GUI convention #4118 should settle. This PR lands as the latency-tools footer. |
|
I tried out the variant passing the text from rtapi_app trough the socket up to It works but just quickly coded together, needs some doc and tidy up. Do you think this will do the job? I changed only latency-test to use this. TCL is unreadable for me, would take me hours to do it... ;-) |
I don't think this is worthwhile, as that line is also useful to tell you which realtime system you are using.
I think I like the env var. the vast majority of users need to be warned. Those who habitually test-run the code on a non-RT system are unlikely to mind clicking-away a warning. |
|
@hdiethelm @BsAtHome a heads-up on the To give the status bar an authoritative realtime type, I folded @hdiethelm's What the change does: the I added one tidy commit on top (
@hdiethelm could you confirm you are happy with the tidy and drop the POC marker from your commit if it is good to land? It is your RT domain, so I did not touch the protocol design itself. @BsAtHome a review of the wire-format and C++ would be appreciated since that is your area. Happy to adjust to whatever framing convention you prefer. |
|
@andypugh both your points are covered by where this landed, though the shape changed since that comment. On showing which RT: the latency-test / latency-histogram footer now shows the realtime type directly ( On the env-var opt-out: I dropped |
I will look into it again let's say until tomorrow. Thanks for pulling my commit in and correcting some parts. I have to check how I can allow you to push to a repo under my github account. We are working at the same thing much lately and having two branches working at the same issue just generates unnecessary friction. |
|
I could have revised the original commit and pushed under your name from my end, as the author could have stayed, but I really did not want to do that, I'd rather keep it as it happened for now, and I can squash later, or possibly separate the concerns in two PR, waiting for Bertho on the final judgement on that. The friction is more about waiting your PRs to land. Pulling in your commit and commit on top is zero friction for me. I can push PRs to your fork, I just decided to go this route instead, why does it create friction on your end? You can just pull merge and commit on top, no? I can merge your changes here when you are done, I don't see much friction. |
Ok, fine for me. Waiting for other PR's is for sure friction. You can also squash your and my commit together and just mark me in the comment. So we just go this route, I push a commit to my version of this branch and then you pull it in and squash it at the end. Most is fine, except some not updated comments. I will quickly update them and the you can pull in my commit. You can remove the POC at the squash phase. |
|
Cleanup from my side: hdiethelm@b7f4618 |
40300ee to
aeccb51
Compare
|
@hdiethelm done on all your points, branch force-pushed.
The tidy fixups folded into your commit are the short-read check in @BsAtHome the rtapi_app wire-format change is now a single self-contained commit ( |
Without 'sudo make setuid' (or 'sudo make setcap') rtapi_app runs unprivileged: no SCHED_FIFO, no locked memory, so latency readings are wildly inflated and easy to mistake for a code regression. Warn, for a non-root user, when rtapi_app is neither setuid root nor carries the cap_sys_nice capability. Closes LinuxCNC#4044
Only warn under PREEMPT_RT or RTAI; on a non-RT kernel the privileges do not matter, so the check would be noise.
…istic Query realtime status with 'realtime verify' (from LinuxCNC#4132) rather than probing the setuid bit. latency-histogram asks the realtime layer directly; latency-test relies on the existing "POSIX non-realtime" note.
Extend the check_rt result wire format to carry a string alongside the int return code, and map the rtapi realtime type enum to a human readable name (for example "Preempt RT", "RTAI", "Xenomai" or "No realtime"). rtapi_app prints the name on stdout so callers such as 'realtime verify' and the lcnc_realtime Python module report the authoritative realtime type instead of sniffing kernel signals. Includes review fixups: a short-read check in recv_result(), a frame-size underflow guard in recv_result()/recv_args(), an exhaustive type-name switch, and typo fixes. Co-authored-by: Luca Toniolo <toniolo.luca@gmail.com>
Add a status bar along the bottom of both tools with three fields: the running realtime type (or a red "No realtime"), the CPU frequency governor, and isolcpus. A field turns red only when it flags a condition that makes the measured latency unrepresentative, so normal states stay in the theme colour. The realtime type comes from 'realtime verify', so both tools report the authoritative type rtapi_app runs rather than each re-sniffing kernel signals. latency-test renders the bar as a pyvcp footer; latency-histogram as a Tk status bar, replacing its earlier modal "no realtime" popup. The duplicate non-realtime console warning is dropped since rtapi_app already prints the actionable note. docs/install/latency-test.adoc gains a status-bar section explaining each field plus a CPU frequency governor tuning note.
aeccb51 to
5d42f24
Compare




What
latency-testandlatency-histogramnow show a status bar along the bottom with three fields: the running realtime type (orNo realtime), the CPU frequency governor, and isolcpus. A field is coloured red only when it flags a condition that makes the measured latency unrepresentative, so normal states stay in the theme colour. On a real RT machine it reads e.g.Preempt RT/performance/isolcpus=2,3.Why
In #4044 a run-in-place build was used without
sudo make setuid.rtapi_appran unprivileged, the latency numbers blew up, and it was mistaken for a code regression. A visible realtime indicator surfaces that immediately. The governor and isolcpus fields cover the other two common reasons a latency measurement is misleading.How
The realtime type now comes authoritatively from rtapi itself:
realtime verifyreturns the type thatrtapi_appreports for the running backend (Preempt RT,RTAI,Xenomai, ..., orNo realtime), instead of each tool re-sniffing kernel signals. The governor is read from/sysand isolcpus from/proc/cmdline.latency-testrenders the bar as a pyvcp footer;latency-histogramas a Tk status bar at the bottom. The histogram's earlier modalno realtimepopup is replaced by this persistent bar (the console warning is kept for non-X runs).Docs:
install/latency-test.adocgains a status-bar section explaining each field and linking the fixes (realtime kernel, setuid/setcap, isolcpus), plus a CPU frequency governor tuning note.rtapi_app change (needs RT/C++ review)
To make the type available without sniffing, this branch carries @hdiethelm's change to
rtapi_app(src/rtapi/uspace_rtapi_main.cc): thecheck_rtsocket result is extended to return a string alongside the int status, and the realtime-type enum is mapped to a human-readable name. I added a follow-up commit tidying it: a short-read check inrecv_result(), a frame-size underflow guard inrecv_result()/recv_args(), an exhaustive type-name switch, and typo fixes. The two commits are kept separate so the tidy delta reads on its own.Follow-ups (separate PRs)
--quiet/--secondsmode forlatency-testso scripts can capture latency (suggested by @rodw-au).Depends on #4132 (merged). Closes #4044.