PicoScope 7 Automotive
Available for Windows, Mac, and Linux, the next evolution of our diagnostic scope software is now available.
|CAN Bus error
When asked if I wanted to head to the seafront to check out a Case wheel loader with suspected CAN problems, I was looking forward to spending some time looking out across the North Sea enjoying the sea breeze. Sadly though, the only way I knew I was even remotely close to the beach was by the sheer number of seagulls flying around at the recycling plant. I get to go to all the best places.
The machine I was here to look at had been suffering from an intermittent problem for many weeks and seemingly just started playing up with no real way to get the fault to occur. The operator informed me that it would be fine for a few hours some days, and then others it would fault as soon as it was put to work.
As always, make sure you can get as much information from the customer as it helps you start to paint the picture of the fault. More importantly, you should try and replicate the fault to confirm the issue. Can’t fix what isn’t broke! The problem with intermittent faults is precisely that, they are intermittent. With no way to replicate the fault the best thing we could do was start with a DTC scan of the machine.
Fortunately, we had some stored fault codes, all pointing towards a CAN issue. Interestingly, the instrument panel for the machine also flashed up the warning for CAN message engine timeout as well as a DEF/AdBlue warning for injection failed and torque is limited. With the DTCs recorded and some further direction, we cleared the codes hoping that they would return. We had the scope connected so that we could try to catch anything when the fault occurred.
Setting up the scope to monitor the CAN network during the operation of the machine seemed like the best course of action considering the stored fault codes. The network of this wheel loader isn’t that complicated as there are relatively few ECUs present on the BUS. However, this is a 2015 machine with over 13,000 hours and limited access in the engine bay.
As you can see, Channel B was extremely distorted compared to Channel A, which is why the A-B math channel has a distorted physical layer. It lead me on a bit of a journey and I even thought I had a problem before I noticed the obvious. In PicoScope 7, the channels have the lozenge that provides you with some detail about the setup for that channel, such as probe, range, coupling and filtering. Yep, I had forgotten to take off the 20 kHz hardware filter. It was previously set up to demonstrate the feature that’s available with the 4*25 and 4*25A scopes. The bandwidth limit is there to remove frequencies above 20 kHz at the scope, so unlike the DSP filter, which works in the software, there is no way to get back this data. This filter is very useful with current clamps and when looking at signals that can be influenced by high-frequency interference, such as MAF sensors and O2 sensors, but not so useful with CAN. You can find more information on bandwidth limits here.
After spending some time chasing what I thought was a wiring issue, it became clear that the issue was only with that one channel and finally, the penny dropped. We’re only human and these little mistakes can and do happen, especially when in the heat of battle. If I was using PicoScope 6 though, I think it would have taken me longer to realise as you have to physically open the channel options to see if the bandwidth limit is active.
With Channel B “restored” we set about trying to capture the fault and took advantage of the new J1939 decoder to help better understand what ECUs were on or offline if the fault occurred. For more information on the J1939 decoder please see the A-Z entry on the website.
I’ve already included a Link file that contains the source address values converted into a readable format. You can find more information about link files on the forum.
One thing we’ve established time and time again is that getting a decent wiring diagram is always difficult with off-highway equipment. I had tried to be a little proactive and getting a drawing before I arrived on site only to realise that the one I found was nothing like the machine. Luckily, with Texa, you do get some drawings included and fortunately, we appeared to have one that matched what we had on the machine. I have redrawn the CAN connections below:
I just want to point out that not all the ECUs in the cab were listed. There was an instrument cluster, telematics unit and others, but they were not relevant to this fault. I’m still not convinced this drawing is correct as I don't think the NOx sensor would be out on its own. However, this was all we have to go on. The most interesting part was the K-Line from the diagnostic port to the ECM, which explained the options we had to communicate using the serial tool (CAN (J1939), J1587 or K-Line).
After driving and working the machine for some time the fault finally appeared on the dash. Interestingly, we were unable to access the ECM, with the fault on the dash, when we tried using the J1939 protocol to read the codes. Instead, we found the only way to communicate with the ECM was using K-Line. Then we tried the J1939 again and suddenly we were able to communicate. We connected the scope to see if we could spot what was going on in the CAN network. We connected to both CAN lines but used the floating input of the scope, which meant we only needed two channels to make this capture. Floating inputs are only available on the 4*25/4*25A range of scopes. You can find more information here. We were lucky that the fault was becoming more apparent meaning catching it was made that little bit easier.
Make no mistake, the above image didn’t happen immediately, it was a good 30 minutes in the making. Hopefully, you can see that halfway across the buffer we had a reduced number of data packets on Channel A. Channel B remained consistent both before and after this event.
By using the J1939 decoder and the link file to convert the decoded source address IDs into readable data, we could see what was going on at that point. We knew that the Engine ECM was one that we couldn’t communicate with using the serial tool when the fault occurred, so looking for this was our first step.
I’ve added in an extra decode table to display the process better. I’ve added a filter to the decode table in the bottom viewport so that we see only the decoded data that has Engine #1 present in the source address. We were trying to see if Engine #1 had gone offline during the loss of communication or if something else had happened. As we measured directly at the engine ECU, this drop could not be related to a wiring issue. By hovering over a packet on the graph, we got a rough idea that the change in the waveform happened around packet 500.
You can see the decoded data in the top viewport and the filtered packets for Engine #1 in the bottom viewport in the tables above. I’ve marked the Engine #1 packets in the two views which are around the 500-mark. As you can see, Engine #1 does not provide any further significant traffic to the network after packet 485, yet all the other ECUs are very much online and sending data onto the bus.
The loss of the ECM stacks up against the fault codes we are seeing and the fact that we were not able to communicate with the ECM via the CAN bus, only the K-Line. To make sure we covered every possibility, our next step was to check the power and ground supplies when the ECM was on and offline. We wanted to ensure that we weren’t losing power when the ECM was offline. With the sheer number of power supplies to the ECU, it seemed best to bring out the 8-channel scope to capture everything in one go. You can, of course, still do this check with the 4-channel scope and split it over multiple captures.
With the ECM online, we could see that in one buffer over 2s of time we had 1110 data packets from Engine #1. We also had voltages recorded at 27.9 V, which when you consider that this capture was taken with the engine running was as expected. We could also see that the ground connections were extremely low at 33.5 mV.
By contrast, when Engine #1 is offline over the same period of time, we have just two packets with the power supply and ground voltage levels being stable. With this information, we decided to inform the customer that the engine ECU needed to be checked at the very least. As with a lot of site machinery, it costs more to have it not working than just replacing parts. With that, an ECU was ordered through CASE and they were required to come and fit and program the ECU. Interestingly the technician tasked with this job suggested that he doubted this would fix the issue as he’s never seen one fail, the words you never want to hear! However, the evidence was clear and we failed to see what else could be causing the issue. With everything documented with PicoScope, we could review each step to ensure that we were still happy to go ahead with the replacement. 6 months later and the machine is still working as expected.
I hope this helps and it goes to show that the new J1939 decoder can be used in conjunction with serial tools to back up your diagnosis. Thanks here to Darren Savage for the opportunity to join him on this adventure.
November 21 2021
It’s scary calling a box, even when you know you right!!
Dlarge commercial repairs
September 12 2021
Very good case study . This shows the value of been able to decode the communications quickly and accurately. Especially when on site. Great work