Category: Tales



* Hint:Please note that some of the screenshots and outputs in this post have been recreated in a lab

* Hint:Please double-click the snapshots to zoom into them. 

This is another interesting case that  I  recently worked on with a colleague of mine.  The topology below represents a high level overview of what the client’s network looks like. The complaint was that users could call local numbers but all international calls were failing to connect.  That got us thinking . . .  our first question was, if local calls are working fine, and international call are not, then what is the difference between both calls

The snap-shot below shows how the sip messages were flowing for the working local calls .
Working call

If you compare the sip messages of the working call above with the non-working international call below, you will see that the difference is that the messages below contains a ‘ 183 session progress message‘.

Failed call

So based on the above, the question that now presents itself is:  why did the Cisco Unified Boarder element send a bye message to the Cisco Call Manager when the remote phone that is out in the cloud was actually ringing? Does this make sense;  Strange isn’t it?

Ok so we know from our experience as Engineers  that when a call fails,  there is normally some sort of error message, failure or cause code of some sort. In the trace output below, you will see that I have actually highlighted the error -code that was in the  ‘ bye‘ message that we saw in the call-flow diagram above.

As we can see,  the sip protocol informs us that the call failed with  a cause of: ‘Reason: Q.850;cause=65‘ . If you do a Google search for cause code 65 will probably find something related to a codec/capability  negotiation issue; which might lead one to think about transcoding. Please have a try for yourself; do a Google search and see what you come up with.

::::::::::::::::::::::::::::::::::::::::::::::::

Sent:
BYE sip:447578332431@192.168.0.10:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.0.20.8:5060;branch=z9hG4bK8052F20C7
From: <sip:00412256731@192.168.0.20>;tag=FAA7C024-667
To: “maxwell o” <sip:447578332431@192.168.0.10>;tag=342693~12b9673c-9ebc-4b8c-b0db-46b8d985386a-34835506
Date: Mon, 30 Sep 2013 12:44:02 gmt
Call-ID: fa88da80-24917212-28046-2212b0a@192.168.0.10
User-Agent: Cisco-SIPGateway/IOS-15.2.4.M1
Max-Forwards: 70
Timestamp: 1380545045
CSeq: 101 BYE
Reason: Q.850;cause=65
P-RTP-Stat: PS=19,OS=4480,PR=25,OR=4000,PL=0,JI=0,LA=0,DU=34270069
Content-Length: 0

::::::::::::::::::::::::::::::::::::::::::::::::

Did you carry out the Google search, did it show something related to codec?  if so, then I’m pleased to inform you that answer is  incorrect 🙂 lol . . . just joking :-). I have been mislead by so many error codes in the past that I no longer trust them completely.

On a more serious note, lets take a look at what RFC: 3262(click on this link) has to say about this; and I quote:

 If the UAS had placed a session description in any reliable
   provisional response that is unacknowledged when the INVITE is
   accepted, the UAS MUST delay sending the 2xx until the provisional
   response is acknowledged.  Otherwise, the reliability of the 1xx
   cannot be guaranteed, and reliability is needed for proper operation
   of the offer/answer exchange.

::::::::::::::::::::::::::::::::::::::::::::::::

So the question is, what does the above mean? Lets start of by explaining what a ‘provisional response’ is:

A very simplified definition of provisional responses is that ‘ they are 1xx responses within a Sip Dialogue/ conversation’. The snap-shot below provides  a quick glance at examples of Provisional Responses.

(source: http://en.wikipedia.org/wiki/List_of_SIP_response_codes#1xx.E2.80.94Provisional_Responses)

1xxreal

However, the definition above still does not explain what a ‘ reliable   provisional response’ is.

As simplified definition of a ‘ reliable provisional response’ is that :    a provisional response becomes reliable when the Sending User Agent Server (UAS) expects an acknowledgement back from the receiving User agent (UA)

With these definitions under out belt, it is now easy to understand what the RFC copied above was trying to say. The RFC is basically pointing out the importance of receiving an acknowledgement from the remote end when the reliable response mechanism is being used. It can actually stop a call from completing if not handled properly.

Lets take a time-out  to quickly take a look  at the SIP-message-flow the copied below. Did you notice the use  of the PRACK?     :

(Source: ‘ borrowed’ form Cisco )

good call flow

I guess at this point, the Sip-call-flow diagram of the failed international  call makes so much more sense now? I guess we can see the cause of the failure now? Basically, the Cisco Unified boarder element sent a bye message to the call-manager because the call manager never sent an ACK message to the ‘183 session progress message‘ it sent to call-manager.

To be honest, this should not have happened though: The Cisco CUBE algorithm was clearly not following the relevant SIP RFC stipulations. But then again who does? I will not explain why the Cisco CUBE was not following RFC stipulations because this post would become too long and boring.

Anyway, here is what we did to solve the problem:

We modified the ‘ sip profile’ attached to the sip trunk so that it would support the sending of RACK ( Provisional Response Acknowledgement)messages to all  Provisional Response messages coming from cube(i.e  18x sip messages).

Here are the steps:

1) go to cucm admin page

2) find the sip profile that your sip trunk is using.

3) then go to device  > device Setting > SIP Profile

4) enable ‘ send PRACK for all 1xx messages

5) reset the sip trunk

6) make sure your sip dial-peer on the cube that is pointing at cucm has something like this in config line:

dial-peer voice 20 voip
  voice-class sip rel1xx require 100rel

Send PRACK

Well as soon as we did this, all the calls started connecting fine.

However, we were having ringing / ring-back issues on international calls such that when an international call was made, the caller could not hear the remote phone ringing.

If you encounter this in your network, here is what you could do resolve it.

On the cucm:

:::::::::::::::::::::::::::::::::::

1) go to cucm admin page

2) find the sip profile that your sip trunk is using.

3) then go to device  > device Setting > SIP Profile

4) tick the check box for ‘ disable early medial for 180’

5) reset the sip trunk

6 on your Cisco Cube you should have a config on your inbound and outbound dial-peer similar to this(if you found that the service provider was not sending the audio or media in 183 /180 message):

dial-peer voice 20 voip
 voice-class sip block 183 sdp absent
voice-class sip block 180 sdp absent (use this single line with care)

see the diagram below for further details .   .    .

local ringback

Hope you have found this entry entertaining and contributory to your quest for knowledge and your day-to-day battles between Man and Machine.

A special thank you goes out  Mike Jones for his contributions to this case which has now led to this publication.

Mike can be found at uk.linkedin.com/pub/michael-cunliffe-jones/46/6b6/2a6/

All the best folks,

Regards

Maxwell

Advertisements

A while back it occurred to me that the best  Unified Communications Engineers  are the ones that  best understand the language of ‘ machines’ .  The truth is, when a device develops a problem, it will try to tell you in its own way using its own language. Just as in any other form of interaction; success is rare in the absence of  being able to give expression to one’s wishes and also being able to listen. An Engineer speaks the language when he issues commands to a device using a ‘ graphical user interface’ or a  ‘ command line interface’ . An Engineer interprets/understands  the  language of machines when he is able to decipher  trace and  error logs.

So, I made up my mind that where possible, I would try to  listen first:  I decided to  look for the root causes of a problem by first taking a look at the traces and logs  before double checking the  configurations. The aim of this is simply geared towards improving my ability to understand what these devices are trying to tell me in logs and traces when something goes wrong. practice makes perfect!

I suspect someone out there shares my view that understanding logs and traces is what completes an Engineer so I am going to share this little episode  that occurred in my home lab recently.

So  a few days ago I took my personal router and switch with me to a data-center where I was carrying out a server rebuild. I had modified the configurations on my devices to replicate the client’s network but for some reason, when I got home  to do some practice labs, I was almost surprised that my MGCP gateway would not   register to my Call Manager.

It was time to explore machine language so I logged unto the voice gateway and issued the following commands:

1) Debug mgcp packets

2) Terminal monitor

3) On the call-manager, I enabled detailed traces for voice gateways and MGCP.

4)  Lastly, from the voice gateway’s global configuration  interface, I issued the commands ‘ no mgcp’ following by ‘ mgcp’ . By so doing, I was essentially restarting the MGCP process.

Here is what I saw happening on the gateway:

*Jul 6 11:23:29.324: MGCP Packet sent to 192.168.0.55:2427—>
RSIP 205478228 *@Router.homelab.com MGCP 0.1
RM: restart
<—

*Jul 6 11:23:29.332: MGCP Packet received from 192.168.0.55:2427—>
200 205478228
<—

*Jul 6 11:23:29.332: MGCP Packet received from 192.168.0.55:2427—>
AUEP 31 S0/SU3/DS1-0/1@Router.homelab.com MGCP 0.1
F: X, A, I
<—

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

In the above, we see that the process started of by the gateway (10.10.10.3) sending a RSIP ‘ Restart in progress’ message to the call-manager(192.168.0.55).

Next, we see a ‘200‘ acknowledgment  message from the Cisco call manager basically saying: ‘ I’m ok with that; thanks for letting me know’ .

Next we see an ‘Audit end point’  message (AUEP) from the call-manager asking the gateway to  provide it with a report of the current status of the ISDN trunk. The call manager is very specific because it was asking for an Audit of interface ‘S0/SU3/DS1-0/1@Router.homelab.com‘.  In human language, that means:  give me a report of the first channel (DS1-0/1) of  interface 0/3/0 at a router which has a qualified domain name of Router.homelab.com.

So far, this is normal behavior. You can double check with the  Cisco Link below which shows the registration process in graphical format.

http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a00801da84e.shtml#t6

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

*Jul 6 11:23:29.336: MGCP Packet sent to 192.168.0.55:2427—>
500 31 Endpt Unknown
<—

*Jul 6 11:23:29.336: MGCP Packet received from 192.168.0.55:2427—>
AUEP 32 S0/SU3/DS1-0/2@Router.homelab.com MGCP 0.1
F: X, A, I
<—

*Jul 6 11:23:29.336: MGCP Packet sent to 192.168.0.55:2427—>
500 32 Endpt Unknown
<—

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

In the first line above, we see that the voice gateway responded to the call-manager’s ‘Audit endpoint’ request with a ‘500’ endpoint unknown response.

In the next two lines, we see that the call-manager tries to ask the gateway the same question about the second  channel (DS1-0/2)on of the ISDN trunk but   once again, the gateway tells the call-manager that it does not know what it’s talking about.

In case you have not understand how to spot  the channel that an audit is being requeted for, take a look at the output below. I think it will make everything sink in:

Router# sh mgcp endpoint

Interface E1 0/3/0

ENDPOINT-NAME V-PORT SIG-TYPE ADMIN
S0/SU3/ds1-0/1@Router 0/3/0:15 none up
S0/SU3/ds1-0/2@Router 0/3/0:15 none up
S0/SU3/ds1-0/3@Router 0/3/0:15 none up

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

So at this point I already knew why my gateway was not registering  but I was curious about seeing how the call-manager would try to express the problem to me so I  collected call-manager traces and this is what I saw:

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

|MGCPHandler received msg from: 10.10.10.3
RSIP 205478228 *@Router.homelab.com MGCP 0.1
RM: restart

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

From the above we see that the call-manager received a  restart in progress(RSIP) message from the voice-gateway ‘Router.homelab.com’ which is at IP Adress 10.10.10.3.

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

In the logs below,  we see the ‘MGCP manager’; which is running on the call-manager, scrambling to figure out who device ‘Router.homelab.com ‘ is by querying the call-manager database.  In the second line below, we see that call-manager believes that the gateway is using a protocol of ‘ PRI Euro’ which runs on E1. 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

|MGCPManager::loadDbMGCPDeviceDigitalAccessPriMatchFunc – found GwName(S0/SU3/DS1-0@Router.homelab.com

MGCPpn9d – Started. Device = S0/SU3/DS1-0@Router.homelab.com, Protocol=PRI EURO, Side = USER.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

In the first and second line below, we see the traces confirming that the ISDN layer three  Backhaul link from the gateway to the call-manager is opened.

In the fourth line, we see that the call-manager determines that this is an MGCP trunk because the model type is 121 (MODEL TYPE = 121)

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

|MGCPBhHandler 10.10.10.3 – TCP opened and BH registered for the device

MGCPpn9d – Backhaul link OPEN for Devicename S0/SU3/DS1-0@Router.homelab.com

MGCPHandler send msg SUCCESSFULLY to: 10.10.10.3

MGCPpn9d: dn_discovery_DbDnRes – DEVICE NAME = S0/SU3/DS1-0@Router.homelab.com MODEL TYPE = 121

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Since the call-manager believes that this is an MGCP trunk (MODEL TYPE = 121)running on protocol type PRI EURO,  the call-manager will not request for an audit of a  24 channel  T1 trunk would it? it will request for an audit of a 31 channel E1 trunk that is probably running in Europe or somewhere else that uses the E1 standard.

So naturally, in the second and fourth lines below, we see  call-manager was requesting for an audit (AUED) of the  1st channel (DS1-0/1) and 31st (DS1-0/31) channel respectively. I did not added the audit request for channel 2 to 30 because it was not needed, but as can be seen in the last line of trace, the gateway’s response to every audit request was: ‘ sorry, I don’t know what you are talking about’ (500 31 Endpt Unknown)

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

|MGCPHandler send msg SUCCESSFULLY to: 10.10.10.3

AUEP 31 S0/SU3/DS1-0/1@Router.homelab.com MGCP 0.1
F: X, A, I

|MGCPHandler send msg SUCCESSFULLY to: 10.10.10.3
AUEP 60 S0/SU3/DS1-0/31@Router.homelab.com MGCP 0.1
F: X, A, I

|MGCPHandler received msg from: 10.10.10.3
500 31 Endpt Unknown

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

If you recall from the beginning of this post I said that I already knew what was wrong when the voice-gateway responded with the ‘endpoint unknown message’ to the audit  endpoint request  from the call-manager. we are about to see the reason why.

The next thing I did was to turn off all debugging on the voice-gateway by using the command: ‘ u all’ at configuration mode. However, I still had terminal monitor on.

I then logged unto the Cisco Communications Manager webpage and reset my gateway from there. I went back to the voice-gateway and this is the shortened version of what I saw:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

*Jul  6 14:23:32.884: %SYS-5-CONFIG_I: Configured from console by console

controller E1 0/3/0
^
% Invalid input detected at^‘ marker.

pri-group timeslots 1-31 service mgcp
^
% Invalid input detected at ‘^‘ marker.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Do you know what’s happening above?  Basically, because I had reset the voice-gateway on call-manager, a signal was sent to the voice-gateway that it should reboot the trunk and download a new set of configurations.

The gateway clearly download the new configuration because the gateway was actually trying to configure the E1 port as can be seen  in the second line of the trace output above. However, we see that the router return an error of ‘ invalid input detected.  This means that this command is not allowed.  But the question is why?   . . . lets find out.

So on the voice gateway I issued the commands ‘do sh run | in controller ‘ and ‘do sh run | in card ‘:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Router(config)#do sh run | in controller
controller T1 0/3/0

Router(config)#do sh run | in card type
card type t1 0 3
voice-card 0

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

So from the above we can clearly see what the problem was. The Call-manager was trying to configure an E1 interface but there was no E1 interfaces on the gateway just T1.   So all along, the call manager was trying to take audit for an E1 interface that did not exist and the gateway could not register it’s MGCP ports because they were not configured as T1 on the call-manager.  Here is what I did to resolve it:

Router(config)#no card type t1 0 3

Router(config)# card type E1 0 3

Router(config)# reload

Then I reset the gateway on call-manager and everything worked perfectly.   I really hope someone has found this informative because my original thought was that this case is not blog-worthy. However, since the main focus was on understanding traces and becoming more familiar with them, I hope a greater understanding of traces  has been delivered to the readers of this post.

Cheers

Maxwell


 * Hint:Please note that some of the outputs in this post have been recreated in a lab

So on this particular issue, my task was to investigate why, even though there were multiple call-managers in the cluster,  all   external incoming  calls failed to connect to the call-manager cluster whenever the primary call-routing  call-manager server was unreachable. We noticed that   even though there were other call-managers in the cluster that could handle  calls  coming from  the PSTN or service provider, the calls would always fail just because one server was unreachable.  Let me show you a quick diagram of the topology below.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Internal phone <———- Call Manager <—(SipTrunk)–<—  2811 Voice Gateway<—-(ISDN Tunk)——-<—Telco or Service provider

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

As the voice gateway is the first point of entry into the network, I started my investigation from there. Here are the related configurations that I found on the voice-gateway.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

 

dial-peer voice 1 pots

description  incoming dial-peer for pots leg
incoming called-number .
direct-inward-dial

:::::::::::::::::::::::::::::::::::::::::

dial-peer voice 5 voip
description primary dial-peer to cluster
destination-pattern ^1…$
session protocol sipv2
session target ipv4:192.168.0.99
incoming called-number .
dtmf-relay sip-kpml

:::::::::::::::::::::::::::::::::::::::::

dial-peer voice 3 voip
preference 1
description secondary dial-peer to cluster
destination-pattern ^1…$
session protocol sipv2
session target ipv4:192.168.0.55
incoming called-number .
dtmf-relay sip-kpml

:::::::::::::::::::::::::::::::::::::::::

sip-ua
timers trying 1000

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

  So based on the above configuration, I could tell that the intention was that  if a call came into  the gateway from the PSTN,  it would be routed by  dial-peer 5 to the call-manager server at  ip-address 192.168.0.99 – and if no response was gotten from that server, the call  would  be re-routed by dial-peer 3 which is pointing to another server in the cluster.

However, this was not working as expected  so I decided to enable  sip debugging (debug ccsip messages) and  also ISDN debugging (debug isdn q931 ) . After this, I made sure  that the primary call-manager was unreachable and then I  placed a test call into the cluster. This is what I saw in the debugs .

 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

An  ISDN setup message is received from the service provider. The calling number is 5555 and the called number is 1007.

:::::::::::::::::::::::::::::::::::::::::::

*Jun 13 20:55:56.528: ISDN Se0/3/0:23 Q931: RX <- SETUP pd = 8 callref = 0x0081
Bearer Capability i = 0x8090A2
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0xA98381
Exclusive, Channel 1
Progress Ind i = 0x8183 – Origination address is non-ISDN
Calling Party Number i = 0x2180, ‘5555
Plan:ISDN, Type:National
Called Party Number i = 0xA1, ‘1007
Plan:ISDN, Type:National

 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

The voice gateway then tries to connect the call to the call-manager at 192.168.0.99 and because I had made sure that the primary server was unreachable, obviously there would be no response from the call-manager.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

INVITE sip:1007@192.168.0.99:5060 SIP/2.0
Via: SIP/2.0/UDP 10.10.10.3:5060;branch=
Remote-Party-ID: <sip:5555@10.10.10.3>;party=calling;screen=no;privacy=off

From: <sip:5555@10.10.10.3>;tag=1EBBA8-1ED
To: <sip:1007@192.168.0.99>
Date: Thu, 13 Jun 2013 20:55:57 GMT
Call-ID: 7A5AF3E6-D3A211E2-800D8BCB-56CC18E8@10.10.10.3
Supported: 100rel,timer,resource-priority,replaces,sdp-anat
Min-SE: 1800
Cisco-Guid: 2052580998-3550613986-2147680284-1483238696
User-Agent: Cisco-SIPGateway/IOS-12.x
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, PRACK, UPDATE, REFER, SUBSCRIBE, NOTIF Y, INFO, REGISTER
CSeq: 101 INVITE
Max-Forwards: 70
Timestamp: 1371156957
Contact: <sip:5555@10.10.10.3:5060>
Expires: 180
Allow-Events: kpml, telephone-event
Content-Type: application/sdp
Content-Disposition: session;handling=required
Content-Length: 232

v=0
o=CiscoSystemsSIP-GW-UserAgent 9433 8425 IN IP4 10.10.10.3
s=SIP Call
c=IN IP4 10.10.10.3
t=0 0
m=audio 19182 RTP/AVP 18 19
c=IN IP4 10.10.10.3
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:19 CN/8000
a=ptime:20

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

 

During the trace or debug output collection, I noticed that even though the gateway was not getting a response back from  the primary call-manager, it never used the secondary dial-peer to send a sip invite message to the backup call-manager. It just continued to send the same invite over and over again  to the primary call-manager until the call failed.  And whenever the call failed, I would see the trace output below:

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

*Jun 13 20:56:06.564: ISDN Se0/3/0:23 Q931: RX <- DISCONNECT pd = 8 callref = 0 x0081
Cause i = 0x82E6 – Recovery on timer expiry

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

 

 

So from the above, it is clear that the  call was dropped from the service provider side because the Recovery timer had expired on the ISDN circuit. This basically means that : while the voice gateway was busy continuously sending the same invite messages to the non-responsive call-manager, the ISDN timers expired because the call was not progressing forward because the voice gateway was not able to connect the ISDN call leg to the SIP call leg.

So basically, I needed to stop the gateway from continuously sending invite messages to a server that is never going to respond. I started looking at all the default sip related configurations on the gateway and I found this:

:::::::::::::::::::::::::::::::::::::::::::

Router#show sip-ua retry
SIP UA Retry Values
invite retry count = 6 response retry count = 6
bye retry count = 10 cancel retry count = 10
prack retry count = 10 update retry count = 6
reliable 1xx count = 6 notify retry count = 10
refer retry count = 10 register retry count = 6
info retry count = 6 subscribe retry count = 6
options retry count = 6

:::::::::::::::::::::::::::::::::::::::::::

As soon as I saw the output above, it was clear what the problem was.  As you can see, the sip user agent was configured to send 6 sip invites before  giving up.  And before it was done sending the 6 sip invites,  the ISDN timers from the service provider  had expired so the Cisco voice gateway never got to the point of sending the 6 sip invites before trying to reached the cluster using the secondary dial-peer.

In order to  resolve this problem, I  reduced the ‘Invite retry’  value to 2 so that the gateway would send two sip invites to the primary server and if it was not responding, the call would be forwarded to the secondary server using the secondary dial-peer.

:::::::::::::::::::::::::::::::::::::::::::

Router(config)#sip-ua

Router(config-sip-ua)#retry invite 2

:::::::::::::::::::::::::::::::::::::::::::

After the above configuration was added to the  gateway, I tested everything again and the fail-over worked perfectly.

Hope you’ve enjoyed this entry.

Cheers


* Hint: Please click on the picture(s) in order to maximize it

 * Hint:Please note that some of the screenshots and outputs in this post have been recreated in a lab

So this was an interesting case that landed on my desk from one of our very technical clients who basically use us solely for escalation.   The client has a third party PBX integrated with Cisco Call-manager via multiple SIP  trunks. One of the things that the client uses the PBX for is that when calls come into Cisco Call-manager, it is sent to the PBX where there are operators who answer calls and forward the calls  on if need be.  However, the client noticed that whenever the agents forwarded calls to his phone that is registered on call-manager,  there was always an extra ‘0’ zero in the beginning of the Caller-ID of the calling-number.   The client wanted to know where this extra zero was being added to the number.  The client went on to say that there was always an extra zero on every call regardless of whether the call originated internally or externally.

My shift was coming to an end and there were so many gateways and trunks everywhere that I didn’t even know how the offending calls entered the network. As a result, I felt we would spend more time trying to rummage through Docs or configurations on call-manager as opposed to just  grabbing a few traces and looking at it. With this thought in mind, I placed a test call from my mobile phone into the company and asked the operator to forward my call on to an internal user.

I collected traces from all the servers in the cluster and since I knew the time-of-call, called and calling number, it was easy to find my call in the traces.  Here is what I found:

From  the traces below  we see a back-hauled signal   to   the Call-manager from Ip address 10.x0.xx.9.. This is the originating MGCP  gateway where my call entered the network .

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

05/17/2013 15:27:34.337 CCM|MGCPBhHandler 10.x0.xx.9  – TCP msg available from Device|<CLID::StandAloneCluster><ct::4,100,135,1.808082><IP::10.X0.XX.9 ><LVL::Detailed>

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

After that, the gateway at  10.x0.xx.9  sends  or backhauls the calling number and called number to call-manager

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

05/17/2013 15:27:34.337 CCM|In  Message — PriEuroSetupMsg — Protocol= PriEuroProtocol|

05/17/2013 15:27:34.337 CCM|Ie – Ni2BearerCapabilityIe — IEData= 04 03 90 90 A3 |

05/17/2013 15:27:34.337 CCM|Ie – Q931ChannelIdIe — IEData= 18 03 A9 83 9B |

05/17/2013 15:27:34.338 CCM|Ie – Q931CallingPartyIe — IEData= 6C 0C 21 83 37 38 32 31 31 37 33 31 34 33 |<CLID::StandAloneCluster><

05/17/2013 15:27:34.338 CCM|Ie – Q931CalledPartyIe — IEData= 70 07 81 37 37 33 33 31 31 |<CLID::StandAloneCluster><

 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

From the above we can see that the telephone number are written in ASIC  format so 37 37 33 33 31 31  is basically equal to  773311  :

From the above  I was able to confirm that this was my call because:

The calling   number  was  7821173143

The called   number ended with    773311

The apparent incompleteness of the called-number is perfectly normal because the client subscribes to  ‘direct Inward Dial ’ service.

From the below, we see that after the call was received from the gateway,  the call-manager looked at the configured digit-manipulation on the gateway and applied it to the calling and called number. Hence we see that   the incoming-calling-number  was changed from 7821173143 to 07821173143.

In the second line we can see the call-manager announcing the newly ‘globalised’ (globalizeIncomingCgpn)  in-coming-calling number as 07821173143

 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

CCM|SPROCPri::globalizeIncomingCgpn – Adding prefix: 0, Digits to strip: 0, Cgpn Transformation CSS: |

05/17/2013 15:27:34.338 CCM|SPROC :: stripAndPrependDigits– The number 7821173143 is prepended with prefix 0, updated number=07821173143|

15:27:34.338 CCM|SPROCPri::globalizeIncomingCgpn – Globalized Cgpn = 07821173143

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

The snapshot below shows the digit manipulation that is configured on the gateway.  Notice that nothing is applied to it? Then why did the trace show that the calling-number had been changed from 7821173143 to 07821173143. well  did you notice that  digit manipulation was set to default?  Default means:  ‘if there is no digit manipulation configurations on the gateway, apply  the digit manipulation that is configurations on the device-pool and if there is nothing  on the device-pool, check the Call-manger Service parameters ’.

GW-mani

On this occasion, there was also no digit manipulation configured on the gateway so call-manager manipulated the digits based on service parameters. Based on the  service parameter configuration below,  we see that  whenever a  call that is classified as national comes into call-manager, a extra leading Zero (0) is added to the number.

service parameter

The next thing that the call-manger  has to do is to find a route to the destination number of 773311 so it invokes digit analysis by triggering a digit analysis request (daReq).

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

05/17/2013 15:27:34.343 CCM|Digit Analysis: wait_DaReq: daReq.

CCM|Digit analysis: match(pi=”2″, fqcn=””, cn=”07821173143“, , dd=”773311

 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

From the trace below we find that the call-manager has found the a match for 773311

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

 

 

05/17/2013 15:27:34.343 CCM|Digit analysis: match(pi=”2″, fqcn=””, cn=”07821173143″,dd=”773311

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Now  call-manager needs to  process the call according to the match.  In the trace below, we  notice the following:

1)      That the match is located at partition called  ‘GW-INCOMING’

2)      That  called number was changed from 773311 to 3311.

 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

 

 

05/17/2013 15:27:34.343 CCM|Digit analysis: analysis results|<CLID::StandAloneCluster><

05/17/2013 15:27:34.344 CCM||PretransformCallFingPartyNumber=07821173143

|CallingPartyNumber=07821173143

|DialingPartition=GW-INCOMING

|DialingPattern=3311

|FullyQualifiedCalledPartyNumber=3311

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

After noticing  that the called-number had been changed,  I jumped on call-manager to confirm what happened  in the traces as it was now clear that a digit manipulation had just taken place because 773311  had just changed to  3311 . So I searched  call-manager for the pattern that was just matched and I found a translation-pattern.

From the screenshot below, we can confirm that the dialed number matched the translation patter ‘77.3311’. We also see that the reason why the digit changed to 3311 was because the digit manipulated that was configured on the translation-pattern was configured to drop all the digits  before 3311

 

 Translationpatt

 

 

Once again call manager tries to get the call to the intended destination by running the new called-number of 3311 through the digit analysis process and in the progress, call manager discovers that number  has a Call-forward-all  (cfa) value of   7165871 configured

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

callKey= 0x37D0D,

ssKey = 0, recordStatus 0,

 dnPattern =  3311, dnPartition =

cfa     = 7165871, cfaToVM     = 0, cfaCss     = PA

cfb     = , cfbToVM     = 0, cfbCss     = PA

cfna    = , cfnaToVM    =

cfur    = 7165871, cfurToVM

cfurInt = 7165871, cfurI

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

So based on the above, our Called or dialed number has just changed from 3311 to 7165871 because it had a (cfa) call-forward-all configured with a value of 7165871.

Therefore, call manager tried to forward the call by first of all carrying out a digit analysis in order to find a match or route.

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

5/17/2013 15:27:34.348 CCM|Forwarding – initiateCFA – CFA, callKey= 0x37D0D|

05/17/2013 15:27:34.350 CCM|Digit analysis: match(pi=”1″, fqcn=””, cn=”07821173143“,dd=”7165871

As can be seen below,   call-manager finds a match for the number it was looking for.

05/17/2013 15:27:34.350 CCM|Digit analysis: match(pi=”1″, fqcn=””, cn=”07821173143, dd=”7165871

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

 

 

Now  let’s see what matches  the call-manager found below.

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

|CallingPartyNumber=007821173143

|DialingPartition=SIP-Internal

|DialingPattern=716.XXXX

|FullyQualifiedCalledPartyNumber=7165871

|PotentialMatches=NoPotentialMatchesExist

|PretransformDigitString=7165871

|PretransformPositionalMatchList=716:5871

|CollectedDigits=5871

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

From the above, we can see that while call-manager was trying to find out how to get to 7165871 by finding something that matched it, it found pattern 716.XXXX in partition ‘SIP-Internal ’.

Hello!? Did you notice that the calling number has changed from   07821173143’ to ‘007821173143’ ? if you didn’t  take another look above

 

If you did then that is good. AS soon as I noticed this,  I wanted to confirm what I had just seen in the logs above  so I logged unto call manager web page  and searched for 716.XXXX.  The number was configured as a route-pattern but there was more to it: check out the  screenshot of the configuration below

 

routpatterntoproutpatternbuttom 

 At this point my job was pretty much done.  I confirmed that the route pattern was responsible for sending calls to the PBX by checking the route list. I also confirmed that this route pattern  is the route-pattern  that was responsible for sending the calls to the PBX with an extra zero. It was the root cause of the problem. So the cause of the problem was Cisco call manager and not the PBX.   All I did was to remove the prefix of Zero and the problem was resolved. The client was very happy and straight away dropped another problem that needed solving on my lap. . .  lol     🙂     from frying pan to fire 😉

Anyway, I  hope you enjoyed this case as much as I did

Cheers


If you are visiting for the first time then I would like to say welcome to Voice Initiate. It is a place where experiences are shared and discussed.  As we all know, working in technology is all about being introduced to new things on an almost daily basis. For me, it’s like being in a perpetual state of initiation.

If you are reading this blog then I’m sure you can relate when I say that I came into the IP telephony world not knowing that I would have to learn programming too, but working on Contact Centers dictates  that  you must- so retreat was never an option. . . it’s an endless battle but we love it anyway.

This blog is an account of the joys and challenges faced by a Unified Communications NOC Engineer. This is a  place where both failure and success is discussed openly.

Advice and contributions are always welcomed.  My hope is that this blogs brings a sense of thrill to fellow bloggers and initiates alike and a sense of nostalgia to those that have seen the joys and challenges that we now speak of .

Cheers!

maxwell.osagie@yahoo.com

%d bloggers like this: