In this scenario I had an MSDP RPF failure. Here is the topology I was using.
OK. R1 is the source, loopback 1 on R6 is the member. I was having an issue with MSDP not working. By not working, I mean no members were receiving the stream when I sent a ping to the multicast address 239.1.1.1 from my source at R1. I also enabled a debug on R6 to find the issue.
R1#ping 239.1.1.1 so lo0 repeat 4
Type escape sequence to abort.
Sending 4, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
….
R6#debug ip msdp peer
MSDP peer debugging is on
*Mar 1 09:59:29.805: MSDP(0): 45.45.45.1: Received 20-byte msg 1154 from peer
*Mar 1 09:59:29.805: MSDP(0): 45.45.45.1: SA TLV, len: 20, ec: 1, RP: 1.1.1.1
*Mar 1 09:59:29.809: MSDP(0): 45.45.45.1: RPF check failed for 1.1.1.1
So I know for an MSDP rpf check in this scenario, that I must have a route to the RP (1.1.1.1) in the mRIB, or uRIB, and I must also have a route to the MSDP peer (45.45.45.1) in the mRIB, or uRIB. Lets check that first. If you are not familiar with the MSDP RPF check procedure in this scenario, Ive documented it in three steps at the top of this post on MSDP RPF Rule 3.
R6#show ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 92
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
3 2 1
5.5.5.5 (metric 2) from 5.5.5.5 (5.5.5.5)
Origin IGP, metric 0, localpref 300, valid, internal, best
R6#sh ip bgp 45.45.45.1
BGP routing table entry for 45.45.45.0/30, version 105
Paths: (1 available, best #1, table Default-IP-Routing-Table, RIB-failure(17))
Not advertised to any peer
4
5.5.5.5 (metric 2) from 5.5.5.5 (5.5.5.5)
Origin IGP, metric 0, localpref 100, valid, internal, best
Ok, so we have a uRIB route to the RP, but not to the MSDP peer because of a RIB-failure. Lets find out more about this:
R6#sh ip bgp rib-failure
Network Next Hop RIB-failure RIB-NH Matches
5.5.5.5/32 5.5.5.5 Higher admin distance n/a
35.35.35.0/30 5.5.5.5 Higher admin distance n/a
45.45.45.0/30 5.5.5.5 Higher admin distance n/a
Ok, so another protocol or static route or something that has made the BGP route less preferred. So how are we learning 45.45.45.1?
R6#sh ip route 45.45.45.1
Routing entry for 45.45.45.0/30
Known via “ospf 1”, distance 110, metric 2, type intra area
Last update from 56.56.56.1 on FastEthernet0/0, 00:12:31 ago
Routing Descriptor Blocks:
* 56.56.56.1, from 5.5.5.5, 00:12:31 ago, via FastEthernet0/0
Route metric is 2, traffic share count is 1
So it looks like R6 has received this route from R5 via OSPF instead of BGP. Because it is listed as a BGP rib-failure, we can assume that we are learning it by both BGP & OSPF. So we should just ensure we are only learning this via BGP. Lets confirm and fix this by seeing why R5 is giving the route via OSPF instead of BGP.
R5#sh ip route 45.45.45.1
Routing entry for 45.45.45.0/30
Known via “connected”, distance 0, metric 0 (connected, via interface)
Routing Descriptor Blocks:
* directly connected, via FastEthernet0/1
Route metric is 0, traffic share count is 1
R5#sh run | s ospf
router ospf 1
log-adjacency-changes
passive-interface FastEthernet0/1
network 0.0.0.0 255.255.255.255 area 0
R5(config)#router ospf 1
R5(config-router)#no network 0.0.0.0 255.255.255.255 area 0
R5(config-router)#network 56.56.56.0 0.0.0.3 area 0
R5(config-router)#network 5.5.5.5 0.0.0.0 area 0
So what I saw here, was that Id configured R5 to advertise everything via OSPF. I dont want that. I only want routes internal to my domain to be learnt via OSPF. So thats what Ive adjusted in OSPF here (5.5.5.5 is just R5s loopback). So I went back to R6 to check the BGP RIB again for the MSDP peer.
R6#sh ip bgp 45.45.45.1
BGP routing table entry for 45.45.45.0/30, version 106
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
4
5.5.5.5 (metric 2) from 5.5.5.5 (5.5.5.5)
Origin IGP, metric 0, localpref 100, valid, internal, best
Problem gone. Next, I know the final stage of the MSDP RPF check for this situation is to ensure the last AS in the path towards the MSDP peer is the First AS in the path towards the RP. So the output of R6 above shows that the last AS (and only AS) to my MSDP peer is AS4. If I check the first AS towards the originating source RP, it should be AS4.
R6#show ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 92
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
3 2 1
5.5.5.5 (metric 2) from 5.5.5.5 (5.5.5.5)
Origin IGP, metric 0, localpref 300, valid, internal, best
Ok, so there seemed to be some issue here. For some reason R6 was preferring the path via R3 to reach the MSDP peer (45.45.45.1). This was was causing the RPF to continue to fail. Theres not really any obvious reason why R6 is doing this, except that the local pref is a different number than 100. So I moved onto R5 to see why he is picking the route via R3.
R5#sh ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 79
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Advertised to update-groups:
1 2
4 2 1
45.45.45.1 from 45.45.45.1 (4.4.4.4)
Origin IGP, localpref 100, valid, external
3 2 1
35.35.35.1 from 35.35.35.1 (3.3.3.3)
Origin IGP, localpref 300, valid, external, best
So for some reason, Ive set the local pref to 300 for the path via R3. So I just checked the BGP config and removed this issue.
R5#sh run | s bgp
router bgp 5
no synchronization
bgp router-id 5.5.5.5
bgp log-neighbor-changes
network 5.5.5.5 mask 255.255.255.255
network 35.35.35.0 mask 255.255.255.252
network 56.56.56.0 mask 255.255.255.252
neighbor 6.6.6.6 remote-as 5
neighbor 6.6.6.6 update-source Loopback0
neighbor 6.6.6.6 next-hop-self
neighbor 35.35.35.1 remote-as 3
neighbor 35.35.35.1 route-map test in
neighbor 45.45.45.1 remote-as 4
no auto-summary
R5#
R5#sh route-map test
route-map test, permit, sequence 10
Match clauses:
Set clauses:
local-preference 300
Policy routing matches: 0 packets, 0 bytes
R5#
R5#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R5(config)#router bgp 5
R5(config-router)#no neighbor 35.35.35.1 route-map test in
R5(config-router)#neighbor 45.45.45.1 route-map test in
R5(config-router)#end
R5#
R5#clear ip bgp * in
R5#
R5#sh ip bgp 45.45.45.1
BGP routing table entry for 45.45.45.0/30, version 113
Paths: (1 available, best #1, table Default-IP-Routing-Table, RIB-failure(17))
Flag: 0x800
Advertised to update-groups:
1 2
4
45.45.45.1 from 45.45.45.1 (4.4.4.4)
Origin IGP, metric 0, localpref 300, valid, external, best
To verify this I sent another multicast ping from R1, then saw a positive rpf check come in on R6 via the debug.
R1#ping 239.1.1.1 so lo1 repeat 2
Type escape sequence to abort.
Sending 2, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 100.100.100.1
Reply to request 0 from 56.56.56.2, 192 ms
Reply to request 1 from 56.56.56.2, 112 ms
Reply to request 1 from 56.56.56.2, 112 ms
R6#
*Mar 1 10:45:48.501: MSDP(0): 45.45.45.1: Received 20-byte msg 1253 from peer
*Mar 1 10:45:48.505: MSDP(0): 45.45.45.1: SA TLV, len: 20, ec: 1, RP: 1.1.1.1
*Mar 1 10:45:48.505: MSDP(0): 45.45.45.1: RPF check passed for 1.1.1.1, Peer is best in the closest AS
Easy right? lol. This was the first time I configured MSDP in my life, my god that was more difficult than I expected!
For an excellent demonstration of troubleshooting multicast RPF failures within a domain, Brian McGahan has provided a free vSeminar recording, veiwable from : http://www.ine.com/all-access-pass/training/playlist/free-ccie-video-training-seminars/ccie-routing11000022.html