© 2011 www.schellworth.de
IP Failover is still not present in OS X 10.8 Server. Apple also changed it's strategy from Managed Preferences (MCX) to Mobile Device Management (MDM). This is a significant change for many enterprise organizations with large Mac deployments. With the change to the MDM-strategy the ability to use Active Director or other third party applications is also impossible. The main fact is there is no need for a Mac Server and Failover solution anymore. To create MDM-Profiles for IOS and OS X devices you don't need a high availability solution.
I'll write down my last approaches of the os x pacemaker port in a new post. Maybe someone will need it to create an over dozed failover solution for a Render/Video Server.
http://www.cultofmac.com/182143/apples-profile-manager-and-the-future-of-mac-management-feature/ http://www.cultofmac.com/170752/apple-serves-up-mac-businessenterprise-resources-ahead-of-mountain-lion/
Apple has removed IP Failover from OS X Lion Server (10.7.2).
Doesn't work as expected. The heartbeatd works fine, but failoverd crashs the kernel after a take over process is initiated. The system can't boot after that happened, just an reinstall will fix the problem. Weird thing, maybe someone know how to fix this issue?
If you need the Failover Daemon i.e. for AFP, you can port it from an OS X 10.6.x Server Installation (hereinafter a short HowTo).
Copy the following files from an OS X Snow Leopard Server (10.6) to a Lion (10.7) Installation. A Lion Server isn't necessary.
scp /usr/sbin/heartbeatd root@lion.server:/usr/sbin/. scp /usr/sbin/failoverd root@lion.server:/usr/sbin/. scp -r /System/Library/PrivateFrameworks/CoreServer.framework root@lion.server:/System/Library/. scp /usr/libexec/NotifyFailover root@lion.server:/usr/libexec/. scp /usr/libexec/ProcessFailover root@lion.server:/usr/libexec/. scp -r /System/Library/StartupItems/IPFailover root@lion.server:System/Library/StartupItems/.
Hopefully Apple will come up with a new Failover solution some day. OD just have one built in, and how the new AFP daemon works I haven't found out yet (the AppleFileServer manpage is the worst I've ever seen!). The SAN (integrated in Lion) features would be an other way for an AFP high availability solution.
Maybe this should work on OS X Mountain Lion (10.8).
libnet
curl -L http://sourceforge.net/projects/libnet-dev/files/libnet-1.1.5.tar.gz/download -o libnet.tar.gz tar xzvf libnet.tar.gz cd libnet-1.1.5 ./configure --prefix=/opt/local make sudo make install
Create a local user and group. You can use the directory editor gui (directory utility) or the command line:
dseditgroup -o create haclient
dscl . -create /Users/hacluster
dscl . -create /Users/hacluster UserShell /usr/bin/false
dscl . -create /Users/hacluster UniqueID 502
dscl . -create /Users/hacluster PrimaryGroupID 502
dscl . -create /Users/hacluster NFSHomeDirectory /var/lib/heartbeat/cores/hacluster
dscl . -create /Users/hacluster RealName “Cluster User”
export CLUSTER_USER=hacluster
export CLUSTER_GROUP=haclient
export CLUSTER_USER=admin export CLUSTER_GROUP=admin export PREFIX=/opt/local
Cluster Glue
hg clone http://hg.linux-ha.org/glue cd glue ./autogen.sh ./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d --with-daemon-user=${CLUSTER_USER} --with-daemon-group=${CLUSTER_GROUP} make sudo make install
Resource Agents
git clone git://github.com/ClusterLabs/resource-agents.git cd resource-agents/ ./autogen.sh /opt/local/bin/autoreconf -i ./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d make sudo make install
Heartbeat
hg clone http://hg.linux-ha.org/dev heartbeat-dev cd heartbeat-dev/ ./bootstrap ./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d make sudo make install
Pacemaker
hg clone http://hg.clusterlabs.org/pacemaker/1.1/ cd 1.1 ./autogen.sh /usr/local/bin/autoreconf -i ./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d --with-heartbeat --disable-fatal-warnings make sudo make install
The IP Failover feature is a daemon process integrated in OS X Server. It's mostly used for high availability server setups, i.e. if you use OS X Server as a OpenDirectory (OD) Server for client home folders stored on the server or for Netboot clients.
OD it self doesn't need the IP Failover daemon features. If you have at least 2 ODs running and one server goes off line, the clients automatically find the other OD servers, it's a built in OD feature. Unlike to AFP or other applications, i.e. a type server or the SWUPD (software update daemon), the clients won't find the new server if the IP or URL of the master is gone.
Over the years I did a lot of IP Failover setups for large companies. The main setups and how they work are well documented by Apple and some other resources on the web.
Here you can find a nearly perfect “IP Failover Test Script”, that checks a little more before the take over starts. The main problems are to restart the master server (without shutting down the replic) or release him after an acquire without getting inconsistence of the home directories data. The so called STOMITH (Shoot The Other Machine In The Head) procedure is not necessary any more (who did invent this rough solution?).
If you like to use them commercially, please contact me.
This IP Failover use additionally a rSync Script to update the home directory files. You can find this script in the Linux section.
#!/bin/bash # (c) 2011 www.schellworth.de v.0.2.1 # # This Script checks if the FileSync is clean and additionally if the IP is really down after <WAIT> seconds. # # WARNING: If the FileSync is not clean (maybe after a past take over and the files hasn't been updated) a TakeOver won't proceed! # STATE=$1 #acquire or release mode IP=$2 #on hook IP WAIT=240 #sleep X seconds (240 = 4 minutes), to check if the Server is really down (or just reboots) HOST_IP=192.168.0.1 #2nd interface to check system health SYNC_STATE="/Library/Scripts/rsync/rsyncbackup.com" logger "IPFailover (Test): Testscript starts to $1 IP: $2" check_link=$( ls -l $SYNC_STATE ) #Get the linked target target=${check_link#* -> } target_file=${target##*/} if [ $target_file == "activesync.sh" ] #Check if the FileSync is active then /sbin/ping -q -c1 $HOST_IP &> /dev/null #Check the 2nd IP if the Master is really down if ([ "$?" -gt 0 ] && [ $STATE == "acquire" ]) then #ACQIURE logger "IPFailover (Test): $HOST_IP isn't reachable! Test again in $WAIT seconds. ..." sleep $WAIT #wait a few seconds logger: "IPFailover (Test): ... ping $HOST_IP" /sbin/ping -q -c1 $HOST_IP &> /dev/null #Check the 2nd IP again if [ "$?" -gt 0 ] then logger "IPFailover (Test): Master is DOWN! Acquiring $IP will proceed. (0)" exit 0 else logger "IPFailover (Test): $HOST_IP is UP. Canceling to acquire $IP (5)." exit 5 fi else if [ $1 == "release" ] #RELEASE then logger "IPFailover (Test): Master is UP again. Releasing $IP will proceed" exit 0 else logger "IPFailover (Test): $HOST_IP is UP! Canceling to acquire $IP (100)." echo "started from command line?" echo "usage: Test ['acquire' | 'release'] ['IP']" exit 10 fi fi else logger "IPFailover (Test): FileSync is disabled. Please reverse the FileFsync process!" logger "IPFailover (Test): $STATE $IP canceled! (50)" SUBJECT="WARNING!!! The IP Failover $IP failed!" TO="[email protected]" BODY="Please check the server logs." echo "$BODY" | mail -s "$SUBJECT" "$TO" exit 50 fi
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PreAcq script" # Disable rsync Process ln -sf /Library/Scripts/rsyncbackup/inactivsync.sh /Library/Scripts/rsyncbackup/rsyncbackup.com logger "IPFailover: RSYNC disabled"
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PostAcq script" # Starting AFP-Daemon serveradmin start afp logger "IPFailover: AFP started"
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PreRel script" # Stop afp-Daemon serveradmin stop afp logger "IPFailover: AFP stopped"
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PostRel script"
http://docs.info.apple.com/article.html?path=ServerAdmin/10.5/en/c3fs29.html
http://www.mactech.com/articles/mactech/Vol.23/23.03/OSXFailover-Part1/index.html
http://www.afp548.com/article.php?story=20050218175501583
http://www.afp548.com/article.php?story=20051018203349525
http://www.mac-o-net.de/article/8
http://osxnetzwerk.de/2010/08/12/ip-failover-fuer-afp-unter-mac-os-x-server-snow-leopard/
Discussion
I wish they would do something. Know of any third party alternatives? I much like you have created my only scripts but using PINGs instead of the hearbeatd protocol on both sides. I then wrote a custom fail over script.
There is a Linux alternative, but it's very kernel based and wouldn't work on Darwin - http://www.keepalived.org/. I'm thinking of to code an alternative in bash. One important feature is to take over a virtual IP on hook of one interface (that won't be a big issue). The other issue is to start this script in launchd and restart it, when it crashed. The best way is to code it in Objective C or in any other language, but this will be a bigger issue (for me).
Here are some more Linux alternatives: http://www.toniwestbrook.com/archives/184 http://www.linux-ha.org/wiki/Main_Page
http://ostatic.com/mpathd/home/1#http://www.ultramonkey.org/I'll have a deeper look into this the next days.
MPATHD is very outdated. Ultramonkeys is also outdated. I'm currently looking a little bit deeper into the linux-ha heartbeat solution.
The conclusion that I have come to is that it has been completely taken out. The general consensus is that apple is moving away from the enterprise market (isn't it obvious?)
My best guess is that heartheatd and failover didn't simply port over from their previous source and maybe had issues with compatibility with the removal of classic support in Lion. Similar to the Final cut fiasco. I also think that is why they re engineered the presentation and GUI interface so radically, and also choose to leave other services out completely.
The only other reason would be that they know of another product that does it differently and better, similar to their stance on xserve raids and having them discontinued and replaced by Promise.
The problem is I do not know of any alternative 3rd party solution that does what it did. I ended up writing my own scripts that monitor my two servers and then fails over in the case of failure.
If anyone with applescript and servers wants to collaborate on what I have I would be totally open. I have been running it for about 3 months in production without issues, and prior tested all functionality which worked great.
That should be the sadly truth. Hence I think there is a way for an 3rd party alternative. Maybe I could port one of the existing Linux projects to Darwin, would be worth a try.
Hello any luck with a 3rd party alternative, I ended up writing a new script in applescript for a client, I am going to begin to port it to Xcode applescript cocoa with GUI and build it out a little bit. I would love to get your opinion of some of the procedures.
After some struggle I've successfully compiled heartbeat/pacemaker on OS X. That should be a good base to go a head to configure it. I'll post my results soon.
hi guys any solid evidence that the ip failover works in Lion server? because i did some work as in the rsync is patch to 3.09 and in theory if there is a patch on the rsync i believe that by do int that the fail over would work in lion server.
sorry I'm very busy at the moment and can't go on with the pacemaker port at the moment. if any one is interested to help me, PM me. rsync is used for time-machine, so it's patched well. I don't think apple will come up with a new ip failover solution.
There is no solid evidence it works in lion, in fact there is overwhelming evidence it doesn't.
best thing to do is create a couple applescripts/bash that will ping your servers. You could set up a monitoring server to do this, or have each server (primary and back up) ping eachother. If ping fails, begin to do scripted actions for fail over. Philipp's algorithm is solid and I came up with my own that is very similar.
You can: -switch IP's with simple commands that change your Network location -enable or disable protocols -send out warning emails.
I was orignally going to release a GUI replacement for fail over but lost my motivation after getting a new job.
I'm quite pleased with the infomrtiaon in this one. TY!