Today our AIX box just killed all the SSH connections and no one could log back into the machine. We could, howver, ping the machine and an nmap scan showed all necessary services were up, but they were just not responding.
After anylisys of the logs it was determined that:
1) There was a core dump produced
2) The firmware needed to be date from: EL320_76 to: EL340_75
3) The paging size on hd6 of rootvg was not large enough 512M < 10G
I will address the last point (3) in this post. The paging size on the system was set to 512M on a system that has 32GB real memory.
# lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type Chksum
hd6 hdisk0 rootvg 512MB 6 yes yes lv 0
Apparently all services stopped responding when paging started. When the machine came back online it was recommend that the paging size be increase to 10G. Since we already had a paging size of 512M, we just needed to increase by 9.5G. But, in order to increase the paging size you need to check the PP size and the amout of free PPs. In my case I needed to increase by 160PPs. The PP SIZE is: 128 megabytes. Just convert 9.5G to MB and divide by the PP SIZE to determine how many PPs will give you 9.5GB. I needed 76 PPs.
Use smitty to increase the page size:
# smitty chps