AIX - Knowledge Base: February 2014

Wednesday, 26 February 2014

Data transfer speed is slow from server1 to server2

Purely files (host application, using TCP IP over Ethernet) copying over network from server1 to server2, there is no SAN or storage device involved in this data transfer.... So, storage team cannot help much on this.

One thing which can makes difference is you can use backup LAN if they are in same VLAN. Which can definitely speedup data transfer.

Use the server1b and server2b as host names to transfer data.
b-> backup LAN

One more option.. mount the file system of server1 as readonly on server2 over backup LAN based on accessibility & start data copy

Script - xargs for changing Filesystem to auto mount

[server12@AIX]>lsvg | grep oraclet
oracletarchvg
oracletvg
oraclet_data1

[server12@AIX]>lsvg | grep oraclet | lsvg -li
oracletarchvg:
LV NAME             TYPE       LPs     PPs     PVs LV STATE      MOUNT POINT
oracle1archlv01     jfs2       158     158     1    closed/syncd /var/opt/oracle/archv01/oracle
oracletvg:
LV NAME             TYPE       LPs     PPs     PVs LV STATE      MOUNT POINT
oraclelv01      jfs2       12      12      6    closed/syncd /var/opt/oracle/01/oracle
oraclelv02      jfs2       12      12      6    closed/syncd /var/opt/oracle/02/oracle
oraclelv03      jfs2       12      12      6    closed/syncd /var/opt/oracle/03/oracle
oraclelv04      jfs2       12      12      6    closed/syncd /var/opt/oracle/04/oracle
oraclelv05      jfs2       12      12      6    closed/syncd /var/opt/oracle/05/oracle
oraclelv06      jfs2       12      12      6    closed/syncd /var/opt/oracle/06/oracle
oraclet_data1:
LV NAME             TYPE       LPs     PPs     PVs LV STATE      MOUNT POINT
ora1datalv01      jfs2       314     314     1    closed/syncd /var/opt/oracle/data001/oracle
ora1datalv02      jfs2       314     314     1    closed/syncd /var/opt/oracle/data002/oracle
ora1datalv03      jfs2       314     314     1    closed/syncd /var/opt/oracle/data003/oracle
ora1datalv04      jfs2       314     314     1    closed/syncd /var/opt/oracle/data004/oracle

[server12@AIX]>lsvg | grep oraclet | lsvg -li | grep oracle
oracle1archlv01     jfs2       158     158     1    closed/syncd /var/opt/oracle/archv01/oracle
oraclelv01      jfs2       12      12      6    closed/syncd /var/opt/oracle/01/oracle
oraclelv02      jfs2       12      12      6    closed/syncd /var/opt/oracle/02/oracle
oraclelv03      jfs2       12      12      6    closed/syncd /var/opt/oracle/03/oracle
oraclelv04      jfs2       12      12      6    closed/syncd /var/opt/oracle/04/oracle
oraclelv05      jfs2       12      12      6    closed/syncd /var/opt/oracle/05/oracle
oraclelv06      jfs2       12      12      6    closed/syncd /var/opt/oracle/06/oracle
ora1datalv01      jfs2       314     314     1    closed/syncd /var/opt/oracle/data001/oracle
ora1datalv02      jfs2       314     314     1    closed/syncd /var/opt/oracle/data002/oracle
ora1datalv03      jfs2       314     314     1    closed/syncd /var/opt/oracle/data003/oracle
ora1datalv04      jfs2       314     314     1    closed/syncd /var/opt/oracle/data004/oracle

[server12@AIX]>lsvg | grep oraclet | lsvg -li | grep oracle | awk '{print $7}'
/var/opt/oracle/archv01/oracle
/var/opt/oracle/01/oracle
/var/opt/oracle/02/oracle
/var/opt/oracle/03/oracle
/var/opt/oracle/04/oracle
/var/opt/oracle/05/oracle
/var/opt/oracle/06/oracle
/var/opt/oracle/data001/oracle
/var/opt/oracle/data002/oracle
/var/opt/oracle/data003/oracle
/var/opt/oracle/data004/oracle

[server12@AIX]>lsvg | grep oraclet | lsvg -li | grep oracle | awk '{print $7}'| xargs -n1 chfs -A yes

[server12@AIX]>lsvg | grep oraclet | lsvg -li | grep oracle | awk '{print $7}'| xargs -n1 mount

[server12@AIX]>lsvg | grep oraclet | lsvg -li | grep oracle | awk '{print $7}'| xargs -n1 lsfs | grep -v Name
/dev/oracle1archlv01 --         /var/opt/oracle/archv01/oracle jfs2 41418752 rw         yes no
/dev/oraclelv01 --         /var/opt/oracle/01/oracle jfs2 3145728 rw         yes no
/dev/oraclelv02 --         /var/opt/oracle/02/oracle jfs2 3145728 rw         yes no
/dev/oraclelv03 --         /var/opt/oracle/03/oracle jfs2 3145728 rw         yes no
/dev/oraclelv04 --         /var/opt/oracle/04/oracle jfs2 3145728 rw         yes no
/dev/oraclelv05 --         /var/opt/oracle/05/oracle jfs2 3145728 rw         yes no
/dev/oraclelv06 --         /var/opt/oracle/06/oracle jfs2 3145728 rw         yes no
/dev/ora1datalv01 --         /var/opt/oracle/data001/oracle jfs2 82313216 rw         yes no
/dev/ora1datalv02 --         /var/opt/oracle/data002/oracle jfs2 82313216 rw         yes no
/dev/ora1datalv03 --         /var/opt/oracle/data003/oracle jfs2 82313216 rw         yes no
/dev/ora1datalv04 --         /var/opt/oracle/data004/oracle jfs2 82313216 rw         yes no

Script to see the LV distribution in mirrored VG

for i in `lsvg -l sapvg|tail +3|awk '{print $1}'`
> do
> echo $i
> echo "_______________"
> lslv -l $i
> echo "_______________"
> done

script to get the VG size of all vgs in a server

#!/bin/ksh

for vgname in $(lsvg -o)
do

echo "$vgname \c"
lsvg $vgname|grep 'TOTAL PPs'| awk '{print $7}'|cut -c2-
done

Script - for loop to find the disk size

Mention the hdisk names in a file -> disks

for i in `cat disks`
> do
> echo $i "     "       $(bootinfo -s $i)
> done
hdisk0   70784
hdisk1   34560
hdisk2   34560
hdisk3   70784
hdisk4   70784
hdisk5   34560
hdisk6   34560
hdisk7   70784
hdisk8   34560

script - for loop to take all disk attributes

for i in `lspv | awk '{print $1}'`
> do
> echo $i ; lsattr -El $i ;echo ===========================>> /tmp/dista
> done

script - for loop script to set the PVID for all hdisks

Mention all the hdisk names in a file ---> disks

for i in `cat disks`
do
chdev -l $i -a pv=yes
done

Script - for loop to grep WWPN number

# for i in `lsdev -C |grep -i fcs| awk '{print $1}'`
> do
> echo $i ; lscfg -vpl $i |grep -i network
> echo =======================================================
> done

fcs0
        Network Address.............10000000C9C0F3C8
======================================
fcs1
        Network Address.............10000000C9C0F3C9
======================================
fcs2
        Network Address.............10000000C993690E
======================================
fcs3
        Network Address.............10000000C993690F
======================================
fcs4
        Network Address.............10000000C995E090
======================================
fcs5
        Network Address.............10000000C995E091
======================================
fcs6
        Network Address.............10000000C995DC9A
======================================
fcs7
        Network Address.............10000000C995DC9B
======================================

Script - for loop to change MPIO settings

Note all the disk names are in a file

# cat disks
hdisk90
hdisk91
hdisk92
hdisk93
hdisk94
hdisk95
hdisk96
hdisk97
hdisk89

for i in `cat disks`
> do
> chdev -l $i -a algorithm=round_robin -a reserve_policy=no_reserve -a queue_depth=32 -a rw_timeout=180
> done
hdisk90 changed
hdisk91 changed
hdisk92 changed
hdisk93 changed
hdisk94 changed
hdisk95 changed
hdisk96 changed
hdisk97 changed
hdisk89 changed

rsync

Just an example: from server1 to server2

rsync -avz -e ssh root@server2:/opt/softwaredepot/* /opt/softwaredepot_new/

need to run this from server1. We are pulling from server2

Before that, we need to set ssh keyless access from server1 to server2

===========================

If you have ssh in place for root..from server3 to server4

So from server3 (push method)

rsync -avz /xxxxxxxx/yyyyyyy/* root@server4:/xxxxxxxx/yyyyyyy/

Procedure to clean up the HMC filesystems

As a hscroot check the filesystem use

monhmc -r disk -n 0

- Free up space in /var

chhmcfs -o f -d 0


- As hscpe, login to HMC and use pesh to exit restricted shell then su to root

- As root run following commands

cd /opt/ccfw/data/vr

chsvcevent -o closeall

rm -rf [123456789]*

reboot

Prework before Physical processor book - Memory DIMM addition

1. Physical CPU processor book add in the Frame

2. Memory DIMM add in the frame

3. I/O cards add

For the above tasks, make sure to take the screen shot of FRAME properties like.......

Memory, processor and I/O tabs.... So that you can will have the details of "Installed / Available / Configured / De-configured" parameters.
After adding the physical CPU, Memory in the Frame.., you can able to see only the NEW parameters in the Frame. You will not get the old "Installed / Available / Configured / De-configured" parameters.
So, make sure to take the screen shot......

OR

you need to take the output of #lscfg -vp in anyone of the LPARS, that will tell us the physical devices connected in the Entire frame.

Also, as usual run the prework.sh in all LPARS, before shutting down the FRAME.

Portmir

portmir

# tty
/dev/pts/8

To take session
# portmir -t /dev/pts/8

To disconnect the session
# portmir -o /dev/pts/8

Save your file with passwd protected.

server1:/tmp # touch testenc

server1:/tmp # echo "hi I am a line" > testenc

server1:/tmp # cat testenc
hi I am a line

server1:/tmp # openssl bf -salt -in ./testenc > ./testenc1
enter bf-cbc encryption password:
Verifying - enter bf-cbc encryption password:

server1:/tmp # cat testenc1
Salted__°îøÐýÉÉQÝly¶ïÑ^±oþjõserver1:/tmp #

server1:/tmp # openssl bf -d -in ./testenc1 >./testenc
enter bf-cbc decryption password:

server1:/tmp # cat testenc
hi I am a line

Paging utilization - Tricks

If the paging utilization is sudden high and keeping constant in that place for more than 3 days... then use this method to reduce the paging utilization.

server1:/root # lsps -s
Total Paging Space   Percent Used
      21312MB               28%

server1:/root # chps -d 1 paging00
shrinkps: Temporary paging space paging01 created.
shrinkps: Paging space paging00 removed.
shrinkps: Paging space paging00 recreated with new size.

(THIS COMMAND WILL TAKE DOUBLE SIZE OF PAGING, THAT IS 28X2 TIMES) IF YOU DO NOT HAVE ENOUGH SPACE IN PAGING... DONT RUN THIS COMMAND.

by decrease 1 pp.....it will create temp paging ... move or reduce usage then recreate
so at end % will become 1 or 2%
will decrease 1 pp and later add it back after % is reduced

now add back the 1 pp

server1:/root # chps -s 1 paging00

server1:/root # lsps -s
Total Paging Space   Percent Used
      21376MB               1%

NOTE : Never try this steps, when your paging is more than 50%

Paging and Sysdump - Redundancy checks

server1:/root # for i in `lsdev -Cs vscsi | awk '{print $1}'`
> do
> lscfg -vl $i
> done
hdisk0           U9119.FHA.021CBF4-V11-C4-T1-L8100000000000000 Virtual SCSI Disk Drive
hdisk1           U9119.FHA.021CBF4-V11-C4-T1-L8200000000000000 Virtual SCSI Disk Drive
hdisk2           U9119.FHA.021CBF4-V11-C5-T1-L8100000000000000 Virtual SCSI Disk Drive
hdisk3           U9119.FHA.021CBF4-V11-C5-T1-L8200000000000000 Virtual SCSI Disk Drive
hdisk35          U9119.FHA.021CBF4-V11-C5-T1-L8300000000000000 Virtual SCSI Disk Drive

hdisk0, hdisk1 is coming from one VIO > V11-C4
hdisk2, hdisk3, hdisk35 is coming from another VIO > V11-C5

server1:/root # lsvg -p rootvg
rootvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk0            active            527         71          00..00..00..00..71
hdisk1            active            63          0           00..00..00..00..00
hdisk2            active            527         71          09..03..00..00..59
hdisk3            active            63          0           00..00..00..00..00

server1:/root # lsps -a
Page Space      Physical Volume   Volume Group    Size %Used Active Auto Type Chksum
paging00        hdisk1            rootvg        4032MB     1   yes   yes    lv     0
hd6             hdisk0            rootvg        4032MB     1   yes   yes    lv     0

Need to make sure, paging00 disk is mirrored and the mirrored disk should come from another VIO.

server1:/root # lslv -m paging00
paging00:N/A
LP    PP1 PV1               PP2 PV2               PP3 PV3
0001 0001 hdisk1            0001 hdisk3
0002 0002 hdisk1            0002 hdisk3
0003 0003 hdisk1            0003 hdisk3
0004 0004 hdisk1            0004 hdisk3
0005 0005 hdisk1            0005 hdisk3
0006 0006 hdisk1            0006 hdisk3
0007 0007 hdisk1            0007 hdisk3
0008 0008 hdisk1            0008 hdisk3
0009 0009 hdisk1            0009 hdisk3
0010 0010 hdisk1            0010 hdisk3
0011 0011 hdisk1            0011 hdisk3
0012 0012 hdisk1            0012 hdisk3
0013 0013 hdisk1            0013 hdisk3

So, paging has redundancy.

Next is sysdump...
==========

server1:/root # sysdumpdev -e
Estimated dump size in bytes: 983564288 > that is 0.9GB

server1:/root # sysdumpdev -l
primary              /dev/aixdump
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         traditional

server1:/root # lsvg -l rootvg
rootvg:
LV NAME             TYPE       LPs     PPs     PVs LV STATE      MOUNT POINT
hd5                 boot       1       2       2    closed/syncd N/A
hd6                 paging     63      126     2    open/syncd    N/A
hd8                 jfs2log    1       2       2    open/syncd    N/A
hd4                 jfs2       8       16      2    open/syncd    /
hd2                 jfs2       48      96      2    open/syncd    /usr
hd9var              jfs2       32      64      2    open/syncd    /var
hd3                 jfs2       48      96      2    open/syncd    /tmp
hd1                 jfs2       16      32      2    open/syncd    /home
hd10opt             jfs2       48      96      2    open/syncd    /opt
hd11admin           jfs2       2       4       2    open/syncd    /admin
aixdump             sysdump    32      64      2    open/syncd    N/A
livedump            jfs2       4       8       2    open/syncd    /var/adm/ras/livedump
paging00            paging     63      126     2    open/syncd    N/A
fslv00              jfs2       1       2       2    open/syncd    /usr/local
fslv01              jfs2       16      32      2    open/syncd    /usr/sys/inst.images
perflv              jfs2       32      64      2    open/syncd    /var/opt/perf
sarlv               jfs2       48      96      2    open/syncd    /var/adm/sa
openvlv             jfs2       24      48      2    open/syncd    /usr/openv
topaslv             jfs2       16      32      2    open/syncd    /etc/perf

server1:/root # lslv -m aixdump
aixdump:N/A
LP    PP1 PV1               PP2 PV2               PP3 PV3
0001 0108 hdisk0            0027 hdisk2
0002 0109 hdisk0            0028 hdisk2
0003 0110 hdisk0            0029 hdisk2
0004 0111 hdisk0            0030 hdisk2
0005 0112 hdisk0            0031 hdisk2
0006 0113 hdisk0            0032 hdisk2
0007 0114 hdisk0            0033 hdisk2
0008 0115 hdisk0            0034 hdisk2
0009 0116 hdisk0            0035 hdisk2
0010 0117 hdisk0            0036 hdisk2

aixdump is mirrored here....... sysdump shouldnt be mirrored. So, remove the copies and create another LV in hdisk2 and make that LV as secondary dump device.

aixdump2/root # rmlvcopy aixdump 1 hdisk2

server1:/root # lslv -m aixdump
aixdump:N/A
LP    PP1 PV1               PP2 PV2               PP3 PV3
0001 0108 hdisk0
0002 0109 hdisk0
0003 0110 hdisk0
0004 0111 hdisk0
0005 0112 hdisk0

server1:/root # smitty lv > to create another LV (aixdump2)

server1:/root # lsvg -l rootvg
rootvg:
LV NAME             TYPE       LPs     PPs     PVs LV STATE      MOUNT POINT
hd5                 boot       1       2       2    closed/syncd N/A
hd6                 paging     63      126     2    open/syncd    N/A
hd8                 jfs2log    1       2       2    open/syncd    N/A
hd4                 jfs2       8       16      2    open/syncd    /
hd2                 jfs2       48      96      2    open/syncd    /usr
hd9var              jfs2       32      64      2    open/syncd    /var
hd3                 jfs2       48      96      2    open/syncd    /tmp
hd1                 jfs2       16      32      2    open/syncd    /home
hd10opt             jfs2       48      96      2    open/syncd    /opt
hd11admin           jfs2       2       4       2    open/syncd    /admin
aixdump             sysdump    32      32      1    open/syncd    N/A
livedump            jfs2       4       8       2    open/syncd    /var/adm/ras/livedump
paging00            paging     63      126     2    open/syncd    N/A
fslv00              jfs2       1       2       2    open/syncd    /usr/local
fslv01              jfs2       16      32      2    open/syncd    /usr/sys/inst.images
perflv              jfs2       32      64      2    open/syncd    /var/opt/perf
sarlv               jfs2       48      96      2    open/syncd    /var/adm/sa
openvlv             jfs2       24      48      2    open/syncd    /usr/openv
topaslv             jfs2       16      32      2    open/syncd    /etc/perf
aixdump2            sysdump    32      32      1    closed/syncd N/A

server1:/root # sysdumpdev -P -s /dev/aixdump2
primary              /dev/aixdump
secondary            /dev/aixdump2
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         traditional

Node Evacuation Procedure

Node evacuation is a process that is required during a hot node repair or a memory upgrade. During the node evacuation process, the Power Hypervisor is used to complete the following tasks.

    Move the contents of the memory in the target node to the memory in the other nodes of the system.
    Move the programs running on dedicated processors assigned to the partitions, and the programs running on processors assigned to the shared processor pool, from the target node to other nodes on the system.
    Lock all the I/O slots that are attached to the target node to prevent the slots from being used during the repair or upgrade.

http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/ared3/ared3nodeevac.htm

NAS mount... tricks

NAS mount... tricks

172.25.92.193:/DataVolume/Public   2748.03   2438.04   12%       49     1% /nas1

server2:/root # ping 172.25.92.193

server2:/root # netstat -rn
Routing tables
Destination        Gateway           Flags   Refs     Use If   Exp Groups

Route Tree for Protocol Family 2 (Internet):
default            130.29.133.65     UG       38   2009595 en0      -      -
127/8              127.0.0.1         U        14       462 lo0      -      -
130.29.133.64      130.29.133.116    UHSb      0         0 en0      -      -   =>
130.29.133.64/26   130.29.133.116    U         4     66744 en0      -      -
130.29.133.116     127.0.0.1         UGHS      1        20 lo0      -      -
130.29.133.127     130.29.133.116    UHSb      0         4 en0      -      -
172.25.95.0        172.25.95.77      UHSb      0         0 en1      -      -   =>
172.25.95/24       172.25.95.77      U         0   8281785 en1      -      -
172.25.95.77       127.0.0.1         UGHS      0        16 lo0      -      -
172.25.95.255      172.25.95.77      UHSb      0         4 en1      -      -

Route Tree for Protocol Family 24 (Internet v6):
::1%1              ::1%1             UH        0       133 lo0      -      -

92 subnet is not there in routing table...

so, we checked the routing table on the other server, where the /nas1 is mounted

server1:/root # netstat -rn
Routing tables
Destination        Gateway           Flags   Refs     Use If   Exp Groups

Route Tree for Protocol Family 2 (Internet):
default            130.29.133.65     UG        9 227980978 en0      -      -
127/8              127.0.0.1         U        20     67317 lo0      -      -
130.29.133.64      130.29.133.113    UHSb      0         0 en0      -      -   =>
130.29.133.64/26   130.29.133.113    U         4     92236 en0      -      -
130.29.133.113     127.0.0.1         UGHS      2        26 lo0      -      -
130.29.133.127     130.29.133.113    UHSb      0         1 en0      -      -
172.25.92.0        172.25.92.172     UHSb      0         0 en1      -      -   =>
172.25.92/24       172.25.92.172     U         1   5959282 en1      -      -
172.25.92.172      127.0.0.1         UGHS      0         1 lo0      -      -
172.25.92.255      172.25.92.172     UHSb      0         1 en1      -      -

Route Tree for Protocol Family 24 (Internet v6):
::1%1              ::1%1             UH        0       326 lo0      -      -

Hence tried to add the routing table on server2, using the gateway 172.25.92.172

server2:/root # route add 172.25.92/24 172.25.92.172
and tried to ping the nas IP "172.25.92.193" from server2.

NO LUCK

Tricks
========

Try to find an IP address closer to nas IP "172.25.92.193" which is not assigned to any server so far... do nslookup

server1:/root # nslookup 172.25.92.199
Server:         130.29.152.21
Address:        130.29.152.21#53

** server can't find 199.92.25.172.in-addr.arpa: NXDOMAIN

So, use this IP address 172.25.92.199 and make an alias in en1.. using smitty inetalias.

Now, try to mount the nas fs. If you get any error while mounting the NAS FS, makesure to change the nfso parameter. and mount again.

server1:/root # nfso -a | grep port
      nfs_use_reserved_ports = 1
                   portcheck = 0

172.25.92.193:/DataVolume/Public 5763040016 5112939624   12%       49     1% /nas1

MPIO settings

EMC storage,

if you do rmdev -dl on the disk & rescan using cfgmgr, the disk will lose all MPIO settings. It has to be reconfigured again.

But, for Netapp storage, MPIO settings will not get changed through rmdev -dl

mksysb with the flag ( ipemX )

mksysb -ipemX /tmp/mksysb.image

-m
            Calls the mkszfile command, with the -m flag to generate map files. Note: The use of the -m flag causes the functions of the -i flag to
            be executed also.

-p
            Disables software packing of the files as they are backed up. Some tape drives use their own packing or compression algorithms.

# smitty lsmksysb

While running lsmksysb, if the mksysb is not good, it will ask for

Mount volume 2 on eprdmt4.mksysb.01.15.12.
        Press the Enter key to continue.

In that case, use -p flag and take a new mksysb... Now, the mksysb will be run fully and will not skip any binary files which are in larger size.

Performance - Memory Calculator

#!/usr/bin/ksh
#memory calculator
um=`svmon -G | head -2|tail -1| awk {'print $3'}`
um=`expr $um / 256`
tm=`lsattr -El sys0 -a realmem | awk {'print $2'}`
tm=`expr $tm / 1024`
fm=`svmon -G | head -2|tail -1| awk {'print $4'}`
fm=`expr $fm / 256`
umorg=`vmstat|tail -1|awk '{print $3}'`
umorg=`expr $umorg / 256`
bc=`expr $um - $umorg`

echo "\n\n-----------------------";
echo "System : (`hostname`)";
echo " "
echo "Paging Space details"
lsps -s
echo "-----------------------\n\n";
echo "Memory Information\n\n";
echo "total memory = $tm MB"
echo "free memory = $fm MB"
echo "used memory = $umorg MB"
echo "Buffer cache = $bc MB"
ump=`expr $umorg \* 100 / $tm `
echo "Used Memmory in % = $ump %"

echo "\n\n-----------------------\n";

Performance - Script to find the top memory consuption

#!/usr/bin/ksh
# Syntax: top_memory.sh number
#
# Description: Shows the top <number> processes using real memory
#
#
##############################################
#################
#
# bos:
# Uncomment the four next lines to create a simple syntax check
# Create text for variable "USAGE"
USAGE="Usage: top_memory.sh number"
# Check for correct syntax and exit if not
[[ "$#" -ne 1 ]] && { echo $USAGE ; exit 3 ; }
# bos
echo "\n\n Pid Command Inuse Pin Pgsp Virtual
64-bit Mthrd LPage\n"
/usr/bin/svmon -uP -t $1 | grep -p Pid | grep '^.*[0-9] '

# eos

Tuesday, 25 February 2014

Maintanence mode through NIM

smitty nim > Perform NIM Administration Tasks > Manage Machines > Perform Operations on Machines > select the server > Now select "maint_boot = enable a machine to boot in maintenance mode"

For this task, only SPOT is enough...

Now, pull the OS from client via network boot.... Make sure the 1st boot device is set to network (Lan)
Boot option > network boot > as usual tried to pull the OS from nim server.. after tftpbooot file came.... select 1 to continue installtion, then slected the hdisk0, came back to previous menu.. and selected
Start Maintanence and system recover, then Access root volume Group

Performance - lparstat

server1:/root # lparstat 2 20

System configuration: type=Shared mode=Uncapped smt=On lcpu=28 mem=118272MB psize=34 ent=1.50

%user %sys %wait %idle physc %entc lbusy   app vcsw phint
----- ----- ------ ------ ----- ----- ------   --- ----- -----
79.0 19.5    1.1    0.3 6.43 428.5   27.9 16.28 12391 1061
79.1 19.3    1.2    0.4 6.62 441.6   27.7 16.16 13412 1196
78.1 20.7    0.9    0.3 6.45 430.2   26.0 15.32 10154 1256
77.4 21.1    1.1    0.3 6.20 413.0   25.1 15.63 11476 1189
76.0 22.5    1.2    0.3 5.98 398.5   26.4 15.78 11381 1150
78.3 20.1    1.2    0.3 6.04 402.6   26.2 14.99 12337 1317
77.4 21.1    1.2    0.3 5.88 392.1   25.3 15.84 11531   920
77.6 20.8    1.4    0.3 5.83 388.5   25.3 16.15 12247 1056
76.8 21.9    1.0    0.3 6.42 427.9   28.0 14.00 10781 1171
77.1 21.3    1.3    0.3 6.10 406.8   25.7 14.59 12815   988
76.7 21.6    1.3    0.3 6.00 399.8   26.3 14.70 12917 1101
77.3 21.1    1.3    0.3 5.95 396.9   25.2 15.55 12763 1098
75.8 22.4    1.4    0.3 5.99 399.5   25.6 14.50 13635 1287
75.9 22.6    1.2    0.3 6.32 421.5   27.6 10.26 12287 1288
76.7 21.7    1.3    0.3 6.34 422.6   27.1 11.76 13282 1037
78.1 20.4    1.2    0.3 6.50 433.2   29.2 12.67 12997 1063
77.7 20.8    1.2    0.3 6.47 431.6   28.1 13.13 12783   850
76.5 22.2    1.0    0.3 7.64 509.2   30.6 13.18 13225 1112
72.5 26.0    1.2    0.3 7.37 491.2   29.5 10.35 13876 1210
75.7 22.6    1.3    0.4 6.98 465.4   27.0 9.90 14753   985

The physc value shouldnt cross more than the entitled capacity of server... Here E.capacity is 1.5... But, physc is crossing more than 6.... So, server is taking CPU from shared pool.
More over, we need to take care of "app" value, (available physical processor), There should be enough physical processor in the shared pool.

Alias for Commands

$ alias zz='/usr/local/bin/pbrun su -'
$
$
$ zz
[YOU HAVE NEW MAIL]
server1:/root #

How to remove the Failed paths in MPIO, lspath

===============================================

# lspath | grep -v Ena

Take one failed paths disk and run this command

# lspath -l hdisk23 -H -F"name parent path_id connection status"
name    parent path_id connection                       status

hdisk23 fscsi0 0       50060482d53192c8,d89000000000000 Failed
hdisk23 fscsi1 1       5006048ad53192c8,17000000000000 Failed
hdisk23 fscsi2 2       50060482d53192c7,d89000000000000 Failed
hdisk23 fscsi3 3       5006048ad53192c7,17000000000000 Failed
hdisk23 fscsi0 4       50060482d53192cc,270000000000000 Enabled
hdisk23 fscsi0 5       5006048ad53192cc,270000000000000 Enabled
hdisk23 fscsi1 6       50060482d53192dc,270000000000000 Enabled
hdisk23 fscsi1 7       5006048ad53192dc,270000000000000 Enabled
hdisk23 fscsi2 8       50060482d53192c3,270000000000000 Enabled
hdisk23 fscsi2 9       5006048ad53192c3,270000000000000 Enabled
hdisk23 fscsi3 10      50060482d53192d3,270000000000000 Enabled
hdisk23 fscsi3 11      5006048ad53192d3,270000000000000 Enabled

4 paths are failed.... and we are removing the 4 paths

for disk in `lspv |awk '{ print $1 }'`
do
for path in `lspath -l $disk -F "status connection" |grep Failed |awk '{ print $2 }'`
do
echo $disk
rmpath -l $disk -w $path -d
done
done

How to idendify the CD , DVD-ROM is assiged to which LPAR ?

How to idendify the CD , DVD-ROM is assiged to which LPAR
========================================

lshwres -r io --rsubtype slot -F unit_phys_loc:phys_loc:lpar_name:description --header -m P595PRD1_SN021CBB4 |grep RAID

U5791.001.9920V21-P1:C1:server1:RAID Controller

DVD drive is assigned to edevdb5 LPAR

sever1:/root # mkdev -l cd0
cd0 Available

checksum

How to do checksum

Actual checksum value will be in the fixcentral, Before downloading 01EM350_108_038.rpm, you have to click the description to find the checksum value.

Filename :    01EM350_108_038.rpm
Size     :    24018865
Checksum:    29340



Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).

ie: sum 01EM350_108_038.rpm

eg

# sum 01EM350_108_038.rpm
29340 23456 01EM350_108_038.rpm

How to limit the speed of scp transfer

Before:
# scp HMC_Update_V7R350_SP3.iso hscroot@drhmc2:/home/hscroot/HMC_Update_V7R350_SP3.iso
Password:
HMC_Update_V7R350_SP3.iso 28% 434MB 1.3MB/s 13:42 ETAReceived disconnect from 148.5.1.181: 2: Corrupted MAC on input.
lost connection

After:
# scp -l 1024 HMC_Update_V7R350_SP3.iso hscroot@drhmc2:/home/hscroot/HMC_Update_V7R350_SP3.iso
Password:
HMC_Update_V7R350_SP3.iso 0% 1040KB 135.7KB/s 3:11:58 ETAK

PowerHA - HMC shutdown (GUI) by slecting "Immediate" option

HMC shutdown (GUI) by slecting "Immediate" option is equvalent to "halt" command.... So, even if cl_stop has "Bring Resource Group Offline" , that will not be taken in count..... RG will move to the secondary node.

server1:/usr/es/sbin/cluster/utilities # ./clRGinfo
-----------------------------------------------------------------------------
Group Name     State                        Node
-----------------------------------------------------------------------------
rg1            ONLINE                       cbitdb501
               OFFLINE                      server1

server1:/usr/es/sbin/cluster/utilities # ./clRGinfo
-----------------------------------------------------------------------------
Group Name     State                        Node
-----------------------------------------------------------------------------
rg1            OFFLINE                      cbitdb501
               ONLINE                       server1

How to check the FC cable connectivity to server

server1:/ # fcstat -D fcs1 | grep -i port
FIBRE CHANNEL STATISTICS REPORT: fcs1
World Wide Port Name: 0x10000000C9978385
Supported: 0x0000012000000000000000000000000000000000000000000000000000000000
Supported ULPs:
Port Speed (supported): 8 GBIT
Port Speed (running):   4 GBIT
Port FC ID: 0x680064
Port Type: Fabric
LIP Type:   L_Port Initializing
Link Down N_Port State: Offline OL2: OLS receive state
Link Down N_Port Transmitter State: Working
Link Down N_Port Receiver State: Reset
Current N_Port State: Active AC
Current N_Port Transmitter State: Working
Current N_Port Receiver State: Synchronization Acquired

server1:/ # fcstat -D fcs1 | grep -i link
Attention Type:   Link Up

server1:/ # lsattr -El fscsi1
attach      switch           How this adapter is CONNECTED         False
dyntrk       yes           Dynamic Tracking of FC Devices        True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id      0x1          Adapter SCSI ID                       False
sw_fc_class 3            FC Class for Fabric                   True

Sunday, 23 February 2014

PowerHA - Export / Importvg in Secondary node ( Offline Node)

server1:/dev # lsvg -p oraclevg
oraclevg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk12           active            269         0           00..00..00..00..00
hdisk13           active            269         0           00..00..00..00..00
hdisk14           active            269         0           00..00..00..00..00
hdisk15           active            269         0           00..00..00..00..00
hdisk16           active            269         0           00..00..00..00..00
hdisk17           active            269         0           00..00..00..00..00
hdisk18           active            269         0           00..00..00..00..00
hdisk19           active            269         0           00..00..00..00..00
hdisk20           active            269         0           00..00..00..00..00
hdisk37           active            269         0           00..00..00..00..00
hdisk1            active            269         0           00..00..00..00..00
hdisk3            active            269         0           00..00..00..00..00

server1:/dev # varyoffvg oraclevg
server1:/dev # exportvg oraclevg
server1:/dev # importvg -V 38 -n -y oraclevg hdisk12
oraclevg
server1:/dev #
server1:/dev #
server1:/dev # varyonvg -n -c -P oraclevg
server1:/dev # lsvg -l oraclevg
oraclevg:
LV NAME             TYPE       LPs     PPs     PVs LV STATE      MOUNT POINT
oraclevg1lv1      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS01
oraclevg1lv2      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS02
oraclevg1lv3      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS03
oraclevg1lv4      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS04
oraclevg1lv5      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS05
oraclevg1lv6      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS06
oraclevg1lv7      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS07
oraclevg1lv8      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS08
oraclevg1lv9      jfs2       269     269     1    closed/syncd /var/opt/oracle/FS09
oraclevg1lv10     jfs2       269     269     1    closed/syncd /var/opt/oracle/FS10
oraclevg1lv11     jfs2       269     269     1    closed/syncd /var/opt/oracle/FS11
oraclevg1lv12     jfs2       269     269     1    closed/syncd /var/opt/oracle/FS12

Expansion of existing LUN - chvg

Expansion of existing LUN

# sanlun lun show | grep /vol/server1_lun09.lun
   csnafa07: /vol/server1_lun09.lun        hdisk210         fcs0     FCP        840.1g (902031212544)   GOOD

server1:/root # chdev -l hdisk210 -a algorithm=round_robin -a reserve_policy=no_reserve -a queue_depth=32 -a rw_timeout=180
hdisk210 changed

server1:/root # extendvg oraclevg hdisk210
0516-1254 extendvg: Changing the PVID in the ODM.

server1:/root # lsvg -p oraclevg
oraclevg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk189          active            18479       0           00..00..00..00..00
hdisk190          active            18479       64          00..00..00..00..64
hdisk210          active            6720        6720        1344..1344..1344..1344..1344

expansion of an existing lun.... Increase the same lun with 10GB more (total 850GB)
change needs to be done on the filer end first, then rescan on the host

csnafa07> lun resize /vol/server1_lun09.lun +10g
lun resize: resized to: 850.1g (912827351040)

server1:/root # cfgmgr

server1:/root # lsvg -p oraclevg
oraclevg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk189          active            18479       0           00..00..00..00..00
hdisk190          active            18479       64          00..00..00..00..64
hdisk210          active            6720        6720        1344..1344..1344..1344..1344

server1:/root # chvg -g oraclevg

server1:/root # lsvg -p oraclevg
oraclevg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk189          active            18479       0           00..00..00..00..00
hdisk190          active            18479       64          00..00..00..00..64
hdisk210          active            6800        6800        1360..1360..1360..1360..1360

Note:
Never do reduction in lun should not be done online when there is data in it, which would cause data corruption
Expansion is allowed... reduction is not allowed

Error - while changing Ethernet Heartbeat detection ratio

Error while changing Ethernet Heartbeat detection ratio
========================================================

IBM recommendation...they asked us to set values as below :-

ethernet network, change FDR to:
Failure cycle = 4
Heartbeat rate = 15
Formula calculation
4*15*2 = 120 seconds

diskHeartbeat network, change FDR to:
Failure cycle = 6
Heartbeat rate = 10
Formula calculation
6*10*2 = 120 seconds

We were able to change to the said Disk HB settings....
but for Ethernet HB...smitty didn't allow us to go beyond a value of 10 for HB Rate.
We got the below error....so we set ethernet HB to same values as Disk HB (ie) 6*10

Failure Cycle                                      [4]                                                                                             #
Interval between Heartbeats (seconds)              [15]
Heartbeat rate is the rate at which cluster service

ERROR

Before command completion, additional instructions may appear below.

topsvcs: 2523-890 ERROR: Network type: ether
The value 15000000 for the heartbeat rate is out of range.
The value should be at least 200000 and no more than 10000000.

In the IBM documents its shown that Max value for HB Rate is 5.

Failure Cycle
The current setting is the default for the network module selected. (Default for Ethernet is 10). This is the number of successive heartbeats that can be missed before the interface is considered to have failed. You can enter a number from 1 to 75.

Interval between Heartbeats (seconds)
The current setting is the default for the network module selected and is a heartbeat rate. This parameter tunes the interval (in seconds) between heartbeats for the selected network module. You can enter a number from less than 1 to 5.

Dump device is too small

==================

E87EF1BE   0905150011 P O dumpcheck      The largest dump device is too small

[server2@AIX]>/usr/lib/ras/dumpcheck -p
The largest dump device is too small.

Largest dump device
         aixdump
Largest dump device size in kb
         2818048
Current estimated dump size in kb
         3816857

======

solution /:

Increase the lv "aixdump"

# extendlv aixdump 10

DLPAR REMOVE I/O resources failed

DLPAR errors... How to remove the child pci

Two FC cards newly added in Frame.... Through HMC

U5791.001.9920WPN-P1-C10
U5791.001.9920WWC-P1-C10

Test the FC card by doing DLPAR to any LPAR in that frame... and do cfgmgr to list the new fc card.

If you do, rmdev -Rdl on the new fcs and then if you do DLPAR to remove the cards from server, then you may get the below error...

ERROR:
======
DLPAR REMOVE I/O resources failed: The I/O slot dynamic logical partitioning operation failed. Here are the I/O Slot IDs that failed and the reasons for failure:
0931-029 The specified slot contains a device or devices that are currently configured. Unconfigure the following device or devices with the rmdev command and try again. pci4 U5791.001.9920WPN-P1-C10 (21050237)

Solution:
======

After rmdev -Rdl on the fcs, you need to remove the child pci devices (NOT THE PARENT pci)

To identify the child pci:
================

bidbs1:/root # prtconf | grep -i pci
0516-010 : Volume group must be varied on; use varyonvg command.
Model Implementation: Multiple Processor, PCI bus
* pci3             U5791.001.9920WWC-P1                                           PCI Bus
* pci5             U5791.001.9920WWC-P1                                           PCI Bus
* pci2             U5791.001.9920WPN-P1                                           PCI Bus
* pci4             U5791.001.9920WPN-P1                                           PCI Bus
* pci1             U5791.001.9920WWC-P2                                           PCI Bus
* pci0             U5791.001.9920WPN-P2                                           PCI Bus

server1:/root # lsdev -p pci3
pci5 Available 03-0e PCI Bus
bidbs1:/root # lsdev -p pci5
fcs6 Available 05-08 FC Adapter
fcs7 Available 05-09 FC Adapter
bidbs1:/root # lsdev -p pci2
pci4 Available 02-0e PCI Bus
bidbs1:/root # lsdev -p pci4
fcs4 Available 04-08 FC Adapter
fcs5 Available 04-09 FC Adapter

So, we need to remove pci4 and 5....

server1:/root # rmdev -l pci4
pci4 Defined
server1:/root # rmdev -dl pci4
pci4 deleted
server1:/root # rmdev -l pci5
pci5 Defined
server1:/root # rmdev -dl pci5

Now, do dlpar to remove the fc card from server and let it be free in frame....

Disk utilization 100 %

Disk utilization 100

http://www.tek-tips.com/viewthread.cfm?qid=1492650
http://www.ibm.com/developerworks/aix/library/au-aixoptimization-disktun1/index.html

# filemon -uo filemon.out -O all ; sleep 60; trcstop
# awk '/Most Active Logical Volumes/,/^$/' filemon.out
# awk '/Most Active Files/,/^$/' filemon.out
# awk '/Most Active Physical Volumes/,/^$/' filemon.out

Do lspv -l <pvname on the high utilization disk>
and dofuser -cux on the FS

Disk in defined state

Disk in defined state
=============

hdisk277 Defined   01-08-02 EMC Symmetrix FCP MPIO Raid5 TimeFinder
hdisk278 Defined   01-08-02 EMC Symmetrix FCP MPIO Raid5 TimeFinder
hdisk279 Defined   01-08-02 EMC Symmetrix FCP MPIO Raid5 TimeFinder
hdisk280 Defined   01-08-02 EMC Symmetrix FCP MPIO Raid5 TimeFinder
hdisk281 Defined   01-08-02 EMC Symmetrix FCP MPIO Raid5 TimeFinder
hdisk282 Defined   01-08-02 EMC Symmetrix FCP MPIO Raid5 TimeFinder

If cfgmgr doesnt brings the device to available state.... then run the below command

# /etc/methods/cfgscsidisk -l hdisk272
# lsdev -Cc disk | grep hdisk272
hdisk272 Available 01-08-02 EMC Symmetrix FCP MPIO Raid5 TimeFinder

COD script

How to use this script?

server1:/tmp # ./codnew.pl
usage: ./codnew.pl MODEL XX XXXXX e.g ./codnew.pl 9119 83 9f6bf

(Eg)
server1:/tmp # ./codnew.pl 9119 02 1CBB4
04 processors activated on 12/06/2011
064 GB memory activated on 12/06/2011
080 GB memory activated on 10/24/2011
020 GB memory activated on 09/09/2011
999 GB memory activated on 08/16/2011
06 processors activated on 03/07/2011
128 GB memory activated on 11/30/2010
02 processors activated on 01/05/2010
016 GB memory activated on 11/18/2009
28 processors activated on 02/05/2009
400 GB memory activated on 02/05/2009

=============================================

#!/usr/bin/perl
#
# PoC Script to go out on the net to the IBM POD site and tally up activations for a given i or p Series machine

#
# todo - add logic for processor deactivation - who uses that? IBM dont even publish the code on the pod site ;)
#
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$ua->agent("mozilla 8.0");
# $ua->proxy(['http'], 'http://proxy:8080/');
$ua->timeout(10);
use HTTP::Request::Common qw(POST);
if ($#ARGV != 2) {
print "usage: $0 MODEL XX XXXXX e.g $0 9119 83 9f6bf\n";
exit;
}
($model, $serial1, $serial2) = @ARGV;

##### main #####
get('http://www-912.ibm.com/pod/pod',"$serial2.htm");
html2txt("$serial2.htm","$serial2.txt");
total("$serial2.txt");
exit;

# fakes a mozilla browser, fills in the CGI form and snags the returned page to a local html file
sub get {
   my $req = (POST $_[0],
      ["system_type" => $model,
      "system_serial_number1" => $serial1,
      "system_serial_number2" => $serial2 ]);

   $request = $ua->request($req);
   $activations = $request->content;
   open(POD,">$_[1]");
   print POD $activations;
   close(POD);
}

# strips out the crap and converts the table to a local txt file to parse
sub html2txt {
   open(HTML,"<$_[0]");
   open(TXT,">$_[1]");
   while (<HTML>) {
      if (/<\/table>/) {$f = 0;};
      if (/<th>Posted Date \(MM/) {$f = 1;};
      if ($f == 1) {
      # poor mans HTML::TableExtract - excuse my sed like perl....
         s/<tr align="center">/:/g;
         s/<[^>][^>]*>//g;
         s/ //g;
         s/\n//g;
         s/:/\n/g;
         s/\&nbsp\;/ /g; #Added DW
         print TXT $_;
      };
   };
   close(TXT);
   close(HTML);
}
# totals up the de/activations to get totals
sub total {
   open(TXT,"<$_[0]");
   $[ = 1; $\ = "\n";# set array base & output record separator
   while (<TXT>) {
      ($code,$hex,$date) = split(' ', $_, -1);
      if (/POD/) {
         $p = substr($hex, 27, 2);
         print $p . ' processors activated on ' . $date;
         $pt = $pt + $p;
      };

      if (/MOD/) {
         $r = substr($hex, 26, 3);
         print $r . ' GB memory activated on ' . $date;
         $rt = $rt + $r;
      };

      if (/RMEM/) {
         $r = substr($hex, 27, 2);
         print $r . ' GB memory activated on ' . $date;
         $rt = $rt - $r;
      };
   };

   print '================';
   print 'TOTAL CPU=' . $pt . ' RAM=' . $rt*1024 . 'MB (' . $rt . 'GB)';
   close(TXT);
}

======================================================

PowerHA - VG,LV,FS in cluster command line - WITHOUT STRIPE

To create VG in cluster
=======================

/usr/es/sbin/cluster/sbin/cl_mkvg -f -n -S -cspoc -n'server1,server2' -r'oraclerg' -y 'oraclevg' -s'128' -V'78' -l'false' '-E' 00c1cbb4343f06a5 00c1cbb4343f0805 00c1cbb4343f095a 00c1cbb4343f0a87

LV creation in Cluster , without stripe
========================================
/usr/es/sbin/cluster/sbin/cl_mklv -cspoc -n'server1,server2' -R'server1' -y'data1lv01' -t'jfs2' -L'/var/opt/oracle/dataFS1' oraclevg 269 hdisk14

269 -> Total PP's in hdisk14

FS creation in Cluster, without stripe (option1 : by giving LV name)
=====================================================================

/usr/es/sbin/cluster/sbin/cl_crfs -cspoc -n'server1,server2' -v jfs2 -d'data1lv01' -m'/var/opt/oracle/data001/FS2' -p'rw' -a agblksize='4096' '-a logname=INLINE'

PowerHA - creating scalable VG via CSPOC

smitty hacmp > System Management (C-SPOC) > Storage > Volume Groups > Create a Volume Group > select both nodes > slect PV > Scalable >

Node Names                                          server1,server2
Resource Group Name                                [oraclerg]                                                            +
PVID                                                00c1cbb4343f0cfd
VOLUME GROUP name                                  [oraclevg]
Physical partition SIZE in megabytes                128                                                                   +
Volume group MAJOR NUMBER                          [78]                                                                    #
Enable Cross-Site LVM Mirroring Verification        false                                                                 +
Enable Fast Disk Takeover or Concurrent Access      Fast Disk Takeover or Disk Heart Beat                                 +
Volume Group Type                                   Scalable

Maximum Physical Partitions in units of 1024        32                                                                    +
Maximum Number of Logical Volumes                   256                                                                   +

Enable Strict Mirror Pools                          no                                                                    +
Mirror Pool name                                   []