Resizing a multipathed SAN LUN under RHEL5

12 11 2009

So today I found myself staring at the Red Hat LVM documentation wherein I began to silently cuss.  You see, I needed to double the LUN size for MySQL’s usage on one of my servers.  These servers are our first that are directly attached into the SAN and running Linux, so we’re working a bit off the map here.  To make matters more difficult, we’re using multipathd to manage the SAN connections between the system and SAN, so it wasn’t very clear on what exactly I should do in this case.  The LVM docs are somewhat … lacking.

Thanks to the wonder of Google and sheer dumb luck, I ran across this post on one of the RHEL5 mailing lists.  In that case, the author wasn’t sure if it was the right method.  But, I had a secondary system and was willing to run with scissors for a moment since we’re not fully in production yet.

In short, the basic flow of operations is:

  1. Figure out the multipath I/O device name.
  2. Figure out the underlying device IDs (or device names)
  3. Issue the resize of the LUN in your SAN.
  4. Tell the kernel to rescan the underlying device IDs so it sees the new LUN size.
  5. Tell multipathd that a resize has occurred.
  6. Issue a pvresize so LVM knows it has more extents to work with now.
  7. Issue an lvresize to increase the logical volume size.
  8. Run resize2fs and do an online resize of the filesystem.
  9. Make popcorn.
  10. Watch a movie.

The first time I attempted the process (in a similar, but not quite fashion), I caused the system hang all LVM commands.  I turned off multipathd thinking that it would need to be off while I did the resize.  This appears to not be a healthy way to do it because I ended up having to warm cycle the system.  This is the point where I stopped reading the LVM documentation and found the mailing list post.    Tried it out and it worked.

So, without further ado …

#  multipath -ll mpath0
mpath0 (360060160dac711004c6fa9d07c7cde11) dm-2 DGC,RAID 10
[size=50G][features=1 queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=2][active]
 \_ 1:0:1:0 sdc 8:32  [active][ready]
 \_ 2:0:1:0 sde 8:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:0 sdb 8:16  [active][ready]
 \_ 2:0:0:0 sdd 8:48  [active][ready]

#  for i in `multipath -ll mpath0 | grep sd | awk '{print $2}'`; do
> echo $i ; done
1:0:1:0
2:0:1:0
1:0:0:0
2:0:0:0

#  for i in `multipath -ll mpath0 | grep sd | awk '{print $3}'`; do
> blockdev --rereadpt /dev/$i ; done
BLKRRPART: Input/output error
BLKRRPART: Input/output error

#  multipathd -k"resize multipath mpath0"
ok

#  multipath -ll mpath0
mpath0 (360060160dac711004c6fa9d07c7cde11) dm-2 DGC,RAID 10
[size=100G][features=1 queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=2][enabled]
 \_ 1:0:1:0 sdc 8:32  [active][ready]
 \_ 2:0:1:0 sde 8:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:0 sdb 8:16  [active][ready]
 \_ 2:0:0:0 sdd 8:48  [active][ready]

#  pvresize /dev/mapper/mpath0
  Physical volume "/dev/mpath/mpath0" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

#  lvresize -L 50G /dev/VolGroupMySQL/mysql-san 
  Extending logical volume mysql-san to 50.00 GB
  Logical volume mysql-san successfully resized

#  resize2fs /dev/VolGroupMySQL/mysql-san 
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/VolGroupMySQL/mysql-san is mounted on /var/lib/mysql; on-line resizing required
Performing an on-line resize of /dev/VolGroupMySQL/mysql-san to 13107200 (4k) blocks.
The filesystem on /dev/VolGroupMySQL/mysql-san is now 13107200 blocks long.

#  df
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      9.7G  3.3G  6.0G  36% /
/dev/sda1             122M   13M  103M  11% /boot
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/mapper/VolGroupMySQL-mysql--san
                       50G  795M   46G   2% /var/lib/mysql

One thing to note is the BLKRRPART errors from blockdev.  This appears to be “normal” as far as I can tell.  The kernel through some log messages (included below), but they appear harmless as far as I can discern.  The SCSI notices occurred when I issued the blockdev command.  The device-mapper multipath warning came from the multipathd resize command.

SCSI device sdc: 209715200 512-byte hdwr sectors (107374 MB)
sdc: Write Protect is off
sdc: Mode Sense: 87 00 00 08
SCSI device sdc: drive cache: write through
sdc: detected capacity change from 53687091200 to 107374182400
 sdc: unknown partition table
SCSI device sde: 209715200 512-byte hdwr sectors (107374 MB)
sde: Write Protect is off
sde: Mode Sense: 87 00 00 08
SCSI device sde: drive cache: write through
sde: detected capacity change from 53687091200 to 107374182400
 sde: unknown partition table
SCSI device sdb: 209715200 512-byte hdwr sectors (107374 MB)
sdb: test WP failed, assume Write Enabled
sdb: asking for cache data failed
sdb: assuming drive cache: write through
sdb: detected capacity change from 53687091200 to 107374182400
 sdb:<6>sd 1:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
    Add. Sense: Logical unit not ready, manual intervention required

end_request: I/O error, dev sdb, sector 0
printk: 68 messages suppressed.

Buffer I/O error on device sdb, logical block 0
 unable to read partition table
SCSI device sdd: 209715200 512-byte hdwr sectors (107374 MB)
sdd: test WP failed, assume Write Enabled
sdd: asking for cache data failed
sdd: assuming drive cache: write through
sdd: detected capacity change from 53687091200 to 107374182400
 sdd:<6>sd 2:0:0:0: Device not ready: <6>: Current: sense key: Not Ready
    Add. Sense: Logical unit not ready, manual intervention required

end_request: I/O error, dev sdd, sector 0
 unable to read partition table
device-mapper: multipath emc: long trespass command will be send
device-mapper: multipath emc: honor reservation bit will not be set (default)
device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
device-mapper: multipath emc: emc_pg_init: sending switch-over command



Ubuntu Linux adds private cloud backing | Open Source – InfoWorld

14 10 2009

Ubuntu Linux adds private cloud backing

Canonical’s upcoming server upgrade supports the Eucalyptus project’s open source system for cloud implementation using hardware and software already in place

Canonical is touting private cloud capabilities in an upgrade to its Ubuntu Linux OS being announced on Tuesday.

Available for free download on October 29, Ubuntu 9.10 Server Edition introduces UEC (Ubuntu Enterprise Cloud), an open source cloud computing environment based on the same APIs as Amazon EC2 (Elastic Compute Cloud). Businesses can take advantage of private clouds, Canonical said.

Ubuntu Linux adds private cloud backing | Open Source – InfoWorld.

This should prove interesting.  If we were able to leverage something like this, we could build out a private cloud for researchers.  The Eucalyptus system certainly looks useful.  Especially if they’re touting it as API compatible with other external cloud vendors.  We’d certainly need to do some heavy investigation to figure out what running our own cloud would actually mean.  I can certainly see it as being completely different than running a classic high performance computing grid.

You, too, can have a cloud in the privacy of your own home!  Time to keep up with the Jones’s again!




Tips: See the contents of an RPM post-install script

14 09 2009

Today I had the need to look at MySQL.com’s post-install scripts for MySQL Advanced. Unfortunately, I was lazy not having quick and easy access to the specfile, I used my blackbelt in Google-fu to figure out that it was simpler than I thought. In my case, I had already installed the RPM. All I needed to do was:

rpm -q --scripts MySQL-server-advanced-gpl | less

and out popped the post-install script. Why did I need to do this? Part of the work I’m doing with bcfg2 is to automate the installation and configuration of my MySQL servers. My requirements state that I need to be able to run more than one instance on the same server. Unfortunately, the default installation of MySQL-server-advanced-gpl sets up a basic db instance for you, right smack in the directory I’m working in. The problem is, this walks all over the directory layout I need for my servers.

At least I know what’s causing the default install to occur. Now, to go make bcfg2 do something about it.




Woot! Unix group enumeration from AD groups.

3 06 2009

Well, that was easy enough. Just needed to understand a bit more of the AD OU structure here. (Sanitized a bit for now).

-bash-3.2$ touch foo bar baz quux
-bash-3.2$ ls -l
total 0
-rw-r--r-- 1 hcoyote UNIXTEST-test 0 Jun  3 16:59 bar
-rw-r--r-- 1 hcoyote UNIXTEST-test 0 Jun  3 16:59 baz
-rw-r--r-- 1 hcoyote UNIXTEST-test 0 Jun  3 16:59 foo
-rw-r--r-- 1 hcoyote UNIXTEST-test 0 Jun  3 16:59 quux
-bash-3.2$ id
uid=66000(hcoyote) gid=66000(UNIXTEST-test) groups=66000(UNIXTEST-test)
-bash-3.2$ getent group UNIXTEST-test
UNIXTEST-test:*:66000:hcoyote,member2,member3
-bash-3.2$ getent group
root:x:0:root
bin:x:1:root,bin,daemon
daemon:x:2:root,bin,daemon
sys:x:3:root,bin,adm
adm:x:4:root,adm,daemon
tty:x:5:
disk:x:6:root
lp:x:7:daemon,lp
.
.
.
stapdev:x:101:
stapusr:x:102:
avahi-autoipd:x:103:
UNIXTEST-test:*:66000:hcoyote,effie,csoto

UNIXTEST-test is the group name for gid 66000 in Active Directory. Everything listed before this group comes straight from the local group file because we’re using the appropriate configuration in nsswitch.conf.

This was solved by adding the following to the ldap.conf:

nss_base_group		ou=Departments,?sub?&(objectCategory=group)(gidNumber=*)

Also, you need to modify nsswitch.conf to be:

group: files ldap

One step closer. Next: account authorization via group membership. In other words, only let someone use a resource if they exist in a specific group. Need to figure out if this should be done via netgroup or unix group membership. Off to research!




Ha ha! SSL success for AD/LDAP.

3 06 2009

Ha ha! Further success on the Linux -> Active Directory integration front. I got SSL working for the underlying ldap bind user. What’s this mean? Protection of the directory information over the wire as it travels from the domain controller to the client host where it will be used.

So what’s the necessary setup bits?  There are three options that need to be added to the ldap.conf that I originally came up with.  They are:

ssl yes
tls_cacertfile /etc/ssl/certs/ca_bundle.crt
tls_checkpeer no
uri ldaps://austin.utexas.edu/ (this is a modification from the previous config)

The tls_cacertfile defines the location of the file that contains the certificate authority information used to create the SSL certificate on the Active Directory domain controllers. You need this to verify the authenticity of the dc’s. The file should be .pem formatted and must be converted from the file you retrieve from the internal certificate authority at UT.

Once you’ve downloaded the file, you get it in the DER format which needs to be converted using something like the following.

openssl x509 -in downloadedcert.cer -inform DER -out rootca.pem -outform PEM

Next, you copy the contents of the rootca.pem to the tls_cacertfile file.

Once you’ve configured the ldap.conf with the updated options, you should now be accessing LDAP over SSL. You’ll have to verify this by running something like wireshark and watching the tcp traffic going across the wire. It’ll look something like the stream in the image above.

If you see errors in /var/log/messages that look like the following, then you’ve got something wrong in your configuration still.

Jun  3 14:09:49 fedex getent: nss_ldap: failed to bind to LDAP server ldaps://austin.utexas.edu/: Can't contact LDAP server
Jun  3 14:09:49 fedex getent: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Jun  3 14:09:53 fedex getent: nss_ldap: failed to bind to LDAP server ldaps://austin.utexas.edu/: Can't contact LDAP server
Jun  3 14:09:53 fedex getent: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Jun  3 14:10:01 fedex getent: nss_ldap: failed to bind to LDAP server ldaps://austin.utexas.edu/: Can't contact LDAP server
Jun  3 14:10:01 fedex getent: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)...

Now that we have SSL tackled, time to get group lookups working. I’ll leave that for another posting.




Authenticating to Austin AD from Linux

2 06 2009

Woot!  With the help of barthag, I got one of our linux boxes configured to provide passwd file map backend via AD/LDAP and authentication via AD/Kerberos.  Most of the problems stem from permissions issues on the AD side and making sure things are open “enough” to let us through to query for information.

On the Linux side (specifically, Red Hat Enterprise Linux 5 Update 2), there are four files you have to touch to make this work: /etc/krb5.conf, /etc/ldap.conf, /etc/nsswitch.conf, and /etc/pam.d/system-auth.

In our configuration, I’m using LDAP to provide the transport for the actual directory information lookups and Kerberos to manage the authenticating of users to the system.  It’s an easy enough configuration for this.  For testing, we’re not encrypting the LDAP connection because we haven’t yet figured out the correcting pathing and formats of the various certificate files necessary for nss_ldap to work correctly (yay underdocumented features!).  Plus, since my test box is close to the AD servers network-wise, I know the connection is secure enough.  Production usage of this config will very likely enforce the use of SSL since we’d be providing these configs for people outside of our local ITS network (but still on-campus).

Also, for this initial test, we’re only working with the passwd map.

So, without further adieu.

/etc/nsswitch.conf

passwd:  files ldap
shadow:  files ldap

/etc/krb5.conf

[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = AUSTIN.UTEXAS.EDU
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 forwardable = yes

[realms]
 AUSTIN.UTEXAS.EDU = {
  kdc = austin.utexas.edu:88
  admin_server = austin.utexas.edu:749
  default_domain = austin.utexas.edu
 }

[domain_realm]
 .austin.utexas.edu = AUSTIN.UTEXAS.EDU
 austin.utexas.edu = AUSTIN.UTEXAS.EDU

[appdefaults]
 pam = {
   debug = false
   ticket_lifetime = 36000
   renew_lifetime = 36000
   forwardable = true
   krb4_convert = false
 }

/etc/ldap.conf

###
### ldap config for binding to AD with a read-only bind account over
### cleartext.
###
base dc=austin,dc=utexas,dc=edu
uri ldap://austin.utexas.edu/

binddn thebinduser
bindpw thebindpasswd
scope sub
timelimit 120
bind_timelimit 120
idle_timelimit 3600

nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman,nscd,gdm
# limit to the Austinites->People OU that have a uid set.
nss_base_passwd		ou=Austinites,ou=People,?sub?&(objectCategory=user)(uid=*)
#nss_base_group		ou=Group,dc=example,dc=com?one
#nss_base_hosts		ou=Hosts,dc=example,dc=com?one
#nss_base_services	ou=Services,dc=example,dc=com?one
#nss_base_networks	ou=Networks,dc=example,dc=com?one
#nss_base_protocols	ou=Protocols,dc=example,dc=com?one
#nss_base_rpc		ou=Rpc,dc=example,dc=com?one
#nss_base_ethers	ou=Ethers,dc=example,dc=com?one
#nss_base_netmasks	ou=Networks,dc=example,dc=com?ne
#nss_base_bootparams	ou=Ethers,dc=example,dc=com?one
#nss_base_aliases	ou=Aliases,dc=example,dc=com?one
#nss_base_netgroup	ou=Netgroup,dc=example,dc=com?one

nss_map_objectclass posixAccount user
nss_map_objectclass shadowAccount user
nss_map_attribute uid sAMAccountName
nss_map_attribute homeDirectory unixHomeDirectory
nss_map_attribute shadowLastChange pwdLastSet
nss_map_objectclass posixGroup group
nss_map_attribute uniqueMember member
pam_login_attribute sAMAccountName
pam_filter objectclass=User
pam_password ad

pam_password_prohibit_message "Sorry, you must change your password using the UTDirect EID interface."

# we don't use referrals at UT.
referrals no

# return more than 10 thousand entries when iterating over the entire
# map.
nss_paged_results yes
page_size 1000

/etc/pam.d/system-auth

%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        sufficient    pam_krb5.so
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        required      pam_deny.so

account     sufficient    pam_unix.so
account     sufficient    pam_krb5.so
account     sufficient    pam_succeed_if.so uid < 500 quiet
account     required      pam_permit.so

password    requisite     pam_cracklib.so try_first_pass retry=3
password    sufficient    pam_unix.so md5 shadow nullok try_first_pass use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so

Great.  So with this configuration, I can verify that I can login to the Linux box via ssh using my Windows username and do things as myself.  There are some caveats to this:  you must be in the Austinites OU at this point AND you must have the appropriate posixAccount attributes populated.  At a minimum, that’s uidNumber, gidNumber, loginShell, unixHomeDirectory, and uid.  You may have noticed that we’re actually using sAMAccountName for the login name above.  We’re still playing with that.  We might use uid instead because that’s something we can more easily filter on to limit on for different pieces of the account lookups.

So, what’s next?

  • need to beat on the system and make sure no one WITHOUT the attributes set can login.
  • need to work on the group map and get that working since we’re currently only doing passwd file lookups.
    • this includes understanding how to deal with secondary groups.
  • need to work on the netgroup map and come up with a standardized way of handling that so we can (under Linux), configure people in the /etc/security/access.conf for access restrictions.
  • need to figure out if there are other access restriction mechanisms we need to pay attention to.
  • need to look at all the pam configs on a default box and see if there’s something that isn’t covered by the system-auth template.
  • need to figure out just how far into Kerberos we want to go.  What we’re doing now is good for authenticating a single session.  Need to determine if we want to go to single sign-on or not and do all the extra bits associated with using kerberos.
  • need to figure out if we should do authentication using LDAP only (honestly, I’d prefer not to, but that depends on what others in the group need).
  • need to get this setup on a Solaris 10 system for testing.
  • need to address the issue of a person’s loginShell and coming up with some standards for those, in order to deal with departments that have differing shell policies but with overlapping accounts (e.g., researcher that has accounts in two Unix areas but that have differing policies on where or which shells should be used).
  • need to address the issue of a person’s unixHomeDirectory possibly differing between two Unix areas.  Same issue as above, but stickier because it’s generally a lot harder to consolidate this down to a consistent path name than it is to consolidate shells down to a consistent path name.

There are problably others, but these are my known unknowns at this point.

Things that were referenced to make all this work:




Useful LDAP/Kerberos integration resources

21 05 2009

I was recently in a meeting discussing the integration of Unix account management (passwd and group) with Active Directory via LDAP and Kerberos.  Having done some of this at a previous job, I’d already done some research into getting it working.  I found some useful resources back then on getting this all working right.  I figured I’d share here.