Troubleshooting Slow Linux Systems

If you system is running slowly, and this goes for RHEL, Debian and other variants then take a look at this article which is a simple walkthrough of the tools you can use to solve problems.  These specific examples are from a system running Openstack, but that’s not important to most of you:

  • top – The place to start is generally the ‘top’ command which shows a resource summary and task list.
  • iotstat – Shows the reads and writes on your disk
  • iotop – Realtime iostat
  • iozone – Generate some test traffic to see how the system reacts.

Continue reading

Ubuntu boots into initramfs

Don’t panic, your boot sector is still good an readable.  Type ‘exit’ and you’ll complete the boot process.  But why does this happen and how can we fix it?  From my reading this can be caused by faults on the disk or an improperly formatted grub configuration or hardware controllers that are too slow responding with the information needed by the grub to complete a clean boot.

DISK FAULTS
If you have a bad sector or two you can address this with the commands below.  Or, you may need to boot into the live CD and fsck -y

sudo touch /forcefsck
sudo shutdown -r now

SLOW CONTROLLER HARDWARE RESPONSE
If the controller hardware takes too long to respond back with the correct devices the system will advance without properly identifying them.  In Ubuntu you can address this in the /etc/default/grub file by changing the GRUB_CMDLINE_LINUX line as follow:

GRUB_CMDLINE_LINUX=”rootdelay=60″
update-grub
reboot

IMPROPERLY FORMATTED GRUB
Not sure the best way to go about this, but what I read was related to a mismatch in the selected boot partitions in /etc/fstab and those in the grub.  I have definately seen SuperMicro system where the drives flipped about on boot.  This is caused by a RAID configuration that works well with Windows installs but confounds Linux.  It’s controlled at BIOS/CMOS level and can be removed with dmraid.  Yeck!  Drop me a note if you need help with this.  I likely have the recipe for fixing it.

KERNEL PANIC – Missing Init

Init is process id #1, the #1 that all other processes are started from.  The first one run after boot and the one that cleans up when you are shutting down.  This isn’t a kernel panic at all, but a failure on the init process.  FSCK it.

Ubuntu Live CD, opened the terminal and typed:

sudo debugfs -w /dev/sda1
debugfs 1.41.11 (14-Mar-2010)
debugfs:

type in: clri <8>
and after hitting enter,
type this: quit
then reboot. It should force an fsck and come up happy.

Installing Oracle with an RPM

Recently I took on the task of finding a way to install oracle 11gR2 on CentOS x64. This process seems to not be talked about on the Internet, or maybe I just don’t know how to search.

So, what makes it so difficult? It’s larger than the maximum allowed RPM size of 2GB, but that shouldn’t stop you from breaking it up into two RPMS where the second one calls the installer in silent mode.

I have it working, but I just wondered why the Internet is so silent about this subject.

ANSWER:

A couple of weeks later and it looks like a matter of too much oracle documentation.  Oracle owns the SEO on these terms and penetrating them isn’t going to a simple matter of setting up an teaser article with a few terms filled out.  I’m going to try a few keywords and see how that goes for me.

keywords: rpmbuild

 

Adding a Host to an HA VMWare Cluster

When adding a new piece of iron to a high-availability (HA) cluster your CPU compatibility matters.  HA means vMotion is involved and vMotion is able to bridge the various pieces of hardware using “Enhanced vMotion Compatibility” or EVC for short.  EVC is a mode you set that allows the servers to function at their highest common denominator, or rather, the greatest level of shared compatibility.  So, if you have a new system with all kinds of additional CPU capabilities but your old system is still L4 “Sandy Bridge” then your system must be setup to use L4.

  • L0 – Intel “Meron” Xeon Core 2
  • L1 – Intel “Penryn” Xeon 45nm Core 2
  • L2 – Intel “Nehalem” Xeon Core i7
  • L3 – Intel “Westmere” Xeon 32 nm Core i7
  • L4 – Intel “Sandy Bridge”
  • L5 – Intel “Ivy Bridge”

So… just use VMWare’s Compatibility Guide? No, you need to apply some common sense as well.  I just went shopping for an r720 to work with our ‘Sandy-Bridge” hexcore r710′s and found that the compatibility guides doesn’t really cover new systems very well.  Take a look at Wikipedia and look up your processor.  In my case, I have a choice of quite a few Xeon E5-26xx processors, but there is a note that the Xeon E5-2603 and Xeon E5-2609 do not have Hyper-threading capabilities.  Well, my other systems certainly do.  Also, these are quadcore rather than hexcore.  So, problem avoided.  I need to spend a couple hundred more on a proper processor so I don’t have to degrade my EVC Mode.

Reference:

About Jay Farschman - Jay currently works as a Senior Systems Administrator for an asset management company in Colorado where he works with companies that produce hardware, telecommunications software and financial services.  Jay previously owned a consulting company and provided training and consulting services for three Fortune 500 companies and numerous small businesses where he leveraged Linux to provided exceptional value.

Hard Drive Related Cheatsheet for Linux

 

TERMINOLOGY:

  • Partition - a portion of physical hard disk space. A hard disk may contain one or more partitions. Partitions are defined by BIOS and described by partition tables stored on a harddrive.
  • Volume - a logical concept which hides the physical organization of storage space. A compatibility volume directly corresponds to a partition while LVM volume may span more than one partition on one or more physical disks. A volume is seen by users as a single drive letter.
  • Physical Volume (PV) Synonym for “hard disk”. A single physical hard drive.
  • Volume Group (VG) A set of one or more PVs which form a single storage pool. You can define multiple VGs on each system.
  • Logical Volume (LV) A usable unit of disk space within VG. LVs are used analogously to partitions on PCs or slices under Solaris: they usually contain filesystems or paging spaces (“swap”)Unlike physical partition can span multiple physical volumes that constitute VG. .
  • Root partition. Physical or logical partition what holds root filesystem and mount points for all other partitions. Can be physical partition or logical volume.

LVM Commands

  • pvcreate /dev/hda3 – creates physical volumes
  • vgcreate vg01 /dev/hda3 – creates a volume group (in this case, vg01) using the physical volume
  • lvcreate -l25000 -nlv01 vg01 – creates a logical volume using the volume group, allocating 25000 blocks
  • lvcreate -L4G -nlvroot vgraid1
  • vgextend – adds a volume to the volume group (if you add a new disk)
  • lvdisplay -v /dev/vg01/lv01
  • vgdisplay -v vg01
  • lvremove
  • vgreduce
  • mkfs -t ext3 /dev/vg01/lv02 makes a file system
  • mount /dev/vg01/lv02 /home/new mount the file system
  • mount -a mounts everything in /etc/fstab
  • vgscan –mknodes
  • vgchange -a y /dev/vgraid1 (bring /dev/vgraid1 online if it didn’t come up automatically)
  • lvscan
  • mkfs -t ext4 -t small /dev/vgraid5extra
  • umount /home/new (unmount a filesystem)
  • pvresize – update size of pv
  • lvextend -L+ /dev/vgraid0/lvsharedfiles0 
  • e2fsck -f /dev/vgraid1/lvsharedfilestemp (check filesystem)
  • resize2fs /dev/vgraid1/lvsharedfilestemp (resize the filesystem) (now works online)

MDADM stuff
RAID
raidtools2
mdadm (probably a better choice)
/etc/raid/raidtab
http://unthought.net/Software-RAID.HOWTO/
http://xtronics.com/reference/SATA-RAID-debian-for-2.6.html
http://juerd.nl/site.plp/debianraid

mdadm -Cv /dev/md0 -l1 -n2 missing /dev/sda1 –auto creates a degraded raid 1 array
mdadm -Cv -c 256 /dev/md20 -l5 -n2 missing /dev/sdf1 array with chunk size

mdadm -A -a /dev/md0 /dev/sda1

mdadm -A -a /dev/md1 /dev/sda2
fixing a degraded array
mdadm –add /dev/md7 /dev/sdd2

mkfs -t ext3 /dev/md20 – make a file system
mkfs -t ext3 -T largefile4 -E stride=16,stripe-width=2 /dev/md20 – reduce inodes, runs faster on big files

largefile4 is defined in /etc/mke2fs.conf, blocksize defaults to 4096, so with chunk size of 256, we need stride of 16

mdadm –detail –scan >> /etc/mdadm/mdadm.conf

Updating your ESX or ESXi Server

Patching a server is important not just for the security but for the features that you will be missing.  this is particulary true of ESX where the VMware folks have to keep updating the supported operating systems for the guest systems.   Updating is pretty easy too.

A COUPLE OF NOTES

  • Patching typically requires maintenance mode and often a reboot.  The bottom line here is that it’s an outage for your systems.
  • Some patches will require you to load a new client for the vSphere before you can get access.

GET READY

  1. Locate the the appropriate patches http://www.vmware.com/patchmgr/download.portal if you don’t know what version your are running then take a look in you vSphere client under “About”
  2. See what’s needed with the CLI command “esxupdate query” This is going to show you what is already installed  For instance it may say “VMware ESXi 4.0 Update 3″  Let’s consider installing update 4.
  3. Place your ESX in Maintenence mode using one of these two commands:ESXi: # vim-cmd hostsvc/maintenance_mode_enter
    ESX: # vimsh -n -e /hostsvc/maintenance_mode_enter
  4. Copy the link for update 4 from step #1 and setup the download process.esxupdate –bundle=https://hostupdate.vmware.com/software/VUM/OFFLINE/release-322-20111116-059770/update-from-esxi4.0-4.0_update04.zip update
  5. Wait for it to complete.  If you get a message about “it is installed or obsoleted” those are two possible problems, but consider that your link from #1 could be for the wrong ESX version as well.
  6. Once installed Get out of maintenance mode:ESXi: # vim-cmd /hostsvc/maintenance_mode_exit
    ESX: # vimsh -n -e /hostsvc/maintenance_mode_exit

    ESXi: # vim-cmd /hostsvc/hostsummary | grep inMaintenanceMode
    ESX: # vimsh -n -e /hostsvc/hostsummary | grep inMaintenanceMode

  7. Reboot as necessary.
  8. Reload your vSphere client.

QED -

About Jay Farschman - Jay currently works as a Senior Systems Administrator for an asset management company in Colorado where he works with companies that produce hardware, telecommunications software and financial services.  Jay previously owned a consulting company and provided training and consulting services for three Fortune 500 companies and numerous small businesses where he leveraged Linux to provided exceptional value.