It has been quote some time i last updated this blog, but my work has been overwhelming. The year of 2011 is pretty much gone, and i think i would be a great time to share some information regarding the journey i have taken to achieve a better quality in the IT services delivered at my work.
I pretty sure that anyone that works in IT has quite some interesting issues to deal with , so read this post keeping in mind the “Your milleage may vary” slogan.
As such ,first things first, and let me start by giving some stats about my current IT environment to better understand what kind of issues i talk about and what decisions have been made to achieve a better quality in the IT services delivered .
HQ IT environment – As of January 2011
- A user base of around 500 , where 50% are WinXP based (laptops and pc’s), 35% thin client based (Windows and Linux Terminal servers based) and 15% based on either HP-UX workstations or high-end PC workstations (XP based as well)
- Data shared across Windows, Linux and HP-UX environments, using CIFS, NFS and a big amount of “creative solutions”
- Network topology based on 3Com/HP switches , with 1GbE as the dominant link speed. No network redundancy/resiliency deployed, either by using link aggregation, STP, or any other technique. Network strategy based on ad-hoc tasks/growth .
- Windows 2003 as the main file server, with around 7.5 TB of data
- Oracle environment with 12+ instances, using 3TB worth of data with it’s own storage solution (Open-E , NFS based)
- Opensolaris as the NAS/SAN environment, for NFS, and iSCSI
- Usage of end-of-support x86 servers from Sun
- Usage of Citrix XenServer as our ,only,Virtualization Hypervisor
- Server farm of around 80 physical servers, x86 based servers, and 80 virtual machines
- Usage of a miriade of different applications by the business, that had lower and lower availability due to systems crashing, lack of monitoring and “weird issues”
- Team of 6 persons, with 5 of them allocated at 100% to sysadmin tasks, and 1 (myself) with time spited as Sysadmin, Technical IT Leader, Infrastructure Consultant and many many other activities
- Workload of IT team members around 11.5 hours work/day , 6 days/week
- IT environment managed from 6am until 1am CEST time
- IT budget and Business needs with a big mismatch , leading to wrong expectations and bad IT services with a direct consequence of end-user lack of trust in IT
HQ IT environment problems – As of January 2011
- IT could not meet the increasing demand from Business to support more and more applications with a higher SLA (higer means something better than best effort ) and lower RTO/RPO
- Ever increasing pain to backup data, mainly data hosted using virtualization
- Users demand to access the same data any time, any where, from multiple devices, while IT couldn’t deal with this demand
- Network errors and bottleneck encountered without a proper way to troubleshooting it
- Storage devices suffering from multiple crashes, leading to lengthy repair times and increasing user’s frustration
- Lack of support from a vendor that could go beyond the sales phase,or in other words lack of partnership with vendors in order to sell a solution rather than selling equipment
- IT department in charge of designing a new datacenter/network/IT environment for the new HQ to be ready at 2012 while keeping/improving the current environment
By now you , probably, have a good picture of the environment and i hope you can map this specific environment to your own and hopefully you can take some good tips/points from this post.
Gladly for me (i work for having what i have by the way), i have the type of managers that actually listen to what i say and take me seriously and based on that , by the end of 2009 major decisions have been made by me and supported by my direct manager(s) and from other key users within the organization. Those decisions where made in terms of strategy of the servers/storage to use, network design, virtualization, backup and what can be done or not with the time/budget/resources available aKa expectations management .
The most critical decisions made in late 2009 where:
- Selection of a new IT (servers, laptops, PCs, storage) vendor. The choice was Dell , due to their good products but more important due to the fact that Dell was committed to work out with us to find a solution and not only to sell servers/desktops. Dell has proved to be a partner to us , that help us to find a solution rather than going directly to the sales phase.
- Selection of a new storage solution, based on iscsi technology – Dell Equallogic
- Selection of a new network vendor, to be used only for the new datacenter/HQ – Juniper
- Decision of adopting a layered approach for the virtualization hypervisor in use. This basically means that for diferent SLA, RTO/RPO of an application to be virtualized either Citrix XenServer or VMWare can be used. For lower SLA’s it has been decided to use Citrix XenServer and VMWare as the foundation block of our new environment to support critical business applications, like Databases, CRM, PLM , etc
- Decision of moving ( not consolidation per se) any x86 workload from a physical environment into a virtualized one, unless specific cases (for instance if a vendor doesn’t support virtualization for their product, like Product License Servers)
- Decision of stopping to use the current backup tool (EMC Legato) to backup of virtualized environments. PHD Virtual Backup for Citrix XenServer selected and Veeam for VMware environments . EMC Legato is still our corporate tool to backup any physical server, and i don’t see moving away from such tool anytime soon
- Provide IT Team members training in core technologies, like storage, networking and virtualization
Fast forward a year later , and here we are with outstanding results.
That are many advantages seen in our IT landscape but above all the business recognizes that the time to deliver a service from IT has decreased and the IT can now actually suggest and support and SLA towards the several business units .
Key factors for success
- Selection of the right IT vendor/partner rather than just focus on a pure technical point of view
- Don’t get impressed by buzzwords used by vendors..For instance if vendor A says “Our storage uses a 64 bit OS” , ask to yourself and to the vendor what advantage such feature will bring to your environment
. In other words, make smart questions, and do some research prior any talk with any vendor
- Plan, execute, invest in creating tasks in an automatic mode with proper error handling
- Invest proper time/efforts in the art of End-User Expectation Management
- Don’t get stuck to procedures just because they exist for long time.The fact they exist for long time, doesn’t mean they are still suitable for the the present/future needs
- Attend industry recognized seminars , like given by Greg Schulz – @storageio(http://www.storageio.com/)
- Attend industry recognized events, like VMWare World ( http://www.vmworld.com )
- Invest time in reading blogs, whitepapers, attending webcasts and social media (mainly Twitter)
To wrap up i need to say thank you to some key persons i meet over Twitter that have helping me in many ways for the past 18 months.
Here’s the list (without any sort of preference) , based on categories:
@RickVanover (good technical discussion at VMWorld 2011 Copenhagen ! Looking forward for VMWorld 2012 in Barcelona 🙂 )
@gostev (If you’re into Veeam…this guy r0cks ! )
@gminks ( You’re probably the acquisition made by Dell, regarding how can social media should be done within a business ! )
Virtualization – VMware