LCG-0 Installation instructions



Document identifier: LCG-04-TYP-0000
Date: 11 May 2004
Author: LouisPoncet Louis.Poncet@cern.ch
Abstract: This guide is the installation guide for LCG-0.

Contents

Introduction

Objectives of this Document

Application Area

Applicable Documents and Reference Documents

Applicable documents

[A1] document
[A2] other document

Reference documents

[R1] something
[R2] something else

Terminology

Definitions

asdasasd

Glossary

adasdas

Title

bla-bla

Example A

Some exercise

Example B

Another exercise

Common Prerequisites

It is expected that all machines where the LCG Grid software will be installed already have the Linux operating system running. We tested the installation on the vanilla RedHat 7.3 (?server? configuration) or CERN RH 7.3.1. If you are installing the LCG Grid software on a machine which already has some parts of the EDG and/or Globus software installed, it is recommended to completely remove it before starting the LCG-0 installation.

To install the software the super user (or ?root?) privileges are required.

All LCG-0 software has been compiled with gcc-2.95-2, so we distribute also this compiler and its libraries. As a general rule it is mandatory to use all packages we supply in the LCG-0 distribution (some of them could differ from similar packages in a vanilla RH distribution).

A generic user (example: lcg) should exist for the execution of Grid jobs. This user identity and the corresponding home directory must be shared between the machines which will serve as CE, SE and WNs (uid and gid must be the same on all the machines).

A batch system must be installed, so that jobs may be submitted from the CE machine and run on any WN machines. Currently OpenPBS is supported for the batch system. As an example details of how to setup an OpenPBS system are included in the Appendix.

The batch system must be setup with a queue to which the generic grid user is allowed to submit jobs. This is the queue to which all grid jobs will be submitted.

If you are going to have a shared NFS area for your SE (example: /flatfiles/lcg) you need to mount this area on all your machines (CE, SE, WNs). This will allow you to use the ?file? data access protocol to access files replicated on your SE.

The LCG-0 software is installed in all or some of the following directories (depending on what is being installed):

	/opt/edg
	/opt/edt
	/opt/vdt
	/opt/globus
	/opt/globus-24
	/usr/local/gcc-alt-2.95.2
	/etc
	/etc/grid-security
	/etc/rc.d/init.d/
Sufficient free space (approximately 100Mbytes) should be available under /opt. The file /etc/services is modified.

It is not yet possible to install different kinds of services (CE, SE, UI) on the same machine.

Download installation scripts

You can download the installation scripts from the Grid Deployment Group web page: http://cern.ch/grid-deployment/ following the LCG release link.

Install UI (User Interface)

The UI has to be installed on every site. It allows access to the grid infrastructure. At least one UI is required on a site for LCG-0 testing purposes. User accounts will be created on this machine.

1) Download the installation script install_UI.sh from the Grid Deployment web site:

wget http://grid-deployment.web.cern.ch/grid-deployment/bindir/LCG-0/install_UI.sh
2a) Check the allowed arguments:
sh install_UI.sh -help
2b) Run from bash (supply any necessary arguments):
sh install_UI.sh arguments > /root/UI_installation.log 2>&1
(the default download directory will be /root/rpm_lcg and will not be removed automatically).

Install CE (Computing Element)

The installation of the CE is required if the site wants to share local computing resources across the grid. In this case one CE is enough for LCG-0 testing purposes.

The CE requires a machine certificate before you start the installation. Contact your national certification authority to understand how to obtain a machine certificate if you do not have one already (certificate public and private keys will be automatically copied to

/etc/grid-security/hostcert.pem
      /etc/grid-security/hostkey.pem

the latter with permission set to 0400 ).

We assume the CE machine already has been configured as the master node of a batch system.

The following extra requirements are needed for the CE.

A user (example: edginfo) must be setup on the CE machine. It is not necessary that the home directory for this user be shared with other machines. This user must be different from the one under which Grid jobs will run.

Installation is by way of a perl script, available from http://cern.ch/grid-deployment/ following the links for the LCG-0 installation to get the install script for the CE.

1) Download it from the page and install it in some convenient place, eg /root/ :

wget http://grid-deployment.web.cern.ch/grid-deployment/bindir/LCG-0/install_CE.pl

The installation process modifies various directories and therefore must be run as root. (See below for some details of the process of installation).

2) Ensure the CE install script is executable:

chmod a+x install_CE.pl

You can invoke ./install_CE.pl without arguments to see the usage:

Usage: install_CE.pl
 <hostname.domain>    	The fully qualified name of this machine
 <lcguname>            	The username of the grid user
 <hostcert.pem>       	Location of a copy of the host certificate
 <hostkey.pem>        	Location of a copy of the host certificate key
 <closese.domain>     	The fully qualified name of the close storage element
 <infouname>          	The username of the info user
 <Batch system>       	Type of batch system: currently only type 
			is supported
 <Batch system path>  	Path under which to find the batch system commands
 <Batch queue>        	The batch queue name that grid jobs should be 
			submitted to
The install_CE.pl script needs to be invoked with 9 parameters. Here we describe each of them in a little more detail than is reported in the command usage.

hostname.domain : The fully qualified host name of the machine. (deployment onto multihomed machines is not currently supported)

lcguname : The user name under which jobs received over the Grid will be run. All jobs will use this single id.

hostcert.pem : Location of a file containing the host certificate which will be used to identify the CE. The host certificate should be in 'pem' format.

hostkey.pem : Similar to hostcert.pem but this file should contain the private key corresponding to the host certificate.

closese.domain : The fully qualified host name of the machine which will be treated as the 'close' SE for this CE.

infouname : A username required by the Grid information system. Must be different from <lcguname>.

Batch system : Defines the type of batch system to be used. Currently PBS is supported.

Batch system path : The path under which the batch system commands can be found.

Batch queue : This should be name of the batch queue to which all jobs received over the Grid will be submitted.

Before starting the install you should note that install_CE.pl will create a temporary directory ?rpm_lcg? inside the directory from which it is started. The temporary directory will hold all the LCG-0 components needed for the CE, which are about 50 Mbytes in size. You should therefore choose a location with sufficient space.

3) The installation process usually takes less than 2 minutes, but this depends on the time taken to retrieve the LCG components from the distribution site.

Included below is an example of the installation of a CE. The machine being installed as the CE was 'lxshare0240.cern.ch' and the close SE was 'lxshare0241.cern.ch'.

cd /root
./install\_CE.pl lxshare0240.cern.ch lcg hostcert.pem\
 hostkey.pem lxshare0241.cern.ch 
edginfo PBS /usr/pbs/bin workq

++ Starting the installation of LCG Compute Element ++

Fetching list of LCG components from distribution site... Done
Fetching globus\_VDT\_CE.tgz... Now installing
Fetching GNU.LANG\_gcc-alt-2.95.2-6.i386.rpm... Now installing
Fetching BrokerInfo-gcc32-3.2-0.i386.rpm... Now installing
Fetching ReplicaCatalogue-gcc32-3.2-3.i386.rpm... Now installing
Fetching edg-replica-manager-gcc32-2.0-6.i386.rpm... Now installing
Fetching workload-profile-1.2.19-1.i386.rpm... Now installing
Fetching locallogger-profile-1.2.21-1.i386.rpm... Now installing
Fetching locallogger-1.2.21-1.i386.rpm... Now installing
Fetching globus\_gatekeeper-edgconfig-0.17-nodep.1.noarch.rpm. Now installing
Fetching globus\_gsi\_wuftpd-edgconfig-0.17-nodep2.noarch.rpm. Now installing
Fetching globus\_profile-edgconfig-0.17-nodep.noarch.rpm... Now installing
Fetching edg-user-env-0.3-1.noarch.rpm... Now installing
Fetching edg-profile-0.3-1.noarch.rpm... Now installing
Fetching perl-Convert-ASN1-0.16-7.i386.rpm... Now installing
Fetching perl-Net\_SSLeay-1.21-7.i386.rpm... Now installing
Fetching perl-IO-Socket-SSL-0.92-7.i386.rpm... Now installing
Fetching perl-perl-ldap-0.26-7.i386.rpm... Now installing
Fetching edg-mkgridmap-1.0.9-2.i386.rpm... Now installing
Fetching edg-utils-system-1.3.2-1.noarch.rpm... Now installing
Executing globus postinstall scripts... Done
Starting globus-gatekeeper:                                [  OK  ]
Starting globus-gsi\_wuftpd:                                [  OK  ]
Starting up Openldap 2.0 SLAPD server for the GRIS
Starting LocalLogger: interlogger and dglogd
LCG CE install completed
Leaving temporary directory /root/rpm\_lcg for reference
--

(the temporary download directory, /root/rpm_lcg in this case, will not be removed automatically but may be removed by hand after installation, if desired)

After installation the required services should already be running, there is no need to restart the machine.

Ask the LCG-0 administrators on http://cern.ch/grid-deployment/ to include your CE in the information index.

--- Appendix - Installation of PBS masternode (eg. on the CE machine) -

Install and configure a PBS master node (extracted from EDG guide) (please refer to PBS documentation to obtain detailed informations)

download from our repository openpbs-2.3pl2-1 rpm -ivh -nodeps openpbs-2.3pl2-1

a) Set the PBS server name:

     echo "yourCE.yourDomain" > /usr/spool/PBS/server_name
     where ?yourCE.yourDomain? is your CE.
b) Add PBS ports in /etc/services as following:
      # PBS
      pbs 15001/tcp
      pbs_mom 15002/tcp
      pbs_remom 15003/tcp
      pbs_remom 15003/udp
      pbs_sched 15004/tcp
c) /usr/pbs/sbin/pbs_server -t create

d) Create the WN list /usr/spool/PBS/server_priv/nodes. The format is:
someWN.yourDomain np=2 lcgqueue
"lcgqueue" is an arbitrary name which has been used to configure the server.
"np" sets the number of concurrent jobs which can be run on the WN.
" someWN.yourDomain" is one of your WN.

e) /sbin/chkconfig pbs on
/etc/rc.d/init.d/pbs stop
/etc/rc.d/init.d/pbs start

f) /usr/pbs/bin/qmgr < /usr/spool/PBS/pbs_server.conf

Please refer to PBS documentation for accurate configuration on the batch system

Install SE (Storage Element)

The installation of the SE is required if the site wants to share local storage resources across the grid. In this case one SE is enough for LCG-0 testing purposes.

The SE requires a machine certificate before you start the installation. Contact your national certification authority to understand how to obtain a machine certificate if you do not have one already.

1) Download the installation script install_SE.sh from the Grid Deployment web site:

wget http://grid-deployment.web.cern.ch/grid-deployment/bindir/LCG-0/install_SE.sh

2) Consult your Certification Authority to obtain the certificate for the machine. Put machine certificates (public and private key) in

/etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem

the latter with permission set to 0400.

3a) Check the allowed arguments:

sh install_SE.sh ?help

3b) Run from bash (supply any necessary arguments):

       sh install_SE.sh arguments > /root/SE_installation.log 2>&1

(the default download directory will be /root/rpm_lcg and will not be removed automatically)
4) Check /opt/edt/mds/infoprovider/se/se.config:

If needed, modify the protocols that you use on the SE (examples: file,gridftp,...) and the data directory (example: /flatfiles/lcg).

Remember that the storage area (/flatfiles/lcg) needs to be exported to the CE and WN for the "file" access protocol.

5) Create an account needed for gridftp (example: lcg). N.B. This account MUST be the same on the CE,WN and SE with the same uid and gid. ("groupadd -g 1000 lcg" , "useradd -u 1000 -g lcg lcg")

6) In the storage area you should have a dedicated area for each V0; for this release we will support only LCG. The owner of this area (/flatfiles/lcg) must be the same user used in item (5) ("lcg").

7) Ask the LCG-0 administrators on http://cern.ch/grid-deployment/ to include your SE in the information index.

Install WN (Worker Nodes)

Currently the WNs must reside on a public network. As many WNs can be installed as necessary. For the WN we assume a batch system (?slave node?) has already been installed and configured. We will give some instructions as an example for OpenPBS configuration.

1) Download the installation script install_WN.sh from the Grid Deployment web site:

wget http://grid-deployment.web.cern.ch/grid-deployment/bindir/LCG-0/install_WN.sh

2a) Check the allowed arguments:

sh install_WN.sh -help

2b) Run from bash (supply any necessary arguments):

      sh install_WN.sh  arguments  > /root/WN_installation.log 2>&1

(the default download directory will be /root/rpm_lcg and will not be removed automatically)
3) Remember to create on the WN the same accounts used by the batch system on your CE (example: lcg). N.B. This account MUST be the same on the CE,WN and SE with the same uid and gid ("groupadd -g 1000 lcg" , "useradd -u 1000 -g lcg lcg").

4) Home directories must be shared between CE and WNs. You could use a disk server or you could export /home from the CE. (example: mount -t nfs -o bg,intr,hard your_CE_machine:/home /home)

5) If you have a SE and you would like to use the "file" data access protocol, the SE storage area (example: /flatfiles/lcg) must be mounted.

6) If you have multiple WN machines to install, you may prefer using a ?tar ball? for the other ones, instead of letting the installation script always download the necessary files from the LCG web server at CERN. In this case, after the successful installation of the first WN machine, create a tar ball from the files contained in the temporary installation directory and save it in a convenient location. For example:

cd  /root/rpm_lcg
tar  cvf  /home /admin/WN.tar  *
Then on the other WN machines you can run the installation as follows (assuming /home/admin/WN.tar is present now):
sh install_WN.sh ?tarball /home/admin/WN.tar > /root/WN\_installation.log 2>&1
- Appendix - Installation of a PBS worknode -------------------------

Install and configure PBS (please refer to PBS documentation to obtain detailed informations)

      download from our repository openpbs-exechost-2.3pl2-1.i386.rpm
      rpm -ivh --nodeps openpbs-exechost-2.3pl2-1.i386.rpm
a. Edit /usr/spool/PBS/server_name to put your CE machine name (yourCE.yourDomain).
      Edit /usr/spool/PBS/mom\_priv/config to insert

        \$clienthost localhost
        \$clienthost your\_CE\_machine
        \$restricted your\_CE\_machine
        \$logevent 255
        \$ideal\_load 1.6
        \$max\_load 2.1
        \$usecp your\_CE\_machine:/home /home
where you are sharing home directories between CE and WNs. Moreover in this example your machine has 2 CPU.

b. Remember to add the hostname of your WN to /usr/spool/server_priv/nodes and restart pbs daemon on your CE.

c. On WNs, add PBS ports in /etc/services as following:

      # PBS
      pbs 15001/tcp
      pbs\_mom 15002/tcp
      pbs\_remom 15003/tcp
      pbs\_remom 15003/udp
      pbs\_sched 15004/tcp
d. /sbin/chkconfig pbs on, /etc/rc.d/init.d/pbs start

Network requirements

It is assumed that all machines (CE, SE, UI and WNs) will have unrestricted outbound TCP connectivity. Some incoming TCP connect requests also need to be allowed. As a guide the CE, SE and UI should be able to accept incoming TCP connections to the following ports:

2119 Globus Gatekeeper
2135 MDS info port
2169 FTree info port
2170 Information Index
2171 FTree info port
2811 GSI ftp server
6375 SE services
6376 SE services
7846 Logging & Bookkeeping
8881 Job Sub. Service (client)
9991 Job Sub. Service (server)
x -> y Globus Job Manager*
15830 Locallogger
* example 14000 -> 15000 : in addition to these fixed port numbers an open range (x -> y) for inbound connections is also needed. These ports are required for some of the Globus services (at least for GSI-Ftp). If at you site you are using a firewall the range x->y should match your firewall open range. You can set this range on the CE and/or SE machines editing the file /proc/sys/net/ipv4/ip_local_port_range.

Contact:

Contact http://cern.ch/grid-deployment/ with problems.

About this document ...

Infrastructure support documentation

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 -html_version 4.0 -no_navigation -address 'GRID deployment' LCG-0-installation-instructions.drv_html

The translation was initiated by Louis Poncet on 2004-05-11


GRID deployment