Manual Installation of Tank & Spark
Manual Configuration of Tank & Spark
Testing the installation
Shutting down the Tank service
More documentation about Experiment Software Installation on LCG-2
The mechanism comes out with a series of rpms that have to be installed on the CE on each WN and on the SE.
The "official" version has the following well known limits:
3. lcg-spark-gcc32dbg-2.0-3.i386.rpm
to be installed on each WN (it installs the Spark client).
4. lcg-tankspark-conf-2.0-4.i386.rpm
to be installed on CE, SE and each WN (it installs the configuration scripts used at the next step Manual Configuration of Tank & Spark
3. lcg-spark-gcc32dbg-2.1-1_sl3.i386.rpm
to be installed on each WN (it installs the Spark client).
4. lcg-tankspark-conf-2.1-1_sl3.i386.rpm
to be installed on CE, SE and each WN (it installs the configuration scripts used at the next step Manual Configuration of Tank & Spark
CGSI_gSOAP_2.3 >= 1.1.2
MySQL-client >= 4.0.13
MySQL-server >= 4.0.13
MySQL-shared >= 4.0.13
mysql++_1.7.9_mysql.4.0.13__LCG_rh73_gcc32
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
Spark needs:
CGSI_gSOAP_2.3 >= 1.1.2
rsync >= 2.5.7
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
The three different components to be configured (Tank, Spark and Rsync) need just one configuration file:
se=pcitgdeis569.crn.ch
edgVarLoc=/opt/edg/var
rsyncport=873
rsyncuser=tango
ce=lxb0706.cern.ch
vo=cms alice atlas lhcb dteam
rsyncrep=/opt/repository
dbuser=tank
dbpasswd=lcg_test
tankconf=/opt/lcg/etc/tank.conf
rsyncconf=/etc/rsyncd.conf
sparkconf=/opt/lcg/etc/spark.conf
flagdir=/opt/flags
expsoftdir=.
explocdir=/opt/ext_soft
sitename=lxb0706
ldapport=2170
afsprincipal=
lifetime=25
"se" is the storage element where r-rync is running. This is the machine in which the experiment software is centrally stored for each VO.
"rsyncport" is the port number used by the rsync daemon (D=873).
"rsyncuser" is the user to be used by the client to be authenticated on the rsync server
"ce" is the Computing Element in which Tanks runs
"vo" is a list of VOs supported by the site and separated by blank space
"rsyncrep" is the root directory (for all the experiments on the SE) of the central software repository.
"dbuser " is the user used by Tank to connect it self to the mysql DB (wn_list).
"dbpasswd" is the password used by Tank to connect it self to the mysql DB (wn_list) and by every WN in order to connect to the rsync server as "rsyncuser" user
"tankconf" is the path of the configuration file (created automatically) which is used by Tank
"rsyncconf" is the path of the configuration file used by the r-sync daemon.
"sparkconf" is the path of the configuration file used by spark (please leave these last three fields to their defaults)
"flagdir" is the path in which spark will save the tag files used to track
which version of software are installed locally. This is the path for all
the vos. Each VO will have its own sub directory with the right ownership.
If the node is sharing the experiment software area you have to provide this flag area visible for all nodes.
It means that - if for instance the experiment software is under /opt/exp_soft/some_vo - you have to create under /opt/exp_soft a subdirectory called whatever you want (we suggest flags) and this is the value for flagdir attribute. Each VO will have its own subdirectory under flagdir.
otherwise must be set as the root experiment software dir (common for all
the experiments) (for instance /opt/ext_software)
"explocdir" is the local experiment software root dir common to all the VOs. Specific vo sub directories will be created automatically. This directory *MUST* be the same as "expsoftdir" in case of shared filesystem on the WN.
"sitename" name of the CE
"ldapport" port to be used to query the ldapserver on the sitename
"edgVarLog" is the value of the EDG_VAR_LOC variable on the CE
"afsprincipal" is the name of the server in which runs the gssklogd daemon used for the conversion of GSI credentials into AFS Kerberos tokens; leave it blank if no AFS shared file system is there (almost always)
"lifetime" is the lifetime of generated AFS tokens.
After having filled up this file on each machine, you are ready to start.
At this very initial moment you have however to take into account whether the farm in which you are going to start Tank & Spark has already the Experiment Software installed there or not.
Run the command
bash> $LCG_LOCATION/etc/tankspark/lcfg-tank.sh $LCG_LOCATION/etc/tankspark/lcgtankspark.conf
Tank is almost completely installed!
It will perform the following actions:
Run the command
bash> mysql -u root -p < $LCG_LOCATION/etc/tankspark/command.sql
Run the command
bash> $LCG_LOCATION/etc/tankspark/lcfg-spark.sh $LCG_LOCATION/etc/tankspark/lcgtankspark.conf
The command will perform the following actions:
Run the command
bash> $LCG_LOCATION/etc/tankspark/lcfg-rsync.sh $LCG_LOCATION/etc/tankspark/lcgtankspark.conf
There are basically few tests to see if everything has been correctly installed. We propose here some basic functional tests and possible solution in case of problem.
Since the installation of either CE and WNs, after a while (order of 30 minutes MAXIMUM) you have to see in the mysql db all nodes of the site registered. This happens automatically if the installation is OK.
Enter password: < DBpasswd
| lxshare0203 | 128.142.65.180 | ON | 20050414102004 | 20050314151123 | 00000000000000 | NORMAL |
| lxb0708.cern.ch | 128.142.65.23 | OFF | 20050401121006 | 20050314151503 | 20050401124124 | NORMAL |
2 rows in set (0.00 sec)
If you cannot see anything on this table be sure that daemons lcg-tank and
lcg-utank are running on the CE (/opt/lcg/sbin/tank status)
If the daemon is running see if the table monitors on wn_list DB (mysql)
is filled correctly with all VO your site supports.
Checks eventually if the cronjobs are correctly installed on each WN for each ESM user.
Checks if these cronjobs are pointing to the right CE-machine.
Checks either on the CE and on the WNs the existence of /opt/lcg/etc/tank.conf and /opt/lcg/etc/spark.conf respectively.
Checks if the fields on these files are correctly set for your site.
Checks if there are old (previous installation) flags named < hostname > on
each WNs under the corresponding directory "flagdir".
In this case the tool will not write on the DB even if the table hosts is
empty.
Checks if you can run manually (from a WN, as dteamsgm) the line of command that the cronjobs invokes every 10 minutes like for instance:
/opt/lcg/sbin/lcg-asis-client.sh dteam lxb0706.cern.ch
If you succeeds to run the command manually and you see the output like that:
/opt/flags/dteam/
/opt/exp_soft/dteam/
/opt/flags/dteam/cmsfarmbl01.lnl.infn.it no present
host not registered: upgrading functionality called
Using the configuration file: /opt/lcg/etc/spark.conf***************
host is http://t2-ce-02.lnl.infn.it:18084
action is :upgradehost
the vo used is :dteam
###############################################
##### Welcome to the spark client program #####
###############################################
#### action is : upgradehost###########
We are going to contact the server : http://t2-ce-02.lnl.infn.it:18084
No updates found for this node.
then the problem is in the syntax of the cronjob it self. This could be true for tcsh/csh account.
If the output shows you up that the server can't be contacted (gSOAP error) then it means that there is a problem in the communication and it might need further investigation (Do you have the ports number 18084 and 18085 open?)
This test relies on the full machinery of Experiment Software Installation. You have to be sure that lcg-ManageSoftware-2.0.1 is installed in your site.
In case of:
Possible workarounds on the light of the results of the TEST A
(auto-registration of hosts):
edguser ALL = NOPASSWD: /bin/chown
on the file /etc/sudoers if the file does exist otherwise install sudo and add this
line!.
rsync error: some files could not be transferred (code 23) at main.c(620)
:Connection refused rsync error: error in socket IO (code 10) at clientserver.c(83)
:
@ERROR: auth failed on module ...
:
@ERROR: Unknown module
:
There is not at all a script that switch off the service on the Computing Element. Nevertheless the site administartors that doesn't want to keep running the service on his Computing Element can follow the following recipe:
- 19 2,8,14,20 * * * /opt/lcg/sbin/tank proxy > /dev/null 2>&1 >>tmp1.$$
- 0-59/5 * * * * /opt/lcg/sbin/tank watch_dog > /dev/null 2>&1 >>tmp1.$$
0-59/10 * * * * /opt/lcg/sbin/lcg-asis-client.sh < vo> < your_ce>
Original Requirements
FAQ
(in Italian)
Sofwtare Installation General Procedure
Tank & Spark in a Nutshell
More documentation about Tank & Spark
Recent results from tests performed within the INFN activity ECGI
Talk given in Melbourne (Dec. 2005)
Roberto Santinelli