The Publisher system in its current form allows you to make data on Freja/Accumulus available to users anywhere in the world.
The published data is a read-only copy of the original data. Published data cannot be changed, only deleted (automatically after a certain time, or manually by the person who published it). Published data is not updated when the original data changes.
You can publish data that is stored on any of the shared filesystems on Freja (e.g /home, /nobackup/*, but not /scratch/local).
You always publish a directory tree with all its contents. If you need to publish a single file, create an empty directory and put the file in it, then publish the directory.
The current system has a capacity of approximately 20TiB published data (shared between all users, no quota).
The Publisher system is connected to the Internet and to NSC systems with a 1Gbps network (so the maximum combined in/out transfer speed will be ~100MB/s).
pcmd mytestdata tmp_rossby
. Sample output:[sm_mkola@analys1 ~]$ pcmd mytestdata tmp_rossby
Checking dataset......
Generating sha1sum......
data
f9910632ba63c554ee7ba95c4eb8f0618e4bd986
Checking dataset file sizes --> OK
Publication created with ID: tmp_rossby.74
Export url: http://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28, rsync://exporter.nsc.liu.se/b7b00058ad424381909938b0492ffb28
[sm_mkola@analys1 ~]$
pcmd -r DATASET_ID
, where
DATASET_ID is the identifier listed by e.g “pcmd -qv” or the
“Publication ID” given when the data was published (e.g
“tmp_rossby.74”). Some areas are configured to automatically
delete datasets after a certain time.You can always see the actual list of publication areas using the
command pcmd -l
. The list below is not guaranteed to be up to date.
Name | Unix groups allowed to publish | Protocols | URL type | Datasets automatically deleted after (days) | Limits |
---|---|---|---|---|---|
tmp_foua | sm_foua | http,rsync | secret url | 30 | max file size 1TB |
tmp_foul | sm_foul | http,rsync | secret url | 30 | max file size 1TB |
tmp_fouo | sm_fouo | http,rsync | secret url | 30 | max file size 1TB |
tmp_foup | sm_foup | http,rsync | secret url | 30 | max file size 1TB |
tmp_bpom | sm_bpom | http,rsync | secret url | 30 | max file size 1TB |
tmp_ml | sm_ml | http,rsync | secret url | 30 | max file size 1TB |
tmp_mo | sm_mo | http,rsync | secret url | 30 | max file size 1TB |
tmp_misu | misu | http,rsync | secret url | 30 | max file size 1TB |
tmp_rossby | rossby | http,rsync | secret url | 30 | max file size 1TB |
tmp_kthmech | kthmech | http,rsync | secret url | 30 | max file size 1TB |
tmp_miuu | miuu | http,rsync | secret url | 30 | max file size 1TB |
rossby_sc | roadmin | http,rsync | secret url | no | max file size 1TB |
rossby_pr | roadmin | http,rsync | user-selectable name | 7 | max file size 1TB |
Getting help:
[x_makro@analys1 ~]$ pcmd -h
usage: pcmd publPath publArea or
pcmd [options]
With no options, pcmd will publish publication at 'publPath' to 'publArea'.
With options, publPath and publArea should be omitted. Please see the
Publisher User Guide for more information:
http://www.nsc.liu.se/systems/publisher/
options:
-h, --help show this help message and exit
-n, --version Shows version
-v, --verbose Shows verbose info
-s, --status Shows the status of the supplied publicationId
-j, --listjobs Lists the ongoing jobs
-p, --poll Shows the status of published files
-l, --list Lists all publication areas
-a, --area Displays the area definition of the supplied areaname
-q, --query Query datasets
-u USER, --user=USER Specifies the user to query (used together with -q)
-d DATE, --date=DATE Specifies the date to query (YYMMDD or YYMMDD-YYMMDD)
(used together with -q)
-r, --remove Removes a publication
[x_makro@analys1 ~]$
Exporting a directory:
[x_makro@analys1 ~]$ pcmd -l
Available publication areas
------------------------------------------------------------------------------
Name Prot Auth Days Url
tmp_misu http, rsync secret url 30 http://exporter.nsc.liu.se
tmp_nsc http, rsync secret url 30 http://exporter.nsc.liu.se
tmp_mo http, rsync secret url 30 http://exporter.nsc.liu.se
publtest-1... http, rsync secret url 1 http://exporter.nsc.liu.se
tmp_rossby http, rsync secret url 30 http://exporter.nsc.liu.se
publtest http, rsync None 0 http://exporter.nsc.liu.se
tmp_foup http, rsync secret url 30 http://exporter.nsc.liu.se
system http, rsync None 1 http://exporter.nsc.liu.se
tmp_foua http, rsync secret url 30 http://exporter.nsc.liu.se
tmp_foul http, rsync secret url 30 http://exporter.nsc.liu.se
tmp_fouo http, rsync secret url 30 http://exporter.nsc.liu.se
tmp_ml http, rsync secret url 30 http://exporter.nsc.liu.se
bigtest http, rsync None 0 http://exporter.nsc.liu.se
tmp_miuu http, rsync secret url 30 http://exporter.nsc.liu.se
tmp_bpom http, rsync secret url 30 http://exporter.nsc.liu.se
rossby_sc http, rsync secret url 0 http://exporter.nsc.liu.se
rossby_pr http, rsync None 7 http://exporter.nsc.liu.se
tmp_kthmech http, rsync secret url 30 http://exporter.nsc.liu.se
nsctest-1d... http, rsync secret url 1 http://exporter.nsc.liu.se
[x_makro@analys1 ~]$ pcmd mydata tmp_misu
Checking dataset......
Generating sha1sum......
file1
82251aabadb525ee709a4a04a30c0e07448ea314
bigfile1
b1260761b0c32c95edd0f0f8d95322ceae96d0e7
dir1/file2
9927b517a5710aa8bf6a9fbfba76e6722114f2f5
dir1/file3
06b82608e19bfb693afe56730db4103dc987b076
Checking dataset file sizes --> OK
Publication created with ID: tmp_misu.675
Export url: http://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55ed10581e, rsync://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55ed10581e
[x_makro@analys1 ~]$ pcmd -j
Active jobs
----------------------------------------------------------------------
tmp_misu.675 In transfer vagn:/home/x_makro/mydata
[x_makro@analys1 ~]$ pcmd -j
Active jobs
----------------------------------------------------------------------
tmp_misu.675 In transfer vagn:/home/x_makro/mydata
[x_makro@analys1 ~]$ pcmd -j
Active jobs
----------------------------------------------------------------------
tmp_misu.675 Performing checksum test vagn:/home/x_makro/mydata
[x_makro@analys1 ~]$ pcmd -j
Active jobs
----------------------------------------------------------------------
tmp_misu.675 Exporting vagn:/home/x_makro/mydata
[x_makro@analys1 ~]$ pcmd -j
Active jobs
----------------------------------------------------------------------
[x_makro@analys1 ~]$
Checking all your published data:
[x_makro@analys1 ~]$ pcmd -q
Available Publications
----------------------------
tmp_misu.675 ---------------------------------------------------------
http://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55... Exported
[x_makro@analys1 ~]$
[x_makro@analys1 ~]$ pcmd -q -v
Available Publications
----------------------------
tmp_misu.675 ---------------------------------------------------------
status Exported
area tmp_misu
url http://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55ed10581e, rsync://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55ed10581e
prot http, rsync
auth secret url
source vagn:/home/x_makro/mydata
user x_makro
time Wed Mar 13 16:11:26 2013
Download and verify exported data (from anywhere in the world) using rsync:
kronberg@ming ~/tmp $ rsync -av rsync://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55ed10581e ./mydata_downloaded
receiving incremental file list
created directory ./mydata_downloaded
./
CHKSUM.SHA1
bigfile1
file1
dir1/
dir1/file2
dir1/file3
sent 156 bytes received 209741472 bytes 27965550.40 bytes/sec
total size is 209715492 speedup is 1.00
kronberg@ming ~/tmp $ (cd mydata_downloaded && sha1sum --check CHKSUM.SHA1)
dir1/file3: OK
bigfile1: OK
file1: OK
dir1/file2: OK
kronberg@ming ~/tmp $
Download a single file using http:
kronberg@ming ~/tmp $ wget -q http://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55ed10581e/bigfile1
kronberg@ming ~/tmp $ ls -l bigfile1
-rw-rw---- 1 kronberg kronberg 209715200 Mar 13 16:01 bigfile1
Recursively download and verify a dataset (from anywhere in the world) using HTTP:
kronberg@ming ~/tmp $ wget -q -e robots=off -r --no-host-directories --no-parent http://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55ed10581e/
kronberg@ming ~/tmp $ (cd f7b70d36e0504a2b93a9cf55ed10581e && sha1sum --check CHKSUM.SHA1)
dir1/file3: OK
bigfile1: OK
file1: OK
dir1/file2: OK
kronberg@ming ~/tmp $
Deleting data
x_makro@analys1 ~]$ pcmd -q
Available Publications
----------------------------
tmp_misu.675 ---------------------------------------------------------
http://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55... Exported
[x_makro@analys1 ~]$ pcmd -r tmp_misu.675
Dataset tmp_misu.675 put on queue for deletion
[x_makro@analys1 ~]$ pcmd -q
Available Publications
----------------------------
tmp_misu.675 ---------------------------------------------------------
http://exporter.nsc.liu.se/f7b70d36e0504a2b93a9cf55... Deleted
[x_makro@analys1 ~]$
A dataset may only contain files and directories. If the directory tree (dataset) that you try to publish contain any other types of data such as symbolic links and sockets, pcmd will display an error message and exit.
This is a design choice and not a bug or technical limitation.
Published data can be deleted using “pcmd -r”. When a dataset is deleted, it is no longer accessible to users.
However, information about the published dataset are retained in the database and can be displayed using e.g “pcmd -qv” (the data set is displayed as “Deleted”).
The URL used by an active data set (one which can be downloaded currently) can not be reused for another (i.e you cannot publish some data as http://server/area/my-latest-data and then replace the files with updated data next week, without first deleting the first data set).
Note: this behaviour is a bug or undocumented behaviour in Publisher, it will change in a future version.
When you publish a directory tree, the permissions of the files are copied along with the files. If you export files or directories that are only accessible by “user” or “group” they will not be accessible after having been exported.
Workaround: make sure that all files and directories are accessible to
“other” before publishing them, e.g by running chmod -R o+rX <DIRECTORY>
Publisher is not designed to be a high-availability system. It can be considered to be approximately as reliable as an NSC cluster login node (e.g Gimle).
In practice, this means:
Published, non-deleted datasets are backed up daily to tape for disaster recovery purposes.
The Publisher internal database that keep tracks of all metadata is backed up to disk hourly and to tape daily.
If this level of availability is not enough for your needs, store your data elsewhere, or contact NSC to discuss how we can improve Publisher.
If you need help using Publisher, if something does not work as expected, or if you have any other questions, please send an email to the normal support address smhi-support@nsc.liu.se.
Guides, documentation and FAQ.
Applying for projects and login accounts.