CSIG 2005 NOTES

MONDAY 18, 2005

Data Modeling
Infrastructure
- SDSC, NCSA, PCS, CTC
- NPACI
- TCS, DTF, ETF

Cyberinfrastructure
BIRN Portal
- Study of mice and human brains
- Problem with developement and users using the information to do work while it is being developed

CI Examples
GEON, LTER, NEON, CUAHSI...

GEONGRID (www.geongrid.org)
- On the fly data integration
- create integrated views
- views cannot be created for everything, must be able to allow user to make there own view.
ABSTRACT the data until the data becomes uniform, then build from there.
GEONGRID connect sto the chronos web service to get numeric age.

- hosted and nonhosted data registration
- service registration

LTER: Sound as an Ecology Indicatior and a Stressor
Stuart Gage, MSU
Environmental Acoustic Monitoring

Use sound to do research like species count, save a "Sound Scape" to the server

http://www.neoninc.org

Production and development competition
must catelogue web services

Introduction to data modeling
- Controlled vocabulary
- Database schema
relational,xml
- Conceptual schema
ER, UML
- Thesauri
synonyms, broader term, narrower terms
- Term with structre (relationship) among them
- Taxonomies

- informal/semi-formal representations
- concept spaces, concept maps.
- RDF
- Formal ontologies, OWL
- formalization of specification
- What is an ontology? An ontology usually...
- specifies a theory ( a set models) by...
- defining and relating
- concepts representing features of a domain or interest

IGNEOUS ROCKS, TERRANES AND GEON
http://pitlab.geol.vt.edu/server/docs/Index.html
http://geon.geol.vt.edu/geon/index.html
http://geon.geol.vt.edu/geon/datamining/tools.html
Show this link to georoc.
ORACLE SQL CLIENT INSTALL NOTES
ROSS, you might be interestered in terashake TERASHAKE, HDF5 file structure
Oracle Instant Clients
NetCDF OPeNDAP CDM = Common data model http://dbdesigner.sourceforge.net/
http://www.fabforce.net/dbdesigner4/


TUESDAY 19, 2005
List of presentations on geongrid
Web service information for Coldfusion
GEONGRID services
- ASCIIToMap
- XMLToMap
- ShapeToMap

UDDI, yellow pages of web services,
getting started but has no way to know which web service is really the one you want.
PHP web service quick start guide
Power point slides for today's workshop (ZIP file)
PHP web service install
install log for php [root@otis tbowers]# pear install -f SOAP Warning: SOAP is state 'beta' which is less stable than state 'stable' downloading SOAP-0.9.1.tgz ... Starting to download SOAP-0.9.1.tgz (69,454 bytes) .................done: 69,454 bytes install ok: SOAP 0.9.1 [root@otis tbowers]# [root@otis tbowers]# pear install -f Net_URL downloading Net_URL-1.0.14.tgz ... Starting to download Net_URL-1.0.14.tgz (5,173 bytes) .....done: 5,173 bytes install ok: Net_URL 1.0.14 [root@otis tbowers]# [root@otis pear]# pear install -f HTTP downloading HTTP-1.3.6.tgz ... Starting to download HTTP-1.3.6.tgz (4,805 bytes) .....done: 4,805 bytes install ok: HTTP 1.3.6
location of SOAP /usr/local/share/pear/doc/SOAP /usr/local/share/pear/SOAP
geongrid web service to connect to
geon02.sdsc.edu:8080/axis/index.html
Link to information on cfinvoke which is used by coldfusion to contact a web service
Runnig web service tools. -Use the database to queue the processes that need to be run.
- implement the schedule program to run the process.
- schedule will know who wants to run the program and how long to keep temp files or the resulting data around.

Use web service to run programs from a cell phone
- register samples, - list samples in the area - list average sample age and rock type

You can run Tomcat as a standalone program on port 8080 c
Ask about GEONGRID mapper XMLToMap web service

WEDNESDAY 20, 2005
Review/Introduction of GIS with Ilya Zaslavsky
integration of information from different sources
The common ground of spatial data is location, but not always

Geographic Database
- Framework Data - Thematic Data
Used to explore data in space
Use ontologies to link features
UMLS, unified medical language system.
Projections focus on two projections, UTM: Universal Transverse Mercator - UTM zone numbers - six degree zone where all the points are measured in meteres from the equator - All measurements are in positive numbers Mercator vs. Robinson as part of request to a remote server, send the needed projections
Any one projections can never preserve shape, area, direction, and distance
www.ncgia.ucsb.edu, lecture notes on GIS, somewhat outdated
Modifiable Areal Unit problem
SVG and VML (Vector markup language), vector graphics internet explorer
focus on two projections,
UTM: Universal Transverse Mercator
- UTM zone numbers
- six degree zone where all the points are measured in meteres from the equator
- All measurements are in positive numbers
- measured in meters from teh equator

Robinson's projection
- trys to make map look "natural"

State plane system/zones
Arc Map Layout view and data view shape file projection is saved in the .prj file - contains units and other infomration on the projection of the shape file data

User arc catalog to edit meta data for shape file -
Medical community is using a brain atlas the same GIS setup could be used for mapping thin sections
Compacting rasters Quadtreees, Morton numbers
Row differences encoding, TIFF
Row legnth encoding

Vector Data Structure Alternatives
- Development trends
- Spaghetti files 1974
- Polygon Loops (location lists)
    - polygon boundaires do not always line up, which is bad
- Point dictionary - topology     - study of basic spatial relationships... Afternoon Spatial types - OGC Simple Features
Map Algebra Four classes of operations: - local
- focal
- zonal
- incremental

Computer slope to find problems, like holes, in the data
Using spatial analyst

Web Mapping with ARC IMS
Gamming environment TORQUE, cheap gamming engine FDGC, federal meta data standards for spatial data
images on the geongrid web service are named as temp files using mac address and time stamp.
Thats is why the arcims server images look funny

Anderson land use cover codes
Land use codes are not standard but close, need an ontology to work with various land use codes
WMS (web map service) getCapabilties, adverstise what the server can do... - GetMap request example

OGC web Feature Server Interfaces
- two classes, Basic WFS and Transaction WFS

www.geographicnetwork.com

GetFeatureInfo to list all the features of an IMS server
arcIMS authos it free???
Feature class is a set of vector data and not an image.
ARCIMS administrator
Arc XML validator


THURSDAY 21, 2005
Distributed database system
- need ability to update data sotred in multiple databases
- Data Warehouse
    - legacy problems

ETL (operation) = Extract, Transform, Load

1. Load data from source to warehouse
2. Query processing interaction only between client and warehouse

Warehouse data could be "stale", i.e. out of synch with source data...

Data integration via middleware (aka mediator)
- client request goes to the middleware, then the mediator queries that databases and returns the information to the client
Warehousing vs Mediation
- Warehouse use ETL to "massage" data - Mediation modify the query to get the connection working     - Integrated view schema
    - Source "export" a view (the export schema)
- Federated database
    - Local sources belong to different "administrative domains", i.e. different owners
    - local autonamy


The Canonical Mediator / Wrapper architecture
XML mediator -Do and XQuery on an XML file decribing the database

Need a role based access from the mediator instead of making a bunch of users. - make Administrator users (maybe) and read only users

Data Integration Carts
- Integrating data sets without explicitly creating views
- Randy Keller has a gravity dataset which would be good to integrate with navdat/earthchem
- Spatial Ontology / aka gazeteer, used to determine what an area is, Rocky Mountain Region
- Need to know classification of datasets as gravity and geology

Geon meta data catalog

Data Registration,BR> Spatial Ontology
- location
-- point
-- polygon
--- latitude
--- longitude
Item Registration (Schema regitration)


Doug Greer Showing off IBM product, IBM DB2 Information Integrator
Top Level view of a Federated database...
Bascially a mediator
Need to develope a view to start from
If people need different views then it will be hard/impossbile to created a database application

NEED to create a GLOBAL VIEW, across all database which defines a uniform way to query across all databases

Four step to connect to federad database
1. Create WRAPPER
2. CREATE SERVER
3. Define USER MAPPING, includes roles/user/authenication of the local database to the wrapper
4. Define NICKNAME/ALIAS, giving the remote table a name

CREATE VIEW TABLE_NAME AS SELECT data1 UNION SELECT data2 UNION SELECT data3

CHRONOS
Look up PaleoStrat Database
TimeScale Database

DB2 control center
Conop9 Application
Maps taxa across core samples
Uses global view to show data form different sources
RTRIM CAST CONCAT to create a sample id from a database that does not exist

Kai Lin

Data Semantics and integrity constraints
Semantic Web
- making web-content machine-processable
- has to understand semantics or ontologies
Why knowledge Representation
- Separate domain knowledge module from the operational module
- Configureable knowledge module
- share and reuse domain knowledge
- Analyze domain knowledge

What is ontology
A formal, explicit specification of shared conceptualization

Parts of an ontology
- concepts
- properties
- axioms/relations
----- dijoint, covers, multiple inheritance, default inheritance, restrictions/cardinality
- reasoning tasks

RDF = Resource Description Framework
- Resource
----- URI (Uniform resource identifier)
- Property
----- lives-in, worksFor, carColor...
- Statement

RDF containers
rdf:Bag
rdf:Sequence
rdf:Alt

RDF schema(RDFS)
Simple ontology language
Core Class
Core Propertya

OWL, Web Ontology Language
RDF too weak to describe resource if sufficient detail
- No localised range and domain contraints
- No existence/cardinality contraints
- No transitive, insrve or symmetical properties
- No in/equality
- No boolean algebra

Three levels of OWL
- OWL DL = description logic
- OWL Lite
- OWL FULL, OWL plus RDF

Description Logic Family
http://www.hpl.hp.com/semweb/index.html

Afternoon
Ashraf Memon
Map Inetgration

Using dataset from different providers
Search conditions
spatial, temporal, concept
Services
Ontology service (redefine query)
Query Service ( execute query)
Mapping Service (Mapping Service)
geon has developed a map assembler which pieces geological maps together.
CHRONOS runs the web service which assigns the correct color to geological map

WMS - Web Map Services
- GetCapabilites function will give out information on server

WMS registration service

Query for concept rather than just values

Show class Jeremy's web service crawler

www.mapdex.org
http://www.wsindex.org/Semantic_Web/

Protein database citation
www.rcsb.org/pdb/citing.html


Kai Lin
Using Ontology
- better way to discover and understand datasets
- better way to query datasets
- Better way to integrate multiple datasets

give out high leve informatino on how to query the database wihtout getting into details that are not important.
A search engine for data users
- metadata based
- spatial coverage
- temporal coverage
- concept based

Data integration Challenges: Hetrogeneities
- syntactical
----- date forma
t - structural
----- how is the data stored, as time stamp or integers
- semantics
----- fuzzy data

Challege for comptuer Scientists and Domain Scientists
- comp sci: build and integrate system based on ontology
- domain sci: create domain ontology
- data provider: register datasets to ontology

Register Properties
example: ROCK has AGE
Both are located in the same table

Ontological Database annotation Language (ODAL)
- annotate database connection information like username, passwd and port number

http://www.mindswap.org/2003/owl-svg/
http://www.mindswap.org/people/pages/?person=http://www.mindswap.org%2F2004%2Fowl%2Fmindswappers%23Aditya.Kalyanpur
http://projects.semwebcentral.org/softwaremap/trove_list.php?form_cat=352&discrim=356

Siriram Krishnan
Web Service Security
Private key crytography
Public Key crytography
X.509 certificates
GAMA: Grid account managment architecture
MLS: Message Level Security
WS-Secure Conversation
- WS secuirty prone to Reply attack
CAS: Community Authorization service (CAS)


FRIDAY 22nd
Efrat Jaeger
Scientific Workflows
Gravity Model Workflow
SWF Systems - Requirements
USERS
- design tools
- Ease of Use
- reuable genereic features
- Extensibility
- Registration and publication
TECHNICAL REQUIREMENTS
- Error dection and recovery
- logging
- Status
www.kepler-project.org
Ptolemy II
More geared for eletrical engineering

There are kepler workflow examples and cookbooks
Ingeous rock classifier, GEON Mineral Classifier Workflow

Vergil is the GUI for kepler


Tim Kaiser Parallel Computing
http://www.sdsc.edu/~tkaiser/
Types or parallelism Two extemes
- Data Parallel
----- each processor performs the data on one data set
- Task Prallel
-----
Limits of Parallel computing
Theorectical upper limit
- Amdahl's law
Flynn/s taxonomy
- SISD
- SIMD
- MISD
- MIMD

Background on MPI, message Passing Iterface
MPI: the complete Reference
MPI tutorial
isnir, otto, huss-lederman, Walker.....
www.lam-mpi.org
www-unix.mcs.anl.gov/mpi/mpich
GLOBUS
www.globus.org/mpi
Include files
C: mpi.h
Fortran: mpif.h (A f90 module is good)
Demonstrate a lam mpi program
impiexec -machinefile mlist aka listof machines

Basic Communitcation, Synchronous ans Asynchronous

www.marine-geo.org

AFTERNOON
There is a perl MPI for parallel programming
There are 6 mian MPI funtions that you need

MPI HAS THERE OWN DATA TYPES
Fortran
       call MPI_Bcast(buffer, COUNT,
                      datatype,
                      root,
                      communicator, ierr)

MPI matlab for parallel programming in matlab

Numerical Recipes in C (BOOK)
 
Ghost cells, edge problem
if the value needed was divided off and sent to another processor, just ask the processor for the data using MPI
TRICK 1. to avoid deadlock, send left, receive from left... divided up by odd and even processors 
TRICK 2. null processors are used so if you are on the edge of a processor and it has no neighbor, then send it to the null processor(aka proc_null).  
Proc_null means you do not have ot check to see if the processor is there, just send the data to no where