Research Projects using the BeSTGRID Data GRID

From BeSTGRID

Jump to: navigation, search

Contents

[edit] Auditory Scene Analysis

Project Lead
Dr Michael Hautus
Project Description

Dr Hautus (University of Auckland) and Dr Johnson (Macquarie University) are collaborating on a project to elucidate some of the brain mechanisms underlying the separation and assignment of auditory stimuli to auditory objects; sometime referred to as auditory scene analysis. This requires the collection of electrophysiological data (EEG, fMRI, and MEG) which needs to be accessed by the teams working at Auckland and Macquarie.


[edit] Austronesian Basic Vocabulary and Bantu Language Databases

Project Lead
Prof Russell Gray
Usage
.5 TB
Project Description
  • Austronesian Basic Vocabulary Database: This database contains 125,000 lexical items from 580 languages spoken throughout the Pacific region. These languages all belong to the Austronesian language family, which is the largest family in the world. It contains between 1,000 and 1,200 languages.
  • Bantu Language Database: This database contains 2,388 lexical items from 6 Bantu languages
www 
http://language.psy.auckland.ac.nz/

[edit] Ecology and Animal Behaviour

Project Lead
Dr Stuart Parsons
Usage
1 TB
Project Description
  • Acoustic identification of bats - a collaboration with Humboldt State University, funded by the US department of Defense. The project requires the exchange of large sound files between Auckland and Humboldt as well as storage of large (high sample rate) sound files
  • Acoustic identification of birds - a collaboration with Humboldt State University, funded by the California Department of Transport. The project requires the exchange of large sound files between Auckland and Humboldt as well as storage of large (high sample rate) sound files.
  • Underwater noise levels in Milford and Doubtful Sounds - a collaboration between SBS, Physics and NIWA, funded by the Department of Conservation. The project requires space to store and exchange large sounds files containing underwater noise recordings.

[edit] Human Immunology

Project Lead
A/Prof Rod Dunbar (Tech Contact: Olli Horlacher)
Usage
1 TB
Project Description

[edit] NZ NEES @ Auckland

Project Lead
Assoc. Prof. Jason Ingham
Usage
2 TB
Project Description
  • New Zealand Network for Earthquake Engineering Simulations, a network for collaborative Earthquake Engineering research to mitigate the impacts of earthquake related effects.
  • A project with research partners from United States, United Kingdom, Taiwan and China.
  • The remote controlling and participation of experiments requires a large volume of numeric and video data to be exchanged in real time.
  • Focus areas of research include: Integrated structure-foundation design of bridges, Self-centering structural systems and Distributed hybrid testing.

[edit] New Zealand Biomirror

Project Lead
Prof Allen Rodrigo
Usage
5 TB
Project Description

Bio-Mirror is a public bioinformatics service in New Zealand for high-speed access to up-to-date DNA & protein biological sequence databanks. In genome research, these databanks have been being growing tremendously, so much that distribution of them is hampered by existing Internet speeds. The Bio-Mirror project is devoted to facilitate timely access to important large data sets for this research. High speed access is provided by Internet2 infrastructure of the Very High Speed Backbone Service (vBNS), Abilene, TransPAC, the Australian Academic Research Network (AARNet) and the Asia-Pacific Advanced Network (APAN).

Using Standalone BeSTGRID BLAST Server for high-speed search of GenBank (a database of protein sequences).

www
http://www.biomirror.org.nz/
ftp
ftp://biomirror.auckland.ac.nz/

[edit] Passive DNS

Project Lead
Bojan Zdrnja
Usage
1 TB
Research

Our sensors are deployed at various networks around the world. The sensors passively parse network traffic and collect all authoritative DNS responses. All responses are sent to a central collector which stores them into a database. The information collected include the query, response, resource record type, TTL, timestamp and the sensor that collected this information. The database also stores first seen and last seen time stamps.

This allows us to do various analysis on the collected data. As the database stores all historical information about the seen DNS records, we are working on a reputation based system for certain domains and/or IP addresses, based on their history. Besides this, it is possible to correlate information received from various sensors so we can see geographical spread of DNS responses. Collected information can easily identify fast-flux hosts; this can help with analysis of security incidents.

Technical Description

DNS data is captured passively by sensors at the network edge, using an architecture designed to make implementation of sensors as simple as possible. A sensor is connected to a router SPAN port in order to get complete access to all network traffic. Sensors run tcpdump, configured to write captured packets to a pcap file. Since we are only interested in DNS messages, we used the following tcpdump filter:

udp port 53 and ( udp[10] & 0x04 != 0 )

Note that our filter only captures UDP DNS replies from authoritative sources, since we filter on their "Authoritative Answer" bit. We ignore TCP (for now) to simplify our parsing code, and because we observe relatively little TCP DNS traffic at the router. Since DNS replies always include the query data (in the Question section), there is little need to also collect DNS queries. Alas, our filter can cause some problems on certain large responses. If the DNS reply is larger than the path MTU, the UDP message will be fragmented. If that occurs, the first fragment usually contains enough information for anomaly detection.

Since our sensor is placed at the network perimeter, we see two types of DNS responses: those destined for the University's local caching resolvers, and responses leaving the University's own authoritative nameservers. The former are most interesting for our purposes here, but we did not attempt to filter out the latter from our database.

The sensors have a cron job that runs every hour. First, a new tcpdump process is launched. Then, the existing tcpdump process is killed. The pcap file containing data from the previous hour is compressed and sent to the collector.

Our database resides on the collector. The database holds only collected DNS data relevant for our research. The relevant data includes:

  • Query name (name of the original query)
  • Resource Record (RR) type (query type, ie A for address records)
  • Resource Record data (answer returned by the authoritative DNS server)
  • TTL (Time To Live) - value in seconds, set by the authoritative server, that allows the client DNS server or resolver to cache the answer
  • First Seen Timestamp - timestamp showing when the sensor first saw this record
  • Last Seen Timestamp - timestamp showing when the sensor last saw this record
  • Sensor ID - ID of the sensor showing its geographical location

Rows in the database correspond to resource records in the Answer section of the DNS reply. We do not store records from the Authority or Additional sections. Incoming pcap files are preprocessed by a program that unpacks the DNS messages and removes any duplicate entries. Duplicates typically occur for popular names with short TTLs. Since the only timestamp in our database is the First Seen column, a duplicate answer does not update the database and can be safely discarded. After all the new pcap files have been properly parsed, the program imports the data to the database.


[edit] Quantum Optics

Project Lead
Prof Howard Carmichael and Dr Levente Horvath
Usage
100 GB
Project Description
  • Quantum stochastic processes for composite systems: It is an area of quantum optics that studies correlations and entanglement of quantum states where stochastic processes are important. Apart from the basic understanding of light and matter interactions, this area is important for quantum information.

[edit] Whole Genome Association Studies

Project Lead
Dr. Sharon Browning and Dr. Brian Browning
Usage
500 GB
Project Description
  • Whole genome association scans involve genotypes on genetic markers spanning the human genome in large numbers of individuals with and without a disease, with the aim of identifying genetic variants responsible for increasing disease risk. The Wellcome Trust Case Consortium data consist of 500,000 genotypes per individual on 19,000 individuals, with associated quality scores and other relevant information.

[edit] Ocean Biogeographic Information System (OBIS)

Project Lead
Dr Mark J. Costello
Usage
OBIS virtual machine on Pleyads, 20GB of disk space.
Project Description
  • OBIS publishes data on behalf of scientists from government agencies, museums, universities, commercial companies, and non-governmental organisations. OBIS is always seeking new contributors.
  • OBIS is a marine biogeographic information system, meaning that we concentrate on datasets that record particular species (or higher taxonomic group) from particular marine locations, at particular times. At present, we can only publish data where the locations are recorded as latitude and longitude, not as place names. Our focus is on high taxonomic quality, so datasets where organisms have been identified by professional or trained biologists are our priority. In the near future, we will be expanding to take in environmental datasets (i.e. coverage of physical, chemical, and geological parameters) that are relevant to understanding the distribution of species. So we are interested in hearing from potential contributors of these datasets, and welcome your contact, but are still in the process building this facility.
www
http://www.iobis.org

[edit] The Polyhedrin Project, School of Biological Sciences

Project Lead
Assoc Prof Peter Metcalf
Usage
500GB
Project Description
  • The polyhedrin project is an international research collaboration based in Auckland involving research groups in Japan and Switzerland. The project was established in November 2002 with the initial aim of determining the atomic structure of cypovirus polyhedra, tiny micron-sized protein crystals produced inside cells of silk worms infected with this virus. The micro-crystals are formed from the viral protein polyhedrin and contain virus particles embedded within a crystalline lattice of polyhedrin molecules. The virus containing micro-crystals are remarkably stable and can remain infectious in the environment for years after the death of the infected silk worms. The initial aim of the project was accomplished in mid-2006 and the results published in the prestigious journal Nature in March 2007. The cypovirus atomic structure is important because it enables protein engineering methods to be used to develop the micro-crystals into a range of stable protein based devices, including stabilized enzyme chips, biosensors and stable micro-containers for vaccine delivery.
  • The specific aims of the research currently being carried out by the collaborators include the atomic level analysis of engineered cypovirus micro-crystals and the determination of the atomic structure of related micro-crystals produced by other insect viruses. In this work, the engineered cypovirus micro-crystals are provided by the laboratory of Professor Hajime Mori at the Kyoto Institute of Technology and preliminary analysis of these and other samples is carried out in Auckland. Protein crystallography experiments are carried out using the specialize micro-X ray beam at the Swiss Light Source synchrotron near Zurich, where we work in collaboration with the group of Clemens Schulze-Briese. These experiments produce large amounts of data (currently ~50 GB per trip, two or three times per year) and arranging convenient international access and secure storage has become a significant problem. BeSTGRID is expected to provide an ideal solution for our data storage/access requirements.

[edit] DING Proteins

Project Lead
Dr. Ken Scott
Contact
Andrew Suh (a.suh@auckland.ac.nz)
Usage
500GB
Project Description

DING proteins are a family of proteins with the characteristic DINGGG- N-terminus. They have been isolated from species of all kingdoms and the proposed biological functions of the various members differ greatly. DING proteins may have roles in some of the most prevalent human diseases including rheumatoid arthritis, kidney stone disease, atherosclerosis, cancer and HIV.

Our lab is currently working on the structure-function relationship of DING proteins from various Pseudomonas species. Work on eukaryotic DING isolates may provide potential therapies for a number of human diseases such as breast cancer. Genetic identification of eukaryotic DING genes is also a priority as this may allow early diagnosis of the noted diseases.


[edit] Molecular mechanisms of learning and memory

Project Lead
A/Prof Nigel Birch (n.birch@auckland.ac.nz)
Contact
Victor Borges (vbor004@aucklanduni.ac.nz)
Usage
2000GB
Project Description

One of the many intriguing questions in the neural sciences is: How are memories stored within the brain? Eric Kandel (Nobel Laureate, Kandel, 2001) stated “One of the most remarkable aspects of an animal’s behaviour is the ability to modify that behaviour by learning, an ability that reaches its highest form in human beings”. Many neurobiologists now strive to understand these important cognitive phenomena at the molecular level. We are interested in molecules that are expressed in the developing and adult nervous system which modulate nerve cell morphology and connectivity. Our approach is to manipulate levels of gene expression in cultured neurons, capture the changes by high resolution imaging of fixed and live cells, and then quantify the changes using image analysis software tools.