Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 6
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3180 times and has 5 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Human Proteome Folding Newsletter - Mid March Update

Human Proteome Folding - World Community Grid
Newsletter: Mid March Update
Richard Bonneau, Seattle, March 2004


Well, we've crossed an important milestone on the World Community Grid: 5 million results returned. Everyone pat yourselves on the back because that means we've computed structure predictions for over 30,000 proteins and protein domains. To put that in perspective, the human genome has on the order of 30,000 protein coding genes. This means that we've evaluated over 840000000000000* different protein conformations. How big is that number … well, if each of those protein conformations was the size of a hamster then the whole calculation would be the size of Jupiter (if the hamsters are puffed up) or Saturn (if you get their fur wet). Compared to numbers from statistical mechanics or astronomy that's a small number, but given the speed of your average cluster node that is a big number. My facilities manager might be a bit upset about the electric bill if we tried to do this in the building.

* Disclaimer: I reserve the right to be off by several orders of magnitude at all points in this newsletter. Also the hamster analogy is not related to the HMMSTER structure prediction server. For a cute picture of a hamster eating a protein see:
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php

What does this mean from the perspective of useable results and science?:

This means we're roughly a third of the way through the calculation that we'll do on the grid. The process consists of three phases: 1) PICK THE GENES FROM THE SETA OF ALL SEQUENCED GENES TO FOLD ON THE GRID. Deciding what proteins to put on the grid and the preprocessing that is required to make the work units (Rosetta input data) 2) WHAT THE CLIENT IS DOING ON YOUR COMPUTER. The folding or protein structure prediction that the Rosetta-client performs on the grid and 3) GETTING THE RESULTS INTO THE HANDS OF BIOLOGISTS AND BIOMEDICAL RESEARCHERS. The post processing step that is required to make sense of results.

Step one is nearly complete, and the work units are mostly waiting for the World community Grid to slurp them up. A lot of work went into deciding what proteins to place on the grid. First of all, we look to see if there are simpler ways to predict the protein structure (known as comparative modeling and fold recognition). If we can find a match to a known protein fold then we can model the structure by mapping the protein in question onto the structure of a close match (when I say close match I mean sequence-sequence). This is much more efficient. Also proteins must be smaller than 150 amino acids long to fit be Rosetta-able. Therefore, we processed a lot more than 100,000 proteins to come up with 100,000 foldable (Rosettable) domains. All in all we've processed nearly all protein sequences in publicly available databases. This task was carried out with using a program called ginzu (Malmstroem, Kim, Chivian, Baker) at the University of Washington in collaboration with Lars Malmstroem and David Baker. We'll continue putting the finishing touches on this process in the weeks to come, but the bulk of this task is finished.

Step two is one third done. That means we have huge numbers of protein conformations on disk at the ISB that were predicted on the grid. Here are some pictures of some of the structures generated on the grid. Until we perform the post processing needed to distill function from these structures we won't have much to say about any single protein.

Step three is just beginning. We'll say more about progress on this final front and what exactly we mean by prost processing in later newsletters, and for now just say that this part of the overall procedure is just beginning.

For the reader interested in the workings of rosetta we offer this installment of "Rosetta Courner". Each newsletter we'll cover another part of the Rosetta simulation. This part gets a tiny bit hairy. For more details see:
http://systemsbiology.org/Default.aspx?pagename=humanproteome

How Does Rosetta Work Part 1: Fragments.

The first thing to understand if we want to talk about what Rosetta is doing is how Rosetta builds proteins. The problem of finding the correct structure in the astronomically large space of all possible structures requires that we have a strategy to efficiently create and judge protein structures if we are to successfully predict protein topologies. This amounts to twisting the bond angles along the protein chain to get good global conformations (create favorable contacts between non-local parts of the chain). The way we do this in Rosetta is to precompute (step 1 above) libraries of local conformations or fragments of peptide chain structure for each protein (fig 2). The problem of building protein structures is then reduced to assembling these structure fragments. That is what your client is showing. We start with a random chain configuration and then start substituting in random fragments until we start to make good contacts and eventually good structures. When we can't make the structure any better we've converged and we start the process over again from a different random number seed. Each client will try to make anywhere from 50 to 500 structures this way, each structure made of pieces of other proteins (local fragments). This strategy has two main advantages: 1) all local parts are "protein-like" and 2) each fragment substitution is an efficient simultaneous sampling of several bond angles (we change 18 degrees of freedom in one move in an intelligent way).



Step 1: Find Rosettable domains. All this happens before we get to to grid. Blue programs are sequence based, orange are structure based. [Bill this figure can go if it is too much]


Step 1 - part two: Pick fragments of local structure: This figure (made by Kim Simons right around the time of Rosetta's birth) shows the pieces of local structure that Rosetta will put together. We're just showing you a few fragments. We use 75 fragments (deep in this picture) for every possible 3 and 9 residue window (across in this figure)… So there are a lot of ways to combine that many fragments at that many positions and that is why we need the grid.

[Mar 22, 2005 5:56:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Human Proteome Folding Newsletter - Mid March Update

Good Job
[Mar 22, 2005 5:59:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Human Proteome Folding Newsletter - Mid March Update

Thank You for the update! It is much appreciated.

What type of information proccessing system will you use for the post proccessing?
[Mar 24, 2005 2:59:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Human Proteome Folding Newsletter - Mid March Update

we have several large(ish) linux clusters at the ISB and
also in the Baker lab at the UW, seattle. The poor little
disk pack is cranking away trying to keep up with the grid right
now!
[Mar 25, 2005 6:57:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Riverr
Cruncher
Joined: Nov 25, 2004
Post Count: 1
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Human Proteome Folding Newsletter - Mid March Update

RE: Rbonneau

What are your financiel ressources? do you have enough equipment to process it, or do you have a real ressource problem?

- can you make a financiel contribution to the proteme/wcg project if needed? confused

Is it possible to integrate a financial contribution/sponsorsite component to the exisiting wcg.org site, where all public can contribute? (for critic ressources)

If interested i got a team who could develop it - sponsored.

Martin (DK)
[Mar 27, 2005 1:15:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Human Proteome Folding Newsletter - Mid March Update

It is not possible to add a contribution/sponsor component to worldcommunitygrid.org.

World Community Grid is a hosting site. As a hosting site, we invite any not-for-profit organization to run their research (as long as it benefits human kind and they make the results available to the public domain) without charge. We can handle multiple research projects and are working very hard at getting the next project(s).

Other than the partnership of running research, World Community Grid has no ties to the research organizations. As such, any contribution or sponsorship arrangements would have to be made directly with the research organization, in this case, the Institute for Systems Biology, which “rbonneau” represents.

As new research starts running on World Community Grid, we’ll be adding additional forums (one for each new project) under the Active Projects category and you can ask them the same question.

The closest thing to sponsorship at World Community Grid is what we call "Partners". This is where organizations allow their employees to run the grid agent on company owned PC's. Partnerships exist so that more devices are added to World Community Grid which in turn, helps run the research faster. For more information on Partners, look here: http://www.worldcommunitygrid.org/about_us/our_partners.html

In addition, if you know of any research organizations who are potential candidates for running their research on World Community Grid, point them here: http://www.worldcommunitygrid.org/projects_showcase/submit_a_proposal.html

Thank you for the offer.
[Mar 28, 2005 5:43:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread