|
|
Knowledge Engineering using Bayesian Network
Bayesian Belief Network
|
|
|
|
What is a Bayesian Network?
A Bayesian belief network or Bayesian network is a directed acyclic graph of nodes representing variables and arcs representing dependence relations among the variables. If there is an arc from node A to another node B, then we say that A is a parent of B. If a node has a known value, it is said to be an evidence node. A node can represent any kind of variable, be it an observed measurement, a parameter, a latent variable, or a hypothesis. Nodes are not restricted to representing random variables; this is what is "Bayesian" about a Bayesian network.
A Bayesian network is a representation of the joint distribution over all the variables represented by nodes in the graph. Let the variables be X(1), ..., X(n). Let parents(A) be the parents of the node A. Then the joint distribution for X(1) through X(n) is represented as the product of the probability distributions p(X(i) | parents(X(i))) for i from 1 to n. If X has no parents, its probability distribution is said to be unconditional, otherwise it is conditional. Figure 1 is a simple BBN example.
Figure 1 BBN example
The usage of BBN can be divided into following four steps:
- Determine variable collection and variable domain
This step is related to the problem domain. For the software reliability early prediction modeling, several complexity metrics are selected to build BBN.
- According to prior knowledge, determine network topology structure and probability distribution
The total number of topology structure can be n! with n node. So the prior knowledge from domain expert is very important to get most possible topology. And if domain expert can also determine the reasonable probability distribution, then a lot of effort for collecting training data can also be reduced.
- Adjust topology structure and probability distribution using training data
It's useful to introduce the prior knowledge from domain expert, but with the number of nodes increasing, such prior knowledge can be inaccurate. So, training data need to be collected to train the topology and probability distribution. Such adjustment is done by BBN learning algorithm. The purpose of learning is to use training data D and prior knowledge * to find the structure S whose posterior probability p(S|D, x) is maximal.
- Predict using BBN
Here, prediction is reasoning. Bayesian reasoning is to calculate the conditional probability of nodes according to the information of other nodes. According to the dependency between the nodes, BBN reasoning can be divided into three types:
- Causal reasoning, which is used to predict result according to the cause. It's a top-down reasoning.
- Diagnosis reasoning, which is used to analyze cause according to result. It's a bottom-up reasoning.
- Support reasoning, which is used to analyze the effect between each cause.
Comparing to the traditional regressive modeling, BBN has following advantages:
- BBN is highly related to Bayesian statistic. It's useful for establishing the relationship between knowledge and data.
- BBN can deal with the data set, which may be not self-contained or there are some noise inside, while traditional model can't.
- Because BBN can depict both the causality and probability, it's easy to be used with the process of decision-making.
- BBN uses graphic style to depict the dependency between data, so it's easy for understanding and explanation.
Quantitative Knowledge Engineering Process using Bayesian Network
--which was extracted and refined from web material: Using Bayesian Networks for Water Quality Prediction in Sydney Harbour.
What have Bayesian Networks been used for?
Defect Detection - software debugging, safety and risk evaluation of complex systems
...
Related topic: BNN (Bayesian Neural Network)
Detailed information can be found here.
|
|
|
 |
White papers & Articles
|
|
|
|
Dealing with the Expert Inconsistency in Probability Elicitation
,
Knowledge Engineering for Bayesian Networks
,
Designing a Procedure for the Acquisition of Probability Constraints for Bayesian Networks
,
Generating Conditional Probabilities for Bayesian Networks: Easing the Knowledge Acquisition Problem
,
Induction of Bayesian Networks with a priori Domain Knowledge
,
Knowledge Engineering for Probabilistic Models: A tutorial
,
Using Sensitivity Analysis for Selective Parameter Update in Bayesian Network Learning
,
Gary D. Boetticher,
Machine Learners Answer the 300-Billion-Dollar Question
,
University of Houston-Clear Lake
Fenton, N., and M. Neil,
A Critique of Software Defect Prediction Research
,
IEEE Transaction on Software Engineering, Vol. 25, No. 5, 1999
S. Bibi, I. Stamelos,
Software Process Modeling with Bayesian Belief Networks
,
IEEE Transaction on Software Engineering, Vol. 25, No. 5, 1999
S. Bibi, I. Stamelos, L. Angelis,
Bayesian Belief Networks as a Software Productivity Estimation Tool
,
Department of Informatics, Aristotle University
Trevor Cockram,
Gaining confidence in Software Inspection using a Bayesian Belief Model
,
Rolls-Royce plc and The Open University
Jilles van Gurp & Jan Bosch,
Using Bayesian Belief Networks in Assessing Software Architectures
,
University of Karlskrona Ronneby, Department of Software Engineering and Computer Science
Hadar Ziv, Debra J. Richardson,
Bayesian-network Confirmation of Software Testing Uncertainties
,
Department of Information and Computer Science, University of California, Irvine
|
|
|
 |
Good books
|
|
|
|
 |
Well-known tools
|
|
|
|
|
Genie & Smile Bayesian network tool
Genie is a program providing support for inferencing with Bayesian networks and
influence diagrams. It has been developed by the Decision Systems Laboratory,
University of Pittsburgh and is available for research purposes
(http://www.sis.pitt.edu/~dsl).
Genie provides a graphical development environment for editing Bayesian networks and
influence diagrams and to perform inference with them. The networks are solved using
the junction tree algorithm (like Hugin). Interactions between variables may be defined
using conditional probability tables. The models are saved in various file formats. Genie
has a propritary format but supports also many others, like the Netica format and the
standard proposal format.
Additionally, an application programmer's library (API) C programmer's interface, called
SMILE, has been made available for integrating the system to other programs. The library
may be used for research purposes. The source code is not available.
The graphical user interface is supported in MS/Windows and Linux platforms. The API
is supported in MS/Windows and Linux.
While testing, the tool seemed to be very stable and easy to use. It serves useful purpose
while making it possible to test Bayesian network formalisms in an easy way.
|
|
|
J Cheng's Bayesian Belief Network Software
BN PowerConstructor: An efficient system that learns Bayesian belief network structures & parameters from data.
BN PowerPredictor: A data mining system for data modeling/classification/prediction. It extends BN PowerConstructor to BN based classifier learning.
Data PreProcessor: A tool used with BN PowerConstructor and BN PowerPredictor for pre-processing the training data.
|
|
|
HUGIN Bayesian network tool (Commercial)
The HUGIN system is a tool for constructing Bayesian network based inference modules
for decision support systems. These modules are able to represent uncertainty in the status
of the variables and in the probabilistic dependencies between the variables. Also
influence diagram representations are supported.
The HUGIN system provides both an application programming interface (HUGIN API)
and a graphical environment and development facilities for interactively defining
Bayesian network structures and associated probability matrices.
|
|
|
Netica Bayesian network tool (Commercial)
The program provides a graphical development environment to edit Bayesian networks or
influence diagrams and to perform inferences with them. The networks are solved using
the junction tree algorithm (like Hugin). Interactions between variables may be defined
using conditional probability tables or using equations. The probabilities may also be
learned form training cases. The system supports delayed links between variables. Such
models are automatically transformed into static models.
It is possible to reverse individual links of the network (the tool updates the probabilistic
dependencies automatically) and also to remove nodes (the system updates the probability
of the other nodes as appropriate).
Additionally, an application programmer's library (API) C programmer's interface is
available for integrating the system to other programs.
|
|
|
|
 |
Wonderful web resources
|
|
|
|
Mining Software Engineering Data: A Survey
Software organizations have often collected volumes of data in hope of better understandingtheir processes and products. Useful information has been extracted from those large volumes of data, but it is commonly believed that large amounts of useful information remains hidden in software engineering databases.
Data mining has appeared as one of the tools of choice to better explore software engineering data. Data mining can be defined as the process of extracting new, non-trivial, and useful information from databases. This broad definition covers a wide spectrum of methods, techniques, and tools. This State of the Art Report (SOAR) discusses how data mining can be, and how it has been, used to analyze software engineering data.
PROMISE Software Engineering Repository contains a collection of publicly available datasets and tools to serve researchers in building predictive software models (PSMs) and software engineering community at large. The repository is created to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering.
Software estimation, benchmarking, productivity, risk analysis, and cost information for software developers and business
Software cost estimation
Software cost estimation is the process of predicting the amount of effort required to build a software system. Models provide one or more mathematical algorithms that compute cost as a function of a number of variables. Size is a primary cost factor in most models and can be measuring using lines of code or function points. Models used to estimate cost can be categorized as either cost models or constraint models. COCOMO is an example of a cost model and SLIM is an example of a constraint model. Although criteria for evaluating a model have been suggested, there are some fundamental problems with existing models. Many models are available as automated tools.
Bayesian Elicitation of Experts Probabilities
The Probability Elicitation Tool
The Third Bayesian Modeling Applications Workshop During UAI-05, Uncertainty in Artificial Intelligence 2005
Edinburgh, Scotland, UK
the big guy who are doing some project related to BBN
|
|
|
 |
Created by
beyondtest
Last modified
2006-03-28 09:55 AM
|
|
«
|
January
2009
|
»
|
| Su |
Mo |
Tu |
We |
Th |
Fr |
Sa |
|
|
|
|
|
1 |
2 |
3 |
| 4 |
5 |
6 |
7 |
8 |
9 |
10 |
| 11 |
12 |
13 |
14 |
15 |
16 |
17 |
| 18 |
19 |
20 |
21 |
22 |
23 |
24 |
| 25 |
26 |
27 |
28 |
29 |
30 |
31 |
|