Data Mining by Mehmed Kantardzic (good book recommendations TXT) π
Read free book Β«Data Mining by Mehmed Kantardzic (good book recommendations TXT) πΒ» - read online or download for free at americanlibrarybooks.com
- Author: Mehmed Kantardzic
Read book online Β«Data Mining by Mehmed Kantardzic (good book recommendations TXT) πΒ». Author - Mehmed Kantardzic
In addition to data privacy issues, data mining raises other social concerns. For example, some researchers argue that data mining and the use of consumer profiles in some companies can actually exclude groups of customers from full participation in the marketplace and limit their access to information.
Good privacy protection not only can help build support for data mining and other tools to enhance security, it can also contribute to making those tools more effective. As technology designers, we should provide an information infrastructure that helps society to be more certain that data-mining power is used only in legally approved ways, and that the data that may give rise to consequences for individuals are based on inferences that are derived from accurate, approved, and legally available data. Future data-mining solutions reconciling any social issues must not only be applicable to the ever changing technological environment, but also flexible with regard to specific contexts and disputes.
12.7 REVIEW QUESTIONS AND PROBLEMS
1. What are the benefits in modeling social networks with a graph structure? What kind of graphs would you use in this case?
2. For the given undirected graph G:
(a) compute the degree and variability parameters of the graph;
(b) find adjacency matrix for the graph G;
(c) determine binary code(G) for the graph;
(d) find closeness parameter or each node of the graph; and
(e) what is the betweeness measure for node number 2?
3. For the graph given in Problem number 2, find partial betweeness centrality using modified graph starting with node number 5.
4. Give real-world examples for traditional analyses of temporal data (i.e., trends, cycles, seasonal patterns, outliers).
5. Given the temporal sequence S = {1 2 3 2 4 6 7 5 3 1 0 2}:
(a) find PAA for four sections of the sequence;
(b) determine SAX values for solution in (a) if (1) Ξ± = 3, (2) Ξ± = 4;
(c) find PAA for three sections of the sequence; and
(d) determine SAX values for solution in (c) if (1) Ξ± = 3, (2) Ξ± = 4.
6. Given the sequence S = {A B C B A A B A B C B A B A B B C B A C C}:
(a) Find the longest subsequence with frequency β₯ 3.
(b) Construct finite-state automaton (FSA) for the subsequence found in (a).
7. Find normalized contiguity matrix for the table of U.S. cities:MinneapolisChicagoNew YorkNashvilleLouisvilleCharlotte
Make assumption that only neighboring cities (vertical and horizontal) in the table are close.
8. For the BN in Figure 12.38 determine:
(a) P(C, R, W)
(b) P(C, S, W)
9. Review the latest articles on privacy-preserving data mining that are available on the Internet. Discuss the trends in the field.
10. What are the largest sources of unintended personal data on the Internet? How do we increase awareness of Web users of their personal data that are available on the Web for a variety of data-mining activities?
11. Discuss an implementation of transparency and accountability mechanisms in a data-mining process. Illustrate your ideas with examples of real-world data-mining applications.
12. Give examples of data-mining applications where you would use DDM approach. Explain the reasons.
12.8 REFERENCES FOR FURTHER STUDY
Aggarwal C. C., P. S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, Springer, Boston, 2008.
The book proposes a number of techniques to perform the data-mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, and randomization and perturbation-based techniques. This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions. Privacy-Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science, and is also suitable for industry practitioners.
Chakrabarti D., C. Faloutsos, Graph Mining: Laws, Generators, and Algorithms, ACM Computing Surveys, Vol. 38, March 2006, pp. 1β69.
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: βHow can we generate synthetic but realistic graphs?β To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey gives an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
Da Silva J. C., et al., Distributed Data Mining and Agents, Engineering Applications of Artificial Intelligence, Vol. 18, No. 7, October 2005, pp. 791β807.
Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. DDM algorithms focus on one class of such distributed problem solving tasksβanalysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multi-agent systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem-solving scenarios. It reviews algorithms for distributed clustering, including privacy-preserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering.
Laxman S., P. S. Sastry, A Survey of Temporal Data Mining, Sadhana, Vol. 31, Part 2, April 2006, pp. 173β198.
Data mining is concerned with analyzing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships that in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal
Comments (0)