Hancom Office is a proprietary office suite that includes a word processor, spreadsheet software, presentation software, and a PDF editor as well as their online versions accessible via a web browser. It is primarily addressed to Korean users. Hancom Office is written in Java and C++ that runs on Android, iOS, macOS and Windows platforms. == Products == Hangul - Hangul is a word processor developed by Hancom. It is a product that eliminates the inconvenience of the original Hangul word processor, which was limited to Hangul cards or PC models. Originally, the name was written using the '아래아' character, a vowel letter that is obsolete in modern Korean, and it was referred to as 'HWP' (an abbreviation for Hangul Word Processor), '아래아 한글' (Arae-a Hangul), '한/글' (Han/Geul), and so on. Hangul is currently the most widely used word processor in South Korea, often used alongside Microsoft Word. HanWord - word processor compatible with Word HanCell - spreadsheet program HanShow - presentation program Hancom Office Hanword Viewer - For viewing documents created by Hancom Office or Microsoft Office
Data exploration
Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems. These characteristics can include size or amount of data, completeness of the data, correctness of the data, possible relationships amongst data elements or files/tables in the data. Data exploration is typically conducted using a combination of automated and manual activities. Automated activities can include data profiling or data visualization or tabular reports to give the analyst an initial view into the data and an understanding of key characteristics. This is often followed by manual drill-down or filtering of the data to identify anomalies or patterns identified through the automated actions. Data exploration can also require manual scripting and queries into the data (e.g. using languages such as SQL or R) or using spreadsheets or similar tools to view the raw data. All of these activities are aimed at creating a mental model and understanding of the data in the mind of the analyst, and defining basic metadata (statistics, structure, relationships) for the data set that can be used in further analysis. Once this initial understanding of the data is had, the data can be pruned or refined by removing unusable parts of the data (data cleansing), correcting poorly formatted elements and defining relevant relationships across datasets. This process is also known as determining data quality. Data exploration can also refer to the ad hoc querying or visualization of data to identify potential relationships or insights that may be hidden in the data and does not require to formulate assumptions beforehand. Traditionally, this had been a key area of focus for statisticians, with John Tukey being a key evangelist in the field. Today, data exploration is more widespread and is the focus of data analysts and data scientists; the latter being a relatively new role within enterprises and larger organizations. == Interactive Data Exploration == This area of data exploration has become an area of interest in the field of machine learning. This is a relatively new field and is still evolving. As its most basic level, a machine-learning algorithm can be fed a data set and can be used to identify whether a hypothesis is true based on the dataset. Common machine learning algorithms can focus on identifying specific patterns in the data. Many common patterns include regression and classification or clustering, but there are many possible patterns and algorithms that can be applied to data via machine learning. By employing machine learning, it is possible to find patterns or relationships in the data that would be difficult or impossible to find via manual inspection, trial and error or traditional exploration techniques. == Software == Trifacta – a data preparation and analysis platform Paxata – self-service data preparation software Alteryx – data blending and advanced data analytics software Microsoft Power BI - interactive visualization and data analysis tool OpenRefine - a standalone open source desktop application for data clean-up and data transformation Tableau software – interactive data visualization software
Generalized multidimensional scaling
Generalized multidimensional scaling (GMDS) is an extension of metric multidimensional scaling, in which the target space is non-Euclidean. When the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. GMDS is an emerging research direction. Currently, main applications are recognition of deformable objects (e.g. for three-dimensional face recognition) and texture mapping.
Latent class model
In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved (or latent). Latent class analysis (LCA) is a subset of structural equation modeling used to find groups or subtypes of cases in multivariate categorical data. These groups or subtypes of cases are called "latent classes". When faced with the following situation, a researcher might opt to use LCA to better understand the data: Symptoms a, b, c, and d have been recorded in a variety of patients diagnosed with diseases X, Y, and Z. Disease X is associated with symptoms a, b, and c; disease Y is linked to symptoms b, c, and d; and disease Z is connected to symptoms a, c, and d. In this context, the LCA would attempt to detect the presence of latent classes (i.e., the disease entities), thus creating patterns of association in the symptoms. As in factor analysis, LCA can also be used to classify cases according to their maximum likelihood class membership probability. The key criterion for resolving the LCA is identifying latent classes in which the observed symptom associations are effectively rendered null. This is because within each class, the diseases responsible for the symptoms create a structure of dependencies. As a result, the symptoms become conditionally independent, meaning that, given the class a case belongs to, the symptoms are no longer related to one another. == Model == Within each latent class, the observed variables are statistically independent—an essential aspect of latent class modeling. Usually, the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes, variables are independent (local independence). Therefore, the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the LCM is written as p i 1 , i 2 , … , i N ≈ ∑ t T p t ∏ n N p i n , t n , {\displaystyle p_{i_{1},i_{2},\ldots ,i_{N}}\approx \sum _{t}^{T}p_{t}\,\prod _{n}^{N}p_{i_{n},t}^{n},} where T {\displaystyle T} is the number of latent classes and p t {\displaystyle p_{t}} are the so-called recruitment or unconditional probabilities that should sum to one. p i n , t n {\displaystyle p_{i_{n},t}^{n}} are the marginal or conditional probabilities. For a two-way latent class model, the form is p i j ≈ ∑ t T p t p i t p j t . {\displaystyle p_{ij}\approx \sum _{t}^{T}p_{t}\,p_{it}\,p_{jt}.} This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization. The probability model used in LCA is closely related to the Naive Bayes classifier. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers, the class membership is an observed label. == Related methods == There are a number of methods with distinct names and uses that share a common relationship. Cluster analysis is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data and assumes that such data arise from a mixture of distributions, such as a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension, allocating members to classes based on that dimension. An example would be assigning cases to social classes based on ability or merit. In a practical instance, the variables could be multiple choice items of a political questionnaire. In this case, the data consists of an N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion, and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance that certain answers are chosen. == Application == LCA may be used in many fields, such as: collaborative filtering, Behavior Genetics and Evaluation of diagnostic tests.
Multiple discriminant analysis
Multiple Discriminant Analysis (MDA) is a multivariate dimensionality reduction technique. It has been used to predict signals as diverse as neural memory traces and corporate failure. MDA is not directly used to perform classification. It merely supports classification by yielding a compressed signal amenable to classification. The method described in Duda et al. (2001) §3.8.3 projects the multivariate signal down to an M−1 dimensional space where M is the number of categories. MDA is useful because most classifiers are strongly affected by the curse of dimensionality. In other words, when signals are represented in very-high-dimensional spaces, the classifier's performance is catastrophically impaired by the overfitting problem. This problem is reduced by compressing the signal down to a lower-dimensional space as MDA does. MDA has been used to reveal neural codes.
Generative design
Generative design is an iterative design process that uses software to generate outputs that fulfill a set of constraints iteratively adjusted by a designer. Whether a human, test program, or artificial intelligence, the designer algorithmically or manually refines the feasible region of the program's inputs and outputs with each iteration to fulfill evolving design requirements. By employing computing power to evaluate more design permutations than a human alone is capable of, the process is capable of producing an optimal design that mimics nature's evolutionary approach to design through genetic variation and selection. The output can be images, sounds, architectural models, animation, and much more. It is, therefore, a fast method of exploring design possibilities that is used in various design fields such as art, architecture, communication design, and product design. Generative design has become more important, largely due to new programming environments or scripting capabilities that have made it relatively easy, even for designers with little programming experience, to implement their ideas. Additionally, this process can create solutions to substantially complex problems that would otherwise be resource-exhaustive with an alternative approach, making it a more attractive option for problems with a large or unknown solution set. It is also facilitated with tools in commercially available CAD packages. Not only are implementation tools more accessible, but also tools leveraging generative design as a foundation. Recent advancements have led to the development of Deep Generative Design, a framework that integrates topology optimization with deep learning models, such as Generative Adversarial Networks (GANs). Unlike traditional evolutionary methods that primarily focus on engineering performance, this approach uses deep generative models to enhance aesthetic diversity and novelty while simultaneously satisfying engineering constraints. For instance, research by Oh et al. (2019) proposed a framework using Boundary Equilibrium GANs (BEGAN) to generate diverse design options which are then refined through density-based topology optimization, allowing for the exploration of complex design spaces that balance structural integrity with visual variation. In practice, generative design does not solely aim to produce a single optimal solution, but involves iteratively refining the design problem by modifying parameters, constraints, and evaluation criteria within a computational model, resulting in multiple design alternatives from which the designer selects. == Use in architecture == Generative design in architecture is an iterative design process that enables architects to explore a wider solution space with more possibility and creativity. Architectural design has long been regarded as a wicked problem. Compared with traditional top-down design approach, generative design can address design problems efficiently, by using a bottom-up paradigm that uses parametric-defined rules to generate complex solutions. The solution itself then evolves to a good, if not optimal, solution. The advantage of using generative design as a design tool is that it does not construct fixed geometries, but take a set of design rules that can generate an infinite set of possible design solutions. The generated design solutions can be more sensitive, responsive, and adaptive to the problem. Generative design involves rule definition and result analysis that are integrated with the design process. By defining parameters and rules, the generative approach is able to provide optimized solution for both structural stability and aesthetics. Possible design algorithms include cellular automata, shape grammar, genetic algorithm, space syntax, and most recently, artificial neural network. Due to the high complexity of the solution generated, rule-based computational tools, such as finite element method and topology optimisation, are preferred to evaluate and optimise the generated solution. The iterative process provided by computer software enables the trial-and-error approach in design, and involves architects interfering with the optimisation process. Historically precedent work includes Antoni Gaudí's Sagrada Família, which used rule based geometrical forms for structures, and Buckminster Fuller's Montreal Biosphere where the rules were designed to generate individual components, rather than the final product. More recent generative-design cases include Foster and Partners' Queen Elizabeth II Great Court, where the tessellated glass roof was designed using a geometric schema to define hierarchical relationships, and then the generated solution was optimized based on geometrical and structural requirements. == Use in sustainable design == Generative design in sustainable design is an effective approach addressing energy efficiency and climate change at the early design stage, recognizing buildings contribute to approximately one-third of global greenhouse gas emissions and 30%-40% of total building energy use. It integrates environmental principles with algorithms, enabling exploration of countless design alternatives to enhance energy performance, reduce carbon footprints, and minimize waste. A key feature of generative design in sustainable design is its ability to incorporate Building Performance Simulations (BPS) into the design process. Simulation programs such as EnergyPlus, Ladybug Tools,, and so on, combined with generative algorithms, can optimize design solutions for cost-effective energy use and zero-carbon building designs. For example, the GENE_ARCH system used a Pareto algorithm with building energy simulation for the whole building design optimization. Generative design has improved sustainable facade design, as illustrated by the algorithm of cellular automata and daylight simulations in adaptive facade design. In addition, genetic algorithms were used with radiation simulations for energy-efficient photo-voltaic (PV) modules on high-rise building facades. Generative design is also applied to life cycle analysis (LCA), as demonstrated by a framework using grid search algorithms to optimize exterior wall design for minimum environmental impact. Multi-objective optimization embraces multiple diverse sustainability goals, such as interactive kinetic louvers using biomimicry and daylight simulations to enhance daylight, visual comfort, and energy efficiency. The study of PV and shading systems can maximize on-site electricity, improve visual quality, and daylight performance. Artificial intelligence (AI) and machine learning (ML) further improve computation efficiency in complex climate-responsive sustainable design. One study employed reinforcement learning to identify the relationship between design parameters and energy use for a sustainable campus, while other studies tried hybrid algorithms, such as using the genetic algorithm and GANs to balance daylight illumination and thermal comfort under different roof conditions. Other popular AI tools were also integrated, including deep reinforcement learning (DRL) and computer vision (CV), to generate an urban block according to direct sunlight hours and solar heat gains. These AI-driven generative design methods enable faster simulations and design decision making, resulting in designs that are environmentally responsible. == Use in additive manufacturing == Additive manufacturing (AM) is a process that creates physical models directly from three-dimensional (3D) data by joining materials layer by layer. It is used in industries to produce a variety of end-use parts, which are final components designed for direct application in products or systems. AM provides design flexibility and enables material reduction in lightweight applications, such as aerospace, automotive, medical, and portable electronic devices, where minimizing weight is critical for performance. Generative design, one of the four key methods for lightweight design in AM, is commonly applied to optimize structures for specific performance requirements. Generative design can help create optimized solutions that balance multiple objectives, such as enhancing performance while minimizing cost. In design for additive manufacturing (DfAM), multi-objective topology optimization is used to generate a set of candidate solutions. Designers then assess these options using their expertise and key performance indicators (KPIs) to select the best option for implementation. However, integrating AM constraints (e.g., speed of build, materials, build envelope, and accuracy) into generative design remains challenging, as ensuring all solutions are valid is complex. Balancing multiple design objectives while limiting computational costs adds further challenges for designers. To overcome these difficulties, researchers proposed a generative design method with manufacturing validation to improve decision-making efficiency. This method starts with a cons
Abess
abess (Adaptive Best Subset Selection, also ABESS) is a machine learning method designed to address the problem of best subset selection. It aims to determine which features or variables are crucial for optimal model performance when provided with a dataset and a prediction task. abess was introduced by Zhu in 2020 and it dynamically selects the appropriate model size adaptively, eliminating the need for selecting regularization parameters. abess is applicable in various statistical and machine learning tasks, including linear regression, the Single-index model, and other common predictive models. abess can also be applied in biostatistics. == Basic Form == The basic form of abess is employed to address the optimal subset selection problem in general linear regression. abess is an l 0 {\displaystyle l_{0}} method, it is characterized by its polynomial time complexity and the property of providing both unbiased and consistent estimates. In the context of linear regression, assuming we have knowledge of n {\displaystyle n} independent samples ( x i , y i ) , i = 1 , … , n {\displaystyle (x_{i},y_{i}),i=1,\ldots ,n} , where x i ∈ R p × 1 {\displaystyle x_{i}\in \mathbb {R} ^{p\times 1}} and y i ∈ R {\displaystyle y_{i}\in \mathbb {R} } , we define X = ( x 1 , … , x n ) ⊤ {\displaystyle X=(x_{1},\ldots ,x_{n})^{\top }} and y = ( y 1 , … , y n ) ⊤ {\displaystyle y=(y_{1},\ldots ,y_{n})^{\top }} . The following equation represents the general linear regression model: y = X β + ε . {\displaystyle y=X\beta +\varepsilon .} To obtain appropriate parameters β {\displaystyle \beta } , one can consider the loss function for linear regression: L n LR ( β ; X , y ) = 1 2 n ‖ y − X β ‖ 2 2 . {\displaystyle {\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y)={\frac {1}{2n}}\|y-X\beta \|_{2}^{2}.} In abess, the initial focus is on optimizing the loss function under the l 0 {\displaystyle l_{0}} constraint. That is, we consider the following problem: min β ∈ R p × 1 L n LR ( β ; X , y ) , subject to ‖ β ‖ 0 ≤ s , {\displaystyle \min _{\beta \in \mathbb {R} ^{p\times 1}}{\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y),{\text{ subject to }}\|\beta \|_{0}\leq s,} where s {\displaystyle s} represents the desired size of the support set, and ‖ β ‖ 0 = ∑ i = 1 p I ( β i ≠ 0 ) {\displaystyle \|\beta \|_{0}=\sum _{i=1}^{p}{\mathcal {I}}_{(\beta _{i}\neq 0)}} is the l 0 {\displaystyle l_{0}} norm of the vector. To address the optimization problem described above, abess iteratively exchanges an equal number of variables between the active set and the inactive set. In each iteration, the concept of sacrifice is introduced as follows: For j in the active set ( j ∈ A ^ {\displaystyle j\in {\hat {\mathcal {A}}}} ): ξ j = L n LR ( β ^ A ∖ { j } ) − L n LR ( β ^ A ) = X j ⊤ X j 2 n ( β ^ j ) 2 {\displaystyle \xi _{j}={\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{{\mathcal {A}}\backslash \{j\}}\right)-{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}\right)={\frac {{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}}{2n}}\left({\hat {\beta }}_{j}\right)^{2}} For j in the inactive set ( j ∉ A ^ {\displaystyle j\notin {\hat {\mathcal {A}}}} ): ξ j = L n LR ( β ^ A ) − L n LR ( β ^ A + t ^ { j } ) = X j ⊤ X j 2 n ( d ^ j X j ⊤ X j / n ) 2 {\displaystyle \xi _{j}={\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}\right)-{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}+{\hat {\boldsymbol {t}}}^{\{j\}}\right)={\frac {{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}}{2n}}\left({\frac {{\hat {\mathrm {d} }}_{j}}{{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}/n}}\right)^{2}} Here are the key elements in the above equations: β ^ A {\displaystyle {\hat {\beta }}^{\mathcal {A}}} : This represents the estimate of β {\displaystyle \beta } obtained in the previous iteration. A ^ {\displaystyle {\hat {\mathcal {A}}}} : It denotes the estimated active set from the previous iteration. β ^ A ∖ { j } {\displaystyle {\hat {\boldsymbol {\beta }}}^{{\mathcal {A}}\backslash \{j\}}} : This is a vector where the j-th element is set to 0, while the other elements are the same as β ^ A {\displaystyle {\hat {\beta }}^{\mathcal {A}}} . t ^ { j } = arg min t L n LR ( β ^ A + t { j } ) {\displaystyle {\hat {\boldsymbol {t}}}^{\{j\}}=\arg \min _{t}{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}+{\boldsymbol {t}}^{\{j\}}\right)} : Here, t { j } {\displaystyle t^{\{j\}}} represents a vector where all elements are 0 except the j-th element. d ^ j = X j ⊤ ( y − X β ^ ) / n {\displaystyle {\hat {d}}_{j}={\boldsymbol {X}}_{j}^{\top }({\boldsymbol {y}}-{\boldsymbol {X}}{\hat {\boldsymbol {\beta }}})/n} : This is calculated based on the equation mentioned. The iterative process involves exchanging variables, with the aim of minimizing the sacrifices in the active set while maximizing the sacrifices in the inactive set during each iteration. This approach allows abess to efficiently search for the optimal feature subset. In abess, select an appropriate s max {\displaystyle s_{\max }} and optimize the above problem for active sets size s = 1 , … , s max {\displaystyle s=1,\ldots ,s_{\max }} using the information criterion GIC = n log L n LR + s log p log log n , {\displaystyle {\text{GIC}}=n\log {\mathcal {L}}_{n}^{\text{LR}}+s\log p\log \log n,} to adaptively choose the appropriate active set size s {\displaystyle s} and obtain its corresponding abess estimator. == Generalizations == The splicing algorithm in abess can be employed for subset selection in other models. === Distribution-Free Location-Scale Regression === In 2023, Siegfried extends abess to the case of Distribution-Free and Location-Scale. Specifically, it considers the optimization problem max ϑ ∈ R P , β ∈ R J , γ ∈ R J ∑ i = 1 N ℓ i ( ϑ , x i ⊤ β , exp ( x i ⊤ γ ) − 1 ) , {\displaystyle \max _{{\boldsymbol {\vartheta }}\in \mathbb {R} ^{P},{\boldsymbol {\beta }}\in \mathbb {R} ^{J},{\boldsymbol {\gamma }}\in \mathbb {R} ^{J}}\sum _{i=1}^{N}\ell _{i}\left({\boldsymbol {\vartheta }},{\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\beta }},{\sqrt {\exp \left({\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\gamma }}\right)}}^{-1}\right),} subject to ‖ ( β ⊤ , γ ⊤ ) ⊤ ‖ 0 ≤ s , {\displaystyle \left\|\left({\boldsymbol {\beta }}^{\top },{\boldsymbol {\gamma }}^{\top }\right)^{\top }\right\|_{0}\leq s,} where ℓ i {\displaystyle \ell _{i}} is a loss function, ϑ {\displaystyle {\boldsymbol {\vartheta }}} is a parameter vector, β {\displaystyle {\boldsymbol {\beta }}} and γ {\displaystyle {\boldsymbol {\gamma }}} are vectors, and x i {\displaystyle {\boldsymbol {x}}_{i}} is a data vector. This approach, demonstrated across various applications, enables parsimonious regression modeling for arbitrary outcomes while maintaining interpretability through innovative subset selection procedures. === Groups Selection === In 2023, Zhang applied the splicing algorithm to group selection, optimizing the following model: min β ∈ R p L n LR ( β ; X , y ) subject to ∑ j = 1 J I ( ‖ β G j ‖ 2 ≠ 0 ) ≤ s {\displaystyle \min _{{\boldsymbol {\beta }}\in \mathbb {R} ^{p}}{\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y){\text{ subject to }}\sum _{j=1}^{J}I\left(\|{\boldsymbol {\beta }}_{G_{j}}\|_{2}\neq 0\right)\leq s} Here are the symbols involved: J {\displaystyle J} : Total number of feature groups, representing the existence of J {\displaystyle J} non-overlapping feature groups in the dataset. G j {\displaystyle G_{j}} : Index set for the j {\displaystyle j} -th feature group, where j {\displaystyle j} ranges from 1 to J {\displaystyle J} , representing the feature grouping structure in the data. s {\displaystyle s} : Model size, a positive integer determined from the data, limiting the number of selected feature groups. === Regression with Corrupted Data === Zhang applied the splicing algorithm to handle corrupted data. Corrupted data refers to information that has been disrupted or contains errors during the data collection or recording process. This interference may include sensor inaccuracies, recording errors, communication issues, or other external disturbances, leading to inaccurate or distorted observations within the dataset. === Single Index Models === In 2023, Tang applied the splicing algorithm to optimal subset selection in the Single-index model. The form of the Single Index Model (SIM) is given by y i = g ( b ⊤ x i , e i ) , i = 1 , … , n , {\displaystyle y_{i}=g({\boldsymbol {b}}^{\top }{\boldsymbol {x}}_{i},e_{i}),\quad i=1,\ldots ,n,} where b {\displaystyle {\boldsymbol {b}}} is the parameter vector, e i {\displaystyle e_{i}} is the error term. The corresponding loss function is defined as l n ( β ) = ∑ i = 1 n ( r i n − 1 2 − x i ⊤ β ) 2 , {\displaystyle l_{n}({\boldsymbol {\beta }})=\sum _{i=1}^{n}\left({\frac {r_{i}}{n}}-{\frac {1}{2}}-{\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\beta }}\right)^{2},} where r {\disp