187 lines
12 KiB
Markdown
187 lines
12 KiB
Markdown
# PyUCC User Manual
|
|
|
|
**Document Version:** 1.0
|
|
**Application:** PyUCC (Python Unified Code Counter)
|
|
|
|
---
|
|
|
|
## 1. Introduction
|
|
**PyUCC** is an advanced static code analysis tool. Its primary objective is to provide quantitative metrics on software development and, crucially, to track code evolution over time through a powerful **Differing** system.
|
|
|
|
### What is it for?
|
|
1. **Counting:** Knowing exactly how many lines of code, comments, and blank lines make up your project.
|
|
2. **Measuring:** Calculating software complexity and maintainability.
|
|
3. **Comparing:** Understanding exactly what changed between two versions (added/removed/modified files and how complexity has shifted).
|
|
|
|
---
|
|
|
|
## 2. Core Concepts
|
|
Before starting, it is useful to understand the key terms used in the application.
|
|
|
|
### 2.1 Baseline
|
|
A **Baseline** is an instant "snapshot" of your project at a specific moment in time.
|
|
* When you create a baseline, PyUCC saves a copy of the files and calculates all metrics.
|
|
* Baselines serve as reference points (benchmarks) for future comparisons.
|
|
|
|
### 2.2 Supported Metrics
|
|
* **SLOC (Source Lines of Code):**
|
|
* *Physical Lines:* Total lines in the file.
|
|
* *Code Lines:* Lines containing executable code.
|
|
* *Comment Lines:* Documentation lines.
|
|
* *Blank Lines:* Empty lines (often used for formatting).
|
|
* **Cyclomatic Complexity (CC):** Measures the complexity of the control flow (how many `if`, `for`, `while` statements, etc.). **Lower is better.**
|
|
* **Maintainability Index (MI):** An index from 0 to 100 estimating how easy the code is to maintain. **Higher is better** (above 85 is excellent, below 65 is problematic).
|
|
|
|
### 2.3 Profile
|
|
A **Profile** is a saved configuration that tells PyUCC:
|
|
* Which folders to analyze.
|
|
* Which languages to include (e.g., Python and C++ only).
|
|
* What to ignore (e.g., `venv`, `build` folders, temporary files).
|
|
|
|
---
|
|
|
|
## 3. User Interface (GUI)
|
|
The interface is divided into functional zones to keep the workflow organized.
|
|
|
|
1. **Top Bar:**
|
|
* **Profile** selection.
|
|
* Access to **Settings** and Profile Manager (**Manage**).
|
|
2. **Actions Bar:** The main buttons to start operations (`Scan`, `Countings`, `Metrics`, `Differing`).
|
|
3. **Progress Area:** Progress bar and file counter.
|
|
4. **Results Table:** The large central table where data appears.
|
|
5. **Log & Status:** At the bottom, a log panel to see what is happening and a status bar monitoring system resources (CPU/RAM).
|
|
|
|
---
|
|
|
|
## 4. Step-by-Step Guide
|
|
|
|
### 4.1 First Run & Profile Configuration
|
|
The first thing to do upon opening PyUCC is to define *what* to analyze.
|
|
|
|
1. Click on **⚙️ Manage...** in the top bar.
|
|
2. Click on **📝 New** to clear the fields.
|
|
3. Enter a **Name** for the profile (e.g., "My Backend Project").
|
|
4. In the **Paths** section, use **Add Folder** to select your code's root directory.
|
|
5. In the **Filter Extensions** section, select the languages you are interested in (e.g., Python, Java).
|
|
6. In the **Ignore patterns** box, you can keep the defaults (which already exclude `.git`, `__pycache__`, etc.).
|
|
7. Click **💾 Save**.
|
|
|
|
### 4.2 Simple Analysis (Scan, Countings, Metrics)
|
|
If you only want to analyze the current state without comparisons:
|
|
|
|
* **🔍 Scan:** Simply verifies which files are found based on the profile filters. Useful to check if you are including the right files.
|
|
* **🔢 Countings:** Analyzes every file and reports how many code, comment, and blank lines exist.
|
|
* **📊 Metrics:** Calculates Cyclomatic Complexity and Maintainability Index for each file.
|
|
|
|
> **Tip:** You can double-click on a file in the results table to open it in the built-in **File Viewer**, which provides syntax highlighting and a colored minimap (blue=code, green=comments).
|
|
|
|
### 4.3 The "Differing" Workflow (Comparison)
|
|
This is PyUCC's most powerful feature.
|
|
|
|
**Step A: Create the First Baseline**
|
|
1. Select your profile.
|
|
2. Click on **🔀 Differing**.
|
|
3. If this is the first time you analyze this project, PyUCC will notify you: *"No baseline found"*.
|
|
4. Confirm creation. PyUCC will take a "snapshot" of the project (Baseline).
|
|
|
|
**Step B: Work on the Code**
|
|
Now you can close PyUCC and work on your code (modify files, add new ones, delete old ones).
|
|
|
|
**Step C: Compare**
|
|
1. Reopen PyUCC and select the same profile.
|
|
2. Click on **🔀 Differing**.
|
|
3. This time, PyUCC detects an existing previous Baseline and asks which one to compare against (if you have multiple).
|
|
4. The result will be a table with specific color coding:
|
|
* **Green:** Added files or improved metrics.
|
|
* **Red:** Deleted files or worsened metrics (e.g., increased complexity).
|
|
* **Yellow/Orange:** Modified files.
|
|
* **Δ (Delta) Columns:** Show numerical differences (e.g., `+50` code lines, `-2` complexity).
|
|
|
|
> **Diff Viewer:** If you double-click a row in the Differing results, a window will open showing the two files side-by-side, highlighting exactly which lines changed.
|
|
|
|
---
|
|
|
|
## 5. Exemplary Use Cases
|
|
|
|
### Case 1: Refactoring
|
|
* **Goal:** You want to clean up code and ensure you haven't increased complexity.
|
|
* **Action:** Create a Baseline before starting. Perform refactoring. Run *Differing*.
|
|
* **Verification:** Check the **Δ avg_cc** column. If it is negative (e.g., `-0.5`), great! You reduced complexity. If **Δ comment_lines** is positive, you improved documentation.
|
|
|
|
### Case 2: Code Review
|
|
* **Goal:** A colleague added a new feature. What changed?
|
|
* **Action:** Run *Differing* against the previous master/main version.
|
|
* **Verification:** Sort by "Status". Immediately see **Added** (A) and **Modified** (M) files. Open the Diff Viewer on modified files to inspect specific lines.
|
|
|
|
---
|
|
|
|
## 6. Development Philosophy (For Developers)
|
|
|
|
PyUCC was built following rigorous software engineering principles, reflected in its stability and usage.
|
|
|
|
### 6.1 Clean Code & PEP8 Standards
|
|
The code adheres to the Python **PEP8** standard. This ensures that if you ever want to extend the tool or write automation scripts using the `core` modules, you will find readable, standardized, and predictable code.
|
|
|
|
### 6.2 Separation of Concerns (SoC)
|
|
The application is strictly divided into two parts:
|
|
1. **Core (`pyucc.core`):** Contains pure logic (scanning, metric calculation, diff algorithms). It knows nothing about the GUI.
|
|
2. **GUI (`pyucc.gui`):** Handles only visualization and user interaction.
|
|
**Philosophy:** This allows changing the interface without breaking the logic, or using the logic via command line without launching the GUI.
|
|
|
|
### 6.3 Non-Blocking UI (Worker Manager)
|
|
You may notice the interface never freezes, even when analyzing thousands of files.
|
|
This is thanks to the **WorkerManager**. All heavy operations are executed in separate background threads. The GUI receives updates via a thread-safe `queue`.
|
|
* **User Benefit:** You can always press "Cancel" if an operation takes too long.
|
|
|
|
### 6.4 Intelligent Matching Algorithm (Gale-Shapley)
|
|
In *Differing*, PyUCC doesn't just check if filenames are identical. It uses an algorithm inspired by the "Stable Marriage Problem" (Gale-Shapley) combined with Levenshtein distance on paths.
|
|
* **Philosophy:** If you move a file from one folder to another, the system attempts to recognize it as the *same* file moved, rather than marking one as "Deleted" and one as "Added".
|
|
|
|
### 6.5 Determinism
|
|
The system uses content hashing (SHA1/MD5) to optimize calculations (caching) and to determine if a file has *truly* changed, ignoring the filesystem modification timestamp if the content remains identical.
|
|
|
|
---
|
|
|
|
## 7. Troubleshooting Common Issues
|
|
|
|
* **Program finds no files:** Check the Profile Manager to see if the file extension is selected in the language list or if the folder is covered by "Ignore patterns".
|
|
* **Extreme slowness:** If you included folders with thousands of small non-code files (e.g., `node_modules` or image assets), add them to "Ignore patterns".
|
|
* **Empty Diff Viewer:** Ensure the source files still exist on disk. If you deleted the project folder after creating the baseline, the viewer cannot display the current file.
|
|
|
|
---
|
|
|
|
## 8. New Features (Since v1.0)
|
|
|
|
This release adds several capabilities that improve code-quality analysis, reproducibility of baselines, and duplicate detection across a codebase. Below is a concise description of what changed and how to use the new features.
|
|
|
|
### 8.1 Duplicate Detection (GUI + CLI)
|
|
- **What it does:** Finds exact and fuzzy duplicates across the project. Exact duplicates are detected by content hashing (SHA1). Fuzzy duplicates use k-gram fingerprinting with a winnowing step to create fingerprints, and a Jaccard similarity score to rank likely duplicates.
|
|
- **Parameters:** `k` (k-gram size), `window` (winnowing window), and `threshold` (percent similarity). Defaults are chosen for balanced precision/recall but can be adjusted.
|
|
- **How to run (GUI):** Use the new **Duplicates** button in the Actions bar (it appears before the Differ button). A dialog lets you choose extensions, the similarity threshold, and fingerprinting parameters. Settings persist between runs.
|
|
- **How to run (CLI):** `python -m pyucc duplicates <path> --threshold 5.0 --ext .py .c` prints a JSON structure with duplicates found.
|
|
- **Exports:** Results can be exported to CSV and to a UCC-style textual report placed inside baseline folders (when run during baseline creation).
|
|
|
|
### 8.2 UCC-style Duplicate and Differ Reports
|
|
- **Compact UCC-style table:** Differ now produces a compact table compatible with UCC-like output, including additional Δ (delta) columns: `ΔCode`, `ΔComm`, `ΔBlank`, `ΔFunc`, `ΔAvgCC`, `ΔMI`. This helps quickly see numeric changes in code, comments, blank lines, number of functions, average cyclomatic complexity and maintainability.
|
|
- **Duplicates report:** A textual `duplicates_report.txt` is generated (when requested) that lists duplicate groups with pairwise percent similarity and the parameters used to generate them. Baselines store the parameters so results are reproducible.
|
|
|
|
Example (compact UCC-style snippet):
|
|
|
|
```
|
|
File Code Comm Blank Func AvgCC MI ΔCode ΔComm ΔBlank ΔFunc ΔAvgCC ΔMI
|
|
---------------------------------------------------------------------------------------------------------------
|
|
src/module/a.py 120 10 8 5 2.3 78 +10 -1 0 +0 -0.1 +2
|
|
src/module/b_copy.py 118 8 10 5 2.4 76 -2 -2 +2 0 +0.1 -2
|
|
```
|
|
|
|
### 8.3 Scanner & Baseline Improvements
|
|
- **Centralized scanning:** The `scanner` is the canonical provider of the file list. Heavy modules (Differ, Duplicates finder) accept a `file_list` produced by the scanner to avoid rescanning and to ensure consistent filtering.
|
|
- **Ignore pattern normalization:** Ignore entries like `.bak` are normalized to `*.bak` and matching is case-insensitive by default; this prevents accidental inclusion of backup files in baselines.
|
|
- **Baseline reproducibility:** Baselines now store the duplicates parameters and the file list snapshot. When a baseline is re-created or analyzed later, PyUCC attempts to re-run per-file function analysis (if `lizard` is available) so that function-level metrics in older baselines remain useful.
|
|
|
|
### 8.4 Notes on Dependencies
|
|
- Function-level metrics (number of functions, per-function CC) rely on `lizard`. If `lizard` is not installed, PyUCC will still produce SLOC and coarse metrics but function details may be missing. Baseline creation records this state and will re-run function analysis if `lizard` becomes available later.
|
|
|
|
---
|
|
|
|
If you want, I can add a short step-by-step example that shows how to create a baseline, run duplicates, and export a CSV + UCC-style report from the GUI and from the CLI. Would you like a full worked example with sample files and commands? |