aggiornato manuali, spostato file di configurazione e profili

2025-12-12 10:14:37 +01:00 · 2025-12-12 10:14:37 +01:00 · 9538919374
commit 9538919374
parent 79ed9c1d72
7 changed files with 1044 additions and 34 deletions
--- a/doc/English-manual.md
+++ b/doc/English-manual.md
@ -184,4 +184,402 @@ src/module/b_copy.py                    118     8     10     5     2.4    76   -

 ---

-If you want, I can add a short step-by-step example that shows how to create a baseline, run duplicates, and export a CSV + UCC-style report from the GUI and from the CLI. Would you like a full worked example with sample files and commands?
+## 9. Duplicate Detection: Algorithms and Technical Details
+
+This section provides a deeper understanding of how PyUCC identifies duplicate code, what the algorithms do, and how to interpret the results.
+
+### 9.1 Exact Duplicate Detection
+
+**How it works:**
+- PyUCC normalizes each file (strips leading/trailing whitespace from each line, converts to lowercase optionally).
+- Computes a SHA1 hash of the normalized content.
+- Files with identical hashes are considered exact duplicates.
+
+**Use case:** Finding files that were copy-pasted with no or minimal changes (e.g., `utils.py` and `utils_backup.py`).
+
+**What you'll see:**
+- In the GUI table: pairs of files marked as "exact" duplicates with 100% similarity.
+- In the report: listed under "Exact duplicates" section.
+
+### 9.2 Fuzzy Duplicate Detection (Advanced)
+
+Fuzzy detection identifies files that are *similar* but not identical. This is useful for finding:
+- Code that was copy-pasted and then slightly modified.
+- Refactored modules that share large blocks of logic.
+- Experimental branches or "almost-duplicates" that should be merged.
+
+**Algorithm Overview:**
+
+1. **K-gram Hashing (Rolling Hash with Rabin-Karp):**
+   - Each file is divided into overlapping sequences of `k` consecutive lines (k-grams).
+   - A rolling hash (Rabin-Karp polynomial hash) is computed for each k-gram.
+   - This produces a large set of hash values representing all k-grams in the file.
+
+2. **Winnowing (Fingerprint Selection):**
+   - To reduce the number of hashes (and improve performance), PyUCC applies a "winnowing" technique.
+   - A sliding window of size `w` moves over the hash sequence.
+   - In each window, the minimum hash value is selected as a fingerprint.
+   - This creates a compact set of representative fingerprints for the file.
+   - **Key property:** If two files share a substring of at least `k + w - 1` lines, they will share at least one fingerprint.
+
+3. **Inverted Index:**
+   - All fingerprints from all files are stored in an inverted index: `{fingerprint -> [list of files containing it]}`.
+   - This allows fast lookup of which files share fingerprints.
+
+4. **Jaccard Similarity:**
+   - For each pair of files that share at least one fingerprint, PyUCC computes the Jaccard similarity:
+     ```
+     Jaccard(A, B) = |A ∩ B| / |A ∪ B|
+     ```
+   - Where A and B are the sets of fingerprints for the two files.
+   - If the Jaccard score is above the threshold (default: 0.85, meaning 85% similarity), the pair is flagged as a fuzzy duplicate.
+
+5. **Percent Change Calculation:**
+   - PyUCC also estimates the percentage of lines that differ between the two files.
+   - If `pct_change <= threshold` (e.g., ≤5%), the files are considered duplicates.
+
+**Parameters you can adjust:**
+
+- **`k` (k-gram size):** Number of consecutive lines in each k-gram. Default: 25.
+  - Larger `k` → fewer false positives, but may miss small duplicates.
+  - Smaller `k` → more sensitive, but may produce false positives.
+  
+- **`window` (winnowing window size):** Size of the window for selecting fingerprints. Default: 4.
+  - Larger window → fewer fingerprints, faster processing, but may miss some matches.
+  - Smaller window → more fingerprints, slower, but more thorough.
+
+- **`threshold` (percent change threshold):** Maximum allowed difference (in %) to still consider two files duplicates. Default: 5.0%.
+  - Lower threshold → stricter matching (only very similar files).
+  - Higher threshold → more lenient (catches files with more differences).
+
+**Recommended settings:**
+
+| Use Case | k | window | threshold |
+|----------|---|--------|----------|
+| Strict duplicate finding (only near-identical files) | 30 | 5 | 3.0% |
+| Balanced (default) | 25 | 4 | 5.0% |
+| Loose matching (catch refactored code) | 20 | 3 | 10.0% |
+| Very aggressive (experimental) | 15 | 2 | 15.0% |
+
+### 9.3 Understanding Duplicate Reports
+
+**GUI Table Columns:**
+
+- **File A / File B:** The two files being compared.
+- **Match Type:** "exact" or "fuzzy".
+- **Similarity (%):** For fuzzy matches, the Jaccard similarity score (0-100%).
+- **Pct Change (%):** Estimated percentage of lines that differ.
+
+**Textual Report (duplicates_report.txt):**
+
+The report is divided into two sections:
+
+1. **Exact Duplicates:**
+   ```
+   Exact duplicates: 3
+   
+   src/utils.py <=> src/backup/utils_old.py
+   src/module/helper.py <=> src/module/helper - Copy.py
+   ```
+
+2. **Fuzzy Duplicates:**
+   ```
+   Fuzzy duplicates (threshold): 5
+   
+   src/processor.py <=> src/processor_v2.py
+     Similarity: 92.5% | Pct Change: 3.2%
+   
+   src/core/engine.py <=> src/experimental/engine_new.py
+     Similarity: 88.0% | Pct Change: 4.8%
+   ```
+
+**Interpretation:**
+
+- **High similarity (>95%):** Strong candidates for deduplication. Consider keeping only one version or merging.
+- **Medium similarity (85-95%):** Review manually. May indicate refactored code or intentional variations.
+- **Threshold violations:** Files that exceed the `pct_change` threshold won't appear in the report, even if they share some fingerprints.
+
+---
+
+## 10. Reading and Interpreting Differ Reports
+
+The Differ functionality produces several types of output. Understanding each helps you track code evolution accurately.
+
+### 10.1 Compact UCC-Style Table
+
+When you run *Differing*, PyUCC generates a compact summary table similar to the original UCC tool:
+
+**Example:**
+```
+File                                   Code   Comm  Blank  Func  AvgCC   MI   ΔCode  ΔComm  ΔBlank  ΔFunc  ΔAvgCC  ΔMI
+---------------------------------------------------------------------------------------------------------------
+src/module/a.py                         120    10     8      5     2.3    78   +10    -1     0       +0     -0.1    +2
+src/module/b.py                         118     8     10     5     2.4    76   -2     -2     +2      0      +0.1   -2
+src/new_feature.py                       45     5      3     2     1.8    82   +45    +5     +3      +2     +1.8   +82
+src/old_code.py                          --    --     --    --     --    --   -30    -5     -2      -1     -2.1   -75
+```
+
+**Column Meanings:**
+
+| Column | Meaning |
+|--------|--------|
+| **File** | Relative path to the file |
+| **Code** | Current number of code lines |
+| **Comm** | Current number of comment lines |
+| **Blank** | Current number of blank lines |
+| **Func** | Number of functions detected (requires `lizard`) |
+| **AvgCC** | Average cyclomatic complexity per function |
+| **MI** | Maintainability Index (0-100, higher is better) |
+| **ΔCode** | Change in code lines (current - baseline) |
+| **ΔComm** | Change in comment lines |
+| **ΔBlank** | Change in blank lines |
+| **ΔFunc** | Change in function count |
+| **ΔAvgCC** | Change in average cyclomatic complexity |
+| **ΔMI** | Change in maintainability index |
+
+**Color Coding (GUI):**
+
+- **Green rows:** New files (Added) or improved metrics (e.g., ΔAvgCC < 0, ΔMI > 0).
+- **Red rows:** Deleted files or worsened metrics (e.g., ΔAvgCC > 0, ΔMI < 0).
+- **Yellow/Orange rows:** Modified files with mixed changes.
+- **Gray rows:** Unmodified files (identical to baseline).
+
+**What to look for:**
+
+- **ΔCode >> 0:** Significant code expansion. Is it justified by new features?
+- **ΔComm < 0:** Documentation decreased. Consider adding more comments.
+- **ΔAvgCC > 0:** Complexity increased. May indicate need for refactoring.
+- **ΔMI < 0:** Maintainability worsened. Review the changes.
+- **New files with high AvgCC:** New code is already complex. Flag for review.
+
+### 10.2 Detailed Diff Report (diff_report.txt)
+
+A textual report is saved in the baseline folder:
+
+**Structure:**
+```
+PyUCC Baseline Comparison Report
+=================================
+Baseline ID: MyProject__20251205T143022_local
+Snapshot timestamp: 2025-12-05 14:30:22
+
+Summary:
+  New files: 3
+  Deleted files: 1
+  Modified files: 12
+  Unchanged files: 45
+
+Metric Changes:
+  Total Code Lines: +150
+  Total Comments: -5
+  Average CC: +0.2 (slight increase in complexity)
+  Average MI: -1.5 (slight decrease in maintainability)
+
+[Compact UCC-style table here]
+
+Legend:
+  A = Added file
+  D = Deleted file
+  M = Modified file
+  U = Unchanged file
+  ...
+```
+
+### 10.3 CSV Exports
+
+You can export any result table to CSV for further analysis in Excel, pandas, or BI tools.
+
+**Columns include:**
+- File path
+- All SLOC metrics (code, comment, blank lines)
+- Complexity metrics (CC, MI, function count)
+- Deltas (if from a Differ operation)
+- Status flags (A/D/M/U)
+
+**Use cases:**
+- Trend analysis over multiple baselines.
+- Generating charts (e.g., complexity over time).
+- Feeding into CI/CD quality gates.
+
+---
+
+## 11. Practical Use Cases and Workflows
+
+### Use Case 1: Detecting Copy-Paste Code Before Code Review
+
+**Scenario:** Your team is developing a new module. You suspect some developers copy-pasted existing code instead of refactoring.
+
+**Workflow:**
+1. Create a profile for your project.
+2. Click **Duplicates** button.
+3. Set threshold to 5% (strict).
+4. Review the results table.
+5. For each fuzzy duplicate pair:
+   - Double-click to open both files in the diff viewer (if implemented).
+   - Assess whether the duplication is intentional or should be refactored into a shared utility.
+6. Export to CSV and share with the team for discussion.
+
+**Expected outcome:** You identify 3-5 near-duplicate files and create tickets to consolidate them.
+
+---
+
+### Use Case 2: Tracking Complexity During a Refactoring Sprint
+
+**Scenario:** Your team plans a 2-week refactoring sprint to reduce technical debt.
+
+**Workflow:**
+1. **Before the sprint:** Create a baseline ("Pre-Refactor").
+   - Click **Differing** → Create baseline.
+   - Name it "PreRefactor_Sprint5".
+2. **During the sprint:** Developers refactor code, extract functions, add comments.
+3. **After the sprint:** Run **Differing** against the baseline.
+4. Review the compact table:
+   - Check ΔAvgCC: Should be negative (complexity reduced).
+   - Check ΔMI: Should be positive (maintainability improved).
+   - Check ΔComm: Should be positive (more documentation).
+5. Generate a diff report and attach to sprint retrospective.
+
+**Expected outcome:** Quantitative proof that refactoring worked: "We reduced average CC by 15% and increased MI by 8 points."
+
+---
+
+### Use Case 3: Ensuring New Features Don't Degrade Quality
+
+**Scenario:** You're adding a new feature to a mature codebase. You want to ensure the new code doesn't introduce excessive complexity.
+
+**Workflow:**
+1. Create a baseline before starting feature development.
+2. Develop the feature in a branch.
+3. Before merging to main:
+   - Run **Differing** to compare current state vs. baseline.
+   - Filter for new files (status = "A").
+   - Check AvgCC and MI of new files.
+   - If AvgCC > 5 or MI < 70, flag for refactoring before merge.
+4. Use **Duplicates** to ensure new code doesn't duplicate existing utilities.
+
+**Expected outcome:** New feature code passes quality gates before merge.
+
+---
+
+### Use Case 4: Generating Compliance Reports for Audits
+
+**Scenario:** Your organization requires periodic code quality audits.
+
+**Workflow:**
+1. Create baselines monthly (e.g., "Audit_2025_01", "Audit_2025_02", ...).
+2. Each baseline automatically generates:
+   - `countings_report.txt`
+   - `metrics_report.txt`
+   - `duplicates_report.txt`
+3. Archive these reports in a compliance folder.
+4. For the audit, provide:
+   - Trend of total SLOC over time.
+   - Trend of average CC and MI.
+   - Number of duplicates detected and resolved each month.
+
+**Expected outcome:** Auditors see measurable improvement in code quality metrics over time.
+
+---
+
+### Use Case 5: Onboarding New Developers with Code Metrics
+
+**Scenario:** A new developer joins the team and needs to understand the codebase.
+
+**Workflow:**
+1. Run **Metrics** on the entire codebase.
+2. Export to CSV.
+3. Sort by AvgCC (descending) to identify the most complex modules.
+4. Share the list with the new developer:
+   - "These 5 files have the highest complexity. Be extra careful when modifying them."
+   - "These modules have low MI. They're candidates for refactoring—good learning exercises."
+5. Use **Duplicates** to show which parts of the code have redundancy (explain why).
+
+**Expected outcome:** New developer understands code hotspots and quality issues faster.
+
+---
+
+## 12. Tips for Effective Use
+
+### 12.1 Profile Management
+
+- **Create separate profiles** for different subprojects or components.
+- Use **ignore patterns** aggressively to exclude:
+  - `node_modules`, `venv`, `.venv`
+  - Build outputs (`build/`, `dist/`, `bin/`)
+  - Generated code
+  - Test fixtures or mock data
+
+### 12.2 Baseline Strategy
+
+- **Naming convention:** Use descriptive names with dates or version tags:
+  - `Release_v1.2.0_20251201`
+  - `PreRefactor_Sprint10`
+  - `BeforeMerge_FeatureX`
+- **Frequency:** Create baselines at key milestones:
+  - End of each sprint
+  - Before/after major refactorings
+  - Before releases
+- **Retention:** Keep at least 3-5 recent baselines. Archive older ones.
+
+### 12.3 Interpreting Metrics
+
+**Cyclomatic Complexity (CC):**
+- **1-5:** Simple, low risk.
+- **6-10:** Moderate complexity, acceptable.
+- **11-20:** High complexity, review recommended.
+- **21+:** Very high complexity, refactoring strongly recommended.
+
+**Maintainability Index (MI):**
+- **85-100:** Highly maintainable (green zone).
+- **70-84:** Moderately maintainable (yellow zone).
+- **Below 70:** Low maintainability (red zone), needs attention.
+
+### 12.4 Duplicate Detection Best Practices
+
+- Start with **default parameters** (k=25, window=4, threshold=5%).
+- If you get too many false positives, **increase k** or **decrease threshold**.
+- If you suspect duplicates are being missed, **decrease k** or **increase threshold**.
+- Always **review fuzzy duplicates manually**—not all similarities are bad (e.g., interface implementations).
+
+---
+
+## 13. Troubleshooting and FAQs
+
+**Q: Duplicates detection is slow on large codebases.**
+
+**A:** 
+- Use profile filters to limit the file types analyzed.
+- Increase `k` and `window` to reduce the number of fingerprints processed.
+- Exclude large auto-generated files or test fixtures.
+
+**Q: Why are some files missing function-level metrics?**
+
+**A:** 
+- Function-level analysis requires `lizard`. Install it: `pip install lizard`.
+- Some languages may not be fully supported by `lizard`.
+
+**Q: Differ shows files as "Modified" but I didn't change them.**
+
+**A:**
+- Check if line endings changed (CRLF ↔ LF).
+- Verify the file wasn't reformatted by an auto-formatter.
+- PyUCC uses content hashing—any byte-level change triggers "Modified" status.
+
+**Q: How do I reset all baselines?**
+
+**A:** 
+- Baselines are stored in the `baseline/` folder (default).
+- Delete the baseline folder or specific baseline subdirectories to reset.
+
+**Q: Can I run PyUCC in CI/CD pipelines?**
+
+**A:** 
+- Yes! Use the CLI mode:
+  ```bash
+  python -m pyucc differ create /path/to/repo
+  python -m pyucc differ diff <baseline_id> /path/to/repo
+  python -m pyucc duplicates /path/to/repo --threshold 5.0
+  ```
+- Parse the JSON output or text reports in your pipeline scripts.
+
+---
--- a/doc/Italian-manual.md
+++ b/doc/Italian-manual.md
@ -184,4 +184,402 @@ src/module/b_copy.py                    118     8     10     5     2.4    76   -

 ---

-Se vuoi, posso aggiungere un esempio passo-passo che mostra come creare una baseline, eseguire la ricerca duplicati e esportare CSV + report UCC sia da GUI che da CLI. Vuoi che lo prepari con comandi e file di esempio?
+## 9. Rilevamento Duplicati: Algoritmi e Dettagli Tecnici
+
+Questa sezione fornisce una comprensione più approfondita di come PyUCC identifica il codice duplicato, cosa fanno gli algoritmi e come interpretare i risultati.
+
+### 9.1 Rilevamento Duplicati Esatti
+
+**Come funziona:**
+- PyUCC normalizza ciascun file (rimuove spazi iniziali/finali da ogni riga, converte in minuscolo opzionalmente).
+- Calcola un hash SHA1 del contenuto normalizzato.
+- I file con hash identici sono considerati duplicati esatti.
+
+**Caso d'uso:** Trovare file che sono stati copiati e incollati senza o con modifiche minime (es. `utils.py` e `utils_backup.py`).
+
+**Cosa vedrai:**
+- Nella tabella GUI: coppie di file contrassegnate come duplicati "esatti" con similarità al 100%.
+- Nel report: elencate nella sezione "Duplicati esatti".
+
+### 9.2 Rilevamento Duplicati Fuzzy (Avanzato)
+
+Il rilevamento fuzzy identifica file che sono *simili* ma non identici. È utile per trovare:
+- Codice che è stato copiato e poi leggermente modificato.
+- Moduli rifatti che condividono grandi blocchi di logica.
+- Branch sperimentali o "quasi-duplicati" che dovrebbero essere fusi.
+
+**Panoramica dell'Algoritmo:**
+
+1. **Hashing K-gram (Rolling Hash con Rabin-Karp):**
+   - Ogni file è diviso in sequenze sovrapposte di `k` righe consecutive (k-gram).
+   - Viene calcolato un rolling hash (hash polinomiale Rabin-Karp) per ogni k-gram.
+   - Questo produce un grande insieme di valori hash che rappresentano tutti i k-gram del file.
+
+2. **Winnowing (Selezione delle Impronte Digitali):**
+   - Per ridurre il numero di hash (e migliorare le prestazioni), PyUCC applica una tecnica di "winnowing".
+   - Una finestra scorrevole di dimensione `w` si sposta sulla sequenza di hash.
+   - In ogni finestra, viene selezionato il valore hash minimo come impronta digitale.
+   - Questo crea un insieme compatto di impronte rappresentative per il file.
+   - **Proprietà chiave:** Se due file condividono una sottostringa di almeno `k + w - 1` righe, condivideranno almeno un'impronta digitale.
+
+3. **Indice Invertito:**
+   - Tutte le impronte digitali di tutti i file vengono memorizzate in un indice invertito: `{impronta -> [lista di file che la contengono]}`.
+   - Questo permette una ricerca veloce di quali file condividono impronte.
+
+4. **Similarità di Jaccard:**
+   - Per ogni coppia di file che condividono almeno un'impronta digitale, PyUCC calcola la similarità di Jaccard:
+     ```
+     Jaccard(A, B) = |A ∩ B| / |A ∪ B|
+     ```
+   - Dove A e B sono gli insiemi di impronte digitali per i due file.
+   - Se il punteggio Jaccard è sopra la soglia (default: 0.85, ovvero 85% di similarità), la coppia viene segnalata come duplicato fuzzy.
+
+5. **Calcolo Percentuale di Modifica:**
+   - PyUCC stima anche la percentuale di righe che differiscono tra i due file.
+   - Se `pct_change <= threshold` (es. ≤5%), i file sono considerati duplicati.
+
+**Parametri regolabili:**
+
+- **`k` (dimensione k-gram):** Numero di righe consecutive in ogni k-gram. Default: 25.
+  - `k` maggiore → meno falsi positivi, ma può perdere duplicati piccoli.
+  - `k` minore → più sensibile, ma può produrre falsi positivi.
+  
+- **`window` (dimensione finestra winnowing):** Dimensione della finestra per selezionare le impronte. Default: 4.
+  - Finestra maggiore → meno impronte, elaborazione più veloce, ma può perdere alcune corrispondenze.
+  - Finestra minore → più impronte, più lento, ma più accurato.
+
+- **`threshold` (soglia percentuale di modifica):** Differenza massima consentita (in %) per considerare ancora due file come duplicati. Default: 5.0%.
+  - Soglia inferiore → corrispondenza più stretta (solo file molto simili).
+  - Soglia superiore → più permissiva (cattura file con più differenze).
+
+**Impostazioni consigliate:**
+
+| Caso d'Uso | k | window | threshold |
+|----------|---|--------|----------|
+| Ricerca duplicati stretta (solo file quasi identici) | 30 | 5 | 3.0% |
+| Bilanciata (default) | 25 | 4 | 5.0% |
+| Corrispondenza lassa (cattura codice rifatto) | 20 | 3 | 10.0% |
+| Molto aggressiva (sperimentale) | 15 | 2 | 15.0% |
+
+### 9.3 Interpretare i Report sui Duplicati
+
+**Colonne Tabella GUI:**
+
+- **File A / File B:** I due file confrontati.
+- **Match Type:** "exact" o "fuzzy".
+- **Similarity (%):** Per corrispondenze fuzzy, il punteggio di similarità Jaccard (0-100%).
+- **Pct Change (%):** Percentuale stimata di righe che differiscono.
+
+**Report Testuale (duplicates_report.txt):**
+
+Il report è diviso in due sezioni:
+
+1. **Duplicati Esatti:**
+   ```
+   Exact duplicates: 3
+   
+   src/utils.py <=> src/backup/utils_old.py
+   src/module/helper.py <=> src/module/helper - Copy.py
+   ```
+
+2. **Duplicati Fuzzy:**
+   ```
+   Fuzzy duplicates (threshold): 5
+   
+   src/processor.py <=> src/processor_v2.py
+     Similarity: 92.5% | Pct Change: 3.2%
+   
+   src/core/engine.py <=> src/experimental/engine_new.py
+     Similarity: 88.0% | Pct Change: 4.8%
+   ```
+
+**Interpretazione:**
+
+- **Alta similarità (>95%):** Forti candidati per la deduplicazione. Considera di mantenere solo una versione o di fonderle.
+- **Media similarità (85-95%):** Rivedi manualmente. Può indicare codice rifatto o variazioni intenzionali.
+- **Violazioni della soglia:** I file che superano la soglia `pct_change` non appariranno nel report, anche se condividono alcune impronte.
+
+---
+
+## 10. Leggere e Interpretare i Report Differ
+
+La funzionalità Differ produce diversi tipi di output. Comprendere ciascuno aiuta a tracciare l'evoluzione del codice con precisione.
+
+### 10.1 Tabella Compatta in Stile UCC
+
+Quando esegui *Differing*, PyUCC genera una tabella di riepilogo compatta simile allo strumento UCC originale:
+
+**Esempio:**
+```
+File                                   Code   Comm  Blank  Func  AvgCC   MI   ΔCode  ΔComm  ΔBlank  ΔFunc  ΔAvgCC  ΔMI
+---------------------------------------------------------------------------------------------------------------
+src/module/a.py                         120    10     8      5     2.3    78   +10    -1     0       +0     -0.1    +2
+src/module/b.py                         118     8     10     5     2.4    76   -2     -2     +2      0      +0.1   -2
+src/new_feature.py                       45     5      3     2     1.8    82   +45    +5     +3      +2     +1.8   +82
+src/old_code.py                          --    --     --    --     --    --   -30    -5     -2      -1     -2.1   -75
+```
+
+**Significato delle Colonne:**
+
+| Colonna | Significato |
+|--------|--------|
+| **File** | Percorso relativo del file |
+| **Code** | Numero corrente di righe di codice |
+| **Comm** | Numero corrente di righe di commenti |
+| **Blank** | Numero corrente di righe vuote |
+| **Func** | Numero di funzioni rilevate (richiede `lizard`) |
+| **AvgCC** | Complessità ciclomatica media per funzione |
+| **MI** | Indice di Manutenibilità (0-100, più alto è meglio) |
+| **ΔCode** | Variazione nelle righe di codice (corrente - baseline) |
+| **ΔComm** | Variazione nelle righe di commenti |
+| **ΔBlank** | Variazione nelle righe vuote |
+| **ΔFunc** | Variazione nel conteggio funzioni |
+| **ΔAvgCC** | Variazione nella complessità ciclomatica media |
+| **ΔMI** | Variazione nell'indice di manutenibilità |
+
+**Codifica Colori (GUI):**
+
+- **Righe verdi:** File nuovi (Aggiunti) o metriche migliorate (es. ΔAvgCC < 0, ΔMI > 0).
+- **Righe rosse:** File eliminati o metriche peggiorate (es. ΔAvgCC > 0, ΔMI < 0).
+- **Righe gialle/arancioni:** File modificati con cambiamenti misti.
+- **Righe grigie:** File non modificati (identici alla baseline).
+
+**Cosa cercare:**
+
+- **ΔCode >> 0:** Espansione significativa del codice. È giustificata da nuove funzionalità?
+- **ΔComm < 0:** Documentazione diminuita. Considera di aggiungere più commenti.
+- **ΔAvgCC > 0:** Complessità aumentata. Può indicare necessità di refactoring.
+- **ΔMI < 0:** Manutenibilità peggiorata. Rivedi le modifiche.
+- **Nuovi file con alto AvgCC:** Il nuovo codice è già complesso. Segnala per revisione.
+
+### 10.2 Report Diff Dettagliato (diff_report.txt)
+
+Un report testuale viene salvato nella cartella baseline:
+
+**Struttura:**
+```
+PyUCC Baseline Comparison Report
+=================================
+Baseline ID: MyProject__20251205T143022_local
+Snapshot timestamp: 2025-12-05 14:30:22
+
+Summary:
+  New files: 3
+  Deleted files: 1
+  Modified files: 12
+  Unchanged files: 45
+
+Metric Changes:
+  Total Code Lines: +150
+  Total Comments: -5
+  Average CC: +0.2 (slight increase in complexity)
+  Average MI: -1.5 (slight decrease in maintainability)
+
+[Tabella compatta in stile UCC qui]
+
+Legend:
+  A = Added file
+  D = Deleted file
+  M = Modified file
+  U = Unchanged file
+  ...
+```
+
+### 10.3 Esportazioni CSV
+
+Puoi esportare qualsiasi tabella dei risultati in CSV per ulteriori analisi in Excel, pandas o strumenti BI.
+
+**Le colonne includono:**
+- Percorso file
+- Tutte le metriche SLOC (codice, commenti, righe vuote)
+- Metriche di complessità (CC, MI, conteggio funzioni)
+- Delta (se da un'operazione Differ)
+- Flag di stato (A/D/M/U)
+
+**Casi d'uso:**
+- Analisi dei trend su più baseline.
+- Generazione di grafici (es. complessità nel tempo).
+- Integrazione in gate di qualità CI/CD.
+
+---
+
+## 11. Casi d'Uso Pratici e Workflow
+
+### Caso d'Uso 1: Rilevare Codice Copiato Prima della Code Review
+
+**Scenario:** Il tuo team sta sviluppando un nuovo modulo. Sospetti che alcuni sviluppatori abbiano copiato e incollato codice esistente invece di fare refactoring.
+
+**Workflow:**
+1. Crea un profilo per il tuo progetto.
+2. Clicca sul pulsante **Duplicates**.
+3. Imposta threshold a 5% (stretto).
+4. Rivedi la tabella dei risultati.
+5. Per ogni coppia di duplicati fuzzy:
+   - Fai doppio click per aprire entrambi i file nel visualizzatore diff (se implementato).
+   - Valuta se la duplicazione è intenzionale o dovrebbe essere rifatta in un'utility condivisa.
+6. Esporta in CSV e condividi con il team per discussione.
+
+**Risultato atteso:** Identifichi 3-5 file quasi duplicati e crei ticket per consolidarli.
+
+---
+
+### Caso d'Uso 2: Tracciare la Complessità Durante uno Sprint di Refactoring
+
+**Scenario:** Il tuo team pianifica uno sprint di refactoring di 2 settimane per ridurre il debito tecnico.
+
+**Workflow:**
+1. **Prima dello sprint:** Crea una baseline ("Pre-Refactor").
+   - Clicca **Differing** → Crea baseline.
+   - Nominala "PreRefactor_Sprint5".
+2. **Durante lo sprint:** Gli sviluppatori rifanno il codice, estraggono funzioni, aggiungono commenti.
+3. **Dopo lo sprint:** Esegui **Differing** contro la baseline.
+4. Rivedi la tabella compatta:
+   - Controlla ΔAvgCC: Dovrebbe essere negativo (complessità ridotta).
+   - Controlla ΔMI: Dovrebbe essere positivo (manutenibilità migliorata).
+   - Controlla ΔComm: Dovrebbe essere positivo (più documentazione).
+5. Genera un report diff e allegalo alla retrospettiva dello sprint.
+
+**Risultato atteso:** Prova quantitativa che il refactoring ha funzionato: "Abbiamo ridotto il CC medio del 15% e aumentato MI di 8 punti."
+
+---
+
+### Caso d'Uso 3: Assicurare che Nuove Funzionalità Non Degradino la Qualità
+
+**Scenario:** Stai aggiungendo una nuova funzionalità a una codebase matura. Vuoi assicurarti che il nuovo codice non introduca complessità eccessiva.
+
+**Workflow:**
+1. Crea una baseline prima di iniziare lo sviluppo della funzionalità.
+2. Sviluppa la funzionalità in un branch.
+3. Prima del merge su main:
+   - Esegui **Differing** per confrontare lo stato corrente vs. baseline.
+   - Filtra per nuovi file (status = "A").
+   - Controlla AvgCC e MI dei nuovi file.
+   - Se AvgCC > 5 o MI < 70, segnala per refactoring prima del merge.
+4. Usa **Duplicates** per assicurarti che il nuovo codice non duplichi utility esistenti.
+
+**Risultato atteso:** Il codice della nuova funzionalità supera i gate di qualità prima del merge.
+
+---
+
+### Caso d'Uso 4: Generare Report di Conformità per Audit
+
+**Scenario:** La tua organizzazione richiede audit periodici sulla qualità del codice.
+
+**Workflow:**
+1. Crea baseline mensili (es. "Audit_2025_01", "Audit_2025_02", ...).
+2. Ogni baseline genera automaticamente:
+   - `countings_report.txt`
+   - `metrics_report.txt`
+   - `duplicates_report.txt`
+3. Archivia questi report in una cartella di conformità.
+4. Per l'audit, fornisci:
+   - Trend di SLOC totale nel tempo.
+   - Trend di CC e MI medi.
+   - Numero di duplicati rilevati e risolti ogni mese.
+
+**Risultato atteso:** Gli auditor vedono un miglioramento misurabile nelle metriche di qualità del codice nel tempo.
+
+---
+
+### Caso d'Uso 5: Onboarding Nuovi Sviluppatori con Metriche del Codice
+
+**Scenario:** Un nuovo sviluppatore si unisce al team e ha bisogno di comprendere la codebase.
+
+**Workflow:**
+1. Esegui **Metrics** sull'intera codebase.
+2. Esporta in CSV.
+3. Ordina per AvgCC (decrescente) per identificare i moduli più complessi.
+4. Condividi l'elenco con il nuovo sviluppatore:
+   - "Questi 5 file hanno la complessità più alta. Fai particolare attenzione quando li modifichi."
+   - "Questi moduli hanno MI basso. Sono candidati per refactoring—buoni esercizi di apprendimento."
+5. Usa **Duplicates** per mostrare quali parti del codice hanno ridondanza (spiega perché).
+
+**Risultato atteso:** Il nuovo sviluppatore comprende più velocemente i punti critici e i problemi di qualità del codice.
+
+---
+
+## 12. Suggerimenti per un Uso Efficace
+
+### 12.1 Gestione Profili
+
+- **Crea profili separati** per diversi sottoprogetti o componenti.
+- Usa **pattern di ignore** in modo aggressivo per escludere:
+  - `node_modules`, `venv`, `.venv`
+  - Output di build (`build/`, `dist/`, `bin/`)
+  - Codice generato
+  - Fixture di test o dati mock
+
+### 12.2 Strategia Baseline
+
+- **Convenzione di denominazione:** Usa nomi descrittivi con date o tag di versione:
+  - `Release_v1.2.0_20251201`
+  - `PreRefactor_Sprint10`
+  - `BeforeMerge_FeatureX`
+- **Frequenza:** Crea baseline ai traguardi chiave:
+  - Fine di ogni sprint
+  - Prima/dopo refactoring importanti
+  - Prima dei rilasci
+- **Ritenzione:** Mantieni almeno 3-5 baseline recenti. Archivia quelle più vecchie.
+
+### 12.3 Interpretare le Metriche
+
+**Complessità Ciclomatica (CC):**
+- **1-5:** Semplice, basso rischio.
+- **6-10:** Complessità moderata, accettabile.
+- **11-20:** Alta complessità, revisione raccomandata.
+- **21+:** Complessità molto alta, refactoring fortemente raccomandato.
+
+**Indice di Manutenibilità (MI):**
+- **85-100:** Altamente manutenibile (zona verde).
+- **70-84:** Moderatamente manutenibile (zona gialla).
+- **Sotto 70:** Bassa manutenibilità (zona rossa), necessita attenzione.
+
+### 12.4 Best Practice per il Rilevamento Duplicati
+
+- Inizia con **parametri di default** (k=25, window=4, threshold=5%).
+- Se ottieni troppi falsi positivi, **aumenta k** o **diminuisci threshold**.
+- Se sospetti che i duplicati vengano persi, **diminuisci k** o **aumenta threshold**.
+- **Rivedi sempre manualmente i duplicati fuzzy**—non tutte le similarità sono negative (es. implementazioni di interfacce).
+
+---
+
+## 13. Risoluzione Problemi e FAQ
+
+**D: Il rilevamento duplicati è lento su grandi codebase.**
+
+**R:** 
+- Usa i filtri del profilo per limitare i tipi di file analizzati.
+- Aumenta `k` e `window` per ridurre il numero di impronte elaborate.
+- Escludi file auto-generati di grandi dimensioni o fixture di test.
+
+**D: Perché alcuni file mancano di metriche a livello di funzione?**
+
+**R:** 
+- L'analisi a livello di funzione richiede `lizard`. Installalo: `pip install lizard`.
+- Alcuni linguaggi potrebbero non essere completamente supportati da `lizard`.
+
+**D: Differ mostra file come "Modified" ma non li ho modificati.**
+
+**R:**
+- Controlla se le terminazioni di riga sono cambiate (CRLF ↔ LF).
+- Verifica che il file non sia stato riformattato da un auto-formatter.
+- PyUCC usa hashing del contenuto—qualsiasi modifica a livello di byte attiva lo stato "Modified".
+
+**D: Come resetto tutte le baseline?**
+
+**R:** 
+- Le baseline sono memorizzate nella cartella `baseline/` (default).
+- Elimina la cartella baseline o specifiche sottocartelle baseline per resettare.
+
+**D: Posso eseguire PyUCC in pipeline CI/CD?**
+
+**R:** 
+- Sì! Usa la modalità CLI:
+  ```bash
+  python -m pyucc differ create /path/to/repo
+  python -m pyucc differ diff <baseline_id> /path/to/repo
+  python -m pyucc duplicates /path/to/repo --threshold 5.0
+  ```
+- Analizza l'output JSON o i report testuali nei tuoi script di pipeline.
+
+---
--- a/profiles.json
+++ b/profiles.json
@ -0,0 +1,189 @@
+[
+  {
+    "name": "target_simulator",
+    "paths": [
+      "C:\\src\\____GitProjects\\S1005403_RisCC\\target_simulator"
+    ],
+    "languages": [
+      "Python"
+    ],
+    "ignore": [
+      "__pycache__",
+      "*.pyc",
+      "*.pyo",
+      "*.pyd",
+      ".Python",
+      "env",
+      "venv",
+      ".venv",
+      "build",
+      "dist",
+      "*.egg-info",
+      ".eggs",
+      "node_modules",
+      ".git",
+      ".hg",
+      ".svn",
+      ".idea",
+      ".vscode",
+      ".DS_Store",
+      "*.class",
+      "*.o",
+      "*.so",
+      "*.dylib",
+      ".pytest_cache",
+      ".mypy_cache",
+      ".cache",
+      "coverage",
+      ".tox",
+      "pip-wheel-metadata",
+      "*.log",
+      "*.tmp",
+      "Thumbs.db"
+    ]
+  },
+  {
+    "name": "EIF",
+    "paths": [
+      "C:\\src\\GRIFO-E\\REP\\Projects\\GHost",
+      "C:\\src\\GRIFO-E\\REP\\Projects\\GrifoFwIF",
+      "C:\\src\\GRIFO-E\\REP\\Projects\\GrifoSdkEif",
+      "C:\\src\\GRIFO-E\\REP\\Projects\\AesaAntennaLibrary",
+      "C:\\src\\GRIFO-E\\REP\\Projects\\RpyOut\\IDD"
+    ],
+    "languages": [
+      "C",
+      "C++"
+    ],
+    "ignore": []
+  },
+  {
+    "name": "test1",
+    "paths": [
+      "C:\\src\\____GitProjects\\__test"
+    ],
+    "languages": [
+      "Python"
+    ],
+    "ignore": [
+      "__pycache__",
+      "*.pyc",
+      "*.pyo",
+      "*.pyd",
+      ".Python",
+      "env",
+      "venv",
+      ".venv",
+      "build",
+      "dist",
+      "*.egg-info",
+      ".eggs",
+      "node_modules",
+      ".git",
+      ".hg",
+      ".svn",
+      ".idea",
+      ".vscode",
+      ".DS_Store",
+      "*.class",
+      "*.o",
+      "*.so",
+      "*.dylib",
+      ".pytest_cache",
+      ".mypy_cache",
+      ".cache",
+      "coverage",
+      ".tox",
+      "pip-wheel-metadata",
+      "*.log",
+      "*.tmp",
+      "Thumbs.db"
+    ]
+  },
+  {
+    "name": "pyucc",
+    "paths": [
+      "C:\\src\\____GitProjects\\SXXXXXXX_PyUcc\\pyucc"
+    ],
+    "languages": [
+      "Python"
+    ],
+    "ignore": [
+      "__pycache__",
+      "*.pyc",
+      "*.pyo",
+      "*.pyd",
+      ".Python",
+      "env",
+      "venv",
+      ".venv",
+      "build",
+      "dist",
+      "*.egg-info",
+      ".eggs",
+      "node_modules",
+      ".git",
+      ".hg",
+      ".svn",
+      ".idea",
+      ".vscode",
+      ".DS_Store",
+      "*.class",
+      "*.o",
+      "*.so",
+      "*.dylib",
+      ".pytest_cache",
+      ".mypy_cache",
+      ".cache",
+      "coverage",
+      ".tox",
+      "pip-wheel-metadata"
+    ]
+  },
+  {
+    "name": "DSP",
+    "paths": [
+      "C:\\__temp\\Metrics\\attuale\\REP\\Projects\\DSP",
+      "C:\\__temp\\Metrics\\attuale\\REP\\Projects\\DspAlgorithms"
+    ],
+    "languages": [
+      "C",
+      "C++"
+    ],
+    "ignore": [
+      "__pycache__",
+      "*.pyc",
+      "*.pyo",
+      "*.pyd",
+      ".Python",
+      "env",
+      "venv",
+      ".venv",
+      "build",
+      "dist",
+      "*.egg-info",
+      ".eggs",
+      "node_modules",
+      ".git",
+      ".hg",
+      ".svn",
+      ".idea",
+      ".vscode",
+      ".DS_Store",
+      "*.class",
+      "*.o",
+      "*.so",
+      "*.dylib",
+      ".pytest_cache",
+      ".mypy_cache",
+      ".cache",
+      "coverage",
+      ".tox",
+      "pip-wheel-metadata",
+      "*.log",
+      "*.tmp",
+      "Thumbs.db",
+      "*.bak"
+    ]
+  }
+]
--- a/pyucc/_version.py
+++ b/pyucc/_version.py
@ -6,10 +6,10 @@
 import re

 # --- Version Data (Generated) ---
-__version__ = "v.0.0.0.16-0-gd211a7d-dirty"
-GIT_COMMIT_HASH = "d211a7d3549a26561a54d32bc2ccd15abef5d714"
+__version__ = "v.0.0.0.19-0-g79ed9c1-dirty"
+GIT_COMMIT_HASH = "79ed9c1d728bb53ace64ca2c232eb3c038b69152"
 GIT_BRANCH = "master"
-BUILD_TIMESTAMP = "2025-12-01T13:12:51.174558+00:00"
+BUILD_TIMESTAMP = "2025-12-12T09:07:06.615288+00:00"
 IS_GIT_REPO = True

 # --- Default Values (for comparison or fallback) ---
@ -17,7 +17,6 @@ DEFAULT_VERSION = "0.0.0+unknown"
 DEFAULT_COMMIT = "Unknown"
 DEFAULT_BRANCH = "Unknown"

-
 # --- Helper Function ---
 def get_version_string(format_string=None):
    """
@ -45,38 +44,28 @@ def get_version_string(format_string=None):

    replacements = {}
    try:
-        replacements["version"] = __version__ if __version__ else DEFAULT_VERSION
-        replacements["commit"] = GIT_COMMIT_HASH if GIT_COMMIT_HASH else DEFAULT_COMMIT
-        replacements["commit_short"] = (
-            GIT_COMMIT_HASH[:7]
-            if GIT_COMMIT_HASH and len(GIT_COMMIT_HASH) >= 7
-            else DEFAULT_COMMIT
-        )
-        replacements["branch"] = GIT_BRANCH if GIT_BRANCH else DEFAULT_BRANCH
-        replacements["timestamp"] = BUILD_TIMESTAMP if BUILD_TIMESTAMP else "Unknown"
-        replacements["timestamp_short"] = (
-            BUILD_TIMESTAMP.split("T")[0]
-            if BUILD_TIMESTAMP and "T" in BUILD_TIMESTAMP
-            else "Unknown"
-        )
-        replacements["is_git"] = "Git" if IS_GIT_REPO else "Unknown"
-        replacements["dirty"] = (
-            "-dirty" if __version__ and __version__.endswith("-dirty") else ""
-        )
+        replacements['version'] = __version__ if __version__ else DEFAULT_VERSION
+        replacements['commit'] = GIT_COMMIT_HASH if GIT_COMMIT_HASH else DEFAULT_COMMIT
+        replacements['commit_short'] = GIT_COMMIT_HASH[:7] if GIT_COMMIT_HASH and len(GIT_COMMIT_HASH) >= 7 else DEFAULT_COMMIT
+        replacements['branch'] = GIT_BRANCH if GIT_BRANCH else DEFAULT_BRANCH
+        replacements['timestamp'] = BUILD_TIMESTAMP if BUILD_TIMESTAMP else "Unknown"
+        replacements['timestamp_short'] = BUILD_TIMESTAMP.split('T')[0] if BUILD_TIMESTAMP and 'T' in BUILD_TIMESTAMP else "Unknown"
+        replacements['is_git'] = "Git" if IS_GIT_REPO else "Unknown"
+        replacements['dirty'] = "-dirty" if __version__ and __version__.endswith('-dirty') else ""

        tag = DEFAULT_VERSION
        if __version__ and IS_GIT_REPO:
-            match = re.match(r"^(v?([0-9]+(?:\.[0-9]+)*))", __version__)
+            match = re.match(r'^(v?([0-9]+(?:\.[0-9]+)*))', __version__)
            if match:
                tag = match.group(1)
-        replacements["tag"] = tag
+        replacements['tag'] = tag

        output_string = format_string
        for placeholder, value in replacements.items():
-            pattern = re.compile(r"{{\s*" + re.escape(placeholder) + r"\s*}}")
+             pattern = re.compile(r'{{\s*' + re.escape(placeholder) + r'\s*}}')
             output_string = pattern.sub(str(value), output_string)

-        if re.search(r"{\s*\w+\s*}", output_string):
+        if re.search(r'{\s*\w+\s*}', output_string):
             pass # Or log a warning: print(f"Warning: Unreplaced placeholders found: {output_string}")

        return output_string
--- a/pyucc/config/profiles.py
+++ b/pyucc/config/profiles.py
@ -1,7 +1,7 @@
 """Profiles persistence for PyUcc.

-Stores user profiles as JSON in the user's home directory
-(`~/.pyucc_profiles.json`). Each profile is a dict with keys:
+Stores user profiles as JSON in the application directory
+(where the executable is located, `profiles.json`). Each profile is a dict with keys:
  - name: str
  - path: str
  - languages: list[str]
@ -12,9 +12,20 @@ This module exposes simple load/save/manage helpers.

 from pathlib import Path
 import json
+import sys
 from typing import List, Dict, Optional

-_DEFAULT_PATH = Path.home() / ".pyucc_profiles.json"
+# Get the directory where the application is running from
+# If frozen (PyInstaller), use the executable's directory
+# Otherwise use the directory of this module
+if getattr(sys, 'frozen', False):
+    # Running as compiled executable
+    _APP_DIR = Path(sys.executable).parent
+else:
+    # Running as script - go up from config/ to pyucc/ to root
+    _APP_DIR = Path(__file__).parent.parent.parent
+
+_DEFAULT_PATH = _APP_DIR / "profiles.json"


 def _read_file(path: Path) -> List[Dict]:
--- a/pyucc/config/settings.py
+++ b/pyucc/config/settings.py
@ -1,8 +1,19 @@
 from pathlib import Path
 import json
+import sys
 from typing import Dict, Optional

-_DEFAULT_PATH = Path.home() / ".pyucc_settings.json"
+# Get the directory where the application is running from
+# If frozen (PyInstaller), use the executable's directory
+# Otherwise use the directory of this module
+if getattr(sys, 'frozen', False):
+    # Running as compiled executable
+    _APP_DIR = Path(sys.executable).parent
+else:
+    # Running as script - go up from config/ to pyucc/ to root
+    _APP_DIR = Path(__file__).parent.parent.parent
+
+_DEFAULT_PATH = _APP_DIR / "settings.json"


 def _read_file(path: Path) -> Dict:
--- a/settings.json
+++ b/settings.json
@ -0,0 +1,14 @@
+{
+  "baseline_dir": "C:\\src\\____GitProjects\\SXXXXXXX_PyUcc\\baseline",
+  "max_keep": 5,
+  "zip_baselines": false,
+  "duplicates": {
+    "threshold": 5.0,
+    "extensions": [
+      ".py",
+      ".pyw"
+    ],
+    "k": 25,
+    "window": 4
+  }
+}